1. Overview

Apache ShardingSphere is an open-source project that consists of a set of integrated tools for data processing. It provides a set of functionalities such as distributed database solutions, transactions, governance, and more.

In this tutorial, we’ll provide a quick overview of this ecosystem and a how-to-start guide.

2. What Is ShardingSphere?

Apache ShardingSphere, initially known as Sharding-JDBC, was created to tackle the problem of data sharding for Java applications. However, now it has expanded to a suite of tools including proxy, sidecar, and handling more than sharding.

When considering using ShardingSphere, it’s important to know what kind of advantages such a project brings to our solutions. A few points we can this are the following:

  • Performance: given the project’s maturity, the driver is close to a native JDBC in terms of efficiency and performance
  • Compatibility: the driver can be connected to any database that implements JDBC specifications; besides that, the proxy used by any application using MySQL and PostgreSQL
  • Zero Business intrusion: failover with no business impact
  • Low Ops and Maintenance Cost: fast learning curve, and it keeps minimal intervention on the current stack
  • Security and Stability: add extra capabilities while ensuring both
  • Elastic Extention: only expansion
  • Open Ecosystem: provides excellent flexibility

3. Use Cases

Now let’s go a bit further into the capabilities and briefly describe each of these use cases in the context of ShardingSphere.

3.1. Sharding

Sharding is the practice of splitting a database into smaller parts called shards, spread across multiple servers. ShardingSphere simplifies this process, allowing developers to distribute their data more effectively, improving their applications’ performance and scalability.

3.2. Distributed Transaction

A transaction may need to alter data on multiple databases in a distributed system. ShardingSphere provides a mechanism for managing these distributed transactions, ensuring data consistency across all databases involved.

3.3. Read/Write Splitting

This is a method of optimizing database access by directing read and write operations to different databases. ShardingSphere can automatically route read operations to replica databases and write operations to the primary database, thus balancing the load and increasing the system’s overall performance.

3.4. DB Gateway

ShardingSphere acts as a database gateway, abstracting the complexities of multiple databases into a unified data interface for the application. This allows developers to interact with various databases as if they were a single entity, simplifying database management.

3.5. Traffic Governance

ShardingSphere allows for fine-grained control over the data traffic in the system. It provides features like data sharding, read/write splitting, and more, which can efficiently distribute the traffic load among various resources.

3.6. Data Migration

ShardingSphere provides support for data migration between shards or databases. It helps smoothly redistribute data when scaling the system by adding or removing database nodes.

3.7. Encryption

ShardingSphere supports automatic data encryption before it’s saved to the database, providing an additional layer of security. This is particularly useful when dealing with sensitive data, such as user passwords or personal identifiable information.

3.8. Data Masking

Data masking is the process of hiding original data with modified content (characters or other data). ShardingSphere supports data masking, which is essential in non-production environments to ensure data privacy.

3.9. Shadow

The Shadow feature in ShardingSphere allows you to test the impact of database updates, new SQL statements, and indexes without affecting the actual production environment. It’s done by routing certain traffic to a shadow database in parallel to the actual database.

3.10. Observability

ShardingSphere provides a mechanism for monitoring the health and performance of your sharded databases. It supports metrics for query tracing, latency tracking, traffic insights, and more, enabling developers to observe and diagnose issues in real-time.

4. Getting Started

In order to introduce such technology and start getting used to it, let’s take an example of a Spring Boot application using Maven.

As mentioned, there are multiple capabilities available in the project. Therefore, to keep it simple, we’ll use only the sharding capability for now. Doing so lets us know how to configure and integrate the solution into our sample application.

4.1. Dependencies

The first step is to add the latest project dependency to our pom.xml:

<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>shardingsphere-jdbc-core</artifactId>
    <version>5.4.0</version>
</dependency>

That enables us to start configuring our data sources to use ShardingSphere.

4.2. Datasource Configuration

Now that we have the dependencies required, we must configure our data sources to use the ShardingSphere JDBC driver. Here we have to define the capabilities we want to use, in this example, the sharding capabilities.

The data of our Order table will be distributed across two MySQL instances based on the mod of the order_id field. To do so, we’ll create a sharding.yml file to hold the necessary configurations and place it under our resources folder:

dataSources:
  ds0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:13306/ds0?serverTimezone=UTC&useSSL=false&useUnicode=true&characterEncoding=UTF-8
    username: test
    password: test
  ds1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:13307/ds1?serverTimezone=UTC&useSSL=false&useUnicode=true&characterEncoding=UTF-8
    username: test
    password: test
rules:
  - !SHARDING
    tables:
      order:
        actualDataNodes: ds${0..1}.order
    defaultDatabaseStrategy:
      standard:
        shardingColumn: order_id
        shardingAlgorithmName: database_inline
    defaultTableStrategy:
      none:
    shardingAlgorithms:
      database_inline:
        type: INLINE
        props:
          algorithm-expression: ds${order_id % 2}
props:
  sql-show: false

Next, we need to configure JPA to use these settings.

4.3. JPA Configuration

Now, we need to connect our JPA/Spring Data setup to our ShardingSphere data sources. Now let’s adjust our application.yml to use the configuration just mentioned:

spring:
  datasource:
    driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
    url: jdbc:shardingsphere:classpath:sharding.yml
  jpa:
    properties:
      hibernate:
        dialect: org.hibernate.dialect.MySQL8Dialect
        ...

For the rest, our application should follow default Spring Data JPA patterns by defining our entities and repositories. For instance, in our case, we can consider the following classes:

@Entity
@Table(name = "`order`")
public class Order {

    @Id
    @Column(name = "order_id")
    private Long orderId;

    @Column(name = "customer_id")
    private Long customerId;

    @Column(name = "total_price")
    private BigDecimal totalPrice;

    @Enumerated(EnumType.STRING)
    @Column(name = "order_status")
    private Status orderStatus;

    @Column(name = "order_date")
    private LocalDate orderDate;

    @Column(name = "delivery_address")
    private String deliveryAddress;

    // ... getter and setters
}

This is the mapping of our Order class, and next, we can also see its respective repository:

public interface OrderRepository extends JpaRepository<Order, Long> { }

As we can observe, standard Spring JPA. No other code change is necessary at this point.

5. Connecting the Dots

With minimal changes, ShardingSphere enabled us to apply a sharding strategy to our table. However, no significant changes were needed in the application. Actually, only configuration changes at the persistence layer were required.

Thanks to the great integration of ShardingSphere with the JDBC drivers, our application can leverage advanced capabilities with nearly no code changes.

6. Conclusion

In this article, we gave our first steps using ShardingSphere. ShardingSphere is a powerful tool for managing and manipulating databases in distributed systems, and it offers a large range of advanced capabilities but abstracts a good amount of its complexity.

As usual, all code samples used in this article are available over on GitHub.