理解和管理Linux中的自动RAID重新同步

1. Overview

Redundant Array of Independent Disks (RAID) configurations are widely used in Linux systems to provide redundancy and improve performance. Moreover, RAID is a technology that combines multiple physical hard drives into a single logical unit called a RAID array to enhance data redundancy, performance, or a combination of both.

However, sometimes we might encounter situations where our RAID array undergoes an automatic resynchronization, commonly known as resync.

In this tutorial, we’ll explore the reasons behind this phenomenon and discuss various methods to manage and disable automatic resyncs.

2. Automatic RAID Resynchronization Reasons

RAID resync is a process that ensures data integrity and redundancy in the event of disk failures or system crashes. Moreover, it rebuilds and synchronizes data across the RAID array to maintain redundancy and data consistency.

There are many reasons that trigger a resync operation:

disk replacement
system reboot
scheduled checks

In this section, we’ll explore these reasons in detail.

2.1. Disk Replacement

A prevalent scenario for an automatic resync is when a new disk integrates into the RAID array. This typically transpires when we replace a failed disk with a new one.

Let’s check the status of a RAID array:

$ sudo mdadm --detail /dev/md0

To grant us temporary superuser privileges for executing the subsequent command, we use the sudo command. Next, we utilize the mdadm command that manages and monitors the software RAID arrays in Linux. Finally, the –detail option expands the output with extra details.

Let’s have a closer look at the output:

...
Number   Major   Minor   RaidDevice State
   0       8        2        0      active sync   /dev/sda2
   1       8       18        1      active sync   /dev/sdb2
   2       8       34        2      active resync /dev/sdc2

In this example, we observe that three devices, sda, sdb, and sdc, are active in the RAID array. Under the State column, we might see one of the devices showing resync as its state. For example, unlike sda and sdb, sdc is resyncing.

2.2. System Reboot

Following a system reboot, the RAID array may undergo an automatic resync to ensure data consistency.

Let’s monitor the resync status:

$ cat /proc/mdstat

We use the cat command to display the content of the /proc/mdstat file from the /proc pseudo-filesystem. In essence, this file holds valuable information regarding active software RAIDs.

Next, we move on to the output of the above command:

md0 : active raid1 sdb2[1] sda2[0]
      1953511936 blocks super 1.2 [2/2] [UU]
      [==>..................]  resync = 12.5% (244618752/1953511936) finish=149.0min speed=76524K/sec
      bitmap: 15/15 pages [60KB], 65536KB chunk

Here, the output provides detailed information about the RAID array called md0. In particular, it indicates that the resync is 12.5% complete, with a projected finish time of 149.0 minutes, and it’s currently operating at a speed of 76,524 KB/sec.

2.3. Scheduled Checks

Aside from hardware changes, scheduled tasks like cron jobs and systemd timers can trigger resyncs.

For example, the mdcheck tool performs periodic checks to ensure data integrity:

$ systemctl start mdcheck_start

The above code snippet can lead to a resync operation using the systemctl command, which might take an extended period, especially with large amounts of data.

3. Managing RAID resync

In this section, we’ll understand how to manage RAID resync operations by using mdadm.

3.1. Forcing resync

Sometimes, there may be instances when we need to manually force the resync process.

Accordingly, we achieve this by using the mdadm tool:

$ sudo mdadm --stop /dev/md0
mdadm: stopped /dev/md0

In particular, –stop deactivates or stops a running RAID array. Thus, the specified array is deactivated. Moreover, this means it’s no longer accessible or actively used by the system. Finally, the underlying devices that make up the array return to their normal, individual states.

Now, let’s force a manual resync:

$ sudo mdadm --assemble --run --force --update=resync /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdb3
mdadm: /dev/mdN has been started with 3 drives.

In this example, we started the RAID array again after forcing a resync with three (3) drives:

–assemble the RAID array
–run starts the array if it’s not already running or forces the array to be operational
–force ignores potential issues and forcibly assembles the array
after array assembly, –update=resync performs a resync operation to ensure data consistency and redundancy

Finally, let’s validate the above operations:

$ cat /proc/mdstat
[===>.................]  resync = 89.7% (414656/2095040) finish=0.2min speed=138218K/sec

The above message indicates that the RAID array is resynchronizing data to ensure integrity and redundancy.

3.2. Speeding Up resync

On the contrary, if we want to expedite the resync process, we increase the resync speed:

$ sudo echo 100000 > /proc/sys/dev/raid/speed_limit_min
$ sudo echo 200000 > /proc/sys/dev/raid/speed_limit_max

Here, we set the minimum speed limit to 100000 KB/s and the maximum as 200000 KB/s. This can vary from one environment to the other based on the capabilities.

3.3. Disabling Through mdadm

Furthermore, we can also disable resync using mdadm by setting the write-intent bitmap to none:

$ sudo mdadm --grow --bitmap=none /dev/md0

This command disables the write-intent bitmap, which tracks regions of the array that need resyncing.

3.4. Stopping mdcheck

To stop the timer responsible for automatic resyncs, we’ll need to identify the specific timer unit associated with RAID checks:

$ systemctl list-timers
NEXT                         LEFT          LAST                         PASSED       UNIT                         ACTIVATES
Mon 2023-10-31 02:22:14 UTC  1h 3min left  Sun 2023-10-30 02:22:14 UTC  22h ago      systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Sun 2023-11-05 22:17:28 PDT 2 weeks 5 days left Sun 2023-06-05 21:31:43 PDT 1 day 10h ago mdcheck_start.timer            mdcheck_start.service

Next, stopping the timer halts scheduled RAID checks, which include potential resync operations.

In essence, this should be done with caution and only if we have a specific reason for doing so:

$ sudo systemctl stop mdcheck_start.timer
$ systemctl list-timers
NEXT                         LEFT          LAST                         PASSED       UNIT                         ACTIVATES
Mon 2023-10-31 02:22:14 UTC  1h 3min left  Sun 2023-10-30 02:22:14 UTC  22h ago      systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service

In the above snippet, we stopped the mdcheck_start timer. Then, we validated that it no longer exists in the list of timers.

Disabling the timer means that scheduled consistency checks, which ensure data integrity, no longer occur. In addition, this could lead to undetected data inconsistencies or errors resulting in potential data loss.

4. resync Theory

To ensure issues are identified and addressed on time, mdcheck is usually scheduled to run at least once a month.

While consistency checks are essential for data safety, it’s important to note that they’re different from resync operations. resync involves the reconstruction and synchronization of data on the RAID array. They’re triggered by hardware changes or system events. Moreover, this is a safety measure to prevent potential data loss.

Finally, regular checks and monitoring of the status of our RAID array are often good practices for keeping our data safe.

5. Conclusion

In this article, we got a comprehensive overview of automatic RAID resyncs.

We learned that while it’s possible to manage and even disable resync processes, it’s good to have proper backups in place before making significant changes to the RAID configuration.

Finally, we saw that each method serves a specific purpose, and it’s important to select the most appropriate one for our particular scenario.

Persistence

REST

Security