1. Introduction
We can fix or replace hardware and buy or reinstall the software. However, what makes a device unique and valuable is its data. Losing that could be a pain or a nightmare. So it comes down to whether we’re ready.
In this tutorial, we discover ways to deal with accidental information loss. First, we refresh our knowledge of storage, partitions, filesystems, and files. Next, we discuss data loss due to storage and partition damage, as well as how to deal with such cases. After that, we focus on damaged filesystems, their analysis, and recovery. We then explore restoring files depending on how and under what conditions they were lost. Finally, we suggest some combined toolsets for complete storage diagnosis and recovery.
We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It is POSIX-compliant and should work in any such environment.
2. Storage, Partitions, Filesystems, Files, and inodes
Physical storage comes in many forms. Importantly, all of them have some kind of controller. In particular, the controller’s role is to organize and optimize:
- input and output (who and when)
- reads and writes (what and where)
An operating system (OS) orders data on the storage via filesystems. How an OS does that depends on its type.
Native Linux filesystems use inodes to store and index information about objects. Every such object with an inode is a file.
For example, we can have directory files and regular files. Directories in Linux are lists of the filename to inode relations. On the other hand, regular files link to an inode, which contains the file information:
- inode number
- permissions
- dates of a file
- content block pointers
- other metadata
Note that the actual file data is in the content blocks (similar to sectors, but logical). However, we won’t know where those are without the pointers in the inode. Furthermore, the actual inode is in an index table.
In addition, filesystems are most often separate partitions, described by a table. They, in turn, can be created and formatted.
Knowing these relations is vital. To recover data, we need to know its type, how it’s stored, and also how it was lost. So let’s begin with the top level.
3. Losing Storage or Partitions
One way to lose data is a hardware fault. This can be due to bad blocks, a damaged controller, or another bad component. Of course, in such cases, data is either gone or could be lost easily. Once a medium is faulty, storing any information on it becomes risky or impossible.
On a storage device, partitions are:
- described in a table
- identified by headers
- with a given format
In particular, we can lose a partition when its headers or the partition table are damaged. This can happen, e.g., due to a malicious actor, rogue software, hardware issues, or mistakes.
Let’s see what we can do to diagnose and undo such issues.
4. Analyze Partitions and Storage
When dealing with storage in general, we can turn to the famous e2fsprogs (Ext2/3/4 Filesystem Programs) package.
In particular, the badblocks tool is handy for finding bad blocks on a device. Indeed, they are usually the first evidence we have of storage damage:
$ umount /dev/sda
$ badblocks -v -sn /dev/sda
Checking for bad blocks in non-destructive read-write mode
From block 1 to 26510666
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: 1.05% done, 4:34 elapsed. (55/0/0 errors)
[...]
Here, we use the -v (verbose), -s (show progress), and -n (non-destructive read-write) flags. Note how we first unmount a partition before scanning. It’s usually best to diagnose when not booting from the target.
Alternatively, we can use S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) in modern storage devices. For example, there are the smartmontools (SMART Monitor Tools):
$ smartctl -l selftest /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-kali4-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 6660 -
After checking the device status, we can also see the partition table via fdisk (Fixed Disk) and its –list (-l) flag:
$ fdisk --list
Disk /dev/sda: 101 GiB, 108587687936 bytes, 212085328 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0xc0666084
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 210088528 210086480 100G 83 Linux
/dev/sda2 210090576 212087376 1996800 975M 82 Linux swap / Solaris
We need to know whether the layout is still readable. It might be lost (blank) or just faulty.
After analyzing the damage, let’s see what we can do to remedy any issues.
5. Storage Rescue
Physical device repairs are outside the scope of this article. Beyond replacing faulty storage, there isn’t much else we can directly do. However, if the medium is still partially readable, we might be able to restore data from it.
Of course, having a full backup of the device is the easiest way to restore everything. Barring that, we should find other means to cope.
For example, the GNU ddrescue (Disc Dump Rescue) tool can dump a raw image of the problematic medium, block by block:
$ ddrescue --idirect --retry-passes=3 /dev/sda dump.img dump.logfile
GNU ddrescue 1.23
Press Ctrl-C to interrupt
ipos: 1597 MB, non-trimmed: 0 B, current rate: 47972 kB/s
opos: 1597 MB, non-scraped: 0 B, average rate: 228 MB/s
non-tried: 273280 MB, bad-sector: 0 B, error rate: 0 B/s
rescued: 1597 MB, bad areas: 0, run time: 6s
pct rescued: 0.58%, read errors: 0, remaining time: 19m
time since last successful read: n/a
Copying non-tried blocks... Pass 1 (forwards)
We use –idirect (-d) to skip kernel caching. In addition, the –retry-passes (-r) flag sets the number of retries on bad sectors. However, this option might be best left for a second pass to avoid further stress a failing device. Importantly, ddrescue uses special algorithms to ensure as little further wear on the medium as possible.
Again, /dev/sda should not be mounted, and the dump.img must be on another device. After this, we just restore this image file to a new medium:
$ ddrescue -f dump.img /dev/sda restore.logfile
Another tool with similar functions is safecopy.
Now that we’ve seen full device rescue, let’s continue one level below.
6. Recover Partitions
Partition tables usually organize storage at the top level. As usual, if we have a backup of a now-defunct partition table, we could and probably should restore it via the scripted version of fdisk, i.e., sfdisk:
$ sfdisk -d /dev/sda > parttable
$ sfdisk /dev/sda < parttable
While the first command dumps the table to parttable, the second restore it from that file.
However, if as the case most often is, we don’t have a backup, there is still hope. The interactive testdisk tool can scan for lost tables:
$ testdisk
[...]
TestDisk 7.1, Data Recovery Utility, July 2019
Christophe GRENIER <[email protected]>
https://www.cgsecurity.org
Disk /dev/sda - 107 GB / 100 GiB - CHS 13054 255 63
Analyse cylinder 5127/13054: 39%
Linux 0 0 1 33418 170 32 536870912
Linux 0 0 1 33418 170 32 536870912
Linux 0 0 1 33418 170 32 536870912
Linux 0 0 1 33418 170 32 536870912
Linux 0 0 1 33418 170 32 536870912
Linux 0 0 1 33418 170 32 536870912
Linux 0 0 1 33418 170 32 536870912
Linux Swap 538 123 34 799 144 41 4194296
Linux 0 0 1 33418 170 32 536870912
Linux 0 0 1 33418 170 32 536870912
Linux 0 0 1 33418 170 32 536870912
Linux 0 0 1 33418 170 32 536870912
Stop
[...]
Here, testdisk found old tables and entries from them. After this, we can also apply any of these to our setup. In addition, there is an option to rewrite the MBR.
Partitions are just isolated “rooms” within a medium. However, they often have a format that identifies them unless it’s corrupted.
7. Filesystem Corruption and File Issues
Since filesystems are basically a format for data, they use ordered metadata. Messing with it can cause problems:
- invalid files, i.e., hard links to lost inodes
- inaccessible valid files, i.e., no hard links to existing inodes
- a system unable to boot
- impossible mounting or recognition of filesystems
- other unexpected behavior
In fact, what issues we see depends strongly on the filesystem type, its structures, and what is lost.
For instance, pointers within inodes refer to what most would call data, e.g., file contents. Losing or damaging an inode can invalidate content block pointers and leave files only as shell filenames. They would contain full metadata but no content.
On the other hand, data loss within the blocks can be permanent, but how it happens is important. Whether lost data is found depends on many factors, including:
- how the information was lost
- whether we lost data or metadata pointing to it
- whether there’s still metadata pointing to our data
- storage type
- device usage since the loss
- filesystem state
Let’s concentrate on the last point.
8. Filesystem Analysis and Recovery
Of course, a standard way to check filesystems is the fsck (Filesystem Consistency Check) tool:
$ fsck /dev/sda1
fsck from util-linux 2.37.2
e2fsck 1.46.4 (18-Aug-2021)
/dev/sda1: clean, 11/6553600 files, 557848/26510666 blocks
Note that we specify a partition /dev/sda1 instead of the whole storage device /dev/sda. Even if we used /dev/sda, fsck would just go through the filesystems one by one.
Just like testdisk, fsck combines both scanning and recovery. To avoid the latter, we can add -n, preventing any changes. While there are tools with similar functions under Linux, e.g., ext4magic, fsck is ubiquitous.
So, if our storage is in order, partition tables – correct, and filesystems – intact, we might just be missing a file. Where has it gone, and how do we get it back?
9. Data and File Discovery and Recovery
First, always backup critical data.
Second, minimize the medium usage after data loss. This includes manual activity and automated tasks like defragmentation, scanning, cronjobs, and OS processes.
We should always establish what we are missing. Whether it’s a single regular file, pieces of one, many separate files, or a whole directory. We then find out or speculate about how it was lost, checking logs and scanning.
In fact, scanning can be the most precise or the most general method. So it depends on its type.
9.1. lost+found
Normally, when all hard links to an object are deleted, its inode is invalidated. However, if this doesn’t occur, the lost+found directory could contain a placeholder file, pointing to the still valid but inaccessible inode.
For this to happen, we should run fsck beforehand to find “decapitated” inodes.
9.2. Data Recognition
On the other hand, losing an inode might present bigger issues since we may lose sight of where the file contents are. Scanning may be able to partially or fully restore files based on their format.
However, there are many caveats. For example, fragmentation, i.e., how dispersed the file blocks are, plays a significant role. That’s because fragments are linked by the information in the inode. Furthermore, it depends on the software how many and which formats we would be able to identify.
For instance, the photorec tool, despite its name, recognizes more than 480 extensions. Conveniently, it’s part of the testdisk package and thus has a similar interactive command-line interface.
Alternatively, there is the foremost tool by the Air Force Office of Special Investigations. However, not many tools are as actively maintained as photorec.
9.3. Corrupted File Information
Indeed, we could have a combination of factors that affect file consistency:
- bad storage blocks, affecting a file’s hard links, content, or inode
- filesystem issues, leading to missing, partial, or replaced inodes
- accidental file damage, deletion, or overwriting
We then should employ all of what we discussed to deal with the issues one by one. To make this easier, there are free packages available.
In fact, they often include the standard tools we explored but also other ones for specific cases.
10. Bootable Toolsets
As we already mentioned, it’s best to minimize activity with problematic storage. Because of this, we usually want to clone the medium with issues and work with that clone. If not possible, we should at least unmount.
However, our OS is often running from the same device we are trying to work on, so we can’t use any tools within our environment. To that end, there are packaged images, which contain rescue toolsets preinstalled in a bootable environment:
- Trinity Rescue Kit, a Linux environment primarily oriented towards Windows recovery
- Hiren’s Boot CD, a universal Windows portable environment with lots of tools
- Ultimate Boot CD
- System Rescue, a Linux system rescue toolkit
Using these is as easy as getting the image, writing it to a medium, and booting from that medium. Indeed, not booting from a device with lost or damaged data drastically increases the chances of restoring it.
11. Summary
In this article, we looked at many ways data can be lost, damaged, or deleted. We also discussed diagnosis, repair, and recovery in all cases.
In conclusion, there are many tools to restore lost data, but we must analyze when and what caused the loss before taking steps.