1. Overview
In this tutorial, we’ll be looking at different ways we can flatten a nested directory in Linux.
2. Nested Directory
A nested directory is a directory that is located inside another directory. This type of directory structure is commonly used to organize files in a hierarchical manner. For example, consider that a company manages its files into different folders depending on the department and function of the department:
$ tree company-data
company_data
|-- engineering
| |-- backend
| | `-- backend-backlog.txt
| `-- frontend
| `-- frontend-backlog.txt
`-- finance
|-- account
| `-- account-book.txt
|-- funding
| `-- funder-profile.txt
`-- risk
`-- risk-assessment.txt
In the example above, the backend and frontend directories are nested within the engineering directory, which in turn, is under the company_data folder.
However, there may be situations where we need to flatten a nested directory. For instance, we want to remove all the nested directories and move the files to a single directory to make it easier to browse. There’re several ways we can flatten nested directories in Linux.
3. Use the find Command With mv or cp Command
The find command is a command-line tool in Linux that can recursively walk a directory and search for files. To flatten a nested directory, we’ll use the find command to recursively search for all the files in the directory. Then, we apply the mv or cp command to the matching files. To demonstrate the method, let’s flatten our company_data directory by moving all the text files into a new flattened directory.
To begin the flattening process, we need to ensure that a new flattened directory exists. We can create this directory by running the mkdir command:
$ mkdir flattened
Next, we use the find command to recursively walk the company_data directory and flatten the directory using the mv command:
$ find company_data -mindepth 2 -type f -exec mv -i '{}' flattened/ ';'
The command specifies several find command options to filter the files that we want to move. For instance, we specify -mindepth 2 to process only files at least two levels deep. Furthermore, we pass the -type f option to process only files, not directories.
We then use the -exec option to specify the command we want to run for each of the files that match, which is the mv command in our example. The find command will substitute the ‘{}’ placeholder on the mv command with each of the matching file paths. Note that as a safety measure, we are passing a -i option to the mv command to prompt the user for confirmation before overwriting any files.
After running the command above, we’ll see all the files at the same level under the flattened directory:
$ tree flattened/
flatten/
|-- account-book.txt
|-- backend-backlog.txt
|-- frontend-backlog.txt
|-- funder-profile.txt
`-- risk-assessment.txt
0 directories, 4 files
Similarly, we can pass the cp command to the -exec instead if we want to make a copy instead:
$ find company_data -mindepth 2 -type f -exec cp -i '{}' flattened/ ';'
4. Use the Glob With mv or cp Command
In Linux, globbing is a path pattern matching mechanism that’s built-in in most of the shell programs. Instead of specifying a fixed path or filename, we can use characters like asterisks, question marks, and curly and square brackets to perform pattern matching. To flatten a nested directory, we can use globbing to match files we want to move and then apply the cp or mv command.
For instance, we can derive a glob pattern to match all the .txt files in our company_data directory and invoke the cp command:
$ cp company_data/*/*/*.txt flattened/
$ ls -l flattened
total 0
-rw-r--r-- 1 root root 0 Mar 4 03:23 account-book.txt
-rw-r--r-- 1 root root 0 Mar 4 03:23 backend-backlog.txt
-rw-r--r-- 1 root root 0 Mar 4 03:23 frontend-backlog.txt
-rw-r--r-- 1 root root 0 Mar 4 03:23 funder-profile.txt
-rw-r--r-- 1 root root 0 Mar 4 03:23 risk-assessment.txt
The glob pattern of company_data/*/*/*.txt will be expanded to match all the .txt files nested within the directories that are 2 levels deep. For each of the matching file paths, the cp command makes a copy on the flattened directory.
On the other hand, if we want to move the files instead of making a copy, we simply run the mv command with the same glob pattern:
$ mv company_data/*/*/*.txt flattened/
5. Using the tar Command
In Linux, archiving command-line tools like tar have the ability to strip away the intermediate directories.
5.1. Using tar Command to Flatten Directory Structure
To flatten a directory structure using the tar command, the first step is to create a tar archive. For instance, to create an archive of the company_data directory, we can use the command:
$ tar -cf company_data.tar company_data
$ ls -l company_data.tar
-rw-r--r-- 1 user user 10240 Mar 4 00:29 company_data.tar
After creating the archive, we can then use the –strip option with the tar -xf command to remove the intermediate directories while unarchiving the file. Specifically, this option removes the preceding directories as specified by its argument. For our nested directory of company_data, we’ll use –strip=3 to remove the preceding 3 directories of all the .txt files:
$ tar -xf company_data.tar --strip=3
$ ls -l
total 12
-rw-r--r-- 1 user user 0 Mar 4 00:21 account-book.txt
-rw-r--r-- 1 user user 0 Mar 4 00:21 backend-backlog.txt
-rw-r--r-- 1 user user 10240 Mar 4 00:29 company_data.tar
-rw-r--r-- 1 user user 0 Mar 4 00:21 frontend-backlog.txt
-rw-r--r-- 1 user user 0 Mar 4 00:21 funder-profile.txt
-rw-r--r-- 1 user user 0 Mar 4 00:21 risk-assessment.txt
5.2. Limitations of Flattening Nested Directories with tar Command
One caveat to this method is that if we file in different levels of the directory, the tar command will only be able to flatten the files in a single level as specified by the –strip option.
Let’s consider the simple example below:
$ tree multilevel
multilevel
|-- level2
| `-- text2.txt
`-- text1.txt
1 directory, 2 files
From the output of the tree command, we can see that there’s one text file directly underneath the multilevel directory. Then, we have another text file under the level2 subdirectory. This structure consists of files in different levels of the nesting files, and flattening it using the tar command trick is not ideal. Let’s see why.
To flatten it on the first level, we pass the –strip 1 option:
$ tar xf multilevel.tar --strip=1
$ ls -lglob
total 16
drwxr-xr-x 2 root root 4096 Mar 4 00:51 level2
-rw-r--r-- 1 root root 10240 Mar 4 00:56 multilevel.tar
-rw-r--r-- 1 root root 0 Mar 4 00:51 text1.txt
Notice how the level2 directory persisted after we unarchive. This is because we only strip away the first level of the directory using the –strip 1 option.
What if we unarchive it using the –strip 2 option instead?
$ tar xf multilevel.tar --strip 2
$ ls -l
total 12
-rw-r--r-- 1 root root 10240 Mar 4 00:56 multilevel.tar
-rw-r--r-- 1 root root 0 Mar 4 00:51 text2.txt
By stripping away the two preceding paths, we manage to flatten the text2.txt. However, notice that the text1.txt is absent. This is because the whole path of the text1.txt is multilevel/text1.txt which is only 1 level. By stripping away the two preceding paths, we are instructing tar to ignore the text1.txt.
Although the tar command is more rigid than the glob and finds related tricks when flattening directories, it can be easier to recall compared to globbing and the find command.
6. Summary
In this tutorial, we explored various methods for flattening a nested directory. Initially, we defined a nested directory as a directory that contains multiple sub-directories that, in turn, consist of files or more directories.
We then demonstrated how the find command could traverse the directory and enable us to execute the mv or cp command to flatten the directory. We also see that globbing provides a means of specifying the file pattern to be flattened. Lastly, we learned that the tar command could be used to flatten nested directories, though it’s a more rigid solution than the previous methods.