1. Overview

In any system, files keep accumulating through manual user activities or system processes. Unfortunately, disk space is a limited resource. So, the disk space on any system is always at risk if an appropriate file cleanup process is not in place.

In this tutorial, we’ll learn how to do file cleanup activity by focusing on removing files older than N days, followed by how to automate the process.

2. Scenario Setup

Let’s start by creating a temporary directory structure for our scenario:

$ mkdir -p old_files/dir{1,2,3,4,5}
$ directories=(dir1 dir2 dir3 dir4 dir5)
$ for dir in "${directories[@]}"
do 
    tmp_file=$(TMPDIR=old_files/${dir}/ mktemp); 
    echo "$tmp_file"; 
done
old_files/dir1/tmp.GxKQcVkGal
old_files/dir2/tmp.urtamGpmAq
old_files/dir3/tmp.CzVSFfbiy6
old_files/dir4/tmp.3mq8yfEwix
old_files/dir5/tmp.SAZwyigT6e

Herein, we used the mktemp command to create a temporary file under each directory.

Next, let’s use the exa command to inspect the files and directories in a tree view:

$ exa --tree old_files/
old_files
├── dir1
│  └── tmp.GxKQcVkGal
├── dir2
│  └── tmp.urtamGpmAq
├── dir3
│  └── tmp.CzVSFfbiy6
├── dir4
│  └── tmp.3mq8yfEwix
└── dir5
   └── tmp.SAZwyigT6e

Further, let’s use the stat command with the –format option to display the timestamps of creation, last modification, and last access for different files:

$ find old_files/ -type f | xargs -n1 stat --format="%n: created:%W, lastModified:%Y, lastAccess:%Y"
old_files/dir4/tmp.3mq8yfEwix: created:1682484289, lastModified:1682484289, lastAccess:1682484289
old_files/dir2/tmp.urtamGpmAq: created:1682484289, lastModified:1682484289, lastAccess:1682484289
old_files/dir3/tmp.CzVSFfbiy6: created:1682484289, lastModified:1682484289, lastAccess:1682484289
old_files/dir5/tmp.SAZwyigT6e: created:1682484289, lastModified:1682484289, lastAccess:1682484289
old_files/dir1/tmp.GxKQcVkGal: created:1682484289, lastModified:1682484289, lastAccess:1682484289

Finally, we must also figure out how to tweak the last modified and last accessed timestamp of files because all the existing timestamps are just a few minutes in the past, and we need them to be many days older. So, to solve this puzzle, let’s use the touch command with the -m and -a options to modify the last modified and last accessed timestamp, respectively, for one of the files:

$ touch -m -t 202112150830  old_files/dir1/tmp.GxKQcVkGal
$ touch -a -t 202212150830 old_files/dir1/tmp.GxKQcVkGal
$ stat --format="%n: created:%w, lastModified:%y, lastAccess:%x" old_files/dir1/tmp.GxKQcVkGal
old_files/dir1/tmp.GxKQcVkGal: created:2023-04-25 21:44:49.822628000 -0700, lastModified:2021-12-15 08:30:00.000000000 -0800, lastAccess:2022-12-15 08:30:00.000000000 -0800

We must note that we’re using this tweak to simulate our use case only, and it’s usually not recommended to do this on production systems. Moreover, it’s impossible to change the file’s creation time using this approach.

In the following section, we’ll reuse this approach to simulate different strategies to solve our use of deleting files older than N Days.

3. Using find With xargs

First, let’s use the touch command to set the last modified timestamp of files under the dir1/ and dir2/ directories a few days in the past:

$ touch -m -d '15 April 2023' old_files/dir1/tmp.GxKQcVkGal
$ touch -m -d '20 April 2023' old_files/dir2/tmp.urtamGpmAq
$ stat --format="%n: created:%w, lastModified:%y, lastAccess:%x" old_files/dir1/tmp.GxKQcVkGal old_files/dir2/tmp.urtamGpmAq
old_files/dir1/tmp.GxKQcVkGal: created:2023-04-25 21:44:49.822628000 -0700, lastModified:2023-04-15 00:00:00.000000000 -0700, lastAccess:2022-12-15 08:30:00.000000000 -0800
old_files/dir2/tmp.urtamGpmAq: created:2023-04-25 21:44:49.831628000 -0700, lastModified:2023-04-20 00:00:00.000000000 -0700, lastAccess:2023-04-25 21:44:49.831628000 -0700

Now, let’s check the current time of the system and use the find command to verify that we’re able to find the two files based on the number of days that passed since the last modification:

$ date
Tue Apr 25 23:34:04 PDT 2023
$ find old_files/ -type f -mtime 10
old_files/dir1/tmp.GxKQcVkGal
$ find old_files/ -type f -mtime 5
old_files/dir2/tmp.urtamGpmAq

At this point, it’s important to note that we used the -mtime with an exact value of 10 and 5 days, respectively, to find files whose last modified time is exactly N days. However, when we need to find all files older than N days, we must use the -mtime option with a value of +.

Next, let’s use this logic to find files older than N=9 and N=4 days:

$ find old_files/ -type f -mtime +9
old_files/dir1/tmp.GxKQcVkGal
$ find old_files/ -type f -mtime +4
old_files/dir2/tmp.urtamGpmAq
old_files/dir1/tmp.GxKQcVkGal

The result looks as expected, as there are two files older than four days, but only one is older than nine days.

Additionally, we can use the xargs command to execute the rm command for each file older than N=4 days:

$ find old_files/ -type f -mtime +4 | xargs -n1 rm
$ find old_files/ -type f -mtime +4
# no output: zero files older than 4 days

Great! We’ve got this right. However, before moving forward to the next section, let’s restore the deleted files:

$ touch old_files/dir1/tmp.GxKQcVkGal
$ touch old_files/dir2/tmp.urtamGpmAq

4. Using find With -exec

Based on the use case, we could also remove files not accessed for more than N days. Let’s go ahead and use the -a option of the touch command to change the last accessed timestamp of the files under the old_files/dir1 and old_files/dir2 directories:

$ touch -a -d '15 April 2023' old_files/dir1/tmp.GxKQcVkGal
$ touch -a -d '20 April 2023' old_files/dir2/tmp.urtamGpmAq
$ stat --format="%n: created:%w, lastModified:%y, lastAccess:%x" old_files/dir1/tmp.GxKQcVkGal old_files/dir2/tmp.urtamGpmAq
old_files/dir1/tmp.GxKQcVkGal: created:2023-04-26 00:06:07.021100006 -0700, lastModified:2023-04-26 00:06:07.022100006 -0700, lastAccess:2023-04-15 00:00:00.000000000 -0700
old_files/dir2/tmp.urtamGpmAq: created:2023-04-26 00:05:32.334353003 -0700, lastModified:2023-04-26 00:05:32.334353003 -0700, lastAccess:2023-04-20 00:00:00.000000000 -0700

Next, we can use the find command with the -atime option to list down the files that aren’t accessed for more than N=4 days:

$ date
Wed Apr 26 00:12:10 PDT 2023
$ find old_files/ -type f -atime +4
old_files/dir2/tmp.urtamGpmAq
old_files/dir1/tmp.GxKQcVkGal

The output looks satisfactory, confirming our understanding of the concept. So, let’s go ahead and use the -exec action of the find command to delete these files as part of the search itself:

$ find old_files/ -type f -atime +4 -exec rm {} \;
$ find old_files/ -type f -atime +4
# no output -> zero files that aren't accessed for longer than 4 days

Perfect! We’ve got one more approach to solving our use case. Like earlier, let’s restore the deleted files before moving forward.

5. Using find With -delete

Let’s start by modifying the last modified timestamp of files to simulate the scenario of files older than N days:

$ touch -m -d '15 April 2023' old_files/dir1/tmp.GxKQcVkGal
$ touch -m -d '20 April 2023' old_files/dir2/tmp.urtamGpmAq
$ stat --format="%n: created:%w, lastModified:%y, lastAccess:%x" old_files/dir1/tmp.GxKQcVkGal old_files/dir2/tmp.urtamGpmAq
old_files/dir1/tmp.GxKQcVkGal: created:2023-04-26 00:18:23.499571013 -0700, lastModified:2023-04-15 00:00:00.000000000 -0700, lastAccess:2023-04-26 00:19:23.853487013 -0700
old_files/dir2/tmp.urtamGpmAq: created:2023-04-26 00:18:28.089949001 -0700, lastModified:2023-04-20 00:00:00.000000000 -0700, lastAccess:2023-04-26 00:19:28.966449002 -0700
$ find old_files/ -type f -mtime +4
old_files/dir2/tmp.urtamGpmAq
old_files/dir1/tmp.GxKQcVkGal

Now, we can use the -delete action with the find command to execute the deletion of files that are older than N=4 days:

$ find old_files/ -type f -mtime +4 -delete
$ find old_files/ -type f -mtime +4
# no output -> # no output: zero files older than 4 days

We must remember to position the -delete action at the end because it needs to execute after all search criteria are matched.

6. Using logrotate

Another technique to delete files older than N days is using the logrotate utility. Although logrotate has an extensive purpose and supports a lot many options, for our use case, we’re particularly interested in the age of the files. So let’s use the maxage parameter to define a minimal configuration in the delete-old-files.conf file:

$ cat delete-old-files.conf
/root/old_files/*/* {
    maxage 4
}

Next, let’s modify the last modified timestamp of a few files within the old_files directory and find out which files are older than N=4 days:

$ touch -m -d '20 April 2023' old_files/dir2/tmp.urtamGpmAq
$ touch -m -d '15 April 2023' old_files/dir1/tmp.GxKQcVkGal
$ find old_files/ -type f -mtime +4
old_files/dir2/tmp.urtamGpmAq
old_files/dir1/tmp.GxKQcVkGal

Finally, let’s run the logrotate command to delete the eligible files:

$ logrotate -f delete-old-files.conf
$ find old_files/ -type f -mtime +4
# no output -> zero files older than N=4 days

It looks like we’ve figured this out correctly. Further, we must note that we used the -f (force) option because the files are empty, and logrotate won’t rotate smaller files having a size of less than 200 MB.

7. Using tmpreaper

We can use the tmpreaper utility to clean up files based on their last accessed time. However, the utility doesn’t come preinstalled with most Linux distros, so we need to do an installation:

$ apt-get install tmpreaper

Now, let’s set the last accessed time of a few files under the old_files/ directory in the past:

$ touch -a -d '20 April 2023' old_files/dir2/tmp.urtamGpmAq
$ touch -a -d '15 April 2023' old_files/dir1/tmp.GxKQcVkGal

Next, let’s run the tmpreaper command with the –test option to perform a dry run of cleaning up the files under the old_files directory:

$ tmpreaper --test 4d  /root/old_files/
(PID 35526) Pretending to clean up directory `/root/old_files/'.
(PID 35527) Pretending to clean up directory `dir4'.
(PID 35527) Back from recursing down `dir4'.
(PID 35527) Pretending to clean up directory `dir2'.
Pretending to remove file `dir2/tmp.urtamGpmAq'.
(PID 35527) Back from recursing down `dir2'.
(PID 35527) Pretending to clean up directory `dir3'.
(PID 35527) Back from recursing down `dir3'.
(PID 35527) Pretending to clean up directory `dir5'.
(PID 35527) Back from recursing down `dir5'.
(PID 35527) Pretending to clean up directory `dir1'.
Pretending to remove file `dir1/tmp.GxKQcVkGal'.
(PID 35527) Back from recursing down `dir1'.

From the output, we can observe that tmpreaper attempts to remove the dir1/tmp.GxKQcVkGal and dir2/tmp.urtamGpmAq files. This matches our expectations.

Finally, let’s see this in action by calling the tmpreaper command without the –test option:

$ tmpreaper 4d /root/old_files/
$ find old_files/ -type f -atime +4
# no output -> zero files with last accessed time more than 4d

It looks like we’ve nailed this!

8. Using agedu

Another approach to deleting stale files is using the agedu utility to scan a directory conditionally based on the last modification timestamp of files. After that, we can use the rm command to delete the files that match the criteria.

First, we need to install the agedu utility as it doesn’t come preinstalled:

$ apt-get install agedu

Next, let’s scan the old_files/ directory using the –age and –mtime criteria and create an index for only those files that are older than N=4 days:

$ agedu --scan old_files/ --age +4d --mtime --depth 5
Built pathname index, 8 entries, 767 bytes of index
Faking directory atimes
Building index
Final index file size = 1432 bytes

On completion of the scan, an index file agedu.dat that should mention the stale files is generated. Let’s use the agedu utility to see the content of the index file on stdout using the –dump and –file options:

$ agedu --dump --file agedu.dat
agedu dump file. pathsep=2f
4096 1682508129 /root/old_files
4096 1682503578 /root/old_files/dir1
0 1681542000 /root/old_files/dir1/tmp.GxKQcVkGal
4096 1682503585 /root/old_files/dir2
0 1681974000 /root/old_files/dir2/tmp.urtamGpmAq
4096 1682499231 /root/old_files/dir3
4096 1682499231 /root/old_files/dir4
4096 1682499231 /root/old_files/dir5

We must note that the agedu command groups the indexed files under the parent directories. As a result, we see directory names over there.

Lastly, we can use the awk command to extract the file paths from the third column and subsequently use the xargs command to delete them:

$ agedu --dump --file agedu.dat \
| awk 'NR>1{print $3}' \
| xargs -i sh -c 'test -f {} && echo {} && rm {}'
old_files/dir1/tmp.GxKQcVkGal
old_files/dir2/tmp.urtamGpmAq

That’s it! We removed the stale files successfully. Further, we must note that we used {} as a placeholder for the output from the awk command.

9. Scheduling Auto-deletion

By now, we’ve learned multiple approaches to deleting files older than N days and verified that they work as expected. So, it’s time to automate the process to ensure the good health of the filesystem. To automate, we can use the crontab -e command to add a cron job with a daily execution frequency.

Let’s see the cron expressions for the different approaches that we’ve explored so far:

@daily find ~/old_files/ -type f -mtime +4 | xargs -n1 rm
@daily find ~/root/old_files/ -type f -atime +4 -exec rm {} \;
@daily find ~/old_files/ -type f -mtime +4 -delete
@daily logrotate -f ~/delete-old-files.conf
@daily tmpreaper 4d ~/old_files/
@daily agedu --scan old_files/ --age +4d --mtime --depth 5 && agedu --dump --file agedu.dat | awk 'NR>1{print $3}' | xargs -i sh -c 'test -f {} && rm {}'

Lastly, we must remember that, for smooth execution and to avoid possible conflicts or race conditions, we must choose only one approach.

10. Conclusion

In this article, we learned how to delete files older than N days based on the file’s last accessed and modified timestamps.

Amongst several approaches, the most standard approach involved using the find command with one of its actions, such as -exec and -delete, or execution of the rm command for each file via xargs. Furthermore, we also explored a few interesting utilities, such as logrotate, tmpreaper, and agedu, for cleaning up the stale files.

Lastly, we learned about the cron expressions to automate the cleanup process by setting up cron jobs for all the approaches.