1. Overview
In this tutorial, we’ll look at the “argument list too long” problem, often encountered while working with a large number of files. First, we’ll discuss what’s causing it. Then, we’ll discuss a few solutions that will help us to solve this issue.
2. What Causes the Error
Let’s consider a case where we have a large number of files residing within a directory:
$ ls -lrt | wc -l
230086
$ ls -lrt | tail -5
-rw-r--r-- 1 shubh shubh 0 Apr 30 14:02 events2120038.log
-rw-r--r-- 1 shubh shubh 0 Apr 30 14:02 events2120040.log
-rw-r--r-- 1 shubh shubh 0 Apr 30 14:02 events2120039.log
-rw-r--r-- 1 shubh shubh 0 Apr 30 14:02 events2120042.log
-rw-r--r-- 1 shubh shubh 0 Apr 30 14:02 events2120041.log
Here, we have over 230K log files in our directory. Let’s try to get the count of all filenames that start with the string ‘events’:
$ ls -lrt events* | wc -l
-bash: /usr/bin/ls: Argument list too long
0
Notably, the command fails, citing “Argument list too long” as the reason. Let’s try the rm command to get rid of these files:
$ rm -rf events*.log
-bash: /usr/bin/rm: Argument list too long
Again, the command fails for the same reason.
While performing filename expansion, Bash expands the asterisk (*) with every matching file. In effect, this produces a very long list of command-line arguments that Bash isn’t able to handle.
When the number of files to be expanded as arguments is larger than the arguments buffer space, Bash fails to handle it. Note that this buffer is shared with the environment space info, so the real available space is smaller than this buffer size.
The rm command in the previous example expands to:
$ rm -rf events2120038.log events2120040.log ... events0000001.log
Here, the argument list becomes equal to the number of files in the directory. In our case, this is over 230K files, which makes for a lot of arguments. We can utilize the getconf command to get the current system limits:
$ getconf ARG_MAX
2097152
The ARG_MAX argument controls the maximum space requirements for the exec family of functions. This helps the kernel to determine the largest buffer it needs to allocate. These limits can also be verified using the xargs command:
$ xargs --show-limits
Your environment variables take up 2504 bytes
POSIX upper limit on argument length (this system): 2092600
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2090096
Size of command buffer we are actually using: 131072
Maximum parallelism (--max-procs must be no greater): 2147483647
The information of prime interest here is the ‘upper limit on argument length’, which may vary from system to system.
3. Overcoming the Limitation
Let’s dive into various approaches we can utilize to solve this problem. What all the proposed solutions have in common is they avoid parameter expansion.
3.1. Using the find Command
We can iterate on the list of files using the find command and then use either the exec option or the xargs command:
$ find . -iname "events*" | xargs ls -lrt | wc -l
230085
First, we fetch the list of all files starting with the word “events” using the find command. Then, we use the xargs command to accept the list of files from stdin, and finally, we execute the ls and wc commands over the list of files provided by xargs.
3.2. Using the for Loop Approach
Another interesting approach is to iterate on the files using the for loop:
$ for f in events*; do echo "$f"; done | wc -l
230085
This is one of the simplest techniques to solve the issue. Note that this solution can be a bit slower, though.
3.3. Manual Split
We can split the files into smaller bunches and execute the commands (such as rm, cp, mv, wc, ls) repeatedly with a different set of strings as arguments each time:
$ ls -lrt events1*.log | wc -l
31154
$ ls -lrt events2*.log | wc -l
15941
Here, we’re filtering only the file names starting with “events1“. In this particular example, we stay within the space requirements controlled by the ARG_MAX value.
Then, we do the same with those starting with “events2“, and so on.
3.4. When We Just Need to Remove the Content of a Directory
Consider a case where we are trying to get rid of all files in a directory and it fails:
$ rm -rf *
-bash: /usr/bin/rm: Argument list too long
To tackle this problem, we can alternatively just delete the directory and create it again:
$ rm -rf /home/shubh/tempdir/logs_archive
$ cd home/shubh/tempdir && mkdir logs_archive
In this case, the logs_archive directory contained the files we wanted to delete.
Note that since we’re deleting the directory and creating it again, this approach won’t preserve the original permissions or ownership of the directory.
4. Conclusion
In this tutorial, we looked at multiple techniques to address the “argument list too long” issue.
First, we discussed what’s causing this error. Then, we learned various solutions that can be utilized to solve the problem both in general and in particular cases.