1. Introduction
Generating a large number of files can be helpful when testing functions, performance, and limits. Having a fresh batch ready quickly can be important for streamlining continuous integration or simply not wasting time.
In this tutorial, we explore several fast methods to efficiently create lots of files. First, we use classic loops to implement a solution. After that, we turn to common interpreters. Finally, we consider a standard-based approach.
To verify our results, we clear all buffers and create one hundred thousand (100000) files with every method. In each case, we compare the real component as returned by the time command.
We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.
2. Using Shell Loops
Like other major languages, Bash has for and while loops. In fact, we can use either to create many files using Bash alone.
2.1. Bash for Loop
The Bash-specific for loop construct can designate a number of cycles via a control expression. We use the latter with the $i variable to leverage it as a filename:
$ for ((i=1; i<=100000; i++)); do : >> "$i"; done
Within the loop body, we create a file by redirecting the output of the : colon (null) utility.
Let’s check the performance:
$ time { for ((i=1; i<=100000; i++)); do : >> "$i"; done; }
real 0m3.405s
user 0m1.113s
sys 0m2.244s
At 3.405 s, we can take this as a baseline.
2.2. POSIX while
POSIX defines the while loop construct and behavior. By introducing a control variable and incrementing it, we can make a solution similar to the one with for:
$ i=1; while [ "$i" -le 100000 ]; do : >> "$i"; i=$(($i + 1)); done
Naturally, the times are comparable:
$ time { i=1; while [ "$i" -le 100000 ]; do : >> "$i"; i=$(($i + 1)); done; }
real 0m3.516s
user 0m1.403s
sys 0m2.084s
Actually, considering the minor fluctuations of each run, the results with shell loops are more or less identical. However, there is room for improvement.
3. Using Interpreted Programming Languages
As usual, most universal interpreters can help with our task. Although languages like C might be marginally more efficient, their drawbacks in terms of complexity, compilation time, and much lower flexibility exclude them from our methods.
3.1. Perl
The perl interpreter is itself written in C, so using it in the correct way provides many performance benefits without the same drawbacks.
Using perl, we can create the same type of for loop we had with Bash earlier:
$ perl -e 'for ($i=1;$i<=100000;$i++) { open($f, ">", "$i"); }'
Here, we use the open() system call to create files.
Let’s time this solution:
$ time perl -e 'for ($i=1;$i<=100000;$i++) { open($f, ">", "$i"); }'
real 0m2.260s
user 0m0.354s
sys 0m1.874s
At 2.260 s, we removed more than a second from our previous best. Of these, the interpreter launch takes up around 0.003 s.
3.2. Python
In python, for loops often look different than in Bash:
$ python -c 'for i in range(100000): open(str(i), "w");'
Here, we use the built-in range() function to get all values for our filenames. In each case, we call the built-in Python open() function to create the file.
Timing this, we get an equivalent time to that of Perl:
$ time python -c 'for i in range(100000): open(str(i), "w");'
real 0m2.154s
user 0m0.274s
sys 0m1.850s
Launching Python takes around 0.010s. One other overhead we might consider is the conversion of i to string with str(). However, “$i” in Perl implicitly does the same due to loose variable types.
3.3. Ruby
Finally, ruby has its own syntax for iteration, which looks like the Bash {a..b} brace expansion with ranges:
$ ruby -e '(1..100000).each { |i| File.open(i.to_s, "w") }'
After generating each number, we store it in i, which we convert to a string with to_s. This makes the code similar to that of Perl and Python:
$ time ruby -e '(1..100000).each { |i| File.open(i.to_s, "w") }'
real 0m2.207s
user 0m0.457s
sys 0m1.912s
So, the similar execution time is not a surprise. Still, Ruby takes around 0.060 s to launch, making it the heavier of the three interpreters.
4. Using POSIX Commands
Actually, POSIX implementations can compete with the times above by employing their own toolset:
$ printf '%s ' {1..100000} | xargs touch
In this case, we generate an argument list with printf and brace expansion. After that, we pass each of its elements to touch via xargs.
Because of the minimalistic utilities adhering to the POSIX philosophy of one tool = one job, we get a very efficient outcome despite multiple calls:
$ time printf '%s ' {1..100000} | xargs touch
real 0m2.241s
user 0m0.287s
sys 0m1.898s
At 2.241 s, the time is equivalent to the third-party interpreters we looked at earlier.
5. Summary
In this article, we looked at different ways to generate many files.
In conclusion, even without interpreters, we can generate lots of files quickly and efficiently by just using POSIX-standard tooling.