1. Overview
In Bash, functions usually execute within the same shell process from which they’re called. Nonetheless, some situations cause a function to run as a subprocess.
In this tutorial, we’ll learn how functions behave within the context of the calling process. We’ll also understand the cases in which a function runs as a subprocess.
2. Functions and Subprocesses
By default, a function in Bash runs in the same context as the shell process that invoked it. So, if we call a function from the current shell or from within a script, it shares the same environment variables and resources as the current shell or script, respectively.
This is very different from how a subprocess behaves. Generally, functions don’t spawn a subshell and they don’t run as standalone processes.
To understand this point, let’s create a script named function_scope.sh:
$ cat function_scope.sh
#!/usr/bin/env bash
my_function() {
var="Function Variable"
echo "Inside the function: var = $var"
echo "Inside the function: PID = $BASHPID"
}
my_function
echo "Outside the function: var = $var"
echo "Outside the function: PID = $BASHPID"
The script begins with a shebang directive and implements several steps:
- define a function named my_function()
- set a variable named var inside the function and print its value in addition to the current process ID via the BASHPID environment variable
- call the function from within the script
- print the value of the var variable and the current process ID outside the function
Notably, we don’t set var as a local variable. This means that var should remain accessible after the function call since we don’t expect the function to run as a subprocess.
Let’s grant the script execute permissions using chmod:
$ chmod u+x function_scope.sh
Finally, we run the script:
$ ./function_scope.sh
Inside the function: var = Function Variable
Inside the function: PID = 2476
Outside the function: var = Function Variable
Outside the function: PID = 2476
Importantly, the value of the var variable remains the same after the function call. Likewise, PID has the value 2476 regardless of whether the value of the BASHPID variable is printed inside or outside the function. This shows that no subshells are spawned. Let’s understand why.
3. Running a Function as a Subprocess
If a function were to run as a subprocess, the changes made inside it, such as setting the var variable in our previous example, wouldn’t be accessible by the parent process. Also, the process ID of the subshell would be different from that of the parent process.
Although functions don’t normally run as subprocesses, there are several situations where a function requires a subprocess to run:
- the function’s body is defined within a subshell
- the function is defined entirely inside a subshell
- the function runs in the background
- the function runs in a pipeline
In the first two cases, it’s the subshell, not the function, which spawns a new process. Likewise, running a function in the background using the ampersand (&) sign causes it to run asynchronously as a subprocess. Moreover, functions and commands that run in a pipeline execute as subprocesses as this is the expected behavior of a pipeline.
Let’s explore each of these cases.
3.1. Defining the Function Body Within a Subshell
We can run a function as a subprocess by introducing a slight modification to its definition. In particular, by enclosing the function’s body within parentheses instead of curly braces, we invoke a subshell. This way, the function’s content runs within the subshell as a separate process.
Let’s create a new script named subshell_function.sh that implements this change:
$ cat subshell_function.sh
#!/usr/bin/env bash
my_function() (
var="Function Variable"
echo "Inside the function: var = $var"
echo "Inside the function: PID = $BASHPID"
)
my_function
echo "Outside the function: var = $var"
echo "Outside the function: PID = $BASHPID"
Notably, the only difference in this script compared to function_scope.sh is that the body of my_function() is defined within a subshell. Therefore, the var variable appearing inside the function won’t be accessible outside of it.
Again, we grant the script execute permissions:
$ chmod u+x subshell_function.sh
Now, let’s run the script:
$ ./subshell_function.sh
Inside the function: var = Function Variable
Inside the function: PID = 2842
Outside the function: var =
Outside the function: PID = 2841
In this case, var isn’t defined outside the scope of the function. Also, the process ID of the spawned shell is 2842, whereas that of the parent process or script is 2841.
3.2. Running a Function Inside a Subshell
Another situation where the function’s context differs from that of the calling script is when we define the function entirely inside a subshell. The subshell creates a separation between the function and the rest of the script.
To illustrate this point, let’s set up a script named function_in_subshell.sh:
$ cat function_in_subshell.sh
#!/usr/bin/env bash
(
my_function() {
var="Function Variable"
echo "Inside the function: var = $var"
echo "Inside the function: PID = $BASHPID"
}
my_function
echo "Inside the subshell: var = $var"
echo "Inside the subshell: PID = $BASHPID"
)
echo "Outside the subshell: var = $var"
echo "Outside the subshell: PID = $BASHPID"
In this script, we wrap my_function() within a subshell indicated by parentheses. Additionally, we print the value of var and the current process ID at three locations sequentially:
- inside the function
- inside the subshell after calling the function
- outside the subshell
Next, we grant the script execute permissions:
$ chmod u+x function_in_subshell.sh
Finally, we run the script:
$ ./function_in_subshell.sh
Inside the function: var = Function Variable
Inside the function: PID = 2883
Inside the subshell: var = Function Variable
Inside the subshell: PID = 2883
Outside the subshell: var =
Outside the subshell: PID = 2882
In this case, each of the values of var and PID remain unchanged within the function and subshell contexts. However, outside the subshell, the var variable isn’t accessible, and the process ID of the parent process, 2882, differs from that of the subshell, which is 2883.
3.3. Running a Function in the Background
Generally, commands that run in the background execute as separate, standalone processes. Therefore, by placing a function in the background, we make sure it runs as a subprocess.
Let’s create a script named function_in_background.sh which defines a function, runs it in the background, and waits for it to complete:
$ cat function_in_background.sh
#!/usr/bin/env bash
my_function() {
var="Function Variable"
echo "Inside the function: var = $var"
echo "Inside the function: PID = $BASHPID"
}
var="Initial Value"
echo "Before function call: var = $var"
echo "Before function call: PID = $BASHPID"
my_function &
wait
echo "After function call: var = $var"
echo "After function call: PID = $BASHPID"
The script carries out several steps:
- define a function named my_function()
- declare the var variable with an initial value that’s different from that set within my_function()
- print the value of var and the current process ID
- invoke the function with & appended to it, forcing the function to run as a separate process in the background
- wait for the function to complete
- print the value of var and the current process ID after the function call
Next, we grant the script execute permissions:
$ chmod u+x function_in_background.sh
Then, let’s run the script:
$ ./function_in_background.sh
Before function call: var = Initial Value
Before function call: PID = 2986
Inside the function: var = Function Variable
Inside the function: PID = 2987
After function call: var = Initial Value
After function call: PID = 2986
Here, the var and PID variables change only within the function call. The changes made by the background function aren’t reflected in the rest of the script. The function runs in a separate process with a context that’s different from that of the main script.
3.4. Running a Function in a Pipeline
Commands that run as part of a pipeline execute as subprocesses. Therefore, when we call a function within a pipeline, it runs in a subshell and its environment isn’t accessible outside the function.
Let’s show an example in a script named function_in_pipeline.sh:
$ cat function_in_pipeline.sh
#!/usr/bin/env bash
read_lines() {
var="Start"
echo "Inside the function: var = $var"
while read line; do
echo "Line Processed: $line"
done
var="End"
echo "Inside the function: var = $var"
echo "Inside the function: PID = $BASHPID"
}
echo -e "Line 1\nLine 2\nLine 3" > file.txt
cat file.txt | read_lines | tee output.txt
echo "Outside the function: var = $var"
echo "Outside the function: PID = $BASHPID"
The script implements several steps:
- define a function named read_lines() that reads input lines in a while loop and then prints the value of the var and PID variables
- create a file named file.txt with three lines of text
- use the cat command to read data from file.txt and pipe the result to the read_lines() function
- use the tee command to complete the pipeline and save the result to a file name output.txt while showing the output in stdout
- print the value of var and PID
Let’s grant the script execute permissions:
$ chmod u+x function_in_pipeline.sh
Finally, let’s run the script:
$ ./function_in_pipeline.sh
Inside the function: var = Start
Line Processed: Line 1
Line Processed: Line 2
Line Processed: Line 3
Inside the function: var = End
Inside the function: PID = 3048
Outside the function: var =
Outside the function: PID = 3046y
As we can see from the output, the var variable is printed twice with different values when read_line() executes within the pipeline. However, the variable isn’t accessible outside the function after the pipeline. Therefore, the modifications made inside the function don’t persist when the function is part of a pipeline. Such modifications don’t affect variables in the scope of the main script.
Moreover, the value of the PID variable is different outside the function call as compared to within the function. This shows that read_line() spawns a subshell with process ID 3048, whereas the script has process ID 3046.
In summary, since the read_line() function is part of a pipeline, it runs in a subshell. Every command in the pipeline runs as a subprocess, and this behavior is an attribute of the pipeline itself.
4. Conclusion
In this article, we learned that Bash functions normally don’t run as subprocesses. However, there are exceptions where a function requires a subprocess to run.
In particular, when we enclose the function’s body with parentheses instead of curly braces, we invoke the function as a subprocess. Likewise, we can wrap the entire function in a subshell, and this separates the function’s context from its surrounding environment. Moreover, running a function in the background or as part of a pipeline forces the function to run as a subprocess.