#5: Shell Advanced
Hopefully you’ll have built up a decent understanding of the Linux shell, and the underlying operating system, but there are a few extra tools and tricks to learn that can help you maximise your terminal efficiency and knowledge. This lecture will cover advanced redirection, job control, multiplexing, aliases, advanced scripting, and GNU parallel.
Advanced redirection
After playing around with pipes and redirection, you may find that some output isn’t filtering appropriately through programs such as grep
, or gets output to terminal even though it is redirected with >
. Consider the following example:
cat /non/existent/file > error.txt
# cat: /non/existent/file: No such file or directory
where the error output of cat
is not redirected to error.txt
and instead output to the terminal. This is due to >
by default only redirecting STDOUT.
The default file descriptors and their data streams are labelled as such:
File descriptor | Name | Standard stream |
---|---|---|
0 | STDIN | Program input |
1 | STDOUT | Program output |
2 | STDERR | Program errors |
Standard streams 1 and 2 can be redirected by specifying their ID before >
. Consider the same example as earlier, but with the error stream redirected: cat /non/existent/file 2> error.txt
. This time, the error output is redirected to error.txt
, and nothing is output to the terminal. STDIN (fd 0) isn’t often redirected, so you’ll mostly be redirecting STDOUT and STDERR.
If you have output for a file that you don’t want to save or read, then you can redirect the standard stream to the device file /dev/null
, which deletes any data sent to it.
As an example, run echo test
$ echo test
test
then echo test >/dev/null
$ echo test >/dev/null
and notice that the error isn’t logged in the latter example. This can also be used with STDERR, for example with cat /non/existent/file 2>/dev/null
not printing an error.
Using this, data streams can be split, for example with python3 -c "print('stdout goes here'); raise Exception('stderr goes here')" >stdout 2>stderr
.
The command might look complicated, but all you need to know is the the python3
command outputs data to STDOUT and STDERR.
>stdout.txt
redirects the STDOUT of the program to a file called stdout.txt
, and 2>stderr
redirects the STDERR of the program to a file called stderr.txt
.
Example output:
$ python3 -c "print('stdout goes here'); raise Exception('stderr goes here')" >stdout 2>stderr
$ cat stdout
stdout goes here
$ cat stderr
Traceback (most recent call last):
File "<string>", line 1, in <module>
Exception: stderr goes here
Note: the syntax &>
can be used to redirect both STDOUT and STDERR, for example using python3 -c "print('stdout goes here'); raise Exception('stderr goes here')" &>output.txt
to redirect all data to output.txt
.
Example output:
$ python3 -c "print('stdout goes here'); raise Exception('stderr goes here')" &>output.txt
$ cat output.txt
Traceback (most recent call last):
File "<string>", line 1, in <module>
Exception: stderr goes here
stdout goes here
You might have noticed that piping to grep
doesn’t filter error messages. This is because grep
only filters STDOUT. This means that error messages that get sent through STDERR don’t get processed by grep
, and we need to redirect our standard streams to fix this.
If we move the output from STDERR to STDOUT, grep
will be able to filter it.
We can do this with the syntax 2>&1
. This moves data from STDERR (fd 2
) to STDOUT (fd 1
), using an &
before 1
to avoid ambiguity between writing to a file named 1
.
Let’s practice this by running python3 -c "raise Exception('This goes to STDERR')"
.
$ python3 -c "raise Exception('This goes to STDERR')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
Exception: This goes to STDERR
This runs the python code raise Exception('This goes to STDERR')
, which throws an error and outputs context data around it. Trying to filter the output with python3 -c "raise Exception('This goes to STDERR')" | grep STDERR
doesn’t work, since the python error output goes to STDERR.
$ python3 -c "raise Exception('This goes to STDERR')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
Exception: This goes to STDERR
(Note the output is unfiltered.)
However, running python3 -c "raise Exception('This goes to STDERR')" 2>&1 | grep STDERR
does, since it moves the STDERR stream to STDOUT, which grep
does filter.
$ python3 -c "raise Exception('This goes to STDERR')" 2>&1 | grep STDERR
Exception: This goes to STDERR
As well as this, STDOUT can be redirected to STDERR to properly log errors, such as with echo 'error here!' >&2
.
Job control
Sometimes when running a program, some jobs may be run in the background of a shell, allowing processes to be run while commands are executed in the foreground. Running ps
with no other arguments allows you to see all processes in the current shell. If you don’t have any processes currently running in your shell background, your output should look similar to this:
PID TTY TIME CMD
344 pts/1 00:00:00 bash
357 pts/1 00:00:00 ps
To run processes in the background, the &
(ampersand) character can be appended to a command to run it in the background. This turns it into a job. The current job number, as well as system process ID should be printed after running the command, for example [1] 381
. Try running sleep 5 &
and sleep 5
and notice the former allows you to still use the shell while it runs. Running ps
soon after the backgrounded command should also show the sleep
process in the shell process list.
Note: Even if processes are run backgrounded, output may be sent to the terminal, if you don’t want to see the output consider hiding STDOUT and STDERR by using &>/dev/null
to redirect the program’s output.
To background a currently running process, we can suspend and background it. As an example. Let’s run sleep 15
, enter Ctrl+Z
to suspend the program (there should be an output of the job ID, for example [1]+
), then run bg [JOB ID]
(usually 1). Run ps
again to see the process running in the background. A better way to monitor jobs in the shell is by running jobs
, which should list any currently running jobs alongside their IDs.
To foreground a job, run fg [JOB ID]
, feel free to try the above example again using fg
instead of bg
. This also works with processes currently backgrounded.
If you want to kill a job, use kill %[JOB ID]
or kill [PROCESS ID]
to terminate it.
However, even if a job is backgrounded, closing the terminal will close any processes running from the shell, including jobs running applications started from the shell. To fix this, you can use disown
to remove processes from the shell’s job control list. To test this, let’s run nautilus &
, the default file manager for Ubuntu in the background (if this is unavailable, than any other application with a gui can be used to demonstrate). Close the shell window and notice that nautilus
(or your chosen application) closes as well. Next, open your shell window again, run nautilus &
, then type disown
to clear the shell’s job control list. Close the shell window again and see that nautilus
stays open.
Note: disown
can also be used with a specific process ID, for example disown 245
to remove a specific process from the shell job control list.
Multiplexing
Often when using the shell, you might want to use multiple applications in one window. To do this, we can use terminal multiplexing. Mutiplexing allows a user to split a terminal into multiple sections, save and restore sessions, and edit theme colouring. One of the most popular terminal multiplexers is tmux
, and we’ll be going through the basics of using it here.
To start, install the tmux
package.
Note: To always automatically start tmux when you open your terminal, run echo -e '\nif command -v tmux &> /dev/null && [ -n "$PS1" ] && [[ ! "$TERM" =~ screen ]] && [[ ! "$TERM" =~ tmux ]] && [ -z "$TMUX" ]; then\n exec tmux\nfi' >> ~/.bashrc
. What this does is add bash code to the shell startup file, .bashrc
, that starts a new tmux
session if one is not already running in the shell. To disable it, delete the 3 lines starting with if command -v tmux
in ~/.bashrc
.
Operations in tmux
will start with the prefix Ctrl-B
, represented by the syntax C-b
in tmux
documentation. Although shortcuts can be edited later, we’ll be going over the defaults now. tmux
commands can be outside a tmux session with tmux
prepended to the command, or in a tmux session after running C-b :
to access the command line. The following commands will have the tmux
keyword prepended.
Session Control
To start, run the tmux
command to create a new tmux
session. You should now see a bar on the bottom of screen that says something similar to [0] 0:bash*
. tmux
sessions can be listed with tmux ls
, and existing sessions can be attached to with tmux attach -a [ID]
. New sessions can be created with another name with tmux new -s [NAME]
, and to quickly view and switch between sessions, use C-b s
in a tmux
session. To kill tmux
sessions, use tmux kill-session -t [ID/NAME]
, and to kill every tmux session, use tmux kill-server
.
Window + Pane Management
tmux
sessions by default have 1 window, shown at the bar at the bottom of the screen. To create more, use C-b c
. To delete windows, use C-b &
. Windows can be switched between using tmux select-window -t [ID]
Each tmux
window starts with 1 pane, but using commands, we can split this up as we’d like. To split a pane into 2 panes vertically, use C-b %
, to split a pane into 2 panes horizontally, use C-b "
. Navigation between these panes can be done with C-b [ARROW KEY]
, or if set -g mouse on
is enabled in the command line or tmux
config file, you can click on panes to activate them. To close a pane, use C-b x
.
Note: You can create a config file in ~/.tmux.conf
or ~/config/tmux.conf
with tmux
commands such as set -g mouse on
to save personal preferences and rebind keys.
Here is a useful cheat sheet to refer to while learning tmux
: https://tmuxcheatsheet.com/
Aliases
If there’s a long command you have to use often, you can use an alias instead to type it faster. Aliases work like their name implies, functioning as another name to call a command. The syntax is alias [ALIAS_NAME]="[COMMAND]"
, and can be placed in a bash source file to load on every shell, or run in a terminal to work for the current session.
Let’s alias aptstall
to sudo apt install
as an example.
alias aptstall="sudo apt install"
aptstall cmatrix
# [sudo] password for user:
Running aptstall cmatrix
here will prompt you for your password, since the shell is running sudo apt install
. This can be used for any command, and is a quick way to save time when using your terminal.
To save your aliases, one recommendation is to create a ~/.bash_aliases
file, and add source ~/.bash_aliases
to your ~/.bashrc
file. All of your aliases can be saved here, and will be loaded every time you open a new shell.
Advanced Shell Scripting
Unlike most other scripting languages, bash uses a variety of special character variables to refer to arguments, error codes, and other relevant variables. Below is a list of some of them. A more comprehensive list can be found here.
$0
- Name of the script$1
to$9
- Arguments to the script.$1
is the first argument and so on.$@
- All the arguments$#
- Number of arguments$?
- Return code of the previous command$$
- Process identification number (PID) for the current script!!
- Entire last command, including arguments. A common pattern is to execute a command only for it to fail due to missing permissions; you can quickly re-execute the command with sudo by doingsudo !!
$_
- Last argument from the last command. If you are in an interactive shell, you can also quickly get this value by typingEsc
followed by.
orAlt+.
Also note the keyword shift
that can be used to shift all argument variables down by 1, destroying the previous value of $1
. This moves the value of $2
to $1
, from $3
to $2
, and so on. This is useful in cases of iterating through all arguments in a loop.
Commands will often return output using STDOUT
, errors through STDERR
, and a return code to report errors in a more script-friendly manner.
The return code or exit status is the way scripts/commands have to communicate how execution went.
A value of 0 usually means everything went OK; anything different from 0 means an error occurred.
Exit codes can be used to conditionally execute commands using &&
(and operator) and ||
(or operator), both of which are short-circuiting operators. Commands can also be separated within the same line using a semicolon ;
.
The true
program will always have a 0 return code and the false
command will always have a 1 return code.
Let’s see some examples
false || echo "Oops, fail"
# Oops, fail
true || echo "Will not be printed"
#
true && echo "Things went well"
# Things went well
false && echo "Will not be printed"
#
true ; echo "This will always run"
# This will always run
false ; echo "This will always run"
# This will always run
Another common pattern is wanting to get the output of a command as a variable. This can be done with command substitution.
Whenever you place $( CMD )
it will execute CMD
, get the output of the command and substitute it in place.
For example, if you do for file in $(ls)
, the shell will first call ls
and then iterate over those values.
A lesser known similar feature is process substitution, <( CMD )
will execute CMD
and place the output in a temporary file and substitute the <()
with that file’s name. This is useful when commands expect values to be passed by file instead of by STDIN. For example, diff <(ls foo) <(ls bar)
will show differences between files in dirs foo
and bar
.
Since that was a huge information dump, let’s see an example that showcases some of these features. It will iterate through the arguments we provide, grep
for the string foobar
, and append it to the file as a comment if it’s not found.
#!/bin/bash
echo "Starting program at $(date)" # Date will be substituted
echo "Running program $0 with $# arguments with pid $$"
for file in "$@"; do
grep foobar "$file" > /dev/null 2> /dev/null
# When pattern is not found, grep has exit status 1
# We redirect STDOUT and STDERR to a null register since we do not care about them
if [[ $? -ne 0 ]]; then
echo "File $file does not have any foobar, adding one"
echo "# foobar" >> "$file"
fi
done
In the comparison we tested whether $?
was not equal to 0.
Bash implements many comparisons of this sort - you can find a detailed list in the manpage for test
.
When performing comparisons in bash, try to use double brackets [[ ]]
in favor of simple brackets [ ]
. Chances of making mistakes are lower although it won’t be portable to sh
. A more detailed explanation can be found here.
When launching scripts, you will often want to provide arguments that are similar. Bash has ways of making this easier, expanding expressions by carrying out filename expansion. These techniques are often referred to as shell globbing.
- Wildcards - Whenever you want to perform some sort of wildcard matching, you can use
?
and*
to match one or any amount of characters respectively. For instance, given filesfoo
,foo1
,foo2
,foo10
andbar
, the commandrm foo?
will deletefoo1
andfoo2
whereasrm foo*
will delete all butbar
. - Curly braces
{}
- Whenever you have a common substring in a series of commands, you can use curly braces for bash to expand this automatically. This comes in very handy when moving or converting files.
Note: Install the imagemagick
package to use the convert
command.
convert image.{png,jpg}
# Will expand to
convert image.png image.jpg
cp /path/to/project/{foo,bar,baz}.sh /newpath
# Will expand to
cp /path/to/project/foo.sh /path/to/project/bar.sh /path/to/project/baz.sh /newpath
# Globbing techniques can also be combined
mv *{.py,.sh} folder
# Will move all *.py and *.sh files
mkdir foo bar
# This creates files foo/a, foo/b, ... foo/h, bar/a, bar/b, ... bar/h
touch {foo,bar}/{a..h}
touch foo/x bar/y
# Show differences between files in foo and bar
diff <(ls foo) <(ls bar)
# Outputs
# < x
# ---
# > y
If a bash
script runs into an error, it will continue running, which may cause unintended behaviour if an earlier command failure is unhandled, which later commands could rely on. To remedy this, you can use set -e
at the start of a bash
script which exits after it receives an error (a non-zero return code).
Writing bash
scripts can be tricky and unintuitive. There are tools like shellcheck that will help you find errors in your sh/bash scripts.
Note that scripts need not necessarily be written in bash to be called from the terminal. For instance, here’s a simple Python script that outputs its arguments in reversed order:
#!/bin/env python3
import sys
for arg in reversed(sys.argv[1:]):
print(arg)
The kernel knows to execute this script with a python interpreter instead of a shell command because we included a shebang line at the top of the script.
It is good practice to write shebang lines using the env
command that will resolve to wherever the command lives in the system, increasing the portability of your scripts. To resolve the location, env
will make use of the PATH
environment variable we introduced in the first lecture.
Some differences between shell functions and scripts that you should keep in mind are:
- Functions have to be in the same language as the shell, while scripts can be written in any language. This is why including a shebang for scripts is important.
- Functions are loaded once when their definition is read. Scripts are loaded every time they are executed. This makes functions slightly faster to load, but whenever you change them you will have to reload their definition.
- Functions are executed in the current shell environment whereas scripts execute in their own process. Thus, functions can modify environment variables, e.g. change your current directory, whereas scripts can’t. Scripts will be passed by value environment variables that have been exported using
export
- As with any programming language, functions are a powerful construct to achieve modularity, code reuse, and clarity of shell code. Often shell scripts will include their own function definitions.
GNU Parallel
GNU parallel
is an very useful, customisable tool that uses threads to split multiple commands into threads to speeding up script execution, while preserving the initial command order. parallel
can be used as a quick and intuitive replacement for for
loops and xargs
.
The typical syntax for a parallel
command looks like parallel [COMMAND] ::: [INPUT LIST]
, for example, parallel echo ::: {1..100}
. To repeat a command a certain number of times without passing the number as an argument, the flag -N0
can be used after parallel
to pass 0 arguments into the command, letting the command to run the same every time.
As well as using {1..100}
to expand a list of numbers up to 100, any other list of arguments can be used to iterate over. For example, parallel ls -la ::: *.txt
lists file metadata for all .txt
files in the current directory. parallel
also allows data to be piped into the command, such as cat /etc/passwd | cut -d ':' -f 1 | parallel id
listing the ID info for all users. Another way of using parallel
is with the syntax parallel [COMMAND] :::: [FILE]
, for example with the command parallel which > paths.txt :::: commands.txt
, which reads all command names in command.txt
and saves their full paths to paths.txt
.
One way of stepping up your parallel
using is by utilising replacement strings. These function as ways to modify arguments given to the command. A couple examples are:
{}
- The argument{#}
- The index of the argument (starting from 1){.}
- The file argument with the extension stripped (e.g:hello.txt
->hello
){/}
- The file argument’s basename (e.g:dir/hello.txt
->hello.txt
){//}
- The file argument’s directory path (e.g:dir1/dir2/hello.txt
->dir1/dir2
)
These can be used for operations such as converting all jpg images to pngs in parallel, where doing so in a single thread would be much slower, for example (reminder that the imagemagick
package is required to use convert
), using the command parallel convert {} {.}.png ::: *.jpg
. Learning how to use parallel
effectively can signiificantly increase your terminal efficiency.
Exercises
To test your learning this session, try the following exercises:
-
Create a bash script that runs the first argument given as a background process with no output that will persist past the terminal’s exit, printing the process’ name and ID in the terminal.
-
Write a bash function that reads any number of arguments, and writes them to a file
args.txt
, clearing the file contents if it exists, with the syntax[ARG NUM]: [ARG VALUE]
-
Build a one-line command that reads website urls from a file
urls.txt
, accesses the websites using multiple threads, and outputs the http status codes to the terminal in the format[URL] — [CODE]
. Try to hide any progress bars shown when receiving data from websites.
Licensed under CC BY-NC-SA.