Linux 101 - Miscellaneous Commands and Concepts in the Terminal

Introduction

With the content already covered in Linux 101 up until this point, a user should have a general idea of navigating and performing essential tasks in the Linux terminal. In this section, various tools and concepts will be discussed that improves the Linux terminal experience.

Regular expressions

Regular expressions are essentially patterns that is used to match strings that conform to the said patterns. Regular expressions is a complex topic, however, for the purposes of this course, only its basic concepts and use will be discussed.

To aid in this discussion, lets say that we have output from one experimental run that executed on the UFS HPC, and the following files are produced:

  • foo_stats.txt
  • foo_seq1.fasta
  • foo_seq2.fasta
  • foo_seq3.fasta
  • foo_timings.txt
  • hpc_job.error
  • hpc_job.output
  • hpc_job.nodes
  • hpc_job.txt

Regular expressions can aid us in selecting/filtering different files which we might need to perform file operations on. This is the most common use of regular expressions to most terminal users.

Let's start by identifying some general patterns in our file list:

  • Some files start with the common prefix foo_
  • Some files start with the common prefix hpc_job
  • Some files end with the common extension .fasta
  • Some files end with the common prefix .txt

From the above list we can already select at least 4 groups of files from the patterns identified.

So now that we understand what the patterns are in an abstract sense, how do we actually construct a regular expression? First, a regular expression is itself a string or piece of text, and is composed of the following elements:

Element Definition Example
Character set or class Characters retaining their literal meaning the prefix foo
Modifiers Expand or narrow the range of text to match The asterisk (*) expands the selection to all characters
Anchors Indicates the location from where to match The caret (^) indicates that the pattern should be at the beginning of a line

Thus to construct a regular expression for the first pattern that we identified, that is to match all file names with the prefix foo we can use the following regular expression: foo*, which means:

  • (foo) = match the literal characters f o o in order
  • (*) = match any and any number of characters

Note that anchors can only be used in the context of tools such as grep and sed (and not for file selection in the terminal)

Most tools already covered in Linux 101 can use regular expressions. Let's use ls to list all the files with our constructed regular expression as an example:

First, ls the directory to see all its contents:

regex_1

Now use ls with the regular expression:

$ ls foo*

regex_2

We can reuse this regular expression with other tools. For example, we can use cp to copy all the files starting with the foo prefix.

For practice, download the archive with the example files above here, and attempt construct regular expressions for the other three patterns identified.

Create and Extract Compressed Archives with tar

It is common for files and software to be distributed as compressed archives (Such as in the section above ;) ). Thus it is important to know how to create and extract these archives in the terminal. Fortunately, there is a command which can accomplish this: tar.

Note that tar.gz is the most common archive in the linux environment and thus the archive linked in the section on Regular Expressions above will be used as an example.

Common Use

Basic Syntax

$ tar -*mode* *archive_path* *file_path*

The mode is a combination of command switches to accomplish the desired task, some common switches are:

  • c : create an archive
  • x : extract files from an archive
  • f : use an archive file
  • v : verbosely list the files being processed
  • z : filter the file through gzip (the gz part of the extension)

Note

Always specify the f switch at the end of any combination in which it is used. tar expects an archive path to follow f

Tip

Use the option -C <path> separately. -C tells tar to change the directory to before performing an operation. Thus, in the context of creating an archive, this is the path which tar will compress and in the context of extracting an archive, this is the path where tar will extract the files.

Basic Syntax - Creating an archive

$ tar -cvzf *archive_path* *path(s)*

Example

Create the archive regex_training.tar.gz from the directory regex_training/ :

tar_1

Basic Syntax - Extracting an archive

$ tar -xvzf *archive_path* *directory_path*

Example

Extract the archive regex_training.tar.gz in the current directory:

tar_2

Keep a terminal session alive using screen

Some applications may take a very long time to execute and thus it isn't feasible to have an open ssh session for the entire duration. Luckily, the tool screen provides a solution to this situation. The screen tool is a terminal multiplexer, meaning that the user can create multiple sub-sessions within a terminal session. Importantly, these nameable sessions can be detached from the current session (meaning they are still running but doesn't end when the user ends the parent session) and be reattached again at a later time.

Thus, imagine the following scenario: A researcher runs an experiment with an interactive PBS job. However, they connect via to the cluster via their laptop and needs to take it home every night. The experiment is also time sensitive, so it needs to run uninterrupted for the full 10 days. Here, the researcher can use screen to detach their interactive PBS job as "experiment x", close their current session and go home every night without worry. In the morning they can check the progress of the job by logging in and reattaching the session, and then detaching the session when they need to terminate their main session again.

Create a screen session

Basic syntax

$ screen -S *name_of_session*

Detach a screen session

Basic syntax

$ *press Ctrl+A, release, then press D*

Example

Create and detach the screen session LongJob1 screen session:

screen_1

List the available screen sessions

Basic syntax

$ screen -ls

Example

List the screen sessions available:

screen_2

Reattach a detached screen session

Basic syntax

$ screen -r *name of the session to reattach*

Example

Reattach one of the LongJob1 screen sessions:

screen_3

Quit/delete a screen session

Basic syntax

$ screen -X -S *name of the session to delete as printed in screen -ls* quit

Example

Delete the LongJob1 screen session:

screen_4

Record a terminal session with script

There are scenarios where having a recording of your current terminal session may be useful. For example, going through a commonly used work flow to contribute a tutorial to the UFS HPC documentation project ;)

Once again, the linux terminal provides just such a useful tool: script. The script command can produces an ASCII text file that contains captured command and their output, which can be viewed in a text editor.

Capture a terminal session with script

Basic syntax

$ script -q *path_of_script_file*

Example

Capture a terminal session in a file. Note, to end the capturing, press Ctrl+D:

script_1

View the contents of script output, example_script:

script_2

Append additional terminal session output to a script output file

Basic syntax

$ script -a -q *path_of_script_file*

Example

Append additional terminal output to the example_script output file created in the previous example. Note, to end logging, press Ctrl+D:

script_3

View the contents of script output, example_script again:

script_4

Outcomes for this section

After completion of this section, you need to be able to do the following tasks in a Linux terminal:

  • Understand the concept of regular expressions and be able to use basic regular expressions with other terminal commands to filter and group input and/or output.
  • Use tar to create and extract compressed archives.
  • Use screen to create, detach and reattach terminal sessions.
  • Use script to capture a terminal session to a log file.