1. Introduction

Although there are many ways to produce input within Linux, the typical text entry device for users is the keyboard. However, the mapping isn’t one-to-one between keyboard keys and actual characters. Thus, we have many character notations and key combinations in the shell and terminal to see and enter our input.

In this tutorial, we discuss modifier keys and their relation to the shell and terminal. First, we go over keyboards, clarifying the process of key and key combination detection, as well as terms like modifier keys. After that, we explore one of the main output formats for special keys and key combinations. Finally, we check key to character conversion on several levels.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments unless otherwise specified.

2. Keyboards and Typing

Keyboards are fairly simple devices that require basic drivers. A standard keyboard comprises a number of keys that usually varies between 78-115, depending on the number of special keys.

2.1. Key Detection

Upon pressing or releasing any single key, Linux goes through several steps:

  1. the pressed or released key produces a scancode
  2. keyboard drivers convert scancodes to keycodes, combining with modifier keys
  3. the kernel converts keycodes to tty input characters via kernel keymaps
  4. keys and key combinations that aren’t characters may need further processing by the kernel or applications

If we press more than one key, the result may be different.

2.2. Key Combinations

Of course, we can also input key combinations by pressing more than one key:

  • one after the other, holding the first
  • one after the other, releasing the first
  • simultaneously

The last case is trivial and simply produces the scancodes of the constituent keys. On the other hand, the second scenario involves so-called dead keys that change or add to the next key scancode. Finally, both the second and third options usually involve modifiers.

2.3. Modifiers

Modifiers can be keys or even key combinations that change the behavior of other keys when pressed along with them. There can be many, but there are a number of well-known examples:

+--------------------------------------------------+----------+
| Key      | Name      | Details                   | Platform |
|----------+-----------+--------------------------------------|
| ^ Ctrl   | Control   | common keyboard shortcuts |     -    |
|----------+-----------+---------------------------+----------|
| ⇧ Shift  | Shift     | change letter case        |     -    |
|----------+-----------+---------------------------+----------|
| ⎇ Alt    | Alternate | alternate shortcuts       |   Win    |
|----------+-----------+---------------------------+----------|
| ⇮ AltGr  | Alternate | similar to Alt            |   Win    |
|          | Graphic   | can produce accents       |          |
|----------+-----------+---------------------------+----------|
| ⌥ Option | Option    | same as Alt               |   Mac    |
|----------+-----------+---------------------------+----------|
| ◆ Meta   | Meta      | original modifier key     | MIT, Sun |
|----------+-----------+---------------------------+----------|
| ✦ Hyper  | Hyper     | MIT-specific meta key     |   MIT    |
|----------+-----------+---------------------------+----------|
| ❖ Super  | Super     | Win and Cmd predecessor   | MIT, BSD |
|----------+-----------+---------------------------+----------|
| ⊞ Win   | Windows   | Windows shortcuts         |   Win    |
|          | logo      | usually like Cmd          |          |
|----------+-----------+---------------------------+----------|
| ⌘ Cmd   | Command   | like Win for Mac           |   Mac    |
|----------+-----------+---------------------------+----------|
| Fn       | Function  | change F-key functions    |    -     |
+-------------------------------------------------------------+

Even though only one key is Meta, the same term is used as a catch-all for modifiers and escape keys. Let’s see why.

2.4. Meta

In short, the actual Meta key can be considered the original Hyper, Super, Win, or Cmd key. As such, most operating systems process it equivalently, regardless of its name and icon.

However, all of these keys can be ignored by terminals. Terminals usually mainly consider Ctrl, Shift, and Alt. In that context, Meta is equivalent to Alt.

Then again, Meta can also be a catch-all term for custom hardware and software-specific keys and combinations:

In fact, some of these keys are a kind of protection against lower-level mechanisms hijacking their shortcuts.

Critically, no Meta is sent directly to the shell or applications but is instead interpreted only in combinations. Alternatively, applications can detect such keys by listening for key press events as detected and sent by the terminal (emulator).

Perhaps the most concrete definition of Meta is a key, key combination, or command character that doesn’t convert to a single normal character directly but augments the following keys or characters. Despite the versatile nature of Meta, it’s often denoted simply as M.

3. Caret Notation and ANSI

One of the most common ways to produce and display non-character keys and key combinations is the caret notation:

$ perl -e 'for(my $c = 0; $c < 256; $c++) {  print(sprintf("%c is %d %x\n", $c, $c, $c)); }' | cat --show-nonprinting
^@ is 0 0
^A is 1 1
^B is 2 2
^C is 3 3
[...]
! is 33 21
" is 34 22
# is 35 23
[...]
a is 97 61
b is 98 62
c is 99 63
[...]
M-^] is 157 9d
M-^^ is 158 9e
[...]

The code above pipes the output of a one-liner in perl to the cat command.

In particular, the for loop within the Perl snippet goes through the characters with codes 0-255 (ASCII) and shows each as a character, a decimal number, and a hexadecimal number. After that, cat is used to –show-nonprinting (-v) characters in the caret notation.

In this case, except for doubled carets, the ^ caret is usually the Control key, while M means Meta. Notably, since keys like the arrows on the keyboard don’t have a character representation, they are not part of the list above.

Still, we can produce the effect of such special keys via the CSI \e+[ (Ctrl+[) prefix:

+--------------------------------------------------------+
| CSI Sequence | Output | Description       | Equivalent |
+--------------+--------+-------------------+------------|
| CSI+[A       | ^[[A   | move cursor up    | Up         |
| CSI+[B       | ^[[B   | move cursor down  | Down       |
| CSI+[C       | ^[[C   | move cursor right | Right      |
| CSI+[D       | ^[[D   | move cursor left  | Left       |
+--------------------------------------------------------+

Of course, the table above is just a small subset of the full ANSI capabilities.

Armed with a way to input and display keys and characters, let’s see how the context influences how they map to each other.

4. Key to Character Conversion Environments

While applications usually expect character sequences, some software implementations, such as the X Window system, request raw scancodes, so they can process key presses on their own.

In the context of a raw terminal, many keys on a keyboard can map directly to characters. However, some single keys, as well as many key combinations, don’t have a direct single-character match. In such cases, the character sequence resulting from a key or key combination depends on the context.

Let’s go through some usual interpretation environments.

4.1. Login Process

The login shell is the first process that runs for a user.

Since login shells don’t require complex functionality, they can be much more rudimentary than regular interactive shells. In fact, having fewer features makes it easier to ensure stability and security.

Thus, the standard but minimalistic sh (Bourne Shell) is often the logical choice for one of the base processes of a user process tree:

baeldung login: user
Password:
$ ps -H
PID TTY   CMD
666 tty1  login
667 tty1   sh

Here, ps shows the process [-H]ierarchy begins with sh as the child process of login. The latter spawns the shell configured in /etc/passwd. As a basic mediator, login only needs to understand valid username and password constituents such as uppercase (meaning Shift) and lowercase letters, numbers, symbols, and Space, as well as Return for submitting. Anything else it usually outputs in the caret notation.

4.2. sh Shell

Just like a dumb terminal without extra features, the now-primitive sh shell intercepts very few special keys and combinations.

In fact, if we press some non-character keys in sh, we encounter many caret notation sequences:

+---------------------------------+
| Key      | Output | Translation |
+----------+--------+-------------|
| Up       | ^[[A   | \e [ A      |
| Down     | ^[[B   | \e [ B      |
| Right    | ^[[C   | \e [ C      |
| Left     | ^[[D   | \e [ D      |
|----------+--------+-------------|
| PageUp   | ^[[5~  | \e [ 5 ~    |
| PageDown | ^[[6~  | \e [ 6 ~    |
+---------------------------------+

Still, like with login, we do get the functionality of Backspace, Return, and others. For the rest, we see what we pressed, although the terminal or shell doesn’t perform any other special action apart from printing it.

4.3. Bash and Similar Shells

On the other hand, the more modern Ash, Bash, Dash, Zsh, and similar shells usually map actions to the keys and combinations that sh simply echoes.

For example, pressing the direction keys in a modern shell can activate command history or line editing.

So, we might see much fewer caret symbols when pressing special keys and key combinations in Bash. Still, we can use the simple cat command without any arguments to enter a mode that echoes combinations similar to sh:

$ cat
^[[A^[[B

In this example, we input Up and Down and see their respective notation.

4.4. Terminal Emulator and Graphical User Interface (GUI)

Being close to the kernel, the terminals and terminal emulators are commonly one of the first to receive and process all low-level keys and key combinations. It’s only after they pass through the terminal (emulator) unprocessed that the underlying shell or application can catch and interpret them.

For example, since shells don’t usually need to move the cursor up and down, it doesn’t consider the Up and Down arrow keys for moving the cursor like a terminal might. Instead, it prints the caret notation values. These terminal behaviors mainly depend on stty and the relevant shortcuts.

On a higher level, a graphical user interface (GUI) like X uses commands such as xmodmap to configure the keymaps similar to those of the kernel.

So, we can continue our earlier chain of actions around a key press or release:

  1. GUI generates a key event
  2. if the key event isn’t handled by the GUI, it might go to our terminal (emulator)
  3. if the key event isn’t handled by our terminal (emulator), it might go to our shell
  4. if the key event isn’t handled by our shell, it might go to our application

Since Meta isn’t part of X, it often goes down the chain, usually to the terminal, most often xterm. This way, its interpretation depends on its configuration and possibly some self-explanatory settings:

  • metaSendsEscape
  • altIsNotMeta
  • altSendsEscape

Thus, even if we know a key has reached xterm, the result has to be deduced based on the current state.

5. Summary

In this article, we looked at keyboards, pressing a key, the modifier keys and combinations, as well as how we detect, interpret, and output the latter at different levels.

In conclusion, it’s not straightforward to determine where and which of these keys and key combinations are interpreted and output, but understanding the concepts can aid our efforts.