nobe4 / Typing Accents _

  |   Technology Keyboard

Typing accents requires a whole lot of processing that seems counter-intuitive at first glance. For a user, “tapping the key é” and “seeing é on the screen” is so common, that one rarely tries to understand what is going on.

I am by no mean an expert in this domain, but I learned a lot and wants to share my understanding.

Note: whenever possible, simplified views of various output is given. Some of the tools used are very verbose.

1. Keyboard’s scancode 🔗

Pressing a key on a keyboard triggers the keyboard firmware to send a HID scancode. Those are not character, only predefined values that are expected to be sent and received by keyboards.

Acceptable scancodes are fixed values, see section “10 Keyboard/Keypad” on page 53. One of the main job of a keyboard’s firmware is to correctly map the physical key pressed to the correct scancode.

UsageID(Dec) UsageID(Hex) UsageName
0            00           Reserved (no event indicated)
...
4            04           Keyboard a and A
5            05           Keyboard b and B
6            06           Keyboard c and C
7            07           Keyboard d and D
8            08           Keyboard e and E
...
30           1E           Keyboard 1 and !
31           1F           Keyboard 2 and @
32           20           Keyboard 3 and #
...
228          E4           Keyboard RightControl
229          E5           Keyboard RightShift
230          E6           Keyboard RightAlt
231          E7           Keyboard Right
232-65535    E8-FFFF      Reserved

You’ll notice that this list also doesn’t contain accented characters.

2. USB’s HID scancode 🔗

The OS receives HID scancodes from the keyboard via its USB cable:

$ sudo usbhid-dump -s 1:6 -f -e all
# tapping 'e'
00 00 08 00 00 00 00 00
00 00 00 00 00 00 00 00
...

It registers the 0x08 keycode. This corresponds to the character e as defined by the HID table.

...
# tapping 'é'
40 00 00 00 00 00 00 00
40 00 0A 00 00 00 00 00
40 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00

It gets the modifier 0x40 and the keycode 0x0A. The modifier 0x40 stays pressed while the keycode 0x0A gets pressed and released.

The keycode 0x0A corresponds to the character g as defined by the HID table.

The modifier 0x40 equals 0b01000000, which is the “RIGHT_ALT” modifier (see section “8.3 Report Format for Array Items” on page 66):

Bit Key         Mask
0   LEFT_CTRL   00000001
1   LEFT_SHIFT  00000010
2   LEFT_ALT    00000100
3   LEFT_GUI    00001000
4   RIGHT_CTRL  00010000
5   RIGHT_SHIFT 00100000
6   RIGHT_ALT   01000000
7   RIGHT_GUI   10000000

Those are immediately handled by the kernel.

3. Linux kernel’s evdev 🔗

Upon receiving the scancodes, Linux HID’s subsystem translate those into device events: evdev.

The device events are defined in the Linux kernel:

...
#define KEY_W        17
#define KEY_E        18
#define KEY_R        19
...
#define KEY_F        33
#define KEY_G        34
#define KEY_H        35
...
#define KEY_SYSRQ    99
#define KEY_RIGHTALT 100
#define KEY_LINEFEED 101
...

The QWERTY keyboard is used as the official layout for interpreting keycodes, instead of the HID’s alphabetical order.

$ sudo libinput record -o record /dev/input/event18 --show-keycodes --with-hidraw
Receiving events: [              *      ]^C
devices:
- node: /dev/input/event18
  evdev:
    # Supported Events:
    # ...
    #   Event code 18 (KEY_E)
    # ...
    #   Event code 34 (KEY_G)
    # ...
    #   Event code 100 (KEY_RIGHTALT)
  events:
  # tapping 'e'
  - hid:
      hidraw2: [ 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00 ]
  - evdev:
    - [  1, 348988,   1,  18,       1] # EV_KEY / KEY_E        1
  - hid:
      hidraw2: [ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ]
  - evdev:
    - [  1, 385986,   1,  18,       0] # EV_KEY / KEY_E        0
  # tapping 'é'
  - hid:
      hidraw2: [ 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ]
  - evdev:
    - [  2, 989974,   1, 100,       1] # EV_KEY / KEY_RIGHTALT 1
  - hid:
      hidraw2: [ 0x40, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00 ]
  - evdev:
    - [  2, 990972,   1,  34,       1] # EV_KEY / KEY_G        1
  - hid:
      hidraw2: [ 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ]
  - evdev:
    - [  2, 991969,   1,  34,       0] # EV_KEY / KEY_G        0
  - hid:
      hidraw2: [ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ]
  - evdev:
    - [  2, 992970,   1, 100,       0] # EV_KEY / KEY_RIGHTALT 0

Each HID event is converted into a evdev event that maps to the Linux code definition. 1 is for press, 0 for release.

In summary:

HIDevdevkey
0x00000818KEY_E
0x400000100KEY_RIGHTALT
0x00000a34KEY_G

4. X’s keycode 🔗

Now that the evdev is generated by the kernel, X (or other display servers) can handle them according to their inner logic.

Both X and Wayland use XKB to handle keyboard mappings.

X’s keycodes represent logical keys and are left to the server for interpretation.

$ xev -evenv keyboard
# tapping 'e'
KeyPress event, serial 28, synthetic NO, window 0x600001,
    state 0x0, keycode 26 (keysym 0x65, e), same_screen YES,
    XLookupString gives 1 bytes: (65) "e"
KeyRelease event, serial 28, synthetic NO, window 0x600001,
    state 0x0, keycode 26 (keysym 0x65, e), same_screen YES,
    XLookupString gives 1 bytes: (65) "e"

# tapping 'é'
KeyPress event, serial 28, synthetic NO, window 0x600001,
    state 0x0, keycode 108 (keysym 0xfe03, ISO_Level3_Shift), same_screen YES,
    XKeysymToKeycode returns keycode: 92
KeyPress event, serial 28, synthetic NO, window 0x600001,
    state 0x80, keycode 42 (keysym 0xe9, eacute), same_screen YES,
    XLookupString gives 2 bytes: (c3 a9) "é"
KeyRelease event, serial 28, synthetic NO, window 0x600001,
    state 0x80, keycode 42 (keysym 0xe9, eacute), same_screen YES,
    XLookupString gives 2 bytes: (c3 a9) "é"
KeyRelease event, serial 28, synthetic NO, window 0x600001,
    state 0x80, keycode 108 (keysym 0xfe03, ISO_Level3_Shift), same_screen YES,
    XKeysymToKeycode returns keycode: 92

This is a similar output, with an interesting difference: all the keycodes are shifted by 8.

Indeed, the function that converts evdev into keycodes adds 8 to all codes before storing them:

#DEFINE MIN_KEYCODE 8
// ...
void
EvdevQueueKbdEvent(InputInfoPtr pInfo, struct input_event *ev, int value)
{
    EventQueuePtr pQueue;

    // ...

    if ((pQueue = EvdevNextInQueue(pInfo)))
    {
        pQueue->type = EV_QUEUE_KEY;
        pQueue->detail.key = ev->code + MIN_KEYCODE;
        pQueue->val = value;
    }
}

5. XKB’s key 🔗

After X captures the evdev and converts it to a keycode, it is mapped to a keyboard key.

X’s key are names for the position of the key traditionally found on a keyboard.

In the basic evdev layout, they have the following setup:

default xkb_keycodes "evdev" {
    <TLDE> = 49;
    <AE01> = 10;
    <AE02> = 11;
    ...
    <AE11> = 20;
    <AE12> = 21;
    <BKSP> = 22;
    ...
}

Those codes are layout-agnostic, but can be visualized on an US ANSI keyboard:

TLDE AEO1 AEO2 AEO3 AEO4 AEO5 AEO6 AEO7 AEO8 AEO9 AE10 AE11 AE12 BKSP
TAB  AD01 AD02 AD03 AD04 AD05 AD06 AD07 AD08 AD09 AD10 AD11 AD12 BKSL
CAPS   AC01 AC02 AC03 AC04 AC05 AC06 AC07 AC08 AC09 AC10 AC11    RTRN
LFSH     AB01 AB02 AB03 AB04 AB05 AB06 AB07 AB08 AB09 AB10       RTSH
LCTL  LWIN  LALT              SPCE             RALT  RWIN  COMP  RCTL

US ANSI Keyboard layout

Here are the interesting keycodes:

6. XKB’s symbol 🔗

X then uses a symbol mapping list to determine which symbol corresponds to the pressed keycode + modifier. Those are configurable by the user and is where one can change the locality of the keyboard layout.

The modifier can be:

The eu mapping is a great example, since it contains a lot of commonly used accents for European languages:

xkb_symbols "basic"  {
    // Mod        NONE      SHIFT          ALTGR           SHIFT+ALTGR
    key <TLDE> {[ grave,    asciitilde,    dead_grave,     dead_tilde    ]};
    key <AE01> {[ 1,        exclam,        exclamdown,     onesuperior   ]};
    key <AE02> {[ 2,        at,            ordfeminine,    twosuperior   ]};
    // ...
    key <AD01> {[ q,        Q,             ae,             AE            ]};
    key <AD02> {[ w,        W,             aring,          Aring         ]};
    key <AD03> {[ e,        E,             ediaeresis,     Ediaeresis    ]};
    key <AD04> {[ r,        R,             yacute,         Yacute        ]};
    ...
    key <AC05> {[ g,        G,             eacute,         Eacute        ]};
    // ...
};

It shows that:

7. Client applications 🔗

Different applications handle the keyboard events similarly:

Summary 🔗

Finger on
keyboard
press erelease epress érelease é
HID's
scancodes
00 00 0800 00 0040 00 00
40 00 0A
40 00 00
00 00 00
Linux's
evdev
18 118 0100 1
34 1
34 0
100 0
X's keycodepress 26release 26press 108
press 42
release 42
release 108
XKB's keypress ADO3release ADO3press RALT
press ACO5
release ACO5
release RALT
XKB's symboleé

References: