Software Engineer. Photographer. Marathoner. Lifelong Student.

Crafted by Blake Sanie with and

Generating Optimal Keyboard Layouts (with Code)

Blake Sanie | Nov 28, 2020

Today, the ubiquitous QWERTY keyboard is assumed to be the optimal layout for typing in the English language. Because QWERTY was developed in the 1870’s, I challenged myself to use modern computer science to generate a more optimal keyboard.

More generally, this task can be applied to any language or collection of words. More on this later!

Necessary Assumptions

To find the one, true best keyboard layout, one must use brute force: testing every permutation of keys for typing efficiency.

I, for one, do not have infinite computing power.

To accomplish this task in a reasonable timeframe, let’s follow these assumptions:

1. Generate Markov Chain from Provided Text

The method we are using extracts language from a text file. The more expansive the file, the better.

// sample.txt

It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.

However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.

...

Pride and Prejudice, Jane Austen (13k lines)

Our keyboard models need to understand how characters relate to each other in various contexts. Introducing the Markov Chain. A Markov Chain, in this case, is a statistical model that maps each character to the frequency of each character that follows it. This is easily achieved by traversing the text (cleared of non-alphabetical characters) and updating the Markov Chain at every iteration.

// Markov Chain

{
    i: {
        freq: 104530,
        t: 13167,
        s: 14064,
        v: 2299,
        ...
    },
    t: { ... },
    s: { ... },
    ...
}

2. Assigning Characters to Key Rows

The letters on modern keyboards make up three rows: the home (middle), top, and bottom rows. Let’s start by grouping all 26 characters into these rows.

The home row
The home row

The most frequently read characters are to be placed in the middle row for fast access, followed by the top row, then the bottom.

// Top Row

["m", "f", "l", "d", "u", "w", "y", "g", "c", "b"][
  // Home Row

  ("n", "t", "s", "r", "a", "h", "o", "i", "e")
][
  // Bottom Row

  ("x", "p", "k", "v", "q", "j", "z")
];

3. Determining Optimal Permutations for Rows

Now that the keys are divided into rows, we need to reconfigure each row such that finger travel distance is minimized.

For each row, iterate through all key permutations. Let’s consider the optimal order of keys to evenly spread out each character’s frequency in the text. Performing this action on each row then yields the final product: our new and improved keyboard layout.

// Final Keyboard

["c", "w", "b", "m", "f", "g", "l", "d", "u", "y"][
  ("n", "h", "r", "o", "i", "s", "e", "t", "a")
][("v", "q", "j", "k", "x", "z", "p")];

4. Comparing to QWERTY

To ensure that our own keyboard layout is more efficient than QWERTY for our given text, let’s run each keyboard through a typing simulation by having each keyboard re-type the sampled text.

The simulation tracks a total cost score for each keyboard (lower is better). We track the position of every finger on each hand, changing these positions with finger movement. When a finger must change rows, a full point is added to the cost. When a finger moves laterally, a half-point is added. When typing a character requires the active hand to switch, no point is added, since that hand already had time to move into place.

Results

QWERTY (Original)

1347859.5 points accumulated, 100% efficiency (by default).

CWBMFG (Our creation)

1109799 points accumulated, 121% efficiency

Fun Use Cases and Applications

As mentioned earlier, this method can be used to generate keyboard layouts for any target language, where a language is just a collection of words. For fun, I tried my method on an extensive collection of Shakespearean plays and sonnets.

// sample.txt

From fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,

...

All of Shakespeare's works, 124k lines

Still, our keyboard remained more efficient for this language sample than QWERTY, with 121% Efficiency

// Keyboard

['w','c','b','m','f','g','l','d','u','y']
['n','r','h','o','i','s','e','t','a',]
['j','q','z','v','k','x','p',]

Takeaways

Now that we have determined a keyboard layout more efficient than QWERTY, we can start to expand our keyboard empire, right? Wrong. QWERTY is dominant in today’s modern world; it’s not going anywhere. Still, this experiment supports that old, ubiquitous technologies are not necessarily the best; they can always be improved. I encourage you to apply this mindset to other technologies that surround us. Without inquisition, innovation is out of reach.

Source Code (JavaScript)