Generating Optimal Keyboard Layouts (with Code)
Today, the ubiquitous QWERTY keyboard is assumed to be the optimal layout for typing in the English language. Because QWERTY was developed in the 1870’s, I challenged myself to use modern computer science to generate a more optimal keyboard.
More generally, this task can be applied to any language or collection of words. More on this later!
Necessary Assumptions
To find the one, true best keyboard layout, one must use brute force: testing every permutation of keys for typing efficiency.
I, for one, do not have infinite computing power.
To accomplish this task in a reasonable timeframe, let’s follow these assumptions:
Hitting keys on the home row is always faster than the top row, and the top row is always faster than the bottom row
The number, function, and special character keys are not to be repositioned. Their existence is also assumed to be negligible.
The sample text accurately represents the extent of the language.
1. Generate Markov Chain from Provided Text
The method we are using extracts language from a text file. The more expansive the file, the better.
// sample.txt
It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.
...
Pride and Prejudice, Jane Austen (13k lines)
Our keyboard models need to understand how characters relate to each other in various contexts. Introducing the Markov Chain. A Markov Chain, in this case, is a statistical model that maps each character to the frequency of each character that follows it. This is easily achieved by traversing the text (cleared of non-alphabetical characters) and updating the Markov Chain at every iteration.
// Markov Chain
{
i: {
freq: 104530,
t: 13167,
s: 14064,
v: 2299,
...
},
t: { ... },
s: { ... },
...
}
2. Assigning Characters to Key Rows
The letters on modern keyboards make up three rows: the home (middle), top, and bottom rows. Let’s start by grouping all 26 characters into these rows.
The most frequently read characters are to be placed in the middle row for fast access, followed by the top row, then the bottom.
// Top Row
["m", "f", "l", "d", "u", "w", "y", "g", "c", "b"][
// Home Row
("n", "t", "s", "r", "a", "h", "o", "i", "e")
][
// Bottom Row
("x", "p", "k", "v", "q", "j", "z")
];
3. Determining Optimal Permutations for Rows
Now that the keys are divided into rows, we need to reconfigure each row such that finger travel distance is minimized.
For each row, iterate through all key permutations. Let’s consider the optimal order of keys to evenly spread out each character’s frequency in the text. Performing this action on each row then yields the final product: our new and improved keyboard layout.
// Final Keyboard
["c", "w", "b", "m", "f", "g", "l", "d", "u", "y"][
("n", "h", "r", "o", "i", "s", "e", "t", "a")
][("v", "q", "j", "k", "x", "z", "p")];
4. Comparing to QWERTY
To ensure that our own keyboard layout is more efficient than QWERTY for our given text, let’s run each keyboard through a typing simulation by having each keyboard re-type the sampled text.
The simulation tracks a total cost score for each keyboard (lower is better). We track the position of every finger on each hand, changing these positions with finger movement. When a finger must change rows, a full point is added to the cost. When a finger moves laterally, a half-point is added. When typing a character requires the active hand to switch, no point is added, since that hand already had time to move into place.
Results
QWERTY (Original)
1347859.5 points accumulated, 100% efficiency (by default).
CWBMFG (Our creation)
1109799 points accumulated, 121% efficiency
Fun Use Cases and Applications
As mentioned earlier, this method can be used to generate keyboard layouts for any target language, where a language is just a collection of words. For fun, I tried my method on an extensive collection of Shakespearean plays and sonnets.
// sample.txt
From fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
...
All of Shakespeare's works, 124k lines
Still, our keyboard remained more efficient for this language sample than QWERTY, with 121% Efficiency
// Keyboard
['w','c','b','m','f','g','l','d','u','y']
['n','r','h','o','i','s','e','t','a',]
['j','q','z','v','k','x','p',]
Takeaways
Now that we have determined a keyboard layout more efficient than QWERTY, we can start to expand our keyboard empire, right? Wrong. QWERTY is dominant in today’s modern world; it’s not going anywhere. Still, this experiment supports that old, ubiquitous technologies are not necessarily the best; they can always be improved. I encourage you to apply this mindset to other technologies that surround us. Without inquisition, innovation is out of reach.