The now standard Qwerty keyboard layout that originates from 19th century is often assumed to be suboptimal, reason being that it was designed to cope with mechanical limits of typewriters. In order to avoid jamming of keylevers, letters often sequentially used had to be placed apart from each other, so that the levers would not interact. While true or not, various alternative layous have been proposed that are claimed to be much better optimized with typing speed and comfort in mind. First of these was the Dvorak Simplified Keyboard (or commonly just Dvorak) in 1932 that put more emphasis on using the middle (home) row and alternating hands after each key press. This is until today the most prominent alternative layout, but Colemak designed in 2006 is becoming increasingly popular. Supposedly, Colemak further increases the use of home row, optimizes the placement of more frequent letters and improves on rolling motions. While it’s not known how exactly was the development of Colemak aided by computers, the intricate computer assisted design of Qgmlwy is publicly available on the Carpalx project website. Qgmlwy should require the least effort when typing.

This post will evaluate differet aspects of the aforementioned keyboard layouts for typing in English. The differences between these layous are known but this post will also attempt to outline the magnitudes of these differences.

# Data entry

Let’s load necessary packages and objects. The latter includes theme and color scheme for plots and some functions for text formatting.

To begin with, we need to imput some data. American National Corpus data gives us the most common words and their relative frequencies in English. These frequencies are used as the basis for creating a text sample of 100 000 words which will be employed later in the analysis.

For evaluation of the placement of letters, the R’s matrix provides an appropriate data format. This does not perfectly reflect the staggered layout of most keyboards, but for the purposes here it’s close enough.

# Data preparation

We begin by creating a matrix that has a row for each word and a column for each symbol. Values of these columns indicate the count of occurrences of these symbols.

After simply binding the matrix with the initial data frame, we have another data frame we can use for our evaluation.

# Plot functions

To present the results of our analysis in an intuitive manner, we create a PlotBar function to illustrate results in absolute values and PlotDensity function to show the distribution of values by words.

# Frequencies of letters on different layouts

First off, lets compare the frequecy of letters in English corpus. Each letter in our data frame is multiplied by its actual count. Then we calculate the share of each letter from all letters.

Let’s plot the result.

Next, we define a function PlotFreq that plots these calculated frequencies. Again, it’s not the exact impression of standard keyboards with staggered layouts but good enough. We will use this function to produce graphs for each layout

The placement of letters on Qwerty does imply that the layout is not optimal for touch typing. Most of the more frequent letters are on the upper row and rather infrequent letters occupy the home position of the strongest fingers. The middle row almost seems to follow the letter ordering of the alphabet. On the other hand, some more infrequent keys are on the edges which should decrease the fatigue of weaker fingers.

It’s evident from the plot that Dvorak puts more emphasis on the home row but it still lacks in this regard, especially with its location of R and L. Hand alternation is increased by having vowels on one side and most of consonants on the other.

Colemak seems to be an improvement by placing all the most frequent letters on home row.

On the graph below, Qgmlwb does seem like the most optimal layout. Most frequent letters are on the home row and under the strongest fingers or near them. Placing consonants on one side and vowels on the other should result in improved alternation between hands.

# Share of letters typed on homerow

The code chunk below calculates the number of letters that can be typed on home row in each word for each layout. Note that there are likely more elegant ways of doing this than several nested apply functions. But this works, too.

When using any of the alternative layouts, more than twice as many keystrokes are executed on home row as with Qwerty. There is little difference between the alternative layouts, however.

The density plot below illustrates the distribution of words by proportion of home row letters for each layout. In case of all layouts, there are many words that require half of keystrokes to be executed on home row. However, when using Qwerty many words are also typed without using home row at all, while the design of other layouts allows typing a large share of words only on homerow. Note that Colemak and Qgmlwb have the same letters on homerow which is why the line for Colemak is not visible here.

# Hand alterations

Hand alteration here means typing consecutive letters with different hand fingers. This should speed up typing as it allows one hand to move to the next position while another hand is doing the key pressing. In order to determine the number of hand alternations in each word, we first assign l to each letter typed with a left hand finger and r to letters typed with a right hand finger. Then we calculate the number of different consecutive letters in each word in our sample.

On the bar chart below, the sum of all alterations is divided by the sum of all characters. It appears that Dvorak which was designed with this feature in mind, is the most successful in this respect with Qgmlwb being quite similar. When taking into account typing only single words, most of keystrokes on these layouts are followed by a press with another hand finger. This is not true for Colemak or Qwerty.

Looking at density chart, the previous finding is confirmed. Qwerty has the most words without any alterations, while most words fall on the right side of graph when typed on Dvorak. However, on all layouts it’s most common to change hands after every two letters, again not taking into account spaces between words.

# Inwards rolling motions

Inwards rolling motions are keystroke sequences that are executed from little finger towards index finger. For instance, on a Qwerty layout it’s convenient to type wear but read is not as natural. In order to evaluate the frequency of such movements, we replace each letter with it’s row number and then count certain patterns (rolling.motions) in each word. Here we consider sequences over multiple columns also as inwards rolling sequences (e.g. AS, AD, AF and AG on Qwerty layout).

Inwards rolling motions seem to be most frequent when using Qwerty but also Colemak layout where 16-17% of keystrokes are part of such movements This is expected in case of Colemak which was designed with this idea in mind. There seems to be a certain tradeoff between hand alterations and rolling motions since Dvorak and Qgmlwb seem to perform worse in this area.

Word density suggests that layouts do not actually differ very much in respect to the typing gestures considered here. In a lot of words inward rolling sequences do not occur at all on any layout and there are very few words where most of keystrokes are part of such motion.

# Conclusion

The previous knowledge of the advantages keyboard layouts gained some support with this simple evaluation. Qwerty is significantly less optimal than other layouts in terms of letter placement and home row usage. While Dvorak and Qgmlwb favor switching between hands, inwards rolling sequences are slightly more frequent when Qwerty and Colemak are used.