"Hznai" and words as patten recognition

“It dseno’t mtaetr in waht oerdr the ltteres in a wrod are, the olny iproamtnt tihng is taht the frsit and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it whotuit a pboerlm.”

The above passage is taken from an interesting article written by a web-acquaintance of mine. This is an example of typoglycemia. The explanation for this is that generally we treat written words primarily as patterns. We only tend to look at each individual letter if the pattern is unfamiliar such as in a word that we do not know. Unlike Chinese, English words are composed of letters and their arrangement will give us some help in determining how the unfamiliar word is pronounced. Of course, given the numerous irregular rules of English the word may be pronounced in a quite different way to what the component letters may suggest to the reader!

That words are not fully read has been established by various experiments. Most of us read much faster than we might do if it was necessary to register every letter. Many of us can also correctly comprehend a larger number of words than we correctly spell. In one test I saw participants were rapidly reading out loud a prepared text. Unbeknown to them the text had deliberate mistakes such as “bifferent”. This was pronounced as “different” on the playback. Context doubtless also had an effect on such corrections.

One of the things that strikes me about the above passage is that it seems to suggest there are a fairly limited selection of commonly used word endings in English:

“-m”, “-n” and “-ng” are used.

“-d” and “-t”

“-r” and “-l”

“-k” and “-g”

“-s”

“-z” and “-x” and some other letters are rarer but not unknown.

Vowel endings are also used. “-y” is phonetically “-i”. Some of the “-e” endings would probably be replaced by the above consonants if the words were first rendered into a more phonetic form. Words like “bole” or “fare” have homophones such as “bowl” or “fair”.

In a more practical vein grouping words by “first letter, last letter, approximate length” may greatly improve the capabilities of search engines and similar systems. One can envision a search engine mode where one enters the first and last letter and the word length. Words of six to nine letters would be grouped together, as would words of eight letters or more.