Intelligent Punctuation, or How Software is 98% Perspiration

Perhaps Thomas Edison’s most enduring legacy after the incandescent bulb (now on its way to join the gas light and the horse carriages in the annals of human history) is the adage, “Genius is 2% inspiration and 98% perspiration.” While this surely applies in many areas of applied science and engineering, it is especially evident in software. We run into this every day when developing PadKeys Keyboard, our iOS on-screen keyboard app.

The aim of PadKeys is to bring a more full-scale keyboard to iOS, where you can punctuate sentences and type numbers without needing to switch to alternate layouts. In general, it’s more efficient to use for doing more “work-ish” kinds of things, though the smaller keys that the full layout brings means you give up some speed when banging out simple texts. And of course using any on-screen keyboard gives something up compared with using a physical keyboard. Nearly all software keyboards, including ours, implement some simple automation to try to make up for this ergonomic deficiency.

Autocorrect is an example of such automation, as well as attempting to do basic things like capitalize your sentences without you needing to hit ‘Shift’, and inserting a period after two spaces since the keyboard can safely assume you wanted to end a sentence. Since PadKeys is about efficiency and also makes some ergonomic sacrifices in the name of the greater good of avoiding extra layouts or “modes” in the keyboard, we’ve naturally implemented these basics, and even included some extras like autocompleting single letters to the most common words in the current language starting with them. (In English, ‘t <space>’ will give you “the”, ‘o’ “of”, ‘f’ “for”, and so on.) But one of our users suggested we could do more by adding a bit more “intelligence” to the keyboard.

He had noticed that punctuation is written with particular patterns of spacing around it. We put space before and after a dash — but only after a comma, or a period. And directional punctuation like (parentheses) and “quotation marks” have spaces before or after depending on their orientation. All of these could be implemented by the keyboard: either automatically inserting spaces after punctuation characters, or before, or even removing space that shouldn’t be there , like if we decide to end a clause or sentence after already inserting a space (or having one inserted after an autocorrection).

It sounded like a great idea, and doable because iOS provides keyboard apps with the text around where the user is typing. It took us a few weeks before we could start on this, but once we got rolling on it, we figured all we’d need to do would be to fire up some logic when the user typed a punctuation character to add or delete spaces appropriately. We started by making a list of the punctuation characters that we wanted to act upon. This included comma, period, semicolon, and colon, and we soon determined that any “clause ending” character fell into the category. Then we deleted space before these characters if it was there, and added a space afterward.

This simple implementation worked fairly well in many cases. The first complication we ran into was typing things like dates, times, and numbers. In these cases the rules change – you don’t want any spaces in 3.14 or 12:15 for instance. We worked around this by checking what kind of character lay before the cursor, skipping any action if it were a number. Then we ran into problems when the user wanted to type punctuation characters together, such as in … or --. Additional complications came when parentheses or quotation marks were involved, and handling situations when the user selects offered word suggestions while in the middle of other text and punctuation.

Overall it took several cycles of iteration internally and two external releases until we had it “right”, or at least only rarely causing annoyance. It’s still not perfect and we still don’t have the guts to turn it on by default. But it’s good enough that it leaves us with a feeling of going back to “manual mode” when using a regular computer keyboard and having to insert and delete all those spaces by hand.

The current code involves constants for character classes, and local state variables to track the context while executing multiple detection and conditional statements. It is really a textbook example of software development, where a simple idea and simple implementation become gradually more complex, as new possibilities and wrinkles are discovered in the real-world situations it deals with. The challenge, as always, is not to anticipate all of this from the beginning, but to evolve the design and approach so the code is as simple as possible — but no simpler.