DEV Community

Cover image for Keyboards: Here there be Dragons

Keyboards: Here there be Dragons

Here at CoScreen from the very beginning our goal has been to bring multi-user remote collaboration into the modern age. Multi-user operating systems have been a thing for some time, and collaboration has been possible in various aspects for decades via various terminal-based tricks and tools. **Then Xerox PARC came along and popularized GUIs: and those too believe it or not were originally imagined to also be potentially multi-user and collaborative. Still, somewhere along the way, something was lost. Computers became personal computers, and the world of multi-user collaboration fell to the fringes of time and history.

Windows was originally built on DOS, which by its nature was never even multi-user, and then jury-rigged onto NT (which was) to preserve in function many of these classic aspects, only to adopt a sane security and permissions model. OSX was built in part on FreeBSD (Unix), and inherited all of the multi-user checkboxes, but made little use of them, building a windowing system very much designed for only one user, one window, and one mouse.

Alt Text
Videoconferencing on NLS (1968)

Douglas Engelbart’s “Mother of All Demos” demonstrated many of the aspects of modern computing we have all come to take for granted. Windows, hypertext, graphics, video conferencing, a computer mouse; and the somewhat forgotten but not lost real-time collaborative editing. Much of what separated Engelbart from his colleagues that would go on to found Xerox-PARC, and define user interfaces for decades to come, was that “[Engelbart saw the future in collaborative, networked, timeshare (client-server) computers, which younger programmers rejected in favor of the personal computer.]”(https://en.wikipedia.org/wiki/Douglas_Engelbart)

Our ultimate goal is for CoScreen to fade into the background and just become part of the underlying operating system, unobtrusively making your computer remotely collaborative. It turns out modern operating systems have deviated so far from the concepts imagined originally that they resist collaborative use.

While on the road to this dream, we have encountered and defeated many exciting and challenging problems. Some of these problems have nearly defeated us or sent some of our engineers onto the very edge of their sanity, especially those problems that haven’t been the problems we expected to encounter.

Keyboards, international layouts, and their complexities are one of these problems. As an engineer, you would probably expect that most of the problems dealing with international keyboards and remote control have long been solved decades ago in VNC, RDP, or in commercial remote control solutions like NX, TeamViewer, and the like. Surprisingly even after decades, technologies like RDP still often involve a lot of individual configuration, tweaking of settings, and several unexpected issues regarding the use of international keyboards. All of this individual configuration, keyboard layout files, and complexities, will immediately burst the bubble of almost any person using an international keyboard with a mismatched remote system.

CoScreen’s intent regarding how a user’s keyboard interacts with a shared window is simple — do what the user wants us to do. At times this means rendering a particular character into the shared window, while other times, this means delivering an exact key sequence to the target application.

This goal is simple to understand and state; however, like all things behave intuitively with respect to users in the realm of computation, the complexities of achieving it are not.

Alt Text
Thanks: u/jch2617 → r/ProgrammerHumor

Terms

To make sense of what we are talking about moving forward, a number of technical terms need to be understood:

  • Dead Key: A special key used to attach a diacritic to a character.
  • Input Method Editor: An OS component used to generate characters that are not on the input device [2].
  • Logical Layout: A virtual mapping of characters applied over, and superseding, the physical layout. For example: choosing Dvorak as the Input Source on MacOSX.
  • Local layout: The local logical layout that is active
  • Local user: The user who is sharing an application
  • Physical Layout: The physical arrangement of keys on a keyboard. This is technically a composite of the form factor and layout of the keys — e.g. Full / ANSI [3].
  • Remote user: The user with whom the content is shared and is the one delivering the keys.
  • Remote layout: The logical layout of the remote user
  • RemoteWindow: The name of the object that represents a shared window in CoScreen.

The matrix of possibilities regarding input methods from various users in a CoScreen is pretty mind-boggling. To simplify, we have focussed on several key confounding factors, which are as follows:

  • A remote and local user may have a different logical layout applied to a physical layout.
  • A user may be entering a key sequence, symbol, or emoji, that relies on Input Method Editors (https://en.wikipedia.org/wiki/Input_method) and “Dead” keys.
  • An application may override behavior for an OS. Emacs, for example, uses the Option key on macOS differently from other applications. In Emacs, the Option sequence is processed, rather than the special character that is injected when an Option sequence is pressed — Option + f renders the character ƒ, while in Emacs this triggers the command to move forward a word.
  • The shifted characters may not align between logical layouts. For example, much of the shifted keyspace in a German layout matches US, but the 7 key translates to / when shifted.

CoScreen’s Solution

We experimented with a wide variety of different solutions. Our original approach used the underlying keycode, and in our native layer, we attempted to translate and map those codes via a more traditional VNC or RDP-like approach. We could write an entire article just on the subject of event simulation and injection on Win32 and OSX, which was our original primary engineering focus.

The result was input that worked pretty well with a standard keyboard, by interpreting and transmitting most international characters as Unicode. Still, it would fall on its face by misinterpreting some control sequences like in the Emacs example given above and didn’t work at all with IMEs.

Now on to what did work: When possible, key processing is kept common across platforms. However, this only works to a point because the operating systems differ in their key injection handling and facilities.

CoScreen renders and composites all shared windows to natively managed hardware-accelerated surfaces using a specialized video stream containing content to display, and metadata about where and how to display it.

But CoScreen is also an Electron-based application and uses Electron Windows in order to leverage the flexibility and speed of modern web-application based programming. It does this by reaching into the native windows and attaching our own native surfaces to them, but underneath our hijacked native rendering surface is an Electron BrowserWindow, and we leverage it for event input. By standing on the shoulders of giants in terms of leveraging the event input work of the Chromium browser, we can generate a much better general input strategy, without compromising performance.

All user key input events are intercepted by a hidden element in our Window. Most actions (click, key, etc) focus the input element to ensure it processes all keys (there is a known issue as of 6/13 on Windows where the window must be re-focused).

The following events are critical for key input processing:

  • keydown
  • keyup
  • beforeinput
  • compositionstart
  • compositionend

To fit into the current architecture, keyup and keydown events are synthesized for compositions and are sent one Unicode character at a time. If an event is the result of a composition, a new flag isComposed is sent with the key payload.

keydown

When a keydown is received, it is not necessarily clear whether this is a keystroke that can be forwarded or whether it starts a composition because the isComposing parameter will generally be false for the first keystroke in a composition. Depending on the value of the key field, one of three actions will be taken:

  • Do nothing: This will be done if the key is one of the reserved IME key strings.
  • Immediately forward to the window: This is true for control keys (detected with length > 1), keys where repeat is true, and keys where ctrlKey or metaKey are true.
  • Cache: All other keys.

beforeinput

This event is sent before the input is ready on the input element. If the isComposing flag is true, the last cached keydown event is discarded because the composition has started. If the flag is false, the last keydown event is popped from the queue and sent to the Window.

keyup

This event represents a key being raised. If the isComposing flag is true then no action is taken. If the flag is false, and there is a completed composition is in the queue, a keyup is synthesized and sent to the Window. If no composition is present the event is sent unaltered.

compositionstart

This event represents the start of IME composition. All existing keydown events are cleared.

compositionend

This event represents the end of composition. A keydown event is synthesized from the result of the composition and sent to the Window. The result of the composition is queued to be sent when the last keyup occurs.

Post Window Processing

The following flow diagram represents the processing common to both operating systems after the above IME processing.

Alt Text

Remote user

  1. The resultant key up/down payloads are forwarded to the RemoteWindow instance responsible for the shared window.
  2. If the platform is macOS, and ALT is down, the un-altered key is obtained so that it can be used for translation by the receiver. The reason for this is that macOS exposes a new keymap when Option (Alt) is down. In that case f (on a US keyboard) becomes ƒ. Unfortunately, no keyboard has this natively, which causes our lookups to fail. Normally, we’d deliver the character, but for applications like Emacs, we need to deliver the key sequence with the un-altered character to achieve the desired effect (move forward a word in the case of Alt-f).
  3. Send the key payload to the local user.

Local user

  1. Determine if the incoming key is a dropped key (which can vary by operating system. We drop keys for which we don’t have a translation strategy today)
  2. Determine if the target is Emacs and apply the app specific policy (ignore the modified key if Alt is down and use the unaltered key).
  3. Forward to platform specific native processing.

Windows

All windows processing takes place in the native layer. With Windows, one can either inject a virtual key up/down, or submit a unicode character. The latter will not take the place of keystrokes for applications (e.g. vim’s gg) and cannot be used generically, though this was desired and tried.

The flow is:

  1. If the isComposed flag is true, inject a unicode character.
  2. If the key is a non-character, inject the virtual key based on the code field.
  3. If the key can be mapped to the local key layout, inject this key.
  4. If no character in the local keymap exists, but Ctrl or Alt are down, inject the virtual key from the key code.
  5. Inject a unicode character.

In each case, because windows is stateful, a cleanup task is queued for 500ms later. This task will inject a keydown for the previous keyup if none were sent. Failure to do this will result in “stuck” keys.

macOS

macOS specific processing has components in the JS and native layer.

In the JS layer, the native-keymap is used to look up the virtual key code in the local keymap corresponding to the code sent by the remote user. This is how we properly handle translations from Dvorak to non-Dvorak. If found, this code value is used instead.

The native layer looks up the generic key code from the macOS headers and sends an event to the input system. If the isComposed flag is true, -1 is used for the code.

  • Note: This area is a bit strange, and was the result of some trial and error. In particular, failure to send -1 with a chinese character (for example) will cause the input to render the character equivalent to the specified code.

Keyboard Events

Keyboard events have several fields, the following of which are used by CoScreen:

  • key - The character string sent by this keypress. This will reflect the logical keymap.
  • code - A string representing the key on a physical QWERTY layout keyboard. This means that any other layouts will not be taken into account. In general reliance on this field is problematic for us, but we do use it to look up key codes in some capacity.

In summary, making use of the amazing work done by the Chromium team is an enormous shortcut, but is by no means simple as this article has demonstrated. We still have much to learn regarding remote control, and input, and with international customers onboarding, and combined international teams becoming the norm, we hope to become a tool that makes the experience of working with your CoWorkers remotely just work, regardless of where they come from, or what tools they use. We hope to one day be able to make as close of a semblance to the collaborative, networked, computation environment, envisioned by early pioneers in the field, as possible.

References
https://en.wikipedia.org/wiki/Dead_key A dead key is a,the key struck immediately after.
https://en.wikipedia.org/wiki/Input_method
https://blog.wooting.nl/the-ultimate-guide-to-keyboard-layouts-and-form-factors/
https://www.w3.org/TR/ime-api/#introduction

Top comments (0)