Visual signals and interfaces have information density and permanence, which gives gleanability. You can put lots of things into your visual field (e.g. multiple application windows, multiple computing devices, multiple inanimate objects), where they'll stay put without further input. Inherent cognitive ability even allows us to track multiple objects in motion throughout a scene and keep them coherent -- which is the skill that enables driving. Visual interfaces are very well suited to how humans best absorb information, and how they context switch given a large number of potential tasks.<p>A world where we begin to move off of visual interfaces will be awkward. While humans are good at absorbing conversational audio, they mentally filter most of it out to distill it down to its essential elements, and knowing what's essential may not even be known ahead of time. We'll direct voice-outputting interfaces to repeat things often, and they must be smart enough to accurately determine the context of our inquiry.<p>Voice output is often paired with voice input, but voice propagates well in public, leaking information to everything in range. Devices that capture speech-like input in a private way are not yet widespread. Meanwhile, structured command input through voice is awkward, and natural language processing doesn't sound natural yet. It's complex to implement and the computer frequently encounters a situation it doesn't yet understand, which is the most discouraging kind of interaction one can have with a computing platform. Factors like these highlight that audio-based interfaces are rarely programmed to be discoverable, and even if they were, exchanging that information over audio is less efficient than doing so visually.<p>New research into interface design is needed to address many of the shortcomings of current attempts to de-emphasize screens.