One of the most challenging things about user interaction/experience design is learning how to maximize creativity and push for the best user experience within the bounds of your current technical limitations. This is perhaps especially true in a speech recognition environment where potentially the entirety of human speech is a possibility and whether or not your core technology is able to actually make sense of that utterance, determine intent and provide the correct response is regularly a complex issue that every company handles differently on one level or another.
It’s easy to shoot the moon and design for the ultimate, ideal functionality. What a good user experience designer does is build the best currently possible experience. It’s an art form in itself to learn how to design within your technical constraints for a particular project that’s usually on a tight timeline while also maintaining and advocating for technical developments so that you can continue push the bounds of your company’s user experience. User experience professionals are not just involved in active development of products, but in the research and advocation for development of future interfaces and the technology that supports them.
This essay was written circa November 2011.
The field of user interface design has been firmly established and built on a foundation of research and inquiry into determining what the user wants and how they want to accomplish it with computers – so long as the interface is graphical. As voice-controlled interfaces become more prevalent and more widely accepted, it is becoming clear that the same methodologies that apply to a visual computer medium are not as relevant for a verbal one. In conversational interfaces, we rely heavily on our own intuitions as to how conversations unfold with fellow humans. While this intuition has brought us far, there is a point where intuition alone is not enough.
Introductory linguistics classes often discuss the fact that everyone thinks that they are an expert on language. And why wouldn’t we? With our native languages, we feel in an instant if something someone has said is wrong or off by the slightest of shades; we sense the contours and tastes with exceeding strength and immediacy. But the average person is a mere enthusiast, and there is much more that goes on beneath the surface of the spoken word. This is something we have addressed on the technological level. Where there are clear rules to be found (in the sub-fields of syntax, phonetics, phonology and morphology), we have found and applied them. However, even in academia, the more social elements of language have received less serious inquiry (with perhaps the exception of sociolinguistics).
Given that even in academia, the sub-fields of pragmatics and discourse analysis receive little attention, it is not surprising that in computational linguistics they receive none at all. It is time, however, that these questions be delved into and applied with more tenacity to voice user interaction design—it is time that we stop taking the fundamentals in our approach for granted.
In researching fundamental questions and ideas, we open ourselves to truly defining the future of this field and doing so with a measure of certainty which we have thus far lacked. Technology lives and dies by how its users relate to the experience. We have the opportunity to address an issue which, until now, has not been undertaken with nearly enough tenacity. The essential question is this: Do users ultimately want a human-like interaction with their computer, or something else entirely?