SUITEKeys


 

(Copyright 1998 ACM ASSETS '98. Appears in Proceedings of The Third International ACM SIGCAPH Conference on Assistive Technologies, April 15-17, 1998, Marina del Rey, CA, USA, pp. 108-115)


SUITEKeys: A Speech Understanding Interface
for the Motor-Control Challenged

Bill Manaris and Alan Harkreader

Computer Science Department
University of Southwestern Louisiana
Lafayette, LA 70504-1771
(318)-482-6638

{ manaris | ahark } @usl.edu

 

ABSTRACT

This paper presents SUITEKeys, a continuous speech understanding interface for motor-control challenged computer users. This interface provides access to all available functionality of a computer by modeling interaction at the physical keyboard and mouse level. The paper briefly discusses the advantages and disadvantages of using speech at the user interface; it outlines the user-centered approach employed in developing the system; it introduces the formal model of the user interface in terms of its conceptual, semantic, syntactic, lexical and acoustic levels; it describes the SUITEKeys system architecture which consists of symbolic, statistical, and connectionist components; it presents a pilot study for assessing the effectiveness of speech as an alternate input modality for motor-control challenged users; and closes with directions for future research.

Keywords

Accessibility, input devices, intelligent user interfaces, keyboard, mouse, motor-disabilities, natural language, selectable modalities, speech recognition.

1. INTRODUCTION

Personal computing platforms typically include an operating system with a graphical user interface. The user interacts with the computer through the interface using a conventional keyboard and mouse (input devices), and a display (output device). From a human-computer interaction perspective, graphical user interfaces are very effective for visually-abled users because they facilitate developing and maintaining a conceptual model of:

  • the available computer functionality (by providing a graphical representation of it on the display), and
  • the means to activate this functionality (by accepting combinations of keystrokes, mouse movements, and mouse button clicks).

However, such platforms make implicit assumptions about the user’s visual and motor-control abilities: The user is presumed to be fully able to interact within the prescribed modalities as defined by the conventional input and output devices; specifically, the user needs to be able to type on the keyboard, move the mouse, click the mouse buttons, and view the display. In fact, the effectiveness of the graphical user interface depends on the user conforming to this model of interaction. Able-bodied users have difficulty imagining how ineffective a personal computer would be if one or more of the supplied input/output modalities became unavailable.(footnote 1) Nevertheless, this is the case for anyone who by task or by disability is unable to utilize the keyboard, mouse, or display.

As computers increasingly become an integral part of people’s educational, working, and private lives, it is essential to develop interfaces that exclude as few users as possible. In fact, federal legislation motivates the development of technologies that provide computer accessibility to users with disabilities [14; 19, p. 24]. In the context of physical or virtual motor-control impairments,(footnote 2) numerous products are available which address this issue; these products enable users to augment, or even replace the available input modalities, thereby extending their computer’s usability.

Other products aim to provide greater comfort, such as ergonomically designed chairs, arm and wrist pads, and mechanisms to customize the position of the keyboard and mouse – especially useful for repetitive-stress injuries. Finally, some customizations are standard features of the operating system; for example, Microsoft’s Windows®95 and NT 4.0 provide accessibility options that allow keyboard-layout customization, facilitate sequential input of concurrent keystroke sequences, such as <ALT-CTRL-DEL>, (StickyKeys), and provide for control of the mouse through the keyboard (MouseKeys) [13].

These products can be classified as follows:

  • Systems which augment motor-skills within existing input modalities; examples include various trackballs, touchpads, extended keyboards, split keyboards, chord keyboards, and other tactile input devices, such as the Keybowl [12].
  • Systems which replace existing input modalities with alternate ones; examples include various on-screen keyboards, such as WiVik2® [17].

Additionally, systems may be further subdivided with respect to whether they provide access to all potential computer functionality, or they sacrifice completeness for usability. Examples of the latter include StickyKeys [13], and state-of-the-art speech recognition applications (see Section 1.2).

1.1 Selective Modalities

Ideally, the user should have a choice of input and output modalities along with the ability to use multiple modalities within the context of a single application process [15]. Kawai, et al. proposed the concept of abstract widgets as a means of developing interfaces with selectable modalities [5]. This model focuses on separating the graphical user interface from an application’s functionality. Abstract widgets allow the developer to disassociate the semantics of human-computer interaction from a particular interface modality. Thus, the user is potentially offered equivalent application functionality regardless of the modality of interaction. A difficulty with this model is that, although potentially very useful as an application development framework, it cannot easily be applied to existent applications. Current applications must be redesigned and re-implemented to benefit from this approach.

A more effective alternative is an interface model that introduces an abstraction barrier between the graphical user interface and the standard modalities defined by the keyboard, mouse, and display. By abstracting the data communicated through a standard modality from the physical device which produces/consumes the data, the physical device need no longer be considered essential. Alternate modalities may then be implemented that produce the abstracted input, thereby providing expressive power equivalent to the standard modalities, but without the need to modify existent applications.

1.2 Speech as Input Modality

Speech is a natural means of communication among people. Since speech does not rely on touch or vision, it is very appealing as an alternate means for humans to communicate with computers. Natural language has been recognized as offering many benefits in human-computer interaction [9, 11, 16]. Text-to-speech and screen-reader utilities have proven useful general output mediums for users who cannot view a computer display [6, 18].

Although speech provides for a great hands-free medium of human-computer interaction, it allows for ambiguous input. A speech recognition system may fail to recognize that which was spoken and recognize falsely that which was not spoken. Additionally, spoken input may not be included in the speech system’s language model. Thus the design of an effective speech input system needs to include robust error handling and ambiguity resolution. Furthermore, the user interface should provide a simple, intuitive means of recovering from speech recognition errors.

There exist a number of systems that provide speech input interfaces. However, they only provide interfaces to specific computer applications (or collections of applications), possibly with limited operating system functionality. For example, Dragon Systems, Inc.’s DragonDictate® only allows a user to verbally control a set of standard applications. Dragon Systems’ more recent product, NaturallySpeaking®, is even more restrictive in range – all input is directed to a proprietary text-editor window from which the user must cut and paste dictated text to other running applications [2]. Command Corp. Inc.’s IN CUBE facilitates Microsoft Windows® operating system navigation by mapping spoken phrases with predefined keystroke sequences [1].

While technologies such as these are particularly well-suited to a specific domain (e.g., dictation), they do not afford the user general expressive capabilities equivalent to use of the keyboard and mouse, and thus do not in themselves represent completely viable input modality alternatives.

In this paper we describe SUITEKeys, a speech interface that provides full access to the communicative power of the standard keyboard and mouse input modalities, and thus makes speech an effective and completely viable input modality for computers. SUITEKeys is being developed as an application in the context of the SUITE project at University of Southwestern Louisiana [10]. This project focuses on integration of speech recognition and natural language processing techniques, in terms of algorithms and software tools which support rapid prototyping, construction, evaluation, revision, and maintenance of speech understanding interfaces. These interfaces incorporate symbolic, statistical, and connectionist components which model acoustic, lexical, syntactic, semantic, and pragmatic aspects of a given application domain.

2. DESIGN METHODOLOGY

SUITEKeys is being developed following a user-centered approach in an attempt to produce an effective interface for the target user group. Thus, user feedback is integral to the development process. Specifically, we are employing the Star Model for user interface development which emphasizes rapid prototyping, analytic (top-down) and synthetic (bottom-up) approaches, and user evaluation throughout development [4]. Representatives of the target user population are being consulted to provide formative and summative evaluation during various stages of development. Up to this point, user consultants have been interviewed to refine requirements and to evaluate preliminary SUITEKeys design options; users have evaluated a functional prototype of SUITEKeys; and, finally, user subjects have participated in a pilot study (see Section 6).

3. FUNCTIONAL SPECIFICATION

SUITEKeys is a speech understanding interface which incorporates a continuous, speaker-independent model of speech to allow users to manipulate a virtual keyboard and mouse. It allows users to input sequences of keystrokes and mouse actions using natural language, much as if they were describing to someone how to perform the same actions with a physical keyboard and mouse. SUITEKeys assumes that the user speaks English and has no speech impediments. To maximize effectiveness and usability, the system incorporates a complete model of the physical keyboard/mouse functionality. The operating system and its applications are unaware that the keyboard and mouse actions originated as speech events. Similarly to the physical keyboard/mouse devices, SUITEKeys attaches minimal meaning (semantics) to the sequences of keyboard and mouse actions (see Section 4); any other possible meaning is left to be derived by the operating system and its applications.

Due the nature of the application, the system models natural language at the level of keyboard/mouse objects and actions; at this level, we are dealing with a well-defined and relatively small subset of natural language – on the order of two hundred words. This subset can be effectively modeled given the state-of-the-art in speech recognition and natural language processing. Actually, this low-level approach makes SUITEKeys the only (as far as we know) speech interface that can provide complete access to all available computer functionality. Other speech-enabled applications attempt to model interaction at the word- or sentence-level by focusing on existing application menu keywords or providing limited subsets of "free"-form phrases.(footnote 3) Consequently, they only work within their predefined domains.

Users interact with SUITEKeys by uttering keyboard keys. Considering the difficulty of recognizing alphabetic characters, the system recognizes both regular and military pronunciations of such characters. It also recognizes words such as "Press" and "Release" and employs syntactic and semantic rules to structure interaction (see Section 4). SUITEKeys maintains a list of frequent words which can be selected to complete a given keystroke sequence. When the confidence level of the speech engine is low, the system presents a list of recognized input alternatives (e.g., select between the keystrokes B and P). As interaction is real-time, errors are handled similarly to regular keyboard-entry errors, i.e., by "pressing" the BACKSPACE key. Since other speech-enabled applications frequently have language models and user interfaces that are optimized for a particular task, such as dictation, SUITEKeys can be switched to a non-active state in which audio input is not interpreted as SUITEKeys input. This way, a user may switch to another application, such as DragonDictate, to perform a particular task, and then switch back to SUITEKeys.

4. SUITEKeys FORMAL MODEL

SUITEKeys models the keyboard and mouse as virtual devices. Due to space limitations, this section briefly describes the conceptual, semantic, syntactic, lexical, and acoustic levels of the interface model.

4.1 Conceptual Level

The conceptual level consists of the System object, which is subdivided into InputDevice and OutputDevice objects. InputDevice consists of Keyboard and Mouse objects, which stand for the virtual keyboard and mouse devices, respectively. These are further subdivided into Key and Button, respectively, corresponding to the available keyboard keys (e.g., TAB, ENTER, BACKSPACE, Q, W, E, R, T, Y, etc.) and mouse buttons (i.e., LEFT, RIGHT, and MIDDLE).

OutputDevice currently consists of Status and WordList objects. Status corresponds to various status indicators identifying the system status (i.e., ACTIVE, PASSIVE, and ERROR), the type of the last action recognized (i.e., KEYBOARD, MOUSE), and the action itself (e.g., <ALT-TAB> – assuming that ALT was "held" down, TAB was "pressed" and then both were "released"). WordList corresponds to instances of selectable word lists displayed by the system. Examples include a list of recognized input alternatives (presented to the user when the confidence level of the speech engine is low) used to recover from input ambiguities; and a list of frequent words(footnote 4) which can be selected to complete a given keystroke sequence.

4.1.1 System Object

The SUITEKeys system object maintains the state of the interface, coordinates interaction among the various subsystem objects, and handles errors. SUITEKeys operates in one of three major modes: Active, Passive, and Error. While in active mode, all audio input is analyzed as verbal keyboard, mouse, or system commands. Misunderstood audio input results in error handling during which SUITEKeys interacts directly with the user in order to resolve the error. This error handling results in updating the user phonetic model to adapt to the user’s speech patterns.(footnote 5)

Successfully understood input results in the operating system being notified of a keyboard or mouse event. While in passive mode, SUITEKeys monitors the audio input for a specific attention word, i.e., the command to wake up. All other audio input is passed on to any application that may require it – if no such application exists, the input is simply ignored.

System actions are: wake-up (sets the system mode to active), go-to-sleep (sets the system mode to passive), quit (shuts down SUITEKeys), set-attention-word (updates the word corresponding to the wake-up action), and select-word (selects a word from a WordList object).

4.1.2 Keyboard Object

The keyboard object consists of 101 keys, corresponding to those of the 101-key enhanced US-layout keyboard. The keyboard object’s state is derived from that of the operating system. In other words, use of SUITEKeys does not prohibit concurrent use of the physical keyboard. This provides for enhanced flexibility in interaction – that is the user selects whichever device instance is best, given the circumstances, without having to spend considerable set-up and set-down time to switch between instances.(footnote 6)

4.1.3 Keyboard-Key Objects

Each key object has a name that corresponds to the symbol depicted on the associated physical key.(footnote 7) The lowest-level actions are: press <key name> (depresses a key), and release <key name> (releases a key). These are used to define the following higher-level actions: <key name> (produces a complete keystroke), and repeat <key name> <number of repetitions> (produces the specified number of keystroke repetitions).

4.1.4 Mouse Object

The mouse object corresponds to the actual mouse pointing device. Conceptually, this object can be asked to move, resulting in corresponding movement of the displayed cursor. This object contains three buttons. Similarly to the keyboard, the mouse state is derived from that of the operating system in order to allow concurrent use of both the physical and virtual mice.

The mouse actions are: move <direction> (moves cursor at a gradual speed in specified direction, until another command is entered), move <direction> <distance> (moves cursor a specific distance in given direction), stop (cease moving cursor), and position <area> (places cursor at the center of five predefined areas: screen, upper left, upper right, lower left, and lower right).

4.1.5 Mouse-Button Objects

A mouse-button object corresponds to a button on an actual mouse. SUITEKeys defines three buttons, namely LEFT, MIDDLE, and RIGHT. The lowest-level actions are: press <button name> (depress specified button) and release <button name> (release button). These are used to define the following higher-level actions: click <button name> (quickly press and release a button), double-click <button name> (quickly perform two consecutive clicks).

4.2 Semantic Level

This section briefly discusses the types of semantic constraints that SUITEKeys imposes on the user and the way it implements the above conceptual actions.

As mentioned earlier, SUITEKeys attaches minimal semantics to the sequences of keyboard and mouse actions. Therefore there are only two types of general semantic constraints that it imposes on the user:

  • It is not possible to press a virtual key (or mouse button) when the operating system’s state indicates that the key is already pressed.
  • It is not possible to release a virtual key (or mouse button) when the operating system’s state indicates that the key is not pressed.

In all other cases, each of the virtual hardware events generated by the actions mentioned above, result in SUITEKeys sending appropriate messages to the operating system’s event queue.

4.3 Syntactic Level

This section discusses the structure of valid interaction. Figure 1 shows a subset of what constitutes a valid sequence of input tokens in BNF notation (non-terminals are enclosed in <angle-brackets>, and terminals are enclosed in [square-brackets]). Due to space limitations, it does not show how the system derives the meaning (semantics) of valid input token sequences.

<Input> ::= <Action> | <Action> <Input>
<Action> ::= <KeyAction> | <MouseAction> | <MetaAction>
<KeyAction> ::= <Press> | <Release> | <Tap> | <Repeat>
<Press> ::= [PRESS] <Key>
<Release> ::= [RELEASE] <Key>
<Tap> ::= <Key>
<Repeat> ::= [REPEAT] <Key> <Number> [TIMES]
<Number> ::= [DIGIT] <Number> | [DIGIT]
<Key> ::= [LETTER] | [DIGIT] | [SYMBOL] | [SPECIAL] | <Function> | <Modifier>
<Function> ::= [FUNCTION] [DIGIT] | [FUNCTION] [EX_DIGIT]
<Modifier> ::= [LEFT] [MODKEY] | [RIGHT] [MODKEY] | [MODKEY]
<MouseAction> ::= <PressButton> | <ReleaseButton> | <Click> | <DoubleClick> |
<MoveMouse> | <StopMouse> | <SetMousePosition>
<PressButton> ::= [PRESS] <MouseButton>
<ReleaseButton> ::= [RELEASE] <MouseButton>
<Click> ::= [DOUBLE] [CLICK] <MouseButton>
<MouseButton> ::= <Button> [MOUSE] [BUTTON] |
<Button> [BUTTON]
<Button> ::= [LEFT] | [RIGHT] | [MIDDLE]

Figure 1. Structure of SUITEKeys Interaction (subset).

4.4 Lexical Level

This section describes the natural language vocabulary "understood" by SUITEKeys. This consists of words (lexemes) that correspond to terminals in Figure 1. The lexicon includes one or more words for each terminal; for example [PRESS] corresponds to "press", "hold" and "lock", whereas [RELEASE] corresponds to "release", "unlock", and "let go". It also includes both alphabetic and military pronunciations of keys to assist in disambiguating easily confusable letters, such as voiced and voiceless variations of phonemes. For example, the user may use either "b" or "bravo", "p" or "papa", "d" or "delta", and "t" or "tango" to enter the corresponding keys.

4.5 Acoustic Level

Words included in the SUITEKeys vocabulary can be associated with various pronunciations. These pronunciations are specified using the Worldbet phonetic alphabet. For example, "a" is associated with (E i:), and (& l f ^).(footnote 8)

5. SUITEKeys ARCHITECTURE

As mentioned earlier, SUITEKeys is built on SUITE, a framework for developing speech understanding interfaces to interactive computer systems [10]. The interface architecture integrates speech recognition and natural language processing components (see Figure 2). The speech recognition portion of SUITEKeys’ architecture features an extension of the CSLU Toolkit, a software development environment for creating pipelined speech processing systems [3]. The natural language processing portion is based on the interface architecture incorporated in NALIGE, a user interface management system for developing natural language interfaces [8].

Figure 2.  SUITEKeys Architecture

Figure 2. SUITEKeys Architecture

5.1 Knowledge Base Components

The Acoustic/Phonetic (AP), Lexical, Augmented Semantic Grammar (ASG), and Semantic Domain (SD) knowledge base (KB) components of SUITEKeys contain its acoustic/phonetic, lexical, syntactic, semantic, and pragmatic knowledge. The nature of this knowledge has been described in Section 4.

5.2 Processing Components

The Feature Extractor forms a parametric representation of the input speech signal. The Phoneme Probability Estimator employs a connectionist model to estimate probabilities of the parameterized input belonging to various sound categories. These estimations are processed by the Lexical Analyzer using stochastic methods to determine the N-best lexical interpretations of the user’s spoken input. The ASG Parser processes the lexical representation to derive a semantic interpretation in terms of the functionality of the modeled keyboard and mouse (subject to the pragmatic constraints of the operating system’s input state) and the SUITEKeys system commands. The Code Generator calls the operating system to generate the events that correspond to the user’s input. The Error Handler communicates with the user in an attempt to resolve errors, such as semantic ambiguity, which cannot be handled given SUITEKeys’ knowledge base. Finally, the KB Manager provides an abstraction barrier between the processing components and the SUITEKeys linguistic model.

5.3 SUITEKeys Prototype

The SUITEKeys prototype is being developed as an interface to the Microsoft’s Windows®95 and NT 4.0 operating systems. This platform is easily obtainable, relatively inexpensive, and in many cases already on-hand. Hopefully this choice of platform helps to maximize the prototype’s potential user population. The only additional hardware requirement is a suitable dictation-quality microphone. The latest prototype has been implemented using tcl/tk, lex, yacc, and MS Visual C++; in addition to the CSLU speech engine, we are also experimenting with Microsoft’s Voice speech engine.

6. PILOT STUDY

A pilot study was conducted as a preliminary means of studying the effectiveness of speech as an alternate input modality for motor-control challenged users.(footnote 9) Additionally, it provided a better understanding of the factors involved in assessing the effectiveness of various computer input modalities. Feedback was acquired reflecting how users, with little bias towards a particular application, choose to communicate with a "keyboard that understands natural language" – this has directly affected the mindset used in developing SUITEKeys. Understanding the various solutions motor-control challenged users employ to interact with a computer offers insight into how a solution such as SUITEKeys may be incorporated with other modalities. Also, the pilot study facilitated evaluation of a methodology that will be refined for use in future, larger-scale experiments.

6.1 Methodology

Task: We used a simple document creation task, namely to type a short paragraph into a text editor and save the document. Each subject performed the task using two different source paragraphs, each with a different method of input (the user’s preferred method and a Wizard-of-Oz SUITEKeys prototype).

Software: The text editor used in this experiment was Microsoft’s Notepad.

Hardware: The computing environment used to type in the paragraph was Microsoft Windows® NT Workstation Version 4.0 running on an 180MHz Pentium PC. The only additional input equipment was a mouthstick used by one of the subjects.

Subjects: Three volunteers served as subjects of the pilot study. All subjects had upper-body motor-control impairments that interfered with the use of a standard keyboard and mouse. All subjects were university students, experienced with the use of the platform. Two of the subjects were female, one male. None of the subjects had significant speech impediments.

Procedure: The subjects were tested independently. Each subject participated in a first trial involving their preferred means of input, a training session with the SUITEKeys prototype, and a second trial making use of this prototype.

Trial 1 (Preferred Input Method): The first trial consisted of entering a 38-word (238 characters, including punctuation, and spacing) paragraph followed by saving the created document. Initially, two windows running the Notepad text editor were displayed on the computer screen, one window above the other. The top window fully displayed the target paragraph. The bottom window displayed a blank document. The bottom window had focus.

The subjects employed their preferred input method (using the keyboard and mouse, without the assistance of speech recognition software) to type the text displayed in the top window into the bottom window and to save the resulting document. The preferred methods of the three subjects were as follows: subject 1 typed single-handedly; subject 2 was unable to directly manipulate the keyboard or mouse—the subject verbally directed an assistant (not a member of the research team) to carry out the desired actions; subject 3 employed a mouthstick to manipulate the keyboard, making use of the StickyKeys functionality available in Microsoft Windows®.

Training: After Trial 1, the subjects were trained in using a Wizard-of-Oz SUITEKeys prototype. This prototype could perform actions such as keystrokes, moving the mouse, and clicking the mouse buttons based on verbal commands from the subject. The prototype beeped or "froze-up" when the subject’s commands were unintelligible or the subject spoke too rapidly. Training consisted of retyping the paragraph from Trial 1.

Trial 2 (SUITEKeys Input Method): The second trial consisted of entering a 41-word (257 characters, including punctuation, and spacing) paragraph followed by saving the created document using the SUITEKeys prototype. Other conditions were identical to Trial 1.

Data Collection: Four methods were used to collect data during the pilot study. The computer’s display was recorded by VCR (JVC BR-S822U) through a screen capture device (AITech Pro PC/TV). The room audio was also recorded by the VCR. User keystrokes and mouse button clicks were recorded with a custom-made software logger. The logger recorded the time (to an accuracy of milliseconds), the nature of the hardware event (i.e., key press, key release, mouse button press, or mouse button release), and the key or mouse button that generated the event. Finally, the saved document created by the user served as a source of measurable data.

6.2 Analysis

The dependent measures were completion rate (percentage of task completed), typing rate, and error rate. The independent variable had two values: subject’s preferred modality and speech (as defined by the SUITEKeys prototype).

Completion Rate: This was calculated as number of correct characters typed per number of characters in the target.

Typing Rate: This was calculated as number of correct characters typed per task completion time (total time).

Error Rate: This was calculated as number of erroneous keystrokes per total number of keystrokes.

6.3 Results

Table 1 and Table 2 summarize the results of the pilot study. Subjects were able to achieve good performance through speech with minimal training (less than ten minutes). Subject 1’s scores using speech (trial 2) were considerably lower than that of the other subjects; this is because, after an erroneous start where she tried to dictate complete words to SUITEKeys, she started over using discrete (as opposed to continuous) speech.

Subject 1

Subject 2

Subject 3

Trial 1

Trial 2

Trial 1

Trial 2

Trial 1

Trial 2

Target Length (chars) 

238

257

238

257

238

257

Result Length (chars) 

238

257

238

257

234

255

Error Count (chars) 

0

0

1

0

0

4

Completion Rate 

100.0%

100.0%

99.6%

100.0%

98.3%

97.7%

Total Time (seconds) 

98

332

317

293

205

212

Typing Rate (chars/second) 

2.43

0.77

0.75

0.88

1.14

1.18

Total Keystrokes (keystrokes) 

280

297

282

287

256

271

Error Keystrokes (keystrokes) 

8

16

13

11

6

5

Error Rate 

2.9%

5.4%

4.6%

3.8%

2.3%

1.8%

Table 1. Pilot Study Measurements and Measures

 

Trial 1

Trial 2

Completion Rate 

sample mean

99.3%

99.2%

std deviation

0.87

1.35

Typing Rate 

sample mean

1.44%

0.95%

std deviation

0.88

0.21

Error Rate 

sample mean

3.3%

3.7%

std deviation

0.01

0.02

Table 2. Mean and Standard Deviation

Considering that these subjects had considerable training on their preferred input modality, these results (although not statistically significant) suggest that speech is very effective as an input modality for motor-control challenged users. One would expect that this would be more pronounced for users that have not yet established a preferred input modality; considering the effort associated with rehabilitation and computer training for motor-control challenged users, systems such as SUITEKeys may prove to be a valuable assistive technology. We plan to examine this issue, in a follow-up, larger-scale study.

7. CONCLUSIONS

This paper presents on-going work on alternate human-computer interface modalities. Specifically, the SUITEKeys interface provides users who cannot access a computer through the traditional input modality of keyboard and mouse with an equivalent alternate modality—namely, speech. SUITEKeys represents keyboard and mouse actions at a sufficiently low level as to associate no more meaning with the events than would the operating system’s device drivers. We are currently investigating ways to augment the target user group by accounting for significant speech impediments, as well as integrating a speech-based output modality into this application.

ACKNOWLEDGMENTS

This research is partially supported by the Louisiana Board of Regents grant LEQSF-(1997-00)-RD-A-31. The authors would like to thank the following individuals: Carl Huval and Mike LeBlanc for their contribution to the implementation efforts and input to system design; Larry Duplantis, James Wright, and the subjects of the pilot study for their feedback and suggestions; and James Jackson, Jay Jackson, Ursula Jackson, Michail Lagoudakis, and Nola Navarre for their invaluable assistance with the pilot study.

REFERENCES

  1. Command Corp., Inc., IN CUBE. http://www.commandcorp.com/incube_welcome.html
  2. Dragon Systems, Inc., http://www.dragontalk.com/
  3. Fanty, Mark. (1996). Overview of the CSLU-C Toolkit. Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology, December 1996. http://www.cse.ogi.edu/CSLU/toolkit/documentation/
    overview/overview.html
  4. Hix D., and Hartson, H.R. Developing User Interfaces – Ensuring Usability Through Product & Process. John Wiley & Sons, New York, 1993.
  5. Kawai, S., et al. Designing interface toolkit with dynamic selectable modality. In Proceedings of ASSETS ’96 (Vancouver, BC, Canada, April 1996), pp. 72-79.
  6. Kennel, A.R. AudioGraf: a diagram-reader for the blind. In Proceedings of ASSETS ’96 (Vancouver, BC, Canada, April 1996), pp. 51-56.
  7. Ledgard, H., et al. The natural language of interactive systems. Communications of the ACM 23(10): 556-563, October 1980.
  8. Manaris, B. and Dominick, W. NALIGE: a user Interface management system for the development of natural language interfaces. International Journal of Man-Machine Studies, 38(6): 891-921, June 1993.
  9. Manaris, B. and Slator, B. (eds.). Interactive natural language processing. IEEE Computer, 29(7): (entire issue), July 1996.
  10. Manaris, B. and Harkreader, A. SUITE: speech understanding interface tools and environments. In Proceedings of FLAIRS ’97 (Daytona Beach, FL, May 1997), pp. 247-252.
  11. Manaris, B. Natural language processing: a human-computer interaction perspective. In Advances in Computers 47: (to appear), Academic Press, New York, 1998.
  12. McAlindon, P.J., and Staney, K.M., The Keybowl: an ergonomically designed document processing device. In Proceedings of ASSETS ’96 (Vancouver, BC, Canada, April 1996), pp. 86-93.
  13. Microsoft Corporation, Customizing Microsoft Windows NT 4.0 for individuals with disabilities. Microsoft Knowledge Base Article Ww1279, Microsoft Product Support Services. http://www.microsoft.com/kb/articles/q167/7/61.htm
  14. Muller, Michael J., et al. Toward an HCI research and practice agenda based on human needs and social responsibility. In Proceedings of CHI ’97 (Atlanta, GA, April 1997), pp. 155-161
  15. Nigay, L. and Coutaz, J. A design space for multimodal systems: concurrent processing and data fusion. In Proceedings of INTERCHI ’93 (Amsterdam, The Netherlands, April 1993), pp. 172-78, 1993.
  16. Napier, H.A., et al. Impact of a restricted natural language interface on ease of learning and productivity. Communications of the ACM 32(10): 1190-1198, October 1989.
  17. Prentke Romich, Co. WiVik2 On-screen keyboard programs for Windows. http://www.prentrom.com/access/wivik.html
  18. Raman, T.V. Emacspeak – direct speech access. In Proceedings of ASSETS ’96 (Vancouver, BC, Canada, April 1996),pp. 72-79.
  19. Schneiderman, B. Designing the User Interface: Strategies for Effective Human-Computer Interaction, 3rd ed., Addison-Wesley, Menlo Park, CA, 1998.
  20. Smith, A. et al. Multimodal input for computer access and augmentative communication. In Proceedings of ASSETS ’96 (Vancouver, BC, Canada, April 1996), pp. 80-85.
  21. Trewin, S. A study of input device manipulation difficulties. In Proceedings of ASSETS ’96 (Vancouver, BC, Canada, April 1996), pp. 15-22.

ENDNOTES

  1. The interested able-bodied reader could experience this ineffectiveness by attempting to interact with a computer whose display, keyboard, or mouse have been disconnected.
  2. We use the term virtual impairments to refer to impairments which do not stem from internal physical disabilities, but are caused by external factors (which are usually temporary in nature). For example, in hands-busy environments, such as driving an automobile or servicing a jet engine, the task may require interaction with a computing device while, at the same time, inhibit/preclude physical access to a standard keyboard or mouse.
  3. Such as "print document" and "change font style to courier".
  4. A word is defined as a sequence of printable characters separated by whitespace.
  5. Although, theoretically, this user-model updating process could be used to handle user speech impediments, it would be ineffective in practice, especially if the user’s speech patterns were far removed from the average speech patterns in the system’s phonetic model. We are currently working on an extension to the SUITEKeys architecture to provide an effective solution to this problem.
  6. Of course, this is of most benefit to users who have some motor-control, and to able-bodied users in hands-busy environments.
  7. In the case of more that one symbol appearing on a physical key, the name is that of the symbol that would result from an unshifted press of the key, typically the lower of two symbols.
  8. E as in bet, i: as in beet, & as in above, l as in lent, f as in fine, and ^ as in above.
  9. This study was influenced by [7, 20, 21].

Copyright Note: Permission to make digital/hard copies of all or part this material for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copyright is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists requires specific permission and/or fee.

manaris@cs.cofc.edu.
Last updated on Friday, August 25, 2000 02:11 PM -0400