Auditory Cues for Browsing, Surfing, and Navigating

Michael C. Albers, Sun Microsystems, Inc.-JavaSoft

Abstract: The use of the World Wide Web (WWW) exploded with the advent of graphical WWW browsers such as NCSA's Mosaic and Netscape's Navigator. In spite of the popularity of these graphical browsers, studies have uncovered areas where the traditional graphical interfaces do not provide correct, sufficient, or intuitive information to their users (Ede & Roshak, 1994; Groff & Descombes, 1994). The Audible Web (non-speech auditory feedback cues embedded within Mosaic[1] to aid user monitoring of data transfer progress that aid users in navigation of the WWW) is one approach to enhancing a WWW browser (Albers & Bergman, 1995). To test these claims, a usability study was conducted that identified The Audible Web's strengths and weaknesses and recommend possible solutions for these weaknesses (Sinclair, Catledge, Brown & His, 1995).

Introduction

Mosaic was chosen as a test platform for auditory enhancements because it exhibited known human-computer interaction problems. In Mosaic, users receive little or no feedback about the results of ongoing processes, the identification and resolution of user actions, the size and content of information referenced by links, and the time to obtain that information (Ede & Roshank, 1994; Groff & Descombes, 1994). The highly visual task of scanning WWW pages filled with hypertext links, graphics, forms, and animation suggested that users could benefit from feedback in a non-visual modality. By using auditory instead of visual enhancements to provide higher information levels to users, the information display shifted from the overloaded visual modality to the auditory channel.

Related Work

A number of auditory displays have been developed that provide auditory functionality similar The Audible Web. Two notable examples that provide enhancements to existing general-purpose desktop graphical user interfaces are The SonicFinder and MoveIt!. The SonicFinder uses auditory icons to provide feedback concerning relevant user actions such as selecting, dragging, and copying files; selecting and resizing windows; opening and closing folders; and emptying and placing files into the Macintosh Finder trash can (Gaver, 1989). Some basic notions concerning other aspects of selected files and directories were conveyed by modifying the auditory icons. MoveIt!, an extension of the olvwm Open Look virtual window manager for Sun workstations, has a number of visual augmentations and auditory cues similar to The SonicFinder (VanSteenbergen, 1996).
Also of note are three systems that provide information about background activities: EAR, ShareMon, and OutToLunch. The Environmental Audio Reminders (EAR) system transmits short auditory cues to people's offices in order to inform them of upcoming or ongoing events (Gaver, 1993). This system was designed to support casual awareness of colleagues, indicate informal communication opportunities, and signal formal events. EAR employs auditory icons to unobtrusively yet effectively announce events in the workplace without interrupting normal workplace activities.
ShareMon utilizes non-speech auditory cues to notify users of file sharing (a MacOS-specific background activity in which one user accesses files located on another user's machine) (Cohen, 1994a). For example, when a user accesses a file on your machine, you might hear the sound of a drawer opening or the sound of the Star Trek[2] transporter energizing (Cohen, 1993). In ShareMon, users found non-speech auditory cues less disruptive than other modalities of feedback. These users often preferred non-speech audio cues even if speech or text would have been more informative (Cohen, 1993).
The OutToLunch system attempted to recreate an atmosphere of "group awareness" in which individuals felt that their coworkers were nearby even if these coworkers were physically dispersed or isolated (Cohen, 1994b). By playing prerecorded keystrokes and mouse movements, OutToLunch used the sounds of keys clicking and of mice rolling to create the sensation of group activity.

Auditory Enhancements Provided

The Audible Web uses auditory cues to enhance interactions between users and Mosaic in three ways: by aiding users' monitoring of these data transfer progress, by providing feedback for users' actions, and by providing content feedback to aid users in navigation.

Monitor Data Transfer Progress

When users select a hypertext link in Mosaic, the globe icon spins and a series of technical messages are displayed in the status area at the bottom of the window. These messages are not only cryptic, but they are also of no use if the user's visual attention is focused elsewhere or if Mosaic is represented iconically.
Because sound is well suited for monitoring time-varying processes, The Audible Web uses auditory cues to provide feedback for normal data transfer, for the execution of external "helper" applications, and for error conditions (Albers, 1994; Gaver, 1993; Kramer, 1994). The normal transfer of data is indicated through small clicks and pops, and a sliding sound means that a helper application is being started to handle incoming data. If an error occurs, the sound of breaking glass is displayed (See Table 1 below).

Feedback for Users' Actions

Ede and Roshak found that many WWW browser users were unsure of whether the menu items they selected had performed the requested action (1994). The Audible Web provides sound cues to reinforce such user actions as button presses, menu selections, scrollbar movement, and hypertext link selections (See Table 1 below). The aim of the auditory feedback is twofold. First, the auditory feedback confirms user actions without further overloading the visual channel. Second, low-volume background sound auditory cues permeate the application and elevate the role of audio to an expected part of the web-based interaction rather than a surprise or a salient special event.

Aid Navigation through Content Feedback

Whether a hypermedia link points to an image, another document, an audio file, or to nowhere, every link has the same visual appearance on a web page. Users who have little information about a link can only select the link and ponder the questions, "What am I going to get?" and "How long will this take?"
The Audible Web uses sound cues to indicate the probable type of information at the other end of a link before the link is selected. As the pointer is moved over a hypermedia link, an auditory cue conveys the probable file type (See Table 1 below). The aim of this feedback is to help users quickly decide if hypertext links point to information of interest.
Additionally, using the right mouse button to select a hypertext link provides auditory feedback about proportional transfer time, file type, file size, and errors. The length of time that a tick-tock sound is produced indicates the proportional data transfer time; an auditory icon indicates the file type; a piano note conveys the size of the file. If the WWW server is down or the file is not found, the user hears the sound of breaking glass. For example, suppose a user in Palo Alto queries a hypertext link that points to a large text file in Japan. The user hears a few seconds of a tick-tock sound (indicating proportional transfer time) followed by the sound of a typewriter (indicating a text file) and a deep piano note (indicating a large file).

Event Auditory Cue Event Auditory Cue

Data Transfer Content Feedback

Data Transfer Pops & Clicks Relative Transfer Time "Tick-Tock"

Open "Helper" Application Sliding/Opening Error Condition Breaking Glass

Error Condition Breaking Glass Pointer on Text Link Typewriter

User Actions Pointer on Graphics Link Camera Shutter

Button Press "Ca-Chick" Pointer on Video Link 16mm Movie Projector

Select Menu Item "Click" Pointer on Audio Link Piano Flourish

Scroll Up/Down "Pop" Pointer on Application Link Modem/Line Noise

Select a Hypertext Link "Bop-Bop" File Size Piano Notes

Table 1. Event to Auditory Cue Mappings for The Audible Web

Usability Study

A usability study was conducted by Sinclair et. al. in order to test The Audible Web's claims, identify its strengths and its weaknesses, offer recommendations to resolve these weaknesses, and to test the mental mappings the subjects forged between auditory cues and system events (Sinclair, Catledge, Brown, & His, 1995). Six (6) subjects with varying levels of technical expertise were asked to complete a series of tasks both with and without auditory cues. Subjects were instructed to "think out loud" during these tasks, and both their times to complete these tasks and their error rates were collected.

Findings

The study reported that users immediately deduced the meanings of the data transfer cues. Being able to work on subsidiary tasks and to identify data transfer completion was considered useful by subjects. While the Sliding/Opening cue was so subtle that subjects rarely commented on it, the sound of breaking glass was immediately identified and understood as signaling an error condition. The subjects believed that the button press auditory cues were useful, rarely mentioned the menu selection cues, and found the auditory cues for the scrollbars unnecessary.
The subjects quickly grasped most of the meanings of auditory cues that conveyed probable file type. However, the movie projector and typewriter cues were not readily identifiable, and their meanings had to be inferred. The functionality of the right mouse button was confusing to most users; the more experienced/technical WWW subjects had more success trying to use this feature than did less experienced subjects, and only one of the six subjects deduced that the piano note conveyed the size of a file.
Overall, the study concluded that The Audible Web achieved its goals. Subjects seemed able to (1) monitor the data transfer process while engaged in another task, (2) understand auditory feedback for user activities, and (3) get a feel for data at the other end of a hypertext link. While the ability to match sounds with system events was good, it correlated closely with the user's technical expertise.

Strengths and Weaknesses
The results obtained identified several strengths and weaknesses of The Audible Web. The strengths included the ability of the user to (1) perform tasks concurrently, (2) determine successful achievement of goals without a great deal of technical support, (3) determine the effect of future actions based on past interactions, and (4) verify user activities and system status. The weaknesses uncovered involved features of The Audible Web that require a technical background to fully understand, reiterate cues given for different system states, and annoy if there is "too much" auditory feedback for any particular task. In an attempt to ease these weaknesses, on-line documentation should be provided to guide users through the auditory cues and their meanings. More verbose auditory cues should be provided for the types of error conditions encountered, and the auditory cues for system states and file sizes should be revised to be less intrusive and more obvious.

Conclusions and Future Directions

Through the use of auditory cues within a WWW browser, The Audible Web aids the user in monitoring data transfer progress, offers feedback for user actions, and provides content feedback to aid navigation of the WWW. A usability study conducted on The Audible Web verifies and strengthens these claims, and the study also helps by suggesting future directions for The Audible Web. The Audible Web could benefit from the use of parameterized or synthesized auditory cues to provide users with more information using fewer cues (such as encoding the file size into the file type cue) (Gaver, 1989; Gaver, 1993).

Acknowledgments

I would like to thank the members of the HCI and the Collaborative Computing groups at SunSoft/JavaSoft for their aid and support during my internship, and I would especially like to thank my supervisor, Eric Bergman. I would also like to thank the Media Technologies/Java Media and Audio groups at SunSoft/JavaSoft for their support.

References

Albers, M. C. (1994). The varese system, hybrid auditory interfaces, and satellite-ground control: Using auditory icons and sonification in a complex, supervisory control system. Proceedings of the International Conference on Auditory Display for 1994 (pp. 3-14). Santa Fe, NM: Santa Fe Institute.
Albers, M. C. & Bergman, E. (1995). The audible web: Auditory enhancements for mosaic. Proceedings of the CHI '95 Conference on Human Factors in Computer Systems Conference Companion (pp. 318-319). Denver, CO: ACM.
Cohen, J. (1993). "Kirk here:" Using genre sounds to monitor background activity. Proceedings of INTERCHI '93 - Conference Companion (pp. 63-64). Amsterdam: ACM.
Cohen, J. (1994a). Monitoring background activities. In G. Kramer (Ed.), Auditory display: Sonification, audification, and auditory interfaces (pp. 499-522). Reading, MA: Addison-Wesley.
Cohen, J. (1994b). Out to lunch: Further adventures monitoring background Activity. Proceedings of the international conference on auditory display for 1994 (pp. 15-20). Santa Fe, NM: Santa Fe Institute.
Ede, M. & Roshak, L. (1994). Quick findings for mosaic usability test. SunSoft usability engineering report (pp. 94-12). Mountain View, CA: SunSoft, Inc.
Gaver, W. (1989). The SonicFinder: An interface that uses auditory icons. In The Journal of Human-Computer Interaction 4(1), 67-94.
Gaver, W. (1993). Synthesizing auditory icons. Proceedings of INTERCHI '93 (pp. 228-235). Amsterdam: ACM.
Gaver, W. (1993). Sound support for collaboration. In R. Baecker (Ed.), Readings in groupware and computer-supported cooperative work (pp. 355-362). San Mateo, CA: Morgan-Kaufmann.
Groff, J. F. & Descombes, J. (1994). Untangling the web. Proceedings of the 1st International Conference on the World-Wide Web available at: http://www1.cern.ch/WWW94/PrelimProcs.html.
Kramer, G. (1994). An introduction to auditory display. In G. Kramer (Ed.), Auditory display: Sonification, audification, and auditory interfaces (pp. 1-77). Reading, MA: Addison-Wesley.
Sinclair, K., Catledge, L., Brown, R., & His, I. (1995). Final report on usability test on the audible web. Unpublished Final Paper for The Georgia Institute of Technology's CS6752: Human-Computer Interaction II taught by Dr. James Foley.
VanSteenbergen, M. (1996). Personal Communication.

Michael C.Albers
Sun Microsystems, Inc. - JavaSoft
2550 Garcia Avenue, UCUP01-102
Mountain View, CA 94043-1100, USA
michael.albers@sun.com
[1 ]The Audible Web is a modified version of the NCSA's X Mosaic version 2.4.
[2] "Star Trek" and accompanying auditory cues are trademarked by the Paramount Corporation.