Data Sonification from the Desktop: Should Sound be part of Standard Data Analysis Software?
Abstract: The design of auditory formats for data display is presently focused on applications for blind or visually impaired users, specialized displays for use when visual attention must be devoted to other tasks, and some innovative work in revealing properties of complex data that may not be effectively rendered by traditional visual means. With the availability of high quality and flexible sound production hardware in standard desktop computers, the potential exists for using sound to represent characteristics of typical "small and simple" samples of data in routine data inspection and analysis. Our research has shown that basic properties of simple functions, distribution properties of data samples, and patterns of covariation between two variables can be effectively displayed by simple auditory graphs involving patterns of pitch variation over time. While such developments have implications for specialized applications and populations of users, these displays are easily comprehended by normal users with minimal practice. Providing further software enhancement to encourage exploration of data representation by sound may lead to a variety of useful creative developments in data display technology.
Data sonification is a useful technique for presenting information to visually impaired individuals, for displaying data to users whose visual attention must be devoted elsewhere, and for revealing data properties not easily rendered by visual graphics. Although auditory formats for describing data are not common at present, it is our view that the use of sound for revealing characteristics of small and relatively simple data samples, even by normally sighted users performing routine data inspection and manipulation activities, holds considerable promise (Flowers & Hauer, 1992). Improvements in hardware capabilities of current desktop and portable computers (multimedia readiness) have the potential to vastly increase the availability of auditory data display formats for general users. Providing software enhancement to encourage exploration of symbolic data representation by sound may lead to useful creative developments for a variety of applications.
Hardware and Software Limitations
The personal computers of the mid-1980's lacked sophisticated sound production hardware, but common programming languages included sound output functions that could control frequency and duration of simple tones through speaker output (e.g., the "Sound" and "Play" commands of various Basic and Pascal dialects). Previous research from our laboratory (Flowers & Hauer, 1992, 1993, 1995; Turnage, Bonebright, Buhman & Flowers, 1996) used simple PC speaker output to demonstrate how frequency distributions and time series functions could by represented by simple time and pitch manipulations. While effective, such crude squarewave displays were not very "ear-catching" and the sound quality varied across computer models.
With the programmable audio hardware that is installed on nearly all current personal computers, manipulation of loudness, timbre, and spatial channels is possible, as well as control of pitch and timing. Unfortunately, however, typically installed software does not provide a straightforward means of manipulating these properties of auditory signals. Such control can be obtained through adapting computer music softwarean approach we are currently using in our work on auditory displays. This requires some effort to implement, since musicians have different objectives than scientists and engineers who are interested in data display. However, once computer music software been configured to work in conjunction with a spreadsheet program, auditory data displays can be created in about the same amount of time as a traditional visual chart or graph. For example, Csound (copyright MIT, 1988, 1994), is music freeware available in versions for several computer platforms. We have found it useful for translating numeric data into transportable sound objects, using simple spreadsheet computations to compute the processing code or "scorefiles" necessary to generate the sound output (Flowers, Buhman & Turnage, in press). Csound allows control of pitch, duration, timbre, and relative loudness, and also has the capability of creating more than a single stream of sound output for stereo or multichannel display.
Despite the ability of Csound to control many sound attributes, our sonification efforts have, so far, continued to use pitch and timing as the primary information-carrying dimensions, with timbre being a "synesthetically directed" aesthetics choice. For example, data "points" on a visual scatterplot seem to be more conceptually similar to short guitarlike "plucks" than sustained "toots" (e.g., Gaver, 1993; Marks, 1982). However, the use of both timbre manipulations and spatial channel manipulations to convey separate data streams is technically possible.
What We Mean by "Small and Relatively Simple" Data Samples Desktop inspection and analysis of data in the social and behavioral sciences commonly involves three types of activity: Describing and depicting simple functions, examining the distribution properties of one or more "samples", and examining the covariation between two variables. While such activities may be performed on very large samples, typical data from our perception laboratory involve 100 observations or less. Thus, our previous work on auditory perception of data characteristics has focused on description of univariate and bivariate characteristics of small data samples. Examples of each of the types of auditory displays (and their visual equivalents) described below may be examined at http://psynext.unl.edu/~jflowers/icad96.html.
Simple Functions and Time Series
One of the most straightforward ways of using sound to describe changes in magnitude of a single dependent variable is to code it by pitch. For example, Mansur, Blattner & Joy (1985) showed the potential utility of this approach as a display format for blind or visually impaired users, and demonstrated that only minimal training was necessary to assess basic function characteristics such as linearity and relative slope differences. Flowers & Hauer (1995) required participants to rate the similarity of graphs that supposedly depicted "sales volumes" of different products from a fictitious company. Results from multidimensional scaling procedures suggested that slope, slope change and monotonic differences were perceived virtually identically in visual and pitch-coded auditory formats. Moreover, the "dimensions" of the multidimensional scaling solution meaningfully depicted the important data characteristics upon which the graphs differed. Subjective reports from participants indicated that pitch coded auditory function graphs were relatively effortless to perceive. We conclude that pitch variation over time can offer a compelling description of function shape that may be nearly as effective as variation in curve height is on a visual graph.
Additionally, some very recent findings from our laboratory indicate that auditory function displays may be more efficient than tactile displaysthe usual alternative chosen for visually impaired users. Turnage, Buhman & Flowers (unpublished manuscript) studied discrimination accuracy between highly similar plots of periodic waveforms, displayed as either visual function charts, pitch-coded auditory functions, or tactile raised line displays. With unpracticed subjects, discrimination accuracy for the auditory and tactile plots was 81.3% and 83.6% respectively, compared with 95.5% for the same data samples presented visually. After only 20 trials with accuracy feedback, however, auditory performance increased to 87.2% while tactile performance remained essentially unchanged (81.9%). Furthermore, the time required for participants to make the comparisons between graphs was up to twice as long for tactile as for the auditory displays, and the tactile displays were subjectively evaluated as extremely effortful to judge. Examples of auditory display of functions and time series can be found at (Flowers & Hauer, 1992, 1993) involved constructing auditory histograms, in which a pitch represents a numeric value range (e.g., C represents test scores from 50-54, C# 55-59, D 60-64, etc.) and the temporal length of a note (or number of repeated notes) indicates the frequency of observations in each category. This format is an auditory analogy to the visual histogram (for which bar position indicates value and bar height represents frequency). Like our research with auditory function graphs, multidimensional scaling of perceived similarities among pairs of graphs of different distributions showed very similar visual and auditory perceptual structure. In addition, this structure was closely tied to important data characteristicsnamely, central tendency, variation, and distribution shape. However, despite the apparent equivalence in perceptual structure for auditory and visual histograms and frequency polygons, participants indicated that the auditory displays were quite effortful to evaluate.
However, visual histograms have also been criticized for being both effortful to perceive and not optimally efficient for revealing key characteristics of a data distribution. Tukey (1977) and McGill, Tukey & Larsen (1978) have argued that various forms of the boxplot display are a more efficient form of distribution display, especially for exploratory data analysis. Boxplot displays, of which several versions exist, illustrate the relative positions of the median, quartiles and extremes of distributions by enclosing the middle quartile with a "box", locating the median with a dot or similar marker, and specifying the distribution range by extending lines above and below the box. A major advantage of such a display is that multiple boxplots can be placed on one set of axes for comparison purposes. Wainer & Thissen (1993) provide a review of these and related graphical alternatives for depicting distributions.
Research from our laboratory (Flowers & Hauer, 1992, 1993) has developed an auditory analogy to the visual boxplot, which could be termed an "arpeggio plot." Musical notes represent a distribution's quartiles and extreme points, and are played as an ascending arpeggio. For purposes of emphasizing central tendency, we adopted the procedure of preceding the arpeggio with a slightly longer sustained note representing the median. While simultaneous comparisons of such auditory displays are not practical, several arpeggio plots can be played within the span of auditory sensory memory for comparative purposes. Subjectively, comparisons of these displays were far less effortful than for auditory histograms. Multidimensional scaling analysis of these comparisons showed that the arpeggios represented the characteristics of the numeric distributions from which they were constructed. Of particular interest were the results from one experiment in which na´ve participants(introductory psychology students) compared arpeggios without any instruction about their symbolic purpose (i.e., they were simply told to rate the similarity of pairs of short musical passages). Multidimensional scaling solutions from these participants were virtually identical to those of advanced graduate students and faculty participants who were fully informed about the purpose and design characteristics of the displays. At a basic perceptual level, such displays appear to effectively carry information about distribution position, spread and shape. Examples of both auditory histogram displays and auditory arpeggios can be found at http://psynext.unl.edu/~jflowers/dist.html
The visual scatterplot is the traditional format for graphically depicting bivariate covariation. Scatterplots provide an indication of correlation magnitude and direction that corresponds well with statistical indices of correlation (Meyer & Shinar, 1992). They may also reveal other important features of a bivariate distribution, such as nonlinearity, heteroscedasticity , and the presence of outliers. Since a scatterplot is simply a variant of a function plot, an auditory analogy can be constructed by letting one variable be represented by pitch and the other by time.
Recently, Flowers, Buhman & Turnage (in press) required participants to make magnitude and direction estimates of correlation of bivariate data displayed as either auditory or visual scatterplots. The auditory scatterplots consisted of a five second presentation of short (0.1 sec) guitarlike notes for which pitch represented the y-value (scaled to a three octave range) and time of onset within the five-second interval represented the x-value. Visual plots of the same data were constructed by spreadsheet charting software. For this experiment, the mean correlation between estimates of correlation magnitude and actual Pearson r were nearly identical for visual and auditory graphs (r=0.92 for visual scatterplots and 0.91 for auditory scatterplots). Mean correlation estimates across participants for auditory and visual representations of each data set were also in very strong agreement (r=0.98). Auditory scatterplots were thus as effective as the traditional visual scatterplots in conveying correlation magnitude. Further analyses indicated that correlation judgments for auditory and visual scatterplots were similarly affected by presence of single outliers.
We have not yet formally studied perception of scatterplots involving distributions that radically depart from bivariate normality (e.g., nonlinear relationships and separate clusters of points), but our subjective evaluations of auditory graphs of such distribution indicate that those features are as vividly presented by sound as by sight. This is not surprising given the perceptual equivalence of auditory and visual function graphs mentioned earlier (of which scatterplots are a variant) and given perceptual literature showing that Gestalt grouping principles operate in the pitch and time domains much as they do in visual space. (e.g., see Goldstein, 1996, pp. 396-401). Examples of auditory scatterplot displays can be found at http://psynext.unl.edu/~jflowers/newscat.html.
Evaluating Display Effectiveness
Although important, subjective judgments are not sufficient to evaluate the potential merit of a novel design of data display. One must determine whether the perceptually salient properties of the display match important characteristics of the data, and/or whether the display design might inadvertently "create" unwanted perceptual dimensions.
As described above, these issues of perceptual structure have often been addressed in our research by employing multidimensional scaling procedures to ratings of similarity between pairs of graphs. We have also supplemented these scaling procedures with multiple regression techniques to see if the similarity judgments themselves, as well as the derived dimensions from the multidimensional scaling solution are predicted by actual differences in key properties of the displayed data. For some types of displays, such as our scatterplots, one may be interested in the ability of viewers to estimate one particular data property, such as correlation.
In those cases, a direct magnitude rating procedure can be employed, and comparisons made between rated and actual correlation measures. In all these cases we have additionally been interested in making comparisons between display modalities (auditory, visual, and sometimes tactile), to see if the perceptual structures are similar, and whether the transmission of information about a key attribute (such as correlation) is comparably effective.
For some display applications, one may be more interested in measures of direct performance, such as accuracy of discriminating between highly similar displays, as in our present work comparing tactile, auditory, and visual displays of periodic functions (Turnage, Buhman & Flowers, in preparation). In attempting to evaluate the potential utility of a novel display using such measures, one must remember that direct comparisons of performance differences obtained with traditional and novel displays may be less informative than looking at performance change with practice. In summary, one must employ a variety of assessment approaches in determining utility of novel data displays.
Future Directions - Why "Simple" Audio Plotting Should be Encouraged
The preceding examples have shown how simple auditory plots can effectively convey information about the basic characteristics of small samples of data. Such displays are potentially easy to construct and present although the supposedly multimedia-ready software of current computer configurations does not provide a straightforward means of doing so. Making data sonification tools more accessible to standard software packages could encourage experimentation with the use of sound in a number of data-related domains that have implications for business, education, and scientific inquiry, in addition to developments for the visually impaired and other special populations of users.
One such application, that we have not addressed in our work, is the use of sonification to supplement traditional presentation graphics. Human memory research has provided examples of how presentation of equivalent verbal information in more than one modality (e.g., both hearing and reading a word) can improve recall and recognition. If such effects generalize to nonverbal information, the addition of data descriptive sound to presentation graphics in instructional settings may be of considerable benefit.
A second offshoot of encouraging sonification of data may be the discovery of more classes of data, or data features, that are more easily perceived in auditory, rather than visual displays. One category of such material might include time series observations having both trend and periodic substrates (such as being able to notice the presence of the eleven-year solar cycle in a plot showing an otherwise increasing incidence of malignant melanoma cases over a 70-year period).
A third potential consequence of making the process of mapping data into sound an easier process for larger numbers of computer users, is the use of sound for symbolic description of abstract information other than numeric quantities. For example, the apparent ease with which our participants seemed to translate information presented in the pitch-time domain into a spatial coordinate domain (in the function and scatterplot experiments) suggests that sound can play a role in the symbolic representation of geometric elements.
While there is much current excitement about the construction of auditory virtual reality, in which virtual sound sources are spatially localized by the same acoustic features used in localizing real world sounds, the relative computational simplicity of a pitch versus time display for symbolically representing spatial relationships, may prove to be a better approach for the design of some forms of nonvisual spatial displays.
Flowers, J. H., and Hauer, T. A. (1992). The ear's versus the eye's potential to assess characteristics of numeric data: Are we too visuocentric? Behavior Research Methods, Instruments, & Computers, 24, 258-264.
Turnage, K. D., Bonebright, T. L., Buhman, D. C., and Flowers, J. H. (1996). The effects of task demands on the equivalence of visual and auditory representations of periodic numerical data. Behavior Research Methods, Instruments, & Computers, 28(2), 270-274.
Wainer, H., and Thissen, D. (1993). Graphical data analysis. In G, Keron and C, Lewis (Eds.), Data analysis in the Behavioral Sciences (pp. 391-457). Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Author InformationJohn H. Flowers, Dion, C. Buhman, & Kimberly D. Turnage
Department of Psychology
University of Nebraska - Lincoln
Requests for reprints and additional information may be sent to John H. Flowers, Department of Psychology, University of Nebraska - Lincoln , PO Box 880303, Lincoln, NE 68588-0308, or by email at email@example.com. Examples of auditory displays described in this paper may be observed at http://psynext.unl.edu/~jflowers/icad96.html , and cassette tape versions of these examples may be obtained by submitting a blank tape to John H. Flowers.