/
Phonemic models of spoken language are incapable of accommo-dating the Phonemic models of spoken language are incapable of accommo-dating the

Phonemic models of spoken language are incapable of accommo-dating the - PDF document

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
396 views
Uploaded On 2015-08-20

Phonemic models of spoken language are incapable of accommo-dating the - PPT Presentation

ABSTRACT In traditional models of spoken language words are representedstrictly as phonemic sequences strung together like ID: 111739

ABSTRACT In traditional models spoken

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Phonemic models of spoken language are i..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

ABSTRACT Phonemic models of spoken language are incapable of accommo-dating the patterns of pronunciation variation observed in sponta-neous speech (as exempliÞed by a corpus of American Englishtelephone dialogues, a.k.a. SWITCHBOARD). Variation in pro-nunciation with respect to segmental identity and duration can betion of the segment within the syllable (i.e., onset, nucleus, coda),in tandem with knowledge of the associated stress-accent pattern,tion contained in the acoustic signal. Many properties of pronun-ciation variation can be accounted for in terms of such a model,including: (1) the prevalence of coda deletion, (2) the mutability In traditional models of spoken language words are representedstrictly as phonemic sequences, strung together like Òbeads on ae Òbeads on adictionary, with little (if any) provision made for prosodic andother extra-phonetic features.Do such linear, phonemic models provide an accurate character-ization of spoken language? Probably not Ð for if they did, cur- spoken language relative to a strictly phonemic representation. It is a canonical alized (i.e., deleted) in SWITCHBOARD [6]. Moreover, the syl-speakers articulate in terms of syllables, rather than phonemes[14]. There is also increasing evidence that the syllable is a per-ceptually important unit for decoding spoken language, perhapsen language, perhapsOne of the more interesting properties of the syllable is its capac-ity for absorbing certain extra-phonetic properties pertinent to theprosody of an utterance. A syllable can be characterized not onlyas a sequence of phonemes, but also in terms of its ÒprominenceÓ rela- to the surrounding syllabic context [1]. The linguistic mani-festation of prominence is Òaccent.Ó Accent is an integralcomponent of a languageÕs prosodic representation and is oftenrelied on for lexical, syntactic and semantic disambiguation [1][13]. It also provides important information concerning the emo-tional tone of the speaker.In English, accent appears to function essentially as a two-level Beyond the Phoneme: A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer Science Institute1947 Center Street, Berkeley, CA 94704 USA{steveng, hmcarvey, leahh, shawnc}@icsi.berkeley.edu Proceedings of the Human Language Technology Conference (HLT Ð 2002), San Diego, California, March 24-27, 2002 system (i.e., accented vs. unaccented); however, syllables withaccent, can assume a graded quantity (such as ÒheavyÓ andÒlightÓ). For this reason the analyses presented in the currentstudy partition the stress-accent ÒspaceÓ into either two (heavilyaccented vs. unaccented) or three levels (heavy, light and none),even though the annotation material from which they are derivedlabeled accent with a Þner degree of granularity (cf. Section 2 forTraditionally, accent has been thought of as a linguistic parameterlargely independent of the phonetic tier, whose realization is func-through which accent is imparted (e.g., [2]). However, the currentstudy calls this assumption into question; many phonetic proper-ties encapsulated in pronunciation variation observed in spontane-stress accent and its impact on syllable structure. Many phoneticproperties of a segment are governed by its position within theconcert with accent (i.e., accent differentially affects the phoneticrealization of syllabic constituents). Together, accent and syllableposition, provide the structural framework required to predict (andunderstand) the pattern of pronunciation variation observed in theSWITCHBOARD corpus. A syllable-based, juncture-accentmodel of spoken language, incorporating these insights, is2. CORPUS MATERIAL AND METHODSThe Switchboard corpus [3] contains many hundreds of brief (5-10 minute) telephone dialogues of a casual nature, spoken bynative speakers of American English (from most major dialectregions). A subset of this material (45.43 minutes, consisting of9,922 words, 13,446 syllables and 33,370 phonetic segments,comprising 674 utterances spoken by 581 different speakers) washand-labeled (by students in Linguistics from the University ofCalifornia, Berkeley, using Entropics Software to concurrentlydisplay the pressure waveform, spectrogram, word- and syllable-level transcripts) with respect to phonetic-segment identity andlevel of stress accent (for each vocalic nucleus). The mean dura-tion of each utterance transcribed was 4.76 seconds (the range wasonds in length), and the average number of words per utterancewas 18.5 (range: 2 to 64 words). The average number of syllablesper utterance was 23.25 (range: 5 to 81 syllables). Filled pauses(e.g., ÒumÓ and ÒuhÓ) were excluded from analysis because of thehigh proportion of non-linguistic attributes associated with suchThree transcribers phonetically labeled and segmented the mate-rial. The phonetic inventory used is a variant of Arpabet, origi-nally applied to labeling the TIMIT corpus, but adapted to theexigencies of spontaneous material (cf. [6] for details of the tran- Figure 1The impact of stress accent on pronunciation variation in the Switchboard corpus, partitioned by syllable position andthe type of pronunciation deviation from the canonical form. The height of the bars indicates the percent of segments associatewith onset, nucleus and coda components that deviate from the canonical phonetic realization. The magnitude of the deviation isalso shown in terms of percentage Þgures for each bar. Note that the magnitude scale differs for each panel. The sum of the ÒDetions,Ó (upper right panel) ÒSubstitutionsÓ (lower left) and ÒInsertionsÓ (lower right) equals the total ÒDeviation from Canonishown in the upper left panel. Canonical onsets = 10,241, nuclei = 12,185, codas = 7,965. Adapted from [7]. scription orthography). The interlabeler agreement was 74%. Ananalysis of the pattern of interlabeler disagreement for vocalicsegments indicates that, in such instances, labelers typically dis-agreed only slightly, usually in terms of one level of height orwhether a segment is a monophthong or diphthong.Two individuals (distinct from those involved with the phoneticlabeling) marked the same material with respect to stress accent.Three levels of stress were distinguished Ð (1) fully accented(ÒheavyÓ), (2) completely unaccented (Òno accentÓ) and (3) anintermediate level of accent (ÒlightÓ). The transcribers weretually based accent rather than using knowledge of a wordÕscanonical stress pattern derived from a dictionary. All of thestress-accent material was labeled by both transcribers and theaccent labels averaged. In the vast majority of instances the tran-scribers agreed as to the stress-accent level associated with eachnucleus Ð interlabeler agreement was 85% for unaccented nuclei,78% for fully accented nuclei (and 95% for any level of accent,complete accord, the difference in their labeling was usually ahalf- (rather than a whole-) level step of accent. Moreover, dis-agreement was typically associated with circumstances wherethere was some genuine ambiguity in accent level (as determinedby an independent, third observer).3. STRESS ACCENTÕS IMPACT ON PRONUNCIATION3.1 Pronunciation Variation at the Level of the SyllableWe Þrst examine stress accentÕs differential impact at the level ofpronunciation patterns observed at the segmental level. In general,heavily accented syllables are far more likely to be realizedtionary of American English) than their unaccented (or lightlypronunciation variation associated with stress accent. Accented in particular, are extremely likely to be pronounced canon-ically, consistent with models of spoken language that highlightthe importance of onsets for lexical access [4][15]. The nuclei andcodas are far less likely to be canonically realized, and the likeli-hood of deviation from the canonical rises dramatically as themagnitude of accent level diminishes. Further insight is gained when the pronunciation patterns are par- of deviation observed Ð substitu-deviations (lower left panel) are to be found in the nucleus and areinherently vocalic in nature (cf. Section 3.2). Substitutions arerarely encountered in either the onset or the coda. Segmental dele-tion, on the other hand, is rarely observed in either the nucleus oronset, but is quite common in the coda. Insertions occur infre-quently and are concentrated in the onset. The absence of stresscoda constituent will deviate from the canonical. However,accentÕs impact is highly selective. Its inßuence is most apparentfor substitutions in the nucleus and deletions in the coda. Theunaffected by accent level (cf. Figure 1). 3.2 Pronunciation Variation in the Vocalic Nucleus Much of stress accentÕs impact on phonetic identity is found in theaccent on the phonetic composition and structure of the vocalicsystem. In heavily accented syllables there is a relatively even dis-tribution of vocalic segments across the articulatory space, partic-ularly with respect to front vowels. Back vowels are mainlyrepresented in terms of the diphthongs [ow] and [uw]. The articu-latory distribution of vowels differs markedly in unaccented sylla-bles. Within this context the overwhelming majority of segmentslie in the high-front ([ih], [iy]) and high-central ([ax]) regions ofthe articulatory space. Moreover, the proportion of low- and mid-height vowels is considerably lower than observed in accentedsyllables. Among unaccented syllables there is a decided skew inthe distribution towards high vowels for both canonical and non-canonical forms (cf. Fig. 4 in [7]). Changes in vowel height areheavily skewed towards raising in unaccented syllables (cf. Fig. 5in [7]). Overall, there is a tendency for lax, high vowels to occurprimarily in unaccented syllables and for low vowels to be presentwels to be present) 3.3 Pronunciation Variation in the Syllable Onsetically, particularly in accented syllables (cf. Figure 1). Only inunaccented syllables is there a signiÞcant tendency for a certainproportion of onsets to be non-canonical pronounced. The over-whelming majority of deviations within this context are in the Figure 2The impact of stress accent (ÒHeavyÓ and ÒNoneÓ) on the number of instances of each vocalic segment type in the corpus.The vowels are partitioned into their articulatory conÞguration in terms of horizontal tongue position (ÒFront,Ó ÒCentralÓ andÒBackÓ) as well as tongue height (ÒHigh,Ó ÒMidÓ and ÒLowÓ). Note the concentration of vocalic instances among the ÒFrontÓ andÒCentralÓ vowels associated with ÒHeavyÓ accent and the association of high-front and high-central vowels with unaccented syllables. The data shown pertain solely to canonical forms realized as such in the corpus. The skew in the distributions would be evengreater if non-canonical forms were included. Adapted from [7]. form of segmental deletions (cf. Figure 4). Most of these dele-Òthem,Ó Òthey,Ó ÒhimÓ and Òher,Ó the deÞnite article Òthe,Ó and thedemonstratives ÒtheseÓ and Òthose.Ó The deleted segment is usu-ally either [dh] or [h] (cf. Table 1). Both classes of segment occurin words that occur frequently and are therefore highly predict-able from context. The other common forms of deviation among onsets pertain toeither the insertion or substitution of junctures, of which the alve-e-)()()glides [w] and [y] are the most common variety. Such junctures(or lightly accented) one preceded by a more heavily accentedprecursor. The ßaps and glottal stop, in particular, are examplesof ÒpureÓ junctures in that they serve primarily as syllable separa-tors rather than as phonetic segments (an issue addressedThere are two other contexts in which onsets are likely to be non-canonically realized. The centrally articulated segments, [t], [d]gments, [t], [d]unaccented syllables. In many instances such segments are trans-formed into pure junctures (i.e., the ßaps [dx] and [nx]). Theother context pertains to the place Òchameleons,Ó whose speciÞcarticulatory locus depends on the surrounding vocalic context.These liquids, approximants and syllabics have many articulatoryand acoustic properties in common with vowels, and under manycircumstances behave more like vocalic than consonantal seg-ments. In unaccented syllables many of these segments eitherbecome reduced (e.g., reduced liquids) or disappear altogether.Under such circumstances segmental duration may serve as amore sensitive indicator of stress accentÕs impact on phoneticrealization than segmental identity (cf. Section 4 and [8]).3.4 Pronunciation Variation in the Syllable CodaThe coda is far less likely to be canonically pronounced than theonset (cf. Section 3.1 and Figure 1). Most of the deviationsobserved are in the form of segmental deletions; their frequencyis extremely sensitive to stress accent (cf. Figure 1). These coda deletions are of a highly selective nature. Virtuallynone of the anterior or posterior segments are deleted in any greatmeasure. The exceptions are [v], [m] and [ng] in unaccented syl-lables, all of which behave in a manner similar to ßaps (and purejunctures) in this context (the case of [v] is discussed in moreulated segments, particularly [t], [d] and [n], are extremely likelyto be non-canonically realized, even in heavily accented syllables(the level of accent exerts a signiÞcant impact on the probabilityof non-canonical pronunciation). In many contexts the defaultpronunciation of such segments is non-canonical (usually seg-more forward (and backward) counterparts (besides place ofarticulation there is a relatively even numerical distributionamong anterior, central and posterior segments (cf. Table 1), thecodas manifest a decided frequency skew towards the centralphones. Fully 75% of coda segments are centrally articulated (incanonical form). In other words, the default place of articulationfor coda segments is central. Anterior and posterior segments arerelatively rare, and in this sense are more ÒinformativeÓ in termsof lexical and syllabic differentiation. It is perhaps not coinciden-tal that the non-central segments most likely to be non-canoni-cally pronounced ([v], [m], [ng]) occur far more frequently thanPlace chameleons in the coda behave, in many respects, likevocalic segments, not only in terms of their segmental mutabilityas a function of accent, but also in terms of duration (cf. Section 4and [8]). They are likely to either delete or reduce in unaccented Figure 3Spatial representation of the mean proportion ofnuclei associated with syllables that are heavily stressed orcompleted unstressed as a function of vocalic identity. Vow-els are segregated into diphthongs and monophthongs forillustrative clarity. Note that the polarization of the y-axisscale for the unaccented syllables is the reverse of that asso-ciated with the heavily accented syllables (performed inorder to highlight the spatial organization of the data). Thex-axis refers to the hypothetical position of the tongue in thehorizontal place and is intended purely for illustrative pur-poses. From [11]. Figure 4The effect of stress accent on the type of pronunci-ation deviation from the canonical for syllable onset seg-ments. The three deviation forms shown (ÒSegmentDeletion,Ó ÒFlap JunctureÓ and ÒJuncture InsertionÓ)account for 76% of the non-canonical segments in onsetposition. Adapted from [7]. Syllable OnsetSyllable CodaStressHeavyLightNoneTotalHeavyLightNoneTotalMannerVoicingPlaceSegCanTranCanTranCanTranCanTranCanTranCanTranCanTranCanTranStopÐp20320515315394944504523332393217138977Stop+1261272272252141905675429644111411Nasal/J+m1371372112111161104644581089614814811283368327Fric Ðf136136104104113103353343373640403648113124Fric/J+v353358581089320118463551028717294337236FricÐth626110210028261921871110241634206946Fric/J+dh95803112576254511031788Glide+y6372135136193145391353ÐÐÐÐÐÐÐÐStopÐ24124527623051327610307513221265751915621721459489Stop+d14114314913417312846340520011929512737096865342Flap/J+dx06201790244ÐÐÐÐÐÐÐÐNasal+n13313523719619413056446131123749838177354215821160Flap/J+nx0400730115ÐÐÐÐÐÐÐÐFricÐs289290284287187186760763142135202214151155495504Fric+z1413161643457374179149258208271221708578StopÐ1851861891871701685445411701501961625139417351Stop+g11511613813754513073041010810452225Nasal/J+ng002311346360139126203129405315FricÐsh2626404073801391469922461517Fric+zh01291117132710040216AffricÐch32341927222373842625272512126562Affric+jh3130524358481411211010111015123632Glide+w201209310330276287787826040206012Junct+q03306403801350420710540167Liquid+C27226923321523316273864620518326021618168646467Liquid+l184180226212220162630554183901696212034472186Aprox+ (Ð)hh1581561691576737394350ÐÐÐÐÐÐÐÐSyllab+er00020002080201011Red liq+lg0210310670460100123Syllab+el Table 1The impact of stress accent and syllable position (onset vs. coda) on the likelihood of canonical pronunciation as a funof phonetic identity organized by place and manner of articulation, and by voicing. Numbers refer to instances of canonical andtranscribed (i.e., actual) instances for each segment. Segments for which there is a signiÞcant discrepancy between the number canonical and transcribed occurrences (indicative of non-canonical pronunciation) are marked in BOLD. Voicing is indicated aseither present (+) or absent (Ð). Abbreviations: Affric Ð Affricate; Aprox Ð Approximant; Can Ð Canonical; Fric Ð Fricative; JÐ Juncture; J Ð Juncture; Red liq Ð Reduced liquid; Syllab Ð Syllabic; Tran Ð Transcribed. Ò/JÓ Ð segment can be a pure juncture. syllables. Such segments behave, in certain respects, like the glideportions of diphthongs. In accented syllables such segments aregments are)gether. In the latter instance, the net result is typically just a slightchange in the quality of the preceding vowel. In this sense, dele-tion of a coda chameleon can often be interpreted as a vocalictransformation (i.e., substitution) rather than as a true segmental4. STRESS ACCENTÕS IMPACT ON Durational variation provides a means separate from segmentalidentity with which to gauge stress accentÕs inßuence on the pho-netic properties of the syllable; the patterns observed complementand extend those described in Section 3.4.1 Durational Variation of the SyllableThe range of durations associated with syllables of variable struc-ture and stress-accent magnitude is shown in Figure 5 (upper leftpanel). Heavily accented syllables are generally 60-100% longerthan their unaccented counterparts. Overall, syllable length islargely dependent on the number of phonetic constituents, butstress accent also plays a decisive role. Syllables of brief duration()ely to be unaccented (unless they contain only asingle segment), while those longer than 300 ms are likely to beheavily accented. The average duration of a segment (irrespective100-150 ms in their heavily accented counterparts. Virtually allsyllables shorter than 110 ms are unaccented. The largest dispar-ity between heavily accented and unaccented forms is found insyllables with one or no consonants (i.e., V, CV and VC forms).Such data imply that the vocalic nucleus absorbs much of stress-accentÕs impact on duration (cf. Figure 5, lower left panel). 4.2 Durational Variation of the Vocalic NucleusVocalic segments associated with heavily accented syllables are,on average, more than twice as long as their unaccented counter-parts, irrespective of syllable structure (Figure 5; lower leftpanel). The average duration of vowels in unaccented syllables isexceedingly short (55-75 ms), particularly for nuclei surroundedforms). The duration of vocalic segments in heavily accented syl-lables is far longer, ranging between 126 and 172 ms (on aver-age). In this sense, the durational properties of vocalic segmentsdepends largely on the stress-accent level of the syllable. How-ever, the detailed relationship between vowel duration and stressThe disparity in duration between vocalic segments in heavilythongs, as well as low, tense monophthongs exhibit a relativelylarge disparity between heavily accented and unaccentedinstances of the same vocalic segment, while there is relatively lit-tle difference in duration as a function of stress-accent magnitudeference in duration as a function of stress-accent magnitude[uh]). These data are interpretable within the framework illus-stress-accent level and vowel height. The low and mid vowels, bethey diphthongs ([ay], [aw], [ey], [oy], [ow]) or monophthongs([ae], [aa], [ao], [eh], [ah]), are more likely to exhibit full stressaccent than their high vocalic counterparts (and conversely, thehigh vowels are far more likely to lack accent entirely). In a sense,such high, lax monophthongs as [ih], [ix], [ax] and [ux] are inher-ently unaccented. Therefore, duration, as reßected in stressaccent, is unlikely to fully manifest its impact in such segments.The signiÞcance of this relationship between vowel height andstress accent is perhaps most easily understood in light of the cor- Figure 5The impact of stress accent on duration of the syllable (upper left panel) and as well as its segmental constituents (oupper right panel; nucleus Ð lower left panel; coda Ð lower right panel) for a variety of syllable structures. Adapted from [8] relation between vowel height and duration (Figure 7). The highvowels, whether they be diphthongs ([iy], [uw]) or monoph-y be diphthongs ([iy], [uw]) or monoph-)than their mid- and low-height counterparts. Moreover, the differ-ence is largely proportional to vowel height Ð the lower thevocalic segment, the longer it tends to be, all other factors (suchas stress-accent level) being equal. The low monophthongs (i.e.,[ae], [aa], [ao]) behave more similarly to their low diphthongalcounterparts (i.e., [ay], [aw]) than to other monophthongs, sug-gesting that vowel height is a primary factor underlying vocalicduration (and vice versa).4.3 Durational Variation of the Syllable Onsetstructure and stress-accent level is shown in the upper right panelof Figure 5. The average duration of unaccented onsets is similaracross syllable types, while those pertaining to heavily accentedsyllables varies relatively little. The disparity associated withonset duration in heavily accented and unaccented syllables isunaccented counterparts), although not quite as large that associ-Most onset segments exhibit a modest (but signiÞcant) differencein duration between the highly accented and unaccented varieties,comparable to the averages shown in Figure 5. However, certainsegments, such as [dh] (as in ÒtheÓ) and [dx] (as in ÒriderÓ)exhibit little difference in duration as a function of accent level.These are the same segments that tend to be non-canonically real-ized in onset position. Thus, some relation between duration andsegmental identity is likely to exist as they relate to stress accent.Although the durational disparity between onset segments associ-ated with heavily accented and unaccented syllables is not nearlyas great as observed among vocalic nuclei, the general patternsobserved are broadly consistent. In both constituent forms, seg-ments that rarely occur in heavily accented syllables exhibit rela-tively little difference in duration as a function of stress-accentlevel (consistent with the durational properties of [dh] and [dx]).4.4 Durational Variation of the Syllable CodaThe mean duration of coda segments is shown in the lower rightpanel of Figure 5 for a variety of syllable structures. The dura-tional patterns observed are rather stable across syllable form.Coda segments in heavily accented syllables are only 23 to 31%longer (on average) than their unaccented counterparts. The dura-tion of coda constituents appears far less sensitive to stress accentthan observed in nuclei or onsets. A closer examination of thedurational disparities between codas in heavily accented andunaccented syllables reveals a variety of interesting patterns. Atthe low end of the durational spectrum are the ÒpureÓ junctures([dx], [nx] and [q]) comprising the alveolar and nasal ßaps, alongwith the glottal stop. These segments are uniformly short (40-50ms) and exhibit virtually no distinction in duration as a functionof stress-accent level. As discussed in Section 3.3, such segmentsfunction largely as syllable dividers and are largely devoid of adistinctive segmental identity. The durational properties of theapproximants ([r] and [l]) exhibit a very different pattern. Theduration of both segments is 67-93% longer in accented syllablesrelative to their unaccented counterparts. This durational dispar-ity is more typical of vocalic nuclei than consonantal codas. Amore variable pattern is observed among the remaining coda seg-g-5. A JUNCTURE-ACCENT MODELtional segmental models of spoken language. In particular, theconcept of the phoneme is difÞcult to reconcile with the pronun-ciation patterns observed in the SWITCHBOARD corpus. Thevariation observed suggests that the neme is the basic organizational unit of spoken language at thesub-word level. Moreover, prosody, in the guise of stress accent, Figure 7Spatial representation of the mean durationalproperties of vocalic nuclei organized by stress-accent mag-nitude and dynamic status of the vowel. The x-axis refers tothe hypothetical position of the tongue in the horizontalplace and is intended purely for illustrative purposes. Notethat the durational scale on the y-axis differs for each of thesix plots. From [11]. Figure 6Mean duration of vocalic nuclei in the annotatedSWITCHBOARD corpus as a function of stress-accent mag-nitude. The duration of vowels in heavily stressed syllables isshown in black, while the duration of vowels in unstressedsyllables is illustrated in grey. Data shown are associatedwith canonical realizations of the vowels only. Data associ-ated with the intermediate level of stress accent is omittedfor illustrative clarity. From [8]. cation of pronunciation; accentÕs inßuence is differentially dis-tributed across the syllable, both in terms of its magnitude andA qualitative model, consistent with the pattern of pronunciationvariation observed, is illustrated in Figure 8. At the heart of thismodel is the concept of juncture and accent, which can be likenedto a mountain range containing peaks and valleys. A peakÕsheight is associated with the stress-accent level of the syllable.Heavily accented syllables have tall peaks, while unaccentedforms are associated with small peaks (as illustrated for the wordÒsevenÓ in Figure 8). The foothills, ascending to the peak, areassociated with the syllableÕs onset, while the steep crevasse fol-lowing the peak is linked to the coda. Within this perspective allconstituents of the syllable are inextricably linked together. ApeakÕs height affects not only that particular constituent of thetopography (i.e., the vocalic nucleus), but also the onset, whoselength is directly related to the syllableÕs magnitude and to alesser extent the coda. Peaks and valleys are separated by junc-tures of various types. A ÒpureÓ juncture is associated with the [v]in ÒsevenÓ (Figure 8). Its acoustic signature is a substantialdepression of energy across the topographyÕs entire bandwidth,and serves primarily to demarcate one (heavily accented) syllablefrom another (less accented one). The onsets convey far morelexically distinctive information than the codas by virtue of thedistribution of phonetically contrastive features (cf. Sections 3.3and 3.4). In this sense codas contain relatively little informationand are therefore more readily ÒexpendableÓ than onsets. Thenuclei set the ÒregisterÓ for decoding the nuclei and the onsets,providing crucial information for interpreting the acoustic signal segmentalrepresentation; but this interpretative machinery requires an accu-rate estimate of stress accent to perform at an optimal level. 6. ACKNOWLEDGEMENTSThis research was supported by the U.S. Department of Defenseand the National Science Foundation. We thank Candace Cardi-nal, Rachel Coulston, Jeff Good and Colleen Richey for tran-scribing portions of the SWITCHBOARD corpus.REFERENCES[1]Beckman, M. Stress and Non-Stress Accent. Fortis, 1986.[2]Clark, J. and Yallup, C. Introduction to Phonology and [3]Godfrey, J.J., Holliman, E.C., and McDaniel, J. SWITCHBOARD: Telephone speech corpus for research and development, Proc. IEEE Int. Conf. Acoust. Speech Sig. Proc[4]Gow, D., Melvold, J. and Manual, S. How word onsets drive lexical access and segmentation: Evidence from Proc. Int. Conf. Spoken Lang. Process. (ICSLP)[5]Greenberg, S. Understanding speech understanding Ð Towards a uniÞed theory of speech perception. Proc. ESCA Tutorial and Advanced Research Workshop on the Auditory Basis of Speech Perception, Keele, England, [6]Greenberg, S. Speaking in shorthand Ð A syllable-centric perspective for understanding pronunciation variation. Speech Communication,[7]Greenberg, S., Carvey, H.M. and Hitchcock, L. The relation of stress accent to pronunciation variation in spontaneous American English discourse. Proc. Int. Conf. Speech Prosody, Aix-en-Provence, 2002.[8]Greenberg, S., Carvey, H.M., Hitchcock, L. and Chang, S. Temporal properties of spontaneous speech Ð A syllable-centric perspective, submitted to Journal of Phonetics2002 (available at: www.icsi.berkeley.edu/~steveng).[9]Greenberg, S. and Chang, S. Linguistic dissection of Proc. ISCA Workshop on Automatic Speech Recognition: Challenges for the New Millennium[10]Greenberg, S., Chang, S. and Hitchcock, L. The relation between stress accent and vocalic identity in spontaneous American English discourse,Ó Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding,[11]Hitchcock, L. and Greenberg, S. Vowel height is intimately associated with stress accent in spontaneous American English discourse,Ó Proc. 7th Int. Conf. Speech Tech. Comm. (Eurospeech), [12]Hockett, C.F. The origin of speech. [13]Lehiste, I. Suprasegmentals.[14]Levelt, W. [15]Marslen-Wilson, W.D. and Zwitserlood, P. Accessing spoken words: The importance of word onsets. Journal of Experimental Psychology: Human Perception and Performance[16]Sampson, G. University Press, 1985. Figure 8An illustration of a spectro-temporal proÞle(STeP) for a single, di-syllabic word, ÒsevenÓ taken from theOGI Numbers95 corpus. The STeP is derived from theenergy contour across time and frequency associated withmany hundreds of instances of ÒsevenÓ spoken by many dif-ferent speakers. Each instance of a word was aligned withthe other words at its arithmetic center. The mean durationof all instances of ÒsevenÓ is shown by the red rectangle.The STeP has been labeled with respect to its segmental andsyllabic components in order to indicate the relationshipbetween onset, nucleus, coda and realizations within the syl-lable and their durational properties. From [8].