/
1111111111.Ifshelistens,shewillunderstand.Score=410111112.Whyhadtheyli 1111111111.Ifshelistens,shewillunderstand.Score=410111112.Whyhadtheyli

1111111111.Ifshelistens,shewillunderstand.Score=410111112.Whyhadtheyli - PDF document

trish-goza
trish-goza . @trish-goza
Follow
375 views
Uploaded On 2016-03-20

1111111111.Ifshelistens,shewillunderstand.Score=410111112.Whyhadtheyli - PPT Presentation

Person Cronbach Item Form Items Persons Alpha bility 154 96 96 Table2ReliabilityforitemsinFormDtherenedtestRSMRawScoretoMeasureItwasclearfromthepilotstudyandfromFormDthatsomeEIsentencesperf ID: 262802

Person Cronbach Item Form Items Persons Alpha bility 154 .96 .96 Table2:ReliabilityforitemsinFormD therenedtest(RSM=Raw-Score-to-Measure).ItwasclearfromthepilotstudyandfromFormDthatsomeEIsentencesperf

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "1111111111.Ifshelistens,shewillunderstan..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1111111111.Ifshelistens,shewillunderstand.Score=410111112.Whyhadtheylikedpeassomuch?Score=311011113.Bigshipswillalwaysmakenoise.Score=30101011014.Weshouldhaveeatenbreakfastbynow.Score=0(They)(eat)(right)111001111010001115.Ifherheartweretostopbeatingwemightnotbeabletohelpher!Score=0(will)(be)(will)(being)Figure2:ScoringsomesampleEIsentences.Persons-MAP-Items&#xhigh;&#x-602;&#x.200;ability|item&#xhigh;&#x-554;difficulty110+06100+|0790+78||6080+|404270+397275T|083841|4345737471|S76|093741|1060404243+112377S|36592259|60676968|031235383957586162|135725566370|345658502124345566+M0414334433353652546465M|1548461226323749|1632550213454850|3147545153|17305007081619273047|19184006101114171846S+25285301043144|2429092829|0123262752030515|S20|224930T+21|20|T0520+|5110+02a&#xlow-;؂.; bility|&#xlow-;啀difficultyPersons-MAP-Items&#xhigh;&#x-602;&#x.200;ability|item&#xhigh;&#x-554;difficulty100+20190+201501|40080+70+401468|T|S001502|50060335344544+101101201307466594|001101200301502335446579|100303S|000400500501503301409|20140150050350302346351354356411467534562+M101101101201502503262303313322332355429431485493575|100100402501502308397416421477495571M|00010010040201299354364380383468494|202217241279290381387394487|00100100110050140153325+160206208402435|300302127183382457S|S200301500501193224318|002500145|40030168+300|40120+|00010+203+&#xlow-;؂.; ability|&#xlow-;啀difficultyFigure3:IRTanalysesforFormA(a)andFormC(b).2.3.AnalysisofresultsTable2presentsthereliabilitystatisticsfortheitemanal-ysisperformedontheoutcomesofFormD.Oneitemwasrepeatedcorrectlybyalloftheparticipantsandtwoitemswerenotrepeatedcorrectlybyanyone,sothesewerere-moved,hencethe57itemsanalyzed.Twosubjectshadin-completedataandwereremovedfromtheanalysis. Person Cronbach Item Form Items Persons Alpha bility 154 .96 .96 Table2:ReliabilityforitemsinFormD,therenedtest(RSM=Raw-Score-to-Measure).ItwasclearfromthepilotstudyandfromFormDthatsomeEIsentencesperformwellinassessingstudents’abilities,andothersdon’t.Whetheranitemwillinfactperformwellornotisnotaprioriobvious.Toillustrate,theitemslistedinFigure4(a)performedwellwhereasthoseinFigure4(b)didnot.Table3showshowwelltheEIscoringcompareswiththatofthevarioushuman-administeredtestinginstruments.The0.92correlationbetweentwodifferentmethodsofscoringtheEItestsuggestthateithermethodworksaboutequallyaswell.NoticealsothatcorrelationsbetweentheEItestscoresareonthesameorderasintercorrelationsamongtheotherfourmoreconventionalmethodsofmeasuringorallanguageciency.Inparticular,theEIcorrelateswiththeOPIaswellorbetterthantheinformalplacementinterviewandthetwocomputerizedspeakingtests,whichrequiremuchmoretrainingandtimetoadministerandscore.OurworkindevelopingandadministeringtheEIinstru-mentinvolvedpresentinglargenumbersofEIitemstoal-most400ESLstudents,whoseresponseswereveryconsis-tent.Furthermore,overallcomparisonsbetweenEIscoresandscoresonothermeasuresoforallanguageprociencywereverypromising.InourworkEIwasshowntobea Automatingthetaskinvolvesanontrivialintegrationwithalreadycomplexsystems.ThespeakersinEItestsarenon-nativespeakerswith(sometimesheavy)L1accents,whereasASRmodelsaretunedandtrainedforrecognitionofnativespeak-ers.ThereisagranularitymismatchinthedatasinceEIscoresaredoneatthesyllablelevelwhereasASRscoresarecomputedatthewordlevel.Ontheotherhand,severalconsiderationsmaketheEI/ASRnexuscompelling:SincehumanscanscoreEIfollowingstrictscoringcri-teria,itisreasonabletoexpectthatonecouldautomatethetask.Theexpectedinputforanygiventestsentenceisal-readyknown,sotheASRtaskismuchmorecon-strained.TheASRtaskcanbedevelopedwithopen-sourcetechnology.Thereisasizablepotentialeconomicbenetifthetestcanbedeliveredonalargescale,shortturnaroundtime,andatlowcostwhencomparedtohumanscor-ing.Theprocedurecanbeappliedtoscorelearnersofotherlanguages,providedASRmodelsareavailableforthoselanguages.Inanefforttoexplorethetradeoffsjustmentioned,weini-tiatedresearchindevelopinganASRcapabilityforscoringtheEIsessions.Tosummarize,itinvolved:1.convertingthelestoanappropriateformat;2.testinghowwellSphinxscoresrstthenativemodelutterancesandtheniterativelyreningthiscapability;3.testinghowaccuratelytheASRenginescoresonnon-nativesubjects;4.iterativelyreningtheASRengineonnon-nativesub-jectrecordings;andthen5.tryingthesystemonunseendataandcomparingtheresultstohumanevaluationscores.Theiterativerenementprocessinvolvedtryingoutdiffer-entrecognizerparameters,grammarandlexicalspecitions,andlanguagemodels.Wediscusseachiterationinimprovingthesystem’sperformanceon(rst)nativemodelutterancesand(then)non-nativesubjectutterances.Wordrecognitionratesrangedrstfromthelow70%sfornativemodelstoeventuallythehigh80%sfornon-nativesubjectsasaresultofourimprovementstotheASRsystem.3.1.1.ProcessingnativeutterancesOurtestingoftheASRperformanceonthenativemodelspeakersproceededincrementally.Webrieysummarizethestagesofdevelopmentundertakeninthisphase.First,tominimizegrammarengineeringattheonset,ourgrammarsconsistedofsimplyallwordsusedintheEIforminanyorder,thusassumingword-levelindependence.NootherconstraintsoradaptationsweremadetotheASRen-gineortheknowledgesources.Thisallowedforrapidsys-temdevelopmentandaconservativebaselinetocomparefutureworkon.Theinputswerescoredatthesentencelevelonanbinaryaccept/rejectbasis.Theresultwasa71%recognitionaccuracyrate.Ofcoursethisleftroomforim-provement,butweweresomewhatsurprisedthattherstattemptswerethispromising.Becauseoftheunconstrainednatureofthegrammar,theperplexitywashighandthuswehadreasontobelievethattheresultscouldbeimprovedon.Wenextproceededtodevelopafullgrammarwhereallsen-tencesforcedthesystemtorecognizethewordsinthecor-rectorder.Thisreducedperplexityconsiderablyandhencethescoringwasmuchfaster.Onedrawbackwiththistypeofanalysiswasthatthesentencehadtooccurintoto,sothatin:iisawhersawherruniisawherisawherruntheformerutterancewouldberejectedbutthelatterwouldbeacceptedsincethewholesentenceisutteredinonechunk.Giventhissetupthesystemachievedan81%recog-nitionaccuracy.Wealsodevelopedvisualizationtoolstohelpanalyzethescoringdataandthushelpndproblematicitemsanddif-cultareaswithinthem.Italsobecameclearthatsomeoftheleswereclippedprematurelyatthebeginning,result-inginlowerscoresuntilwepaddedthesoundleswithleadingsilence,whichhelpednoticeably.Thenextlevelofeffortinvolvedforcingthesystemtouseafullyspeciedgrammar,butonlyforthesentenceinques-tion,whenprocessinganitemle.Notethatthisisonlypossiblewhenthemodelutteranceisknownapriori,whichisthecaseforEItests(butnotfortypicalASRapplications,e.g.speechtranscription).Thisyieldedanaccuracyofjustabove90%.UptothispointwehadbeenusingtheHub4acous-ticmodel,whichistrainedonbroadcastnews.Replac-ingthemodelwiththeWallStreetJournal(WSJ)modelboostedtheword-levelaccuracyscoreto99.7%formenandwomen,witha93%accuracyrateatthesentencelevel.3.1.2.Processingnon-nativeutterancesEncouragedbytheresultsforthenativemodelspeakers,weproceededtoevaluatehowwellthesystemworkedonthenon-nativesubjectutteranceles,whichwerescoredbyhu-manjudges.Thisprocessalsoinvolvedseveraliterations.Fortherstiterationwejustcomputedthematchbetweensentence-levelASRonthepilottestdatawehadbeenwork-ingwith.Thisattempt,onFormsA,B,andC,resultedina0.88correlationwiththehumanscores.However,weknewthatwewouldeventuallyneedtode-velopascoringsystemthatwouldtakeintoconsiderationthegranularitymismatchinscoring.RecallthatASRis identifyingitemsandpassageswhereASRscoringdidnotperformwell.Finally,weintendtopursuethedevelopmentofEIinstru-mentsandASRmodelsfortestingL2learnerabilitiesforotherlanguagesbesidesEnglish.Ultimatelyourgoalistodeveloparun-timeadaptivespeak-ingtestthatcanbedeployedforEI-basedprociencyscor-ing,similartothosethatarecurrentlyinuseforevaluatingreadingandlisteningcomprehension.Ifadjustmentscouldbemadeinrealtime,thesystemcouldadjustselectionofEIitemsbasedonthesubject’sperformance,thuscalibratingthetestforamoreexactevaluation.5.AcknowledgementsWewouldliketothankMeghanEckerson,DanRasband,BenMillard,RossHendrickson,andKevinCookforlin-guisticandprogrammingsupportonthisproject.WealsoappreciatetheBYUEnglishLanguageCenterforitssup-portincarryingoutthevariouslanguagetestingactivities.6.ReferencesR.Bley-VromanandC.Chaudron.1994.Elicitedimita-tionasameasureofsecond-languagecompetence.InE.E.Tarone,S.Gass,andA.D.Cohen,editors,Researchmethodologyinsecondlanguageacquisition,pages245–261.LawrenceErlbaum,Hilldale.C.Chaudron,M.Prior,andU.Kozok.2005.Elicitedimi-tationasanoralprociencymeasure.Paperpresentedatthe14thWorldCongressofAppliedLinguistics,Madi-sonWisconsin.A.DevescoviandM.C.Caselli.2007.SentencerepetitionasameasureofearlygrammaticaldevelopmentinItal-ian.InternationalJournalofLanguageandCommuni-cationDisorders,42(2):187–208.S.Ervin-Tripp.1964.Imitationandstructuralchangeinchildren’slanguage.InE.H.Lenneberg,editor,Newdi-rectionsinthestudyoflanguage,pages163–189.M.I.TPress,Cambridge,MA.M.FujikiandB.Brinton.1987.Elicitedimitationrevis-ited:Acomparisonwithspontaneouslanguageproduc-tion.Language,Speech,andHearingServicesintheSchools,18(4):301–311.C.R.Graham.2006.Ananalysisofelicitedimitationasatechniqueformeasuringorallanguageprociency.InYijuChenandYiunamLeung,editors,SelectedPapersfromtheFifteenthInternationalSymposiumonEnglishTeaching,pages57–67,Taipei,Taiwan.EnglishTeach-ers’Association.G.Henning.1983.Oralprociencytesting:comparativevaliditiesofinterview,imitation,andcompletionmeth-ods.LanguageLearning,33(3):315–332.K-F.Lee.1989.AutomaticSpeechRecognition:TheDe-velopmentoftheSPHINXSystem.KluwerAcademicPublishers,Boston,MA.S.W.Li,H.T.Lin,andH.Y.Chen.2005.Howspeech/textalignmentbenetsweb-basedlearning.InProceedingsofthe13thAnnualACMInternationalConferenceonMultimedia,pages259–260,NewYork,NY.ACMPress.D.Lonsdale,C.R.Graham,andR.Madsen.2005.Learnercenteredlanguageprograms:Integratingdisparatere-sourcesfortask-basedinteraction.InPanayiotisZaphirisandGiorgosZacharis,editors,UserCenteredComputerAidedLanguageLearning,pages116–132.InformationSciencePublishing,Hershey,PA.T.Vinther.2002.Elicitedimitation:abriefoverview.ternationalJournalofAppliedLinguistics,12(1):54–73. abilitytounderstandthesentenceandthenreconstructthemthroughtheirinterlanguagesystem,willvaryaccordingtotheiroverallspeakingprociency.Themoreprocientthespeakerthelongerandmorecom-plexwillbethesentenceswhichheorshecanrepeatac-curately.Thustheelicitedimitationtechniquepromisestoprovideanefcientandreliablemethod,albeitsome-whatindirect,ofmeasuringsecondlanguagespeakingpro-ciency.Figure1showssomesampleEIsentencesofvary-ingcomplexity.ShespeaksEnglish.Perhapsheworksthere.Doesthatwomanhelpherstudents?WhenIwasateenagerIwouldgototowneveryday.Ihopethatshelikestheplaybecauseifshedoeswe’llhaveaparty.Hesitatingbeforeshespokehernextline,theactressreachedthepinnacleofhernervousness.Figure1:SomesampleEIsentences.2.TheElicitedImitationStudyrstpartofthisresearchinvolveddevelopmentandnementofanelicitedimitationinstrument,whichpro-ceededintwophases.Inthissectionwesketchtheprocessforbothphases.2.1.ThepilotstudyrstphaseinvolvedapilotstudywherethreeseparateEItests(FormsA,B,andC)weredevelopedinparallel.Eachformhadsixtyitems(i.e.sentences),eachchoseninaccordwiththecriteriaestablishedinpreviousliterature(Chaudronetal.,2005).Theseincludedawidevarietyofmorphologicalandsyntacticstructuresinvolvingvariablessuchassentencelength,sentencecomplexity,vocabularylevels,andbreadthofsamplingstructures.Forexample,theitemsineachformrangedinlengthfromthreesyllablestotwenty-foursyllables.13itemswererepeatedonallforms,and47sentenceswereuniquetoeachform.High-qualityrecordingsofthesestimulussentencesinthethreeformsweremadeinastudiowithbothmaleandfemalevoices,andtheformsweretestedonadultnativespeakers.Subsequentlythethreeformswerepresentedinparallelto232ESLlearnersinanintensiveEnglishpro-gram(IEP)intheU.S.Thestudentsrepresented13widelyvaryingrst-language(L1)backgrounds,andprociencylevelsfromnovicetoadvanced.Theiragesrangedfrom18to53years(mean=24,s.d.Subjectslistenedtothestimulussentencesviacomputerswithmicrophoneheadsetsandrecordedtheirresponses,savingtheirsoundlestoaserver.Thesewereretrievedandeachsentencewasscoredforaccuracyindependentlybytwoseparatehumanraters.Eachraterusedtwosys-temsforscoringeachitem,oneusingafour-pointscaleperstandardrecommendations(Chaudronetal.,2005)andtheotherbysimplycountingthetotalnumberofsyllablesre-peatedcorrectly.Figure2showssomesamplescoreditems.Associatedwitheach(pseudo-)syllableiseithera1(meaningthesyllablewaspronounced)ora0(meaningitwasn’t).Thenalscoreforeachitemdependsonhowmanymissestherewereinthatitem.Wheredifferentwordswereused,theywerealsoannotatedparenthetically.Itemanalyseswereperformedonthesescoresandrelia-bilitycoefcientswerecomputedforeachform.Table1showstheveryencouragingresults.Figure3showstwoperson/itemmapsfromIRTanalyses,oneforFormAandoneforFormC.Ontheleftsideofeach,subjectscoresarepresentedonastandardscalewithmoreprocientlearnersatthetopandlessprocientonesatthebottom.Itemscoresarepresentedontherightsideofeach,withdifcultitemsatthetopandeasieritemsatthebottom.Meanscoresforpersonsanditemsaremarkedwithan“M”oneithersideofthemiddleline.Testdifcultycanbeascertainedbyobservingthedistributionofthepointsupthescale.Moredetailsonthepilotstudyandadeeperanalysiscanbefoundelsewhere(Graham,2006). Person Cronbach Item Form Items Persons RSM Alpha Relia- RSM bility A 58 78 .98 .97 .98 B 59 73 .99 .97 .98 C 60 72 .96 .96 .97 Table1:Reliabilityforitemsfromthethreepilotstudyforms(RSM=Raw-Score-to-Measure).2.2.TherenedtestFromthe60best-discriminatingitemsinthepilotstudywecreatedanewrenedtest,calledFormD.Theselectedsen-tencesrangedinlengthfromvesyllablestotwenty-twosyllables.FormDwasadministeredto156adultESLlearn-ersinthesameIEPprogram.TheycamefromtwelveL1backgrounds,theirEnglishprociencylevelsrangedfromnovicetoadvanced,andtheiragesrangedfrom18to55=24,s.d.).Onaveragethelearnerstookfromseventotenminutestocompletethetest.Onseparateoccasionswithinafewdaysoftheelicitedimi-tationtest,thesesubjectswerealsogivenadditionalspeak-ingtestsadministeredbyqualiedexaminers.Thesein-cluded:aninformal15-minuteplacementinterview,a30-minutesimulatedcomputeradministeredoralciencytest(ECT),a30-minutecomputerelicitedoralachievementtest(LAT),andanoralprociencyinterview(OPI)administeredbyedACTFLtesterstoastratiedrandomsampleof40ofthe156participants.TheutterancesfromFormDwerescoredbytwohumansasdescribedaboveforthepilotstudy.Inaddition,theseEItestresultswerecorrelatedwiththeoutcomesoftheseothertestingmodalities,asexplainedbelow. 1605 WhenshewenttoLasVegas,didsheliketheshowsthatshesaw?Perhapsheworksthere.Ifherheartweretostopbeating,wemightnotbeabletohelpher.Hadyoueverflownthathighbefore?Goodcarswillneverbreakdown.(a)Haveyouslept?Maybeshelikescats.Weeatcookies.Heshouldhavewalkedawaybeforethefightstarted.Howdogoodchildrenplaybaseball?Chrishasyelledlouderthantensheep.Figure4:Well-performingEIsentences(a)andpoorly-performingones(b). EITraditional EISyllable ECTSpeaking OPI OralPlacemt. LATSpeaking EITraditional 1 .925 .516 .658 .639 .551 EISyllable .925 1 .465 .648 .691 .414 ECTSpeaking .516 .465 1 .432 .577 .442 OPI .658 .648 .432 1 .660 .652 OralPlacemt. .639 .691 .577 .660 1 — LATSpeaking .551 .414 .442 .652 — 1 Table3:Pearsoncorrelationsofthevariousorallanguagemeasuresusedinthestudy.Caseswheresubjectstookmutuallyexclusivetestsareindicatedwith—.highlyreliablewayofmeasuringasingletraitoforallan-guageuse.However,exactlywhatthattraitisandthede-greetowhichitcorrelateswithothermeasuresoforallan-guageprociencycouldbefurtherelucidatedwithsubse-quentwork.Still,specicitemscanbeshowntoconsis-tentlydetractfromoraddtothereliabilityofthemeasure,anditistheseitemsthatinterestusmostparticularly.Fi-nally,ourworkhasshownthatthescoringprocedurede-velopedbyChaudronetal.(2005)appearstoworkrea-sonablywell,althoughotherproceduresshouldbeexper-imentedwith.ObviousadvantagesoftheEItechniqueoverconventionalmethodsoforallanguagetestingincludethat:thetestcanbeadministeredtomultiplelearnersatthesametime,itcanbeadministeredinaconventionalcomputerlabwithouttheassistanceofahighlytrainedoralinter-viewer,anditcanbescoredratherefcientlybyareasonablypro-cientspeakerofthetargetlanguage.Thislastadvantageismadeevenmoreinterestingbyourattemptsatdevelopinganautomaticscoringprocedureus-ingspeechrecognitiontechnology.Thisresearchwillbedescribedinthefollowingsection.3.SpeechrecognitionAutomaticspeechrecognition(ASR)involvesprocessingspokenlanguagetoextractitscontent.Itisacomplextaskcombiningphysics,engineering,mathematics,statis-tics,andlinguistics.Thecurrentstateofthearthaspro-ducedaccuracyrangesfrombarelytolerabletoverygood,dependingontheparticularapplication.InthisregardASRisjustbecomingpracticalandviableinsomedomainsforEnglish,thoughgenerallyitislesswelldevelopedformostotherlanguages.ThoughtherearenotablecommercialenterprisesinvolvedinASRdevelopment,thetechnologyisbecomingincreasinglymoreavailableinopen-sourcerepositories.Ourpriorwork(Lonsdaleetal.,2005)hasfocusedonseveralASRapplicationsfromdialoguetolan-guagepedagogy.Conceptually,ASRinvolvestakinganinputacousticsig-nal(pre-digitizedifnecessary)andsamplingitatregularintervals.Thesamplesarethenanalyzedforfeaturesthataresalientfordownstreamprocessing.Thepropertiesofeachsamplearesentthroughaclassiertoascertainwhichlanguagesounds(orphones)bestmatchthesampleinques-tion.ForthisprojectweusedtheSphinxASRengine(Lee,1989)whichwasdevelopedatCMU.Wemanipulatedthreemaincomponents:therecognizer,whichhandlesthesignalprocessing;thegrammar,whichspeciesthelanguagemodel,andthelinguist,whichmanageslexicalandphonologicalpropertiesofthewordsinthelanguage(inthiscaseEnglish)3.1.ASRforEITheapplicabilityofASRforEIisaninterestingquestionthattoourknowledgehasnotreceivedanyattentioninthecurrentresearchliterature.ThereareaprioriafewconsiderationswhichmayleadonetosuspectthatscoringEItestswithASRmightperhapsbeproblematic:ASRisstillanemergingtechnology. Formoredetailsseehttp://psst.byu.edu. 1607 scoredatthewordorsentencelevelwhereasthehumanjudgesprovidedsyllable-levelscores.Wethereforedevel-opedascoringsystemthatmapsfromthesyllableleveltothewordlevelandcomputescorrelationsaccordingly.Evenwiththisindirectioninthescoringmechanism,thesystemachieveda0.85correlation.NotethatthisisinspiteofthefactthattheASRsystemhasnosyllable-levellanguagemodel.Theslightlossinthisrubricformeasurementwastolerablesincethescoringapproachismuchmoreecologi-callyvalid.Finally,weupgradedthegrammarbystrategicallyintro-ducingmorewildcards.ThisresultedinanalcorrelationforformsA,B,andCof0.90withthehumanscores.Fig-ure5showsascatterplotofthescoresobtainedduringthisdevelopmentcycleforscoringallitemsusedintheEIpilotstudy.Inaperfectlycorrelatedsystemthepointswouldallliealongthediagonal. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Human score ASR score 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Human score ASR score 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Human score ASR score Figure5:CorrelationofASRscoreswithhumanscoresforFormsA,B,andC(developmentdata).Ofcourse,avalidtestofourworkwouldneedtobecar-riedoutonheld-outorunseendata.Havingrenedthesystemwiththepilotdata(i.e.FormsA,B,andC)wepro-ceededtotestitonFormD.Someworkwasrequiredtoreformatthehumanscoresforthisform,whichwasanno-tatedinaslightlydifferentmanner.Theacousticdatawasalsorecordedusingdifferenttoolsandthushadtoundergoanotherconversionprocess.Still,thedataforFormDwasforallintentsandpurposesunseen.ThescoringobtainedbythesystemonFormDachieveda0.83correlationwiththehumanscoresforthisform.Figure6showsascatterplotoftheseresults.Asanalcheckonourresultswealsoranothervalidationtests.WeconsolidatedFormsA,B,C,andDtogetherandselectedrandomsubsetsofdatafromeachform,creatingnewtestsets.Onallofthese(sub)setsweachievedcorre-lationsofbetween0.85and0.88.Thoughtechnicallythedatainthesesetswasnolongerunseen,weviewthesead-ditionalresultsasencouragingsupportfortheresultswehavebeenachieving,sincetheyprecludeanypossibleef-fectsduetotemporalorscoringpracticefactorsacrosstheforms.4.FutureworkWeanticipatefutureworkinseveraldirections. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Human score ASR score Figure6:CorrelationofASRscoreswithhumanscoresforFormD(unseendata).First,weareintheprocessofreningtheEIinstrument,cullingoutthesentencesthatdonotperformwellintheASRscoring.ThiswillhelpproducefutureteststhatwillbeevenmoreamenabletoASRscoring.Weintendtocarryoutfurtherexplorationwiththeinter-relationshipsbetweenstudentresponsesandEIvariablessuchassentencelength,complexity,andvocabulary.Thisincludesdevelopingextensivecriteriaforcomplexityatalllevels:lexical,phonological,morphological,syntactic,andsemantic.Usingthesecriteriawewillbeabletobettercre-ateawiderangeofnewEIitemscoveringthespectrumofdifculty.Indeed,itmightevenbepossibletosemi-automatethisprocessandproduceaninteractivetoolforEIinstrumentdevelopment.Furtherexaminationofrespondervariablessuchaswork-ingmemory,nativelanguage,andagestillneedtobecar-riedout.Weexpectthatsuchfeatures,incombinationwiththerawEIscores,willbeusefulineventuallyapplyingma-chinelearningorsomeotherformofclassicationmethod-ologytobetterscorethestudents’responses.ManyoftheASR-relatedissuesthatremainincludespeechperformancedifcultiesincludingfalsestarts,theuseofcontractions,lledpauses,longhesitations,andpoorrecordingqualityonsomeresponses.WealsohaveseveralhundredmoreEIteststhathaveal-readybeenadministeredandthatstillneedtobescoredbyhumans.WeintendtoinvestigateEIwithotherlanguages.Ofcourse,thiswillrequiretheappropriateASRinfrastruc-tureforthoselanguages.Theprocesswillinvolvedevel-opinganEIinstrumentforthatlanguageandverifyingthatitcorrelateswellwithhumanscores,integratingthestu-dentrecordingswithanASRenginedevelopedforthatlan-guage,andexecutingtheprocessasdescribedinthispaper.AnotherASRdevelopmentpossibilityistotrainuponeormorenon-nativeacousticmodelsfortheASRcomponent.Thiswouldimprovescoringnon-nativespeech.Howeverthistaskseemsunlikelyintheneartermsincethereisstillapaucityofannotatedcorporathatcouldbeusedtotrainuparecognizerforthispurpose.Wearealsoworkingtowardusingforcedalignment(Lietal.,2005)asadiagnostictool.Thiswillbehelpfulinbetter 1609