/
The communicative legacy in language testing Glenn Ful The communicative legacy in language testing Glenn Ful

The communicative legacy in language testing Glenn Ful - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
457 views
Uploaded On 2015-05-18

The communicative legacy in language testing Glenn Ful - PPT Presentation

Lado in particular became a target for com municative testers It is argued that many of the concerns of the communicative movement had already been addressed outside the United Kingdom and that Lado was done an in justice Nevertheless the jargon of ID: 69247

Lado particular became

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "The communicative legacy in language tes..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

The`communicative'legacyinlanguagetestingGlennFulcher*EnglishLanguageInstitute,UniversityofSurrey,Guildford,SurreyGU27XH,UKReceived15December1999;accepted11April2000Thisarticlelooksatthephenomenonof`communicative''languagetestingasitemergedinthelate1970sandearly1980sasareactionagainsttestsconstructedofmultiplechoiceitemsandtheperceivedover-emphasisofreliability.Ladoinparticularbecameatargetforcom-municativetesters.ItisarguedthatmanyoftheconcernsofthecommunicativemovementhadalreadybeenaddressedoutsidetheUnitedKingdom,andthatLadowasdoneanin-justice.Nevertheless,thejargonofthecommunicativetestingmovement,howeverimpreciseitmayhavebeen,hasimpacteduponthewaysinwhichlanguagetestersapproachproblemstoday.Thelegacyofthecommunicativemovementistracedfromits®rstformulation,throughpresentconundrums,totomorrow'sresearchquestions.2000ElsevierScienceLtd.Allrightsreserved.Languagetesting;Communicativelanguagetesting1.Whatis`communicative'languagetesting?Spolsky(1976)suggestedthatthehistoryoflanguagetestingcouldbedividedintothreedistinctiveperiods:thepre-scienti®c,thepsychometric-structuralist,andthepsycholinguistic-sociolinguistic.Morrow(1979,p.144)translatedtheseperiodsintotheGardenofEden,theValeofTearsandthePromisedLand.ThePromisedLandwastheadventof`communicative'languagetesting(asMorrowchristenedit)attheendofthe1970sandtheearly1980s.Butwhatwas`communicative'languagetesting?Theapproachwasprimarilyarejectionoftherolethatreliabilityandvalidityhadcometoplayinlanguagetesting,mainlyintheUnitedStatesduringthe1960s.Thekeytargetstoattackforthenew System28(2000)483±497www.elsevier.com/locate/system0346-251X/00/$-seefrontmatter2000ElsevierScienceLtd.Allrightsreserved.PII:S0346-251X(00)00033-6 *Tel.:+44-1483-259910;fax:+44-1483-259507.E-mailaddresses:g.fulcher@surrey.ac.uk(G.Fulcher). `communicative'languagetesterswerethemultiple-choiceitemasembodiedintheTestofEnglishasaForeignLanguage(Spolsky,1995),andtheworkofLadoMorrow(1979,pp.146±147)characterisedreliability(asde®nedbyLado)asthesearchforobjectivity(intheuseofmultiplechoiceitems),convenientlyignoringthedistinctionthatLadomadebetween`reliability'and`scorability'(Lado,1961,p.31).Morrowfurtherclaimedthatwiththeexceptionoffacevalidity(andpossiblypredictivevalidity)thewholeconceptofvalidityiscircularbecauseitonlyexistsintermsofcriteria,allofwhicharerelativeandbaseduponquestionableassumptions.Itwasthetaskofthecommunicativelanguagetestertore-de®nereliabilityandvalidity.Inordertodothis,earlycommunicativelanguagetesterslatchedontoDavies'(1978)argumentthattherewasa``tension''betweenreliabilityandvalidity,andde®nedvalidityastheparallelismofreal-worldactivitiesandtesttasks.Thismeantthatvalidityinvolvedmakingthetesttruertoanactivitythatwouldtakeplaceintherealworld.Yet,reliabilitycouldonlybeincreasedthroughusingobjec-tiveitemtypeslikemultiplechoice.AsUnderhill(1982,p.18)argued:``thereisnoreal-lifesituationinwhichwegoaroundaskingoransweringmultiplechoiceques-tions.''(Itshouldalsoberememberedthatatthistimevaliditywasperceivedtobeaqualityofthetest,whichisnolongerthecase.)Thus,themorevalidityatesthad,thelessreliabilityithad,andviceversa.ThiswasexpressedmostclearlyinUnderhill(1982,p.17)whenhewrote:Ifyoubelievethatreallanguageuseonlyoccursincreativecommunicationbetweentwoormorepartieswithgenuinereasonsforcommunication,thenyoumayacceptthatthetrade-o€betweenreliabilityandvalidityisunavoidable.Similarly,Morrow(1979,p.151)hadarguedthat,``Reliability,whileclearlyimportant,willbesubordinatetofacevalidity.''Thatis,thekeycriterioninidenti-fyingagoodtestisthatitlookslikeagoodone,theinputappearstobe``authentic'',andthetaskoritemtypemirrorsanactofcommunicationintherealworld(Mor-row,1982,pp.56±57).Carroll(1982,p.1)wentasfarastosaythat,``Thecom-municativeapproachstandsorfallsbythedegreeofreallife,oratleastlife-like,communicationthatisachievedItwasarguedthatcommunicativetestswouldinvolveperformance(speaking),andtheperformancewouldbejudgedsubjectively,qualitativelyandimpres-sionistically,byasympatheticinterlocutor/assessor.Itisnotinsigni®cantthatCar-roll's(1980)bookwasentitled``TestingCommunicativePerformance'',ratherthan``TestingCommunicativeCompetence'',pickinguptheview®rstexpressedbyMorrow(1979,p.151)that`communicativetests'willalwaysinvolveperformance.Thebuzzwordsofearlycommunicativelanguagetestingsoonbecame:1.reallifetasks;2.facevalidity;3.authenticity;and4.performance.G.Fulcher/System28(2000)483±497 Inthejourneytothepromisedland,Morrow(1979,p.156)prophesiedthat,thereissomebloodtobespiltyet.''Underhill(1982,p.18)preached:``Asyeteach,soshallyetest.''Thecommunicativemovementsoondevelopedanantipathytostatisticalanalysisandlanguagetestingresearch,inwhichthe``cultofthelan-guagetestingexpert''wasdeplored(Underhill,1987,p.1),and``commonsense''portrayedasmoreimportantthan``thestatisticalsausagemachine''(ibid,p.105).Morrow(1991,p.116)wroteoflanguagetestingresearchthat:Atitsworst,itmighttemptustoparaphraseOscarWilde'sepigramabouttheEnglishupperclassesandtheirfondnessforfoxhunting.Wildespokeof`TheEnglishcountrygentlemangallopingafterafoxÐtheunspeakableinfullpursuitoftheuneatable'.Languagetestingresearchersmaynotbeunspeak-able;buttheymaywellbeinsearchoftheunmeasurable.Perhapspopularcommunicativelanguagetestingofthistypewasneitherrevolu-tionnorevolution.Itwasrebellionagainstaperceivedlackofconcernfortheindi-vidual,thehuman,the``subject''takingthetest;itwishedtoremovetheuseofso-called``arcanepseudo-scienti®cjargon'',andrejoiceinthecommonsenseoftheclassroomteacherwhocantellwhetheratestisgoodornotbylookingatit;itwishedto``re-humanise''assessment.Indeed,Morrow(1991,p.117)referredto``ethicalvalidity'',andaskedwhethertestingwouldmakeadi€erencetothequalityoflifeofthetest-taker.Heperceivedthisparticularlyintermsofwashback.However,healsorecommendedthattest-takersshouldbe``genuinelyinvolvedintheprocess''ofassessmentÐalthoughwhatthismightmeaninpracticewasnotexplored.2.Wasthisallsonew?Thenewcommunicativelanguagetestersbelievedthatthepromisedlando€eredsomethingdi€erentandbetterfromeverythingthathadgonebefore(Harrison,1983).ButasSpolsky(1995)andBarnwell(1996)remindus,theworkofthosewhohavegonebeforeisalwaysindangerofbeingforgotten(orevenmisrepresented),whichispreciselywhathappenedintheearlycommunicativelanguagetestingmovement.Indeed,manyofthecallsforchangearereminiscentofdebatesthathadtakenplacedecadesbefore.SpeakingtestshadbeencommonintheUnitedStatessincethelate1800s,andthereisearlyevidenceofinnovativetasktypes,suchasaskingalearnertointerpretbetweentworaters(Lundeberg,1929).Nevertheless,itistruetosaythatthroughoutthe1920s,theheydayoflanguagetestdevelopment(Spolsky,1995,pp.33±51),andintothe1960s,thefocusoflanguagetestingwasondeliveringlargenumbersoftestseachyearwithinarapidlyexpandingeducationsysteminthe Eventhough``Commonsensemightbeembarrassedindefendingitspositionagainsttheawkwardfactwhichhasbeenadduced,thateveninthebestregulatedexaminationsoneexamineroccasionallydi€ersfromanothertotheextentof50percent.''(Edgeworth,1890,p.661)G.Fulcher/System28(2000)483±497 UnitedStates.Theanswertothisproblemwasthe`new-type'testmadeupofmul-tiplechoiceitems.Themultiplechoiceitemwasbornoutoftheneedtoproducetestsonanindustrialscale,andtheirusewasperpetuatedthroughthedevelopmentofnewautomaticmarkingmachinesthatweredesignedespeciallytoprocessmulti-plechoiceitems(Fulcher,2000).Factsnottakenintoaccountbythecommunicativecriticsoflarge-scaletests.Communicativelanguagetestersalsoseemtohavethoughtthattheirconcernsaboutmultiplechoiceitemswerenew,butthiswasfarfromthecase.Mercier(1933,p.382)wasjustoneofthefewearlywriterstoexpressconcernthattheitemformatelicited``passive''ratherthan``dynamic''knowledge,whichmightlimitthegen-eralisabilityofscorestolanguageuse.Further,languagetestersfromthe1920sdidrealisethatitwasimportanttohaveindividualisedtestsofspeakingusingquestionsandconversationalprompts(Wood,1927),butthatsuchtestswerenotpracticalwhendeliveringtestsonanindustrialscale(Hayden,1920).Apartfrompracticalitytherewasonefurtherreasonwhyspeakingtestswerenotdevelopedforlarge-scaleuseatthetime.Thiswasthedeepconcernaboutreliability,wherethe`subjective'judgementofanindividualscorerdictatedtestoutcome.Nevertheless,intheUnitedStatestheCollegeBoardstestinEnglishasaForeignLanguageusedthroughoutthe1920sand1930sincludeda15-minoralinterviewaspartofthetestbattery(Barnwell,1996,p.70).However,thelanguagesamplewasgradedforpronunciation,asthiswasconsideredmorereliablethananyothercri-teriathatmighthavebeenused.WiththeSecondWorldWar,however,theabilitytocommunicateinasecondlanguagebecametheexplicitgoalofteachingprogrammes.Theseteachingpro-grammesdependedmoreuponcontinualassessment,andnewtestswerenotimmediatelydevelopedtoratethepersonnelbeingtrainedinforeignlanguages.However,thewarexperiencewasundoubtedlythebeginningofthemoderntestingofspeakingand`communicative'testing(Fulcher,1998a).Kaulfers(1944,p.137),forexample,wrotethattests:shouldprovidespeci®c,recognizableevidenceoftheexaminee'sreadinesstoperforminalife-situation,wherelackofabilitytounderstandandspeakextemporaneouslymightbeaserioushandicaptosafetyandcomfort,ortothee€ectiveexecutionofmilitaryresponsibilities.IntheBritishtraditiontoo,therehadalwaysbeenspeakingtests.However,thesemayhaveputtest-takersatmoreofadisadvantage,becausetheywerebasedonreadingaloudandunderstandingextensiveliterarytextsaboutwhichtheyhadtoexpressopinions(Spolsky,1995,pp.205±205).Further,therewaslittleofthecon-cernfortestreliabilitythathademergedintheUnitedStates.Intothe1950sand1960swhilstBritishtestdeveloperswerearguingoverwhethermultiplechoiceitemsshouldbeallowedintotheirtests,andwhethertheliteraturerequirementshouldbebroughtmoreuptodate,testingagenciesintheUnitedStatesweredevelopingthe®rsttruecommunicativeperformanceteststhatweretobecomethemodelsforfuturetestdesignanddevelopmentaroundtheworld.G.Fulcher/System28(2000)483±497 TheForeignServiceInstitute(FSI)speakingtestof1956wasthe®rsttestofspeakingabilitythatrequiredconversationwithatrainedrater(Sollenberger,1978).Thegreaterpartofthetesthadbeendevelopedasearlyas1952,andin1958sub-scalesofaccent,comprehension,¯uency,grammarandvocabularywereadded(althoughnotusedforinitialratings).SosuccessfulwastheFSIoralpro®ciencyinterviewthatitsusespreadtotheCIA,theDefenseLanguageInstituteandthePeaceCorps.By1968astandardspeakingtestwasproducedforalltheseorganisa-tionsintheformoftheInteragencyLanguageRoundtablespeakingtest(Lowe,1983).Inthe1970sthisapproachspreadtoschoolsandcollegesintheUnitedStates(Liskin-Gasparro,1984),andmanyinnovativeprogrammes,includingratertrain-ing,weredevelopedanddisseminated(AdamsandFrith,1979).Theratingscales,astheyeventuallyappliedtoschoolsandcolleges,weredevelopedbytheAmericanCouncilontheTeachingofForeignLanguages,publishedin1982(draft)and1986(®nal).Theformatandnumberofbandswereadirectresultofresearchintoscalesensitivityconductedtwodecadesearlier(Carroll,1961).Inadditiontothesedevelopmentsagreatdealofresearchintothereliabilityandvalidityofthesetestswasundertaken.IntheBritishcontext,underthegripofthecommunicativeschool,suchresearchwasnotgenerallyconductedbecauseoffaithinfacevalidity.Indeed,aslateas1990inthe®rstreferencetoareliabilitystudyusingUniversityofCambridgeLocalExaminationSyndicatetests,BachmanandDavidson(1990,pp.34±35)reportthatreliabilitycoecientswerenotavailableforspeakingtestsbecausetheywereneverdoublescored.ThisechoedfearsexpressedearlierbyresearcherssuchasHamp-Lyons(1987,p.19).DespitethepublicationoftheALTECodeofPracticein1994,committingEuropeanexaminationboardstotheproductionofappropriatedata,suchbasicstatisticalinformationhasyettobepublished(Chalhoub-DevilleandTurner,2000).3.ReinstatingLadoTheprecedingbriefreviewshowsthatthecommunicativelanguagetestingmove-mentinthelate1970sandearly1980swasanunusualblipinthedevelopmentoflanguagetesting.AsaprimarilyBritishphenomenon,ithasleftitsmarkonthepolicyandpracticeofBritishEFLboardsinthefocusonfacevalidity,`authentic'lookingtasks,andadearthofresearchÐifnotanactualdisdainofresearchÐasdemonstratedbyAldersonandBuck(1993).Thissituationwaspredictablepartlybecausethecommunicativetestingmovementwasessentiallyarebellionparticularlyagainstreliability,asperceivedintheworkofLado(1961).Yet,thisperceptionwasonethatdidnot,anddoesnot,matchthecontentofLado'swork.Withhind-sight,onemustwonderifBritishcriticsofthelate1970shadactuallyreadLadoatall.Lado(p.239)wrotethat:``Theabilitytospeakaforeignlanguageiswithoutdoubtthemostprizedlanguageskill,andrightlyso''.Hewenton,however,toexplainthatthetestingofspeakingwastheleastdevelopedareaoflanguagetesting,andthatthis``isprobablydueinpartatleasttoalackofclearunderstandingofwhatconstitutesspeakingabilityororalproduction.''Theconstructof``speaking''G.Fulcher/System28(2000)483±497 wasunder-de®ned.Ladowasdeeplyconcernedaboutcorrectlyclassifyingstudentsintolanguageabilitylevels,andwasthusinterestedinreliability.Hearguedthatbecauseofthecomplexityoflanguageproductionandthenon-languagefactorsinvolved,reliabilitywasdiculttoobtain.Validityandreliabilitywereinextricablyboundtogether.Hewasalsokeenlyawareofotherproblems,asthisquotation(Lado,p.240)makesclear:Speakingabilityisdescribedastheabilitytoexpressoneselfinlifesituations,ortheabilitytoreportactsorsituationsinprecisewords,ortheabilitytoconverse,ortoexpressasequenceofideas¯uently.Thisapproachpro-ducesteststhatmustrangeoveravarietyofsituationstoachievevalidity,andthenthereisnoassurancethatthelanguageelementsofspeakinghavebeenadequatelysampled.ScoringisdonebymeansofaratingscaledescribingthetypeofresponsestobeexpectedandthescoretoassigntoLadothereforearguedthatitwasbettertotestspeakingthroughthe``languageelements''thatwereneededtocommunicate.Hewasawareoftheproblemofsam-plingsituationsinperformancetestsinsuchawaythattaskcouldbematchedtoabilities,sothatclaimsaboutabilitieswouldbevalidandgeneralisabletoothersituations(withinwhatwouldnowbecalledadomainbehaviourparadigm).How-ever,aspeakingtestwouldstillinvolvethelearnerspeaking,asitisnecessaryinLado'sviewtotesttheabilitytoproducelanguageataspeedthatiscomparabletothatofanativespeaker(Lado'sde®nitionof``¯uency'').ManyoftheactivitytypeslistedbyLado(1961,pp.244±245)arenotdissimilartothoseofUnderhill(1987)insectionslabelled``pictureseries'',``sustainedspeech'',and``conversation''.Thesus-tainedspeechisessentiallyaroleplayconductedbetweentheraterandthetest-taker.Thedi€erenceliesinLado'sawarenessofthemeasurementrequirements,thelinktocarefultaskandscaledesign,andconstructde®nition.Withsectionsentitled``TestingtheIntegratedSkills''and``BeyondLanguage:Howtotestcross-culturalunderstanding,''Davies(1978,p.133)wassurelycorrectwhenhewrote``thereismoretoLadothananalyticaltestsTheBritishcommunicativetestingmovementofthelate1970sandearly1980swasthereforelobbyingfortestsandtasktypesthatwerealreadybeingdevelopedoutsidetheUnitedKingdom(seealsoLowe,1976),andsimultaneouslyhinderingthepursuitofanysystematicresearchagendainlanguagetestingthatcouldaddressLado'sworries.Particularlyinthe®eldoftestingspeaking,littleresearchhasbeendonewithinBritishtestingorganisationstoseekanswerstomanyofthekeyques-tionsrelatingtoreliability,ratingscaledesignandconstruction,test-takerchar-acteristics,taskvariance,test-methodfacets,generalisability,andconstructde®nition.Nevertheless,thefactthatthejargonofBritishcommunicativelanguagetestingspreadrapidlyhasimpacteduponthedirectionoflanguagetestingresearch.Inwhatfollows,Iwilllookatthecriteriaofacommunicativetestsetoutin1979,andconsiderhowthesecriteriahavebeendevelopedinwaysthatprovidenewinsightsintolanguagetesting.G.Fulcher/System28(2000)483±497 4.Whatisa`communicative'test?Morrow(1979)claimedthattherewerespeci®ccriteriathatcouldbeusedtotellifatestiscommunicative.Inanearlycommentary,Alderson(1981,p.48)arguedthatcommunicativetestershadfailedtoshow:(1)howtraditionallanguagetests(unde-®ned)failtoincorporatethesecriteria;and(2)howthecriteriacouldbeincorpo-ratedintocommunicativelanguagetests.Nevertheless,thesecriteria(andtheassociatedbuzzwords)haveleftalegacyinlanguagetestingresearchbecauseofthewideacceptanceof`thecommunicativeapproach'.This,inturn,hasimpacteduponlanguagetesters,whohavedevelopednewwaysoflookingatlanguageassessmentasaresult.4.1.CommunicativetestsinvolveperformancePerformance:test-takersshouldactuallyhavetoproducelanguage.Interaction-based:therewillbeactual``face-to-faceoralinteractionwhichinvolvesnotonlythemodi®cationofexpressionandcontentbutalsoanamalgamofreceptiveandproductiveskills''(Morrow,1979,p.149).Unpredictability:Languageuseinreal-timeinteractionisunpredictable.Spolsky(1995)hasshownthatperformancetests,inwhichlearnerstakepartinextendeddiscourseofsomeform,havebeeninusefordecades,ifnotcenturies.Theassumptionunderlyingtheperformancetestsadvocatedanddevelopedinthe1980swasthattheobservationofbehaviourthatmirrored`real-worldcommunication'wouldleadtoscoresthatwouldindicatewhetherthelearnercouldperformintherealworld.This`realworld'involvesinteraction,unpredictability,andintegrationofskills.Thetestandthecriterion(real-worldcommunication)areseentobeessentiallythesame,whichledtothepre-eminentpositionofsamplingandcontentanalysisastheprimaryapproachtotestvalidationinEnglishforAcademicPur-poses(EAP)testing(Fulcher,1999).However,languagetestersarenotusuallyinterestedinmakingapredictionfromatesttaskonlytothesametaskinthe`realworld'Ðeventhoughthisisinitselfaninferenceinneedofvalidation.Fromasampleperformance,theinferenceweneedtomakeisusuallytoarangeofpotentialperformances,oftenaswideas``abletosurviveinanEnglishmediumUniversity''.Forthis,theperformanceismerelythevehiclefor``gettingto''underlyingabilities,whicharehypothesisedtoenablearangeofbehavioursrelevanttothecriterionsituation(s)towhichthetesterwishestopredict.Therequirementthatwevalidateinferencesdrawnfromscores,whichareabstractionsbasedonsampleperfor-mances,isnowrecognised.Researchinthisareaissettocontinuewellintothenewcentury(Messick,1994;McNamara,1996). Unpredictabilityisnotdiscussedinthispaper.The`openchoice'principleuponwhichtheunpre-dictabilityargumentrestshasbeenlaidtorest,anddiscourseanalystshavedemonstratedthatmuchlan-guageproductionisinfacthighlypredictable(Sinclair,1987).G.Fulcher/System28(2000)483±497 Alegacythatisonlyjustbeginningtobeinvestigatedisthe`amalgam'ofskills,orinitsmodernincarnation,theuseofintegrativetasksinlanguagetests.communicativelanguagetestswerecertainlydi€erentinthedegreetowhichinte-grationwasachieved,notinthesenseofthetermusedbyOller(1979),butthedeliberatethematiclinkingoftesttasks,andtheintroductionofdependenciessuchasspeakingbasedonalisteningpassage,where``adependencymustbeaccountedfor''(Mislevy,1995,p.363).Admittedly,the``CommunicativeUseofEnglishasaForeignLanguage''(CUEFL,RSA)wentparticularlyfarinthis,withno`pure'scoresforskillsoflanguage,butfortaskful®lment(Morrow,1977).Hargreaves(1987)reportedthatthemainproblemsassociatedwiththistest(andsimilartests)were``standardisation''(usuallyreferredtoasequatingforms),andscoring.Totheseproblems,Lewkowicz(1998)addsthefactthatfewertaskscanbeusedintestsiftheyarenottobecomeimpractical,andtasksaremoredicultandexpensivetoconstruct.Furthermore,reliabilityisassociatedwithtestlength,andtestswithintegratedtasksmaythereforeneedtobeanalysedinnewandinnovativeways.Theseproblemscurrentlyseeminsurmountable,butcontinuedresearchintothereliableuseofintegratedtasksinperformancetestswillbeoneofthemostimpor-tantpositivelegaciesofthecommunicativemovementforthenextfewdecades.4.2.Communicativetestsareauthentic:thetest-takermustbeabletorecognisecommunicativepurposeandbeabletorespondappropriately.Authenticity:inputandpromptsinthelanguagetestshouldnotbesimpli®edforthelearner.Context:languagewillvaryfromcontexttocontext;sometalkwillbeappro-priateinonecontextandnotanother.Thetest-takermustthereforebetestedonhis/herabilitytocopewiththecontextofsituation(physicalenvironment,statusofparticipants,thedegreeofformality,andexpressingdi€erentatti-tudes),aswellasthelinguisticcontext.LanguageforSpeci®cPurpose(LSP),andmorespeci®callyEAPbene®ttedimme-diatelyanddirectlyfromthecommunicativemovement.Carroll(1980,1981,1982)andCarrollandHall(1985)tooktheprinciplesofMunby(1978)andWilkins(1976)todevelopaframeworkforEAPtestdesign,meetingtherequirementofMorrow(1979)thattestcontentshouldbetailoredtolearningneeds,orpurposeofcommu-nication.However,LSPtestinghasbeenplaguedbytheseemingimpossibilityofde®ningwhatis`speci®c'toaparticularcommunicativesettingorpurpose.Fulcher(1999)summarisesthemain®ndingsofthe®eldtodate:1.languageknowledge(initsmostgeneralsense)accountsformostscorevar-ianceinEAPtests; Asineverything,integratedtestingascurrentlybeingdiscussedisnotnew.Carroll(1961)®rstintroducedtheterm,andPerren(1968)discussedthetechnicalproblemsofintegratedtestinganddealingwithitemandtaskdependencyinthe``modern''sense.G.Fulcher/System28(2000)483±497 2.grammarmodulesarebetterpredictorsoftesttotalscorethansubjectspeci®cmodules;and3.expert(content)judgescannotsaywhatmakesatextspeci®ctotheir®eld.ThelookandfeelofEAPtestsastheyhaveevolvedtodateislikelytoremainpri-marilybecausethey`lookgood'andencourageteachersandlearnerstofocusonwhatisconsideredtobeimportantintheclassroom(Alderson,1993,p.216;Clap-ham,1996,p.201)ratherthananyempiricalargumentfortheirusefulnessintesting(Clapham,2000).Perhapsmoreimportantistheroleofinitsbroadestsense.Ithasbeenrecognisedforsomeyearsthatcontextualvariablesinthetestingsituationimpactupondiscourseproduced,andsometimesalsoupontestscores.Thisawarenesshasledlanguagetesterstoconductresearchintothesevariables,usuallyundertheassumptionthatscorevariancewhichcanbeattributedtothemconstitutessys-tematicconstructirrelevantvariance.However,theviewhasbeenexpressedthatconstructsastraditionallyconceivedbylanguagetesterscannotexist,becausethereisnosuchthingascompetence,onlyvariableperformanceandvariablecapacity(Tarone,1998).Everychangeincontextresultsindi€erentperformance.AcceptingthisextremeviewfromSLAresearch,Skehan(1987,p.200)concludedthatlan-guagetestingwasamatterofsampling,hencereducingthegeneralisabilityoftestscorestotherelationshipbetweenspeci®ctesttaskanditsreal-worldcounterpart.Tarone(p.83)seesthisasincreasingvaliditywhilereducingreliabilityandgen-eralisability.Whiletheproblemsforlanguagetestingareclear(Fulcher,1995),itisequallytruethatoneofthemostimportanttasksofthecomingdecadewillbetoidentifythelevelofgeneralisabilitythatispermittedinmakinginferencesfromscoresacrossvariablefeaturesoftasks.ThemostusefulwayofconceptualisingthisdebatehasbeenprovidedbyChapelle(1998),inwhichshedistinguishesbetween(thenew)behaviourism,traittheory,andinteractionalism.Whilelanguagetest-ersaregenerallyreluctanttoabandontraittheory(andhencethetreatmentofcon-textualvarianceaserror),someaspectsofcontextmayneedtobede®nedasconstructratherthanerroriftheyarefoundtobepartofthemeaningoftestscores.Thequestionthatremainsishowfardownthisroad,towardstheextremevaria-tionistposition,languagetestingwillmoveintheory.However,inpractice,itislikelythatthepositionadoptedontheclineinthedevelopmentofanyspeci®ctestwillberelatedtotestpurposeandtheneedsofscoreusers,withtheendoftheclineintestingforextremelyspeci®cpurposes(Douglas,1998,p.152,2000a,b).Inearlycommunicativetexts,`authenticity'meanttheuseofmaterialinpromptsthathadnotbeenwrittenfornon-nativespeakersofEnglish,andatestcouldnotbe Davies(1978,p.151)wrote:``Whatremainsaconvincingargumentinfavouroflinguisticcompetencetests(bothdiscretepointandintegrative)isthatgrammarisatthecoreoflanguagelearning...isfarmorepowerfulintermsofgeneralisabilitythananyotherlanguagefeature.Thereforegrammarmaystillbethemostsalientfeaturetotest.''LanguagetesterslikeClapham(2000),aswellassecondlanguageacquisitionresearchers,arenowreturningtothestudyofgrammarforpreciselythereasonssuggestedbyDavies.Bernhardt(1999,p.4),forexample,arguesthat,``...secondlanguagereadingisprincipallydependentongrammaticalabilityinthesecondlanguage''.G.Fulcher/System28(2000)483±497 communicativeunlessitwasauthentic.Thiswastermeda``sterileargument''byAlderson(1981,p.48).Modernperformanceteststhatattempttomirrorsomecri-terionsituationintheexternalworldarenomorethanrole-playsorsimulations,inwhichthelearnerisaskedto`imagine'thattheyareactuallytakingapatient'sdetailsinahospital,givingstudentsamini-lecture,orengagingisabusinessnego-tiation.Languagetestsbytheirverynaturearenotmirrorsofreallife,butinstru-mentsconstructedonthebasisofatheoryofthenatureoflanguage,oflanguageuse,andoflanguagelearning.Widdowson(1983,p.30)drewadistinctionbetweenthesimplenotionof`authenticity'asconceivedinearlycommunicativewritings,and:thecommunicativeactivityofthelanguageuser,totheengagementofinter-pretativeproceduresformakingsense,eveniftheseproceduresareoperatingonandwithtextualdatawhicharenotauthenticinthe®rst[languageproducedbynativespeakersforanormalcommunicativepurpose]sense.Inotherwords,therelationshipbetweenthelearnerandthetask,howthelearnerdealswiththetask,andwhatwecanlearnaboutthelearnerasaresultofdoingthetask,iswhatmakesataskcommunicative.Inthemostrecentformulationofauthenticity,BachmanandPalmer(1996)dis-tinguishbetweenauthenticity(correspondencebetweentesttaskcharacteristicsandthecharacteristicsofatargetlanguageusetask)andinteractiveness(abilitiesengagedbythetaskthatarecomparabletothoseengagedbythetargetlanguageusesituation).Thedegreetowhichthetaskcharacteristicsmatchisthedegreeoftaskauthenticity,andrelatesdirectlytoconstructvaliditythroughtheabilitytogen-eralisefromtesttasktotargetlanguageusetask.Thisformulationcombinespreviousde®nitions,relatingthedegreeofauthenticityachievedtothe``perceptions''ofauthenticitybylearnersinspeci®csituations(Lewkowicz,2000,p.49).Inconjunctionwithamethodfordescribingtestandlearnercharacteristics,thereformulationmayaidresearchintoconstructvalidityintermsofhowresearchershypothesisetheconstructoperatesoverarangeofdi€erentcontextsinaninteractionalistframeworkthatexpectssigni®cantscorevariationacrosscontexts(Chapelle,1998,pp.41±42;Douglas,1998).However,Lewkowicz(1997,2000)questionswhetheritispossible(orpractical)tocarryoutcomparisonsoftesttasksandtargetlanguageusetasksusingthenewtaxonomyoftheBachmanandPalmermodel,andalsodemonstratesthattheper-ceptionofauthenticityvarieswidelyamongtesttakers.PerhapsChapelle(2000,p.161)isrightwhenshesuggeststhatauthenticityisa``folkconcept'',whichmayjustbeshorthandfor`allthosebitsofrealitythatwecan'tlistbutaresimpli®edinmodelsoftestmethod/taskfacets'.Itisherewheremodelsofcommunicativelanguageabilityandtestmethod(CanaleandSwain,1980,1981;Bachman,1990;BachmanandPalmer,1996)aremostusefulinlanguagetestingresearch,whetherthepurposeistoisolateerrorvarianceorattempttobuildtaskfacetsintoconstructde®nition.Suchmodelsmayprovideaframeworkforempiricalresearchthatwillhelpde®netheconstructswithG.Fulcher/System28(2000)483±497 whichlanguagetestdevelopersworkinthefuture.However,itshouldberemem-beredthatmodelsremainabstractionsofperceptionsofreality(Mislevy,1995,p.344),andassucharethemselvesconstructs.Inferencesaredrawnfromtestscorestomodelsofreality,andsothemodelsremaininneedofvalidationÐortheesti-mationoftheusefulnessofthemodelfortheresearchforwhichitisbeingused(Fulcher,1998b).4.3.Communicativetestsarescoredonreal-lifeoutcomesBehaviour-based:theonlyrealcriterionofsuccessinalanguagetestisthebehaviouraloutcome,orwhetherthelearnerwasabletoachievetheintendedcommunicativee€ect.Whenstudentsareaskedtoperformatask,itisessentialtostatewhatitisthattheratersaremeasuring.The``behaviouraloutcomes''ofMorrow(1979)arediculttoisolate,andevenmorediculttospecify(Fulcher,1999,pp.224±225).Anditisalsopossibleforlearnerstosuccessfullycompleteactsofcommunicationwithlittleornoknowledgeofalanguage.Inperformancetestsitisthroughtheratingscalethatwhatisbeingtestedisde®ned.Infact,moreresearchhasbeencarriedoutintoratingscalesthananyotheraspectoftestsofspeaking(Fulcher,1998a),anditisperhapsherewheretheearlyBritishcommunicativemovementhashadnolastinglegacyformoderntestingpractice.Untilrecently,thetemplateforratingscaledesignwastheoriginalFSI,andratingscalesdesignedforBritishtestshavenotfaredwellfromcritics(Fulcher,1987;Wood,1990).Mostratingscaledesignhasbeena-theoretical,relyingpurelyonarm-chairnotionsoflanguagedevelopmentandstructure.Onlyrecentlyhaveempiricalmeth-odsbeenemployedintheconstructionandvalidationofratingscales(Fulcher,1993,1996;North,1996;NorthandSchneider,1998;UpshurandTurner,1995).Researchinthisareaisnowgrowing(Turner,1998),andasratingscalesofmorecomponentsofmodelsofcommunicativelanguageabilityaredesigned,oper-ationalisedandusedinresearch,languagetestersandSLAresearcherswilllearnmoreaboutlanguageandhowitisusedacrossarangeoftasksandtargetlanguageusesituationsbyspeakerswithdi€erentcharacteristics.5.ConclusionMuchcurrentresearchinlanguagetestingmayhavedevelopeddespitetheBritishcommunicativetestingmovement.Theconcernswithlanguageusethathavegener-atedthequestionsthatarenowbeingaddressedalreadyexistedintheUnitedStates.However,theideologyassociatedwiththeearlyBritishcommunicativemovementhashadapervasivein¯uenceontheethosofteaching,learningandtestingaroundtheworld.Itwouldbediculttomarketanewlarge-scaletestthatdidnotclaimtobe`communicative'Ðwhateverthetermmaymeanfordi€erentusers.ThepositivelegacyofthemovementasawholecanbeseeninalltheresearchthatconcernsitselfG.Fulcher/System28(2000)483±497 withmorecarefulde®nitionsoftask,ofcontext,andtherelationshipbetweentesttaskandthetargetlanguageusesituation.Thisisespeciallythecaseintestinglan-guageforspeci®cpurposes.Yet,thisresearchisbeingdonewitharigourthatwasrejectedbytheBritishcommunicativetestingmovement,involvingtheuseofnewstatisticaltoolsandthedevelopmentofsophisticatedconceptualparadigms.WemayevenseethedevelopmentofhighqualitytestsconformingtothestandardsexpectedintheUnitedStates(APA,1999)withtheattractiveandcreativecontentofBritishtests.Whateverspeci®cbene®tstheremaybe,theuni®edconceptofvalid-ity,agenerallyacceptedtheoreticalframeworkforresearchineducationalassess-ment,andgrowinginterestintheethicsoflanguagetestingshouldleadtoadecadeofcooperativeresearchthatwillbringsigni®cantadvancesinlanguagetestingtheoryandpractice.AcknowledgementsMythanksareduetoBobHill,whoseobservationsonthevaguenessofpopularterminologyinlanguagetestingledmetoconsiderthelegacyof`communicativetesting'atthestartofanewdecade.AndtoFredDavidsonwhoprovidedcon-structivecriticismonthe®rstdraftofthispaper.Responsibilityforanyerrors,andtheviewsexpressed,remainsmine.ReferencesAdams,M.L.,Frith,J.R.,1979.TestingKit:FrenchandSpanish.DepartmentofState,ForeignServiceInstitute,WashingtonD.C.Alderson,J.C.,1981.ReactiontotheMorrowPaper(3).In:Alderson,J.C.,Hughes,A.(Eds.),IssuesinLanguageTesting.TheBritishCouncil,London,pp.45±54.Alderson,J.C.,1993.TherelationshipbetweengrammarandreadinginanEnglishforacademicpurposestestbattery.In:Douglas,D.,Chapelle,C.(Eds.),ANewDecadeinLanguageTestingResearch.TESOLPublications,WashingtonDC,pp.203±219.Alderson,J.C.,Buck,G.,1993.Standardsintesting:astudyofthepracticeofUKexaminationboardsinEFL/ESLtesting.LanguageTesting10(1),1±26.ALTE,1994.TheALTECodeofPractice.CambridgeLocalExaminationsSyndicate,Cambridge.APA(AmericanEducationalResearchAssociation,AmericanPsychologicalAssociation,NationalCouncilonMeasurementinEducation),1999.StandardsforEducationalandPsychologicalTesting.AERA,WashingtonDC.Bachman,L.F.,1976.FundamentalConsiderationsinLanguageTesting.OxfordUniversityPress,Bachman,L.F.,Davidson,F.,1990.TheCambridge-TOEFLcomparabilitystudy:anexampleofthecross-nationalcomparisonoflanguagetests.In:deJong,H.A.L.(Ed.),StandardizationinLanguageTesting.AILAReview,Amsterdam,pp.24±45. TheTOEFL2000projectmayleadtothe®rstlanguagetestthatachievestheintegrationofhightechnicalquality,innovativetasktypes,andconstructframeworksthatguidetestdesign(Jamiesonetal.,2000).SeeMessick(1981,1984,1989a,b,1994).G.Fulcher/System28(2000)483±497 Bachman,L.F.,Palmer,A.S.,1996.LanguageTestinginPractice.OxfordUniversityPress,Oxford.Barnwell,D.P.,1996.AHistoryofForeignLanguageTestingintheUnitedStates:FromitsBeginningstothePresentDay.BilingualPress,Arizona.Bernhardt,E.,1999.Ifreadingisreader-based,cantherebeacomputeradaptivetestofreading?In:Chalhoub-Deville,M.(Ed.),IssuesinComputer-adaptiveTestingofReadingPro®ciency.CambridgeUniversityPress,Cambridge,pp.1±10.Canale,M.,Swain,M.,1980.Theoreticalbasesofcommunicativeapproachestosecondlanguageteach-ingandtesting.AppliedLinguistics1(1),1±47.Canale,M.,Swain,M.,1981.Atheoreticalframeworkforcommunicativecompetence.In:Palmer,A.S.,Groot,P.J.M.,Trosper,G.A.(Eds.),TheConstructValidationofTestsofCommunicativeCompe-tence.TESOLPublications,WashingtonDC,pp.31±35.Carroll,J.B.,1961.FundamentalconsiderationsintestingforEnglishlanguagepro®ciencyofforeignstudents.In:Allen,H.B.(Ed.),TeachingEnglishasaSecondLanguage.McGrawHill,NewYork,pp.364±372.Carroll,J.B.,1967.Theforeignlanguageattainmentsoflanguagemajorsinthesenioryear:asurveyconductedinU.S.collegesanduniversities.ForeignLanguageAnnals1(2),131±151.Carroll,B.J.,1980.TestingCommunicativePerformance:AnInterimStudy.Pergamon,Exeter.Carroll,B.J.,1981.Speci®cationsforanEnglishLanguageTestingService.In:Alderson,J.C.,Hughes,A.(Eds.),IssuesinLanguageTesting.TheBritishCouncil,London,pp.68±110.Carroll,B.J.,1982.Languagetesting:isthereanotherway?In:Heaton,J.B.(Ed.),LanguageTesting.ModernEnglishPublications,London,pp.1±10.Carroll,B.J.,Hall,P.J.,1985.MakeYourOwnLanguageTests:APracticalGuidetoWritingLanguagePerformanceTests.Pergamon,Oxford.Chapelle,C.,1998.Constructde®nitionandvalidityinquiryinSLAresearch.In:Bachman,L.F.,Cohen,A.D.(Eds.),InterfacesBetweenSecondLanguageAcquisitionandLanguageTestingResearch.Cam-bridgeUniversityPress,Cambridge,pp.32±70.Chapelle,C.,2000.Fromreadingtheorytotestingpractice.In:Chalhoub-Deville,M.(Ed.),IssuesinComputer-adaptiveTestingofReadingPro®ciency.StudiesinLanguageTesting,Vol.10.CambridgeUniversityPress,Cambridge,pp.150±166.Clapham,C.,1996.TheDevelopmentofIELTS.AStudyoftheE€ectofBackgroundKnowledgeonReadingComprehension.CambridgeUniversityPress,Cambridge.Davies,A.,1978.Languagetesting.In:LanguageTeachingandLinguisticsAbstracts.Vol.11,3&4,reprintedinKinsella,V.(Ed.)(1982)Surveys1:EightState-of-the-artArticlesonKeyAreasinLan-guageTeaching.CambridgeUniversityPress,Cambridge,pp.127±159.Douglas,D.,1998.Testingmethodsincontext-basedSLresearch.In:Bachman,L.F.,Cohen,A.D.(Eds.),InterfacesBetweenSecondLanguageAcquisitionandLanguageTestingResearch.CambridgeUniversityPress,Cambridge,pp.141±155.Douglas,D.,2000a.AssessingLanguagesforSpeci®cPurposes.CambridgeUniversityPress,Cambridge.Douglas,D.,2000b.Testingforspeci®cpurposes.In:Fulcher,G.,Thrasher,R.VideoFAQs:IntroducingTopicsinLanguageTesting.Availableat:http://www.surrey.ac.uk/ELI/ilta/faqs/main.htmlEdgeworth,F.Y.,1890.Theelementofchanceincompetitiveexaminations.JournaloftheRoyalStatis-ticalSociety49(1),644±663.Fulcher,G.,1987.Testsoforalperformance:theneedfordata-basedcriteria.EnglishLanguageTeachingJournal14(4),287±291.Fulcher,G.,1993.TheconstructionandvalidationofratingscalesfororaltestsinEnglishasaForeignLanguage.UnpublishedPhDthesis,UniversityofLancaster,UK.Fulcher,G.,1995.Variablecompetenceinsecondlanguageacquisition:aproblemforresearchmetho-dology.System23(1),25±33.Fulcher,G.,1996.Doesthickdescriptionleadtosmarttests?Adata-basedapproachtoratingscaleconstruction.LanguageTesting13(2),208±238.Fulcher,G.,1998a.Thetestingofspeakinginasecondlanguage.In:Clapham,C.,Corson,D.(Eds.),LanguageTestingandAssessment.EncyclopediaofLanguageandEducation.Vol.7.KluwerAca-demicPublishers,Dordrecht,pp.75±86.G.Fulcher/System28(2000)483±497 Fulcher,G.,1998b.Widdowson'smodelofcommunicativecompetenceandthetestingofreading:anexploratorystudy.System26,281±302.Fulcher,G.,1999.AssessmentinEnglishforAcademicPurposes:puttingcontentvalidityinitsplace.AppliedLinguistics20(2),221±236.Fulcher,G.,2000.Computersinlanguagetesting.In:Brett,P.,Motteram,G.(Eds.),ComputersinLan-guageTeaching.IATEFLPublications,Manchester,pp.97±111.Hamp-Lyons,L.,1987.CambridgeFirstCerti®cateinEnglish.In:Alderson,J.C.,Krahnke,K.J.,Stans-®eld,C.W.(Eds.),ReviewsofEnglishLanguagePro®ciencyTests.TESOLPublications,WashingtonDC,pp.18±19.Hargreaves,P.,1987.RoyalSocietyofArts:examinationsinthecommunicativeuseofEnglishasaForeignLanguage.In:Alderson,J.C.,Krahnke,K.J.,Stans®eld,C.W.(Eds.),ReviewsofEnglishLanguagePro®ciencyTests.TESOLPublications,WashingtonDC,pp.32±34.Harrison,A.,1983.Communicativetesting:jamtomorrow?In:Hughes,A.,Porter,D.(Eds.),CurrentDevelopmentsinLanguageTesting.AcademicPress,London,pp.77±85.Hayden,P.M.,1920.Experiencewithoralexaminationsinmodernlanguages.ModernLanguageJournal5,87±92.Jamieson,J.,Jones,S.,Kirsch,I.,Mosthenthal,P.,Taylor,C.,2000.TOEFL2000framework:aworkingpaper.ETS:TOEFLMonographSeries16,PrincetonNJ.Kaulfers,W.,1944.War-timedevelopmentsinmodernlanguageachievementtests.ModernLanguageJournal28,136±150.Lado,R.,1961.LanguageTesting:TheConstructionandUseofForeignLanguageTests.Longman,Lewkowicz,J.A.,1997.Authenticityforwhom?Doesauthenticityreallymatter?In:Huhta,A.,Kohonen,V.,Lurki-Suonio,L.,Luoma,S.(Eds.),CurrentDevelopmentsandAlternativesinLanguageAssess-ment.JyvaskylaUniversity,Finland,pp.165±184.Lewkowicz,J.A.,1998.Integratedtesting.In:Clapham,C.,Corson,D.(Eds.),LanguageTestingandAssessment.EncyclopediaofLanguageandEducation,Dordrecht:KluwerAcademicPublishers,Vol.7,pp.121±130.Lewkowicz,J.A.,2000.Authenticityinlanguagetesting:someoutstandingquestions.LanguageTesting17(1),43±64.Liskin-Gasparro,J.E.,1984.TheACTFLpro®ciencyguidelines:gatewaytotestingandcurriculum.ForeignLanguageAnnals17(5),475±489.Lowe,P.,1976.HandbookonQuestionTypesandtheUseintheLSOralPro®ciencyTests.CIALan-guageSchool,WashingtonDC.Lowe,P.,1983.TheILRoralinterview:origins,applications,pitfalls,andimplications.DieUnterrich-spraxis16,230±244.Lundeberg,O.K.,1929.Recentdevelopmentsinaudition-speechtests.ModernLanguageJournal14,193±202.McNamara,T.,1996.MeasuringSecondLanguagePerformance.Longman,London.Mercier,L.,1933.Divergingtrendsinmodernforeignlanguageteachingandtheirpossiblereconciliation.FrenchReview6(3),370±386.Messick,S.,1981.Evidenceandethicsintheevaluationoftests.EducationalResearcher10(9),9±20.Messick,S.,1984.Assessmentincontext:appraisingstudentperformanceinrelationtoinstructionalquality.EducationalResearcher13(3),3±8.Messick,S.,1989a.Validity.In:Linn,R.L.(Ed.),EducationalMeasurement.Macmillan,NewYork,pp.Messick,S.,1989b.Meaningandvaluesintestvalidation:thescienceandethicsofassessment.Educa-tionalResearcher18(2),5±11.Messick,S.,1994.Theinterplayofevidenceandconsequencesinthevalidationofperformanceassess-ments.EducationalResearcher23(2),13±23.Mislevy,R.J.,1995.Testtheoryandlanguagelearningassessment.LanguageTesting12(3),341±369.Morrow,K.,1977.TechniquesofEvaluationforaNationalSyllabus.RoyalSocietyofArts,G.Fulcher/System28(2000)483±497 Morrow,K.,1979.Communicativelanguagetesting:revolutionofevolution?In:Brum®t,C.K.,Johnson,K.(Eds.),TheCommunicativeApproachtoLanguageTeaching.OxfordUniversityPress,Oxford,pp.143±159.Morrow,K.,1982.Testingspokenlanguage.In:Heaton,J.B.(Ed.),LanguageTesting.ModernEnglishPublications,London,pp.56±58.Morrow,K.,1991.Evaluatingcommunicativetests.In:Anivan,S.(Ed.),CurrentDevelopmentsinLan-guageTesting.RELC,Singapore,pp.111±118.Munby,J.L.,1978.CommunicativeSyllabusDesign.CambridgeUniversityPress,Cambridge.North,B.,1996.Thedevelopmentofacommonframeworkscaleofdescriptorsoflanguagepro®ciencybasedonatheoryofmeasurement.UnpublishedPhDthesis,ThamesValleyUniversity.North,B.,Schneider,G.,1998.Scalingdescriptorsforlanguagepro®ciencyscales.LanguageTesting15(2),217±262.Oller,J.,1979.LanguagetestsatSchool.London,Longman.Perren,G.E.,1968.Testingspokenlanguage:someunsolvedproblems.In:Davies,A.(Ed.),LanguageTestingSymposium:APsycholinguisticApproach.OxfordUniversityPress,Oxford,pp.107±116.Sinclair,J.McH,1987.Collocation:aprogressreport.In:Steele,R.,Treadgold,T.(Eds.),EssaysinHonourofMichaelHalliday.JohnBenjamins,Amsterdam,pp.319±331.Skehan,P.,1987.Variabilityandlanguagetesting.In:Ellis,R.(Ed.),SecondLanguageAcquisitioninContext.PrenticeHall,HemelHempstead,pp.195±206.Sollenberger,H.E.,1978.DevelopmentandcurrentuseoftheFSIoralinterviewtest.In:Clark,J.L.D.(Ed.),DirectTestingofSpeakingPro®ciency:TheoryandApplication.EducationalTestingService,Princeton,NJ,pp.1±12.Spolsky,B.,1976.LanguageTesting:ArtofScience?Paperreadatthe4thInternationalCongressofAppliedLinguistics.Stuttgart,Germany.Spolsky,B.,1995.MeasuredWords.OxfordUniversityPress,Oxford.Tarone,E.,1998.Researchoninterlanguagevariation:implicationsforlanguagetesting.In:Bachman,L.F.,Cohen,A.D.(Eds.),InterfacesBetweenSecondLanguageAcquisitionandLanguageTestingResearch.UniversityofCambridgePress,Cambridge,pp.71±89.Turner,J.,1998.AssessingSpeaking.AnnualReviewofAppliedLinguistics.CambridgeUniversityPress,Cambridge,192-207.Underhill,N.,1982.Thegreatreliabilityvaliditytrade-o€:problemsinassessingtheproductiveskills.In:Heaton,J.B.(Ed.),LanguageTesting.ModernEnglishPublications,London,pp.17±23.Underhill,N.,1987.TestingSpokenLanguage.CambridgeUniversityPress,Cambridge.Upshur,J.A.,Turner,C.E.,1995.Constructingratingscalesforsecondlanguagetests.EnglishLanguageTeachingJournal49,3±12.Widdowson,H.G.,1983.LearningPurposeandLanguageUse.OxfordUniversityPress,Oxford.Wilkins,D.A.,1976.NotionalSyllabuses.OxfordUniversityPress,Oxford.Wood,B.D.,1927.NewYorkExperimentswithNew-TypeModernLanguageTests.Macmillan,NewWood,R.,1990.AssessmentandTesting:ASurveyofResearch.CambridgeUniversityPress,Cambridge.G.Fulcher/System28(2000)483±497