/
OE FORM 6000 2o14141ITALNI OF liVALTII CrWCATANC WELFAIIEOFF1C1 01 OE FORM 6000 2o14141ITALNI OF liVALTII CrWCATANC WELFAIIEOFF1C1 01

OE FORM 6000 2o14141ITALNI OF liVALTII CrWCATANC WELFAIIEOFF1C1 01 - PDF document

gabriella
gabriella . @gabriella
Follow
343 views
Uploaded On 2021-09-24

OE FORM 6000 2o14141ITALNI OF liVALTII CrWCATANC WELFAIIEOFF1C1 01 - PPT Presentation

01OOO00ASSESSING STUDENT PERFORMANCE INCOLLEGEJames M Richards JrReport 2ERIC Clearinghouseon Higher EducationThe George Washington University1 Dupont Circle Suite 630Washington DC 20036May 1970FOREWO ID: 884554

tests college student students college tests students student referenced criterion accomplishment 1970 research examinations items scores clep scales test

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "OE FORM 6000 2o14141ITALNI OF liVALTII C..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 OE FORM 6000, 2/o/1414/%1IT(ALNI OF liVA
OE FORM 6000, 2/o/1414/%1IT(ALNI OF liVALTII, CrWCATANC WELFAIIE:OFF'1C1: 01 r011r,AHoN----mrito Auc. N.ED 040307fr.FZEL; Niz.ridll;IS DOCUMF.NTE.RICLEVELI:,LJNIU:7",Kt.1,-COPYRIGHTED?YES 0 NO 0REPRODUCTION RELEASE?YESNO 0CH ACO, NO.AA 000 581P.A.PUDL.DATE70ISSIi"CRIENOVIOOF AVAILABILITYI MI110 1110Aumor,Rio')-;1:!, JT:01!.,Jr.-- ----...................0..m.mMTITLKnE, Studevt: for.`,'?1.1co inCollco.''v.......=mniommmeglimwarW..N.I.011. .MINE!WOW...m.polwSOURCE CODE.'11751WO.ONOM.00wwenimm+...M.NOW.Y.^.,......MPM.ONPoo.*N.mmNe,=..INSTITUTION(SOURCE)u,4IC C1earin00:1se on :fir,hsrIducation, Washin2:ton, D.C.,..06=WWW...SP. O,G, CODE;:")6:;000wilmVa...MralmOmfts..,.....SPONSORING AGENCY0.M.C.2 OE1,;.:1;icrttiort(D11.,W), Vrashini,tort, D.C._...............,EDRS PRICE0.25;n0rACONTRACT NO.GRANT NO..REPORT NO.R-2....,BUREAU NO.MINN.IIMMMAVAILABILITY.JOURNAL. CITATION=1DEf-GRIPTIVE NOTE1F" j1..DESCRIPTORS*111.gbrr Educat .on; *ResOarch; *MeasurementTechniques; *Evaluation 1,Ie t hod s ;'Tests;Achievement ; PerformanceA 4.1mormis.MIINENIMNTIFIERS*CritGriOn MerenCedTests..,rwm........111.4em..../qm......ABSTRAC1Thiq report discusses mijor areasoff research dealing with theevaluation ofcolicve stMont performance.Zimle typos of mezi,-,uremant whichhave beensystemtically examLned areexaminatioas for which academiccredit is awarded,citerion-referenced tests, and asses meetof extracurricular, achievement.ThereIlort is divided into an"Overvimi," which presents the win conclusionsandtmplications for practices in assessmnt,and a "Technical Review," whichcontainsa moredetailed summary of the research.References follow the text. OS)oroensmnoom..N0*100.1.1.almmoolotherwm.NimoMiglowi....**NrG1'0 870400 0(1)OOO00ASSESSING STUDENT PERFORMANCE INCOLLEGEJames M. Richards, Jr.Report 2ERIC Clearinghouseon Higher EducationThe George Washington University1 Dupont Circle, Suite 630Washington, D.C. 20036May 1970 FOREWORDThe ERIC Clearinghouse on Higher Education, one of a network of clearinghousesestablished by the U.S. Office of Education, is concerned with undergraduate, graduate,and professional education.As well as abstracting and indexing significant, currentdocuments in its field, the Clearinghouse prepares its own and commissions outsideworks on various aspects of higher education.Because of widespread interest in developing new methods of evaluating theper-formance of college students, we asked James M. Richards, Jr., to discuss the majorareas of research in this area.Dr. Richards, a Principal Research Scientist in the AmericanInstitutes for Research, is currently engaged in psychological and educational researchon Project TALENT and has taught and/or conducted research at the University of Utah,the Educational Testing Service, the American College Testing Program, and the Universityof California, Los Angeles. He has published widely, with recent emphasison such topicsas the description of college environments, student growth and development, studentachievement, and the conservation of talent.Carl J. Lange, DirectorEric Clearinghouseon Higher EducationMay 1970This publication was prepared pursuant to a contract with the Officeof Education, U.S. Department of Health,Education and Welfare.Contractors undertaking such projects under Government sponsorshipare encouraged toexpress freely their judgment in professional and technical matters.Points of view or opinions do not therefore,necessarily represent official Office of Education positionor policy. Perhaps no aspect of college has more potential significancefor college graduates than do their grades.Numerous psycho-logical studies have shown that students forget much of thecontent of the course, and almost everything the professor saidin his lectures, within a short time after completing the finalexam.Yet, these same students are often asked about theircollege grade point average twenty years or more after theirgraduation.Moreover the grades they received may be theonly information about the accomplishments of these studentsthat their college keeps in its permanent records.Grades aretreated by students, by colleges, and by society as the mostsignificant assessment of student accomplishment and potential.In view of the importance of the assessment of students fortheir lives, one might expect that any improvement in suchassessment would be welcome, and that enterprising, respon-sible professors and researchers would be continually makinginnovations in assessing student accomplishment and trying todetermine whether they were, in fact, real improvements. Suchexpectations are not borne out by the literature.This review will attempt to summarize research on the assess-ment of student accomplishment at a particular point in time,the beginning of the 1970s. Any good review has a point ofview.It is important to spell out the point of view of thisreview, for some aspects were difficult for me to write. Myresearch over the past five years has concerned assessment ofstudentpotential and accomplishment, and1have alreadypublicly taken the position that grades and typical multiplechoice tests involve only academic achievement (in the pejorativesense of academic) and have littl

2 e or no relationship to accom-plishment
e or no relationship to accom-plishment in other important areas of human endeavor.These considerations necessitated making two importantdecisions: whether to include my own work (Iam hardly anobjective judge of it) and whether to try to assumea disin-terested point of view in spite of my commitment toa particularposition.I decided to include my own work when it appearedrelevant, and to strive to be objective but not disinterested.Inother words, this review definitely assumes that all is not wellwith current methods of assessing student accomplishment.The review is divided into an Overview, which attempts topresent the main conclusions and implications for practices inassessment, and a more detailed Technical Review of theliterature.OVERVIEWThe overall impression gained from perusing the researchliterature on the assessment of college student accomplishmentis that such research is very sparse.The majority of researchcontinues to concern the prediction of college grades fromhigh school grades and admissiontests.Other than thesestudies, the College Entrance Examination Board (1967a) hasintroduced examinations with which toearn college credit, andthe recent introduction of the notion of "criterion-referenced"tests has potentially revolutionary implications for collegeexam-inations and grades (Ebel, 1962; Guttman and Schlesinger,1967a; Osburn, 1968).Dr. John Holland (1966) and hisassociates (including the author) have conducted programmaticresearch on college student accomplishment outsideas well asinsidethe classroom.These three areas, however, pretty1much exhaust the systematic research leading to a cumulativebody of knowledge about the assessment of student accom-plishment.There are, of course, numerous scattered studies conductedpretty much in isolation.Some of these studies involve goodideas that arc not pursued by the investigator beyond his onestudy.Such studies will be considered in this review onlywhen they relate to other studies of assessment of accomplish-ment in some rather clear way. As always, there is no dearth ofopinions about assessing student accomplishment expressed bycollege professors in their various professional journals.Sincethis is a review of research, such expressions of opinion withoutsupporting evidence beyond anecdotes are ignored.It is particularly disappointing that the public record doesnot yet contain any systematic research evidence about twoimportant innovations in higher education and related innova-tions in the assessment of students.These innovations are thewidespread adoption of pass-fail grading and the developmentof new curricula for special cultural groups, notably Afro-Americans and Mexican-Americans.The effect of pass-failgrading is such an obvious and easy area for investigation that,surely, the current lack of research information will not longendure.The future course of research on special culturalcurricula is not so certain; there is widespread resistance toinvestigation by outsiders who very likely are unsympathetic,and insiders appear to regard traditional research and assessmentas premature, at best, and destructive of the purposes of theirprograms, at worst.Their major criterion for success seems tobe getting and keeping students in college.There is unques-tionably much merit inthis position.Nevertheless, highereducation is the common enterprise of many kinds of men, andthe rest of us can hope that those responsible for assessingstudent accomplishment in these curricula' will soon be free totell us what does and does not work. What does not work fortheir students probably does not really work for any students.Credit by examinationThe first of the three areas in which there has been syste-matic investigation involves development of examinations forwhich academic credit is awarded. These testsgrew out of therecognition that people can and do learn college level materialin a variety of ways, and that there should besome way for aperson to obtain college credit for material he has alreadymastered other than having to repeat it ina college course.Therefore, the purpose of these tests is to provide recognitionfor learning obtained from reading, independent study,corres-pondence courses, private instruction, lectures, TVcourses,on -the job training, etc.Accordingly, the College EntranceExamination Board (1967a, 1967c, 1968), in cooperationwithEducational Testing Service, developed Comprehensive CollegeTests Program and its successor the College Level ExaminationProgram or CLEP.The test battery consists of five generalexaminationsin English composition, humanities, mathematics,natural science, and social scienceandan increasing number ofspecific subject examinations for "widely taught undergraduatecourses."The publishers provide standard kinds ofevidenceabout reliability, norms, etc., and this evidence confirmsthat thetests are soundly constructed examples of thebest in conven- tional multiple choice testing.Moreover, it appears that agrowing number of colleges will give credit by these examina-tions.Both U.S.A.F.1. and the Commission on Accreditationof Service Experience have endorsed CLEP, and, as a conse-quence, of 40,000 servicemen testedwith

3 these examina-tions in 1966, a substanti
these examina-tions in 1966, a substantial number received credit.There is no question that the goal of these tests, providingflexibility through credit by examination, is important. Never-theless, it seems possible that multiple choice tests similar tocollege admissions tests are not the proper tool.Beanblossom(1967a, 1967b) has shown that the correlations among thefive general examinations are exceptionally high. When specificsubjects were considered, Beanblossom found that repeatedexposure to courses definitely increased genera/ examinationscores for the natural sciences, moderately increased scoresforthe humanities, and minimally increased scores for the socialstudies.Beanblossom also compared scores on CLEP testsgiven to students who had completed two years of collegewith their scores on a college admission test administered priorto entrance.His overall conclusion was that the CLEP tests donot measure anything different from what is measured by thetraditional battery of pre-college aptitude examinations, and thatthe CLEP general examinations should be used with caution inevaluating liberal arts curricula.Although little research hasbeen done, on rational grounds one would expect the criticismsto be less applicable to the subject examinations.Overall,therefore, the most obvious conclusion might be that the CLEPsubject examinations should be used to grant credit by examina-tion, at least until more suitable measures are available.Criterion- referenced testsThe second area of research on the assessment of studentaccomplishment provides some leads as to what these moresuitable measures might beThis area of research involves whatiscalled "criterion-referenced" tests, as opposed to "norm-referenced" tests.The basic notion of criterion-referencedtests is that the purpose of a test is to determine whether or nota student has mastered a particular skill or subject matterrather than how he stands relative to other students.Thus,the basic criterion for selecting test items is that the responseto the item discriminates students who have mastered thematerial from those who have not mastered it, rather thandiscriminates students with the highest scores on the overalltest from those with the lowest scores.These notions are so simple and straightforward thatitwould be easy to underestimate their importance.Takenseriously, however, they have profound significance for bothmeasurement theory and the practice of assessment.Becauseall of the implications of their use have not been examined,criterion-referenced tests represent a theoretical possibility ratherthan an immediately usable procedure. The principles of suchtests arefairly well understood, but much must be learnedbefore they can be routinely constructed.There are a numberof promising first steps in the development of practical criterion-referenced tests (Guttman and Schlesinger, 1966, 1967a; Osburnand Shoemaker, 1968); and similar notions are used in moni-toring performance on some programed instruction materials(Wendt, Rust, and Alexander, 1965).Moreover, investigation2of criterion-referenced tests is a very active area of research.Although few, if any, are available now, such tests for manycollege courses could be available reasonably soon.In experimenting with criterion-referenced tests, more successhas been obtained it mathematics and science than in thehumanities or thesocialsciences, etc.No doubt, because ofdifferences in the subject matter and nature of learning in theseareas, one can more easily pose questions havingonly oneright answer in science and mathematics.Itis possible thatreally good criterion-referenced tests can never be written inthe humanities and social sciences. The technique is so promis-ing, however, that we should not concede this until we havetries: to develop such tests.The significance of criterion-referenced tests for measurementtheory is, primarily, that they repudiate traditional notions ofreliability and of a student's "true" score. If we had a "perfect"criterion-referenced test and a "perfect" course, we would findthat no students got any of the questions right before takingthe course and all students got every question right after takingthe course.In such a case, the internal consistency reliabilityof thetestgiven either before or after the course wouldbe zero.Similarly, the reliability coefficient obtained by cor-relating before and after scores for the same students wouldbe zero.Yet, the test would discriminate perfectly betweenstudents who have and have not mastered the material andtherefore would be an ideal measure for the rigorous awardingof pass-fail grades.Similarly, the specification of a "perfect" criterion-referencedtestfor a particular course would demand development ofrules for writing all possible appropriate items about the contentof that course. In order to do this, the objectives of the coursewould have to be detailed.For multiple choice tests, rules forwriting "distractor" alternatives as well as the correct alternative(Guttman and Schlesinger, 1966) could then be developed andthe rules for writing distractors would lead, in turn, to particularkinds of wrong answers determined by the subject m

4 atter.Thus, the sorts of errors the indi
atter.Thus, the sorts of errors the individual student made wouldprovide diagnostic information.Once a comprehensive set ofrules was developed for writing items about a particular subjectmatter, they would define a pool of items.Parallel test formswould then be defined in terms of samples of items drawn bythe same procedures from this pool.A person's true score,then, would be measured by the proportion of items in thepool he could answer correctly. In estimating the probability ofhis making a correct response, quite elaborate decizion functionsbased on Bayesianstatistics could be used (Wood, 1970;Ferguson, 1970).The basic significance of criterion -referenced tests for assess-ment is that, in theory, we now have a technique for developingan end-of-course examination that will provide informationabout the specific content mastered by each student withoutreference to the performance of other pupils.In other words,because it would no longer be possible for a student taking thesame examination to receive grades ranging from A to Fdepending on how bright the other students in his class were,competition for grades would be eliminated. This advantage ofcriterion-referencedtestsis not minor, for current gradingpractices almost universally treat courses as "races" in which the winners snatch the As, the runner-ups win theBs, and thealso-rans receive Cs or worse (Palmer, 1962). To treat coursesas competitive races seems quitedestructive of the values andgoals of higher education.In spite of these advantages, many people, and perhapsespecially those inthe humanities, may react negatively tocriterion-referenced tests, believing that the use of specific rulesin writing examinations is mechanistic and anti-human.Sucha negative reaction is likely to be exacerbatedif it is realizedthat at least some criterion-referenced tests can be written bya computer (Osburn and Shoemaker,1968).It would beunfortunate if these and other misgivings (Ebel, 1970) shouldlead to a rejection of criterion-referenced tests without a fullconsideration of the issues, for such tests do promise to be amajor improvement over current ways of assessing studentaccomplishment in the classroom.Of course, instructors arenot really required to be mechanistic to write such tests.Rather, they are required to be explicit about the purposes oftheir coursesa requirement that should be damaging tofew courses.Moreover, if an instructor thought he could notspecify any skills or knowledge that students should have as aconsequence of taking his course, it is difficult to see how hecould justify assigning grades on any basis.Extracurricular achievementThe final area of research to be discussed is the assessmentof accomplishment outside the classroom.Although thereare many studies, the major sustained program of research onassessment of nonacademic accomplishment has been that con-ducted by Dr. John Holland (1966) and his associates, first atthe National Merit Scholarship Corporation, later at the Ameri-can College Testing Program, and now at Johns HopkinsUniversity.These investigations grew out of an initial interestin the whole area of originality, creativity, or creative per-formance.As a first step, creative performance was defined as"a performance which is awarded public recognition throughawards, prizes or publication, and which may therefore beassumed to have exceptional cultural value."Using this defi-nition as a guide, a self-report checklist of achievements at thehigh school level was developed by reviewing the secondaryschool achievements of National Merit Finalists.The checklistwas divided into 'Creative Science" and "Creative Arts," andcontained items such as:Won a literary award or prize for creative writing.Won a prize or award in an art competition (sculpture,ceramics, painting, etc.).Received the highest rating in a state music contest.Had a scientific paper published in a science journal.Through a series of studies, the investigators moved fromthe initial measures of scientific and artistic accomplishment tosix criteriascience, leadership, art, music, writing, and dramaticartsfor assessing notable extracurricular accomplishment atboth the high school and college levels.More recently, scaleswere developed to assess accomplishment in such additionalareasas:socialparticipation (i.e.,activism), social service,business, humanistic-cultural, religious service, social science,and interperscinal competency.A control scale measuringrecognition of academic accomplishment .was also developed.Although these scales are highly skewed (the modal numberof accomplishments is zero), they have moderate reliability.The evidence for their validity rests primarily on two bases.First, their content represents outstanding achievement tothejudges and experts who either contributed or approvedtheitems in the scales.Second, the validity of the scales dependson the honesty with which students reporttheir accomplish-ments, and there is considerable evidence from themeaningfulpatterns of results (Holland and Richvds, 1967b) thatstudents,for the most part, have been making rational discriminationsamong accomplishments and appropriate responses.Othertechniques tha

5 t provide some additional control for st
t provide some additional control for studenthonesty (Skager, Schultz, and Klein, 1965) have been developedfor obtaining information about student accomplishment out-side the, classroom.To summarize, the college achievement scales appear to bereliable and valid.They provide a brief set of personallyrelevant measures which can serveas fairly comprehensivecriteria of college success.Coupled with grades and tests, theycan be used in studying such problems as: the effects ofvariouskinds of colleges upon a variety of student outcomes, theconservation of talent, and the relationship between collegeand adult achievement.These scales represent only a sampleof student accomplishments, however, and itis quite likelythat important areas of achievement are ignored.But even ifthisis the case, they can be used as guides in developingsimilar scales to increase our ability to assess student attainments.These scales have been used in several longitudinal studies ofclassroom and nonclassroom accomplishment in high schooland college.In general, the results indicate that nonacademicaccomplishment can be assessed with moderate reliability, thatboth academic and nonacademic accomplishment can be pre-dicted to a useful degree, and that nonacademic accomplishmentis largely independent of academic aptitude and achievement.Similarly, selecting college students on measures of academicaptitude and achievement yields a student body that achievesin the classroom, while selecting college students on measuresof nonacademic achievement yields a student body that doesimportant things outside the classroom (Baird and Richards,1968).Some of the practical implications of these results seemclear.The emphasis in colleges on academic aptitude andachievement leads to neglect of other equally important talents.There should be continuing efforts, therefore, to develop andimprove measures of originality and of many kinds of achieve-ment.Further, such measures should be considered importantin their own right, and not just as supplements to grades andtests.The results also indicate a need for a broader definitionof the nature of human talent and of higher education. Thereare many kinds of human talent, and each would be likely tobenefit from some type of higher education.In other words,the results indicate a need for a highly diversified collegesystem in which institutions would be selective only in specific,and different, areas.3TECHNICAL REVIEWThe purpose of this section is to provide a more detailedsummary of the research underlying the material presented intheOverview.Because theOverviewstressed interpretation,this section will emphasize factual presentation with only the additional interpretation that is necessary to maintain continuity.The three main areas of research described in the Overview willbe reviewed separately.CLEPThe CollegeEntrance Examination Board developed theCollege-Level Examination Program (CLEP) to enable individualswho have acquired their education in nontraditional ways todemonstrate their academic achievement.In its manuals forCLEP, the College Entrance Examination Board (1967a, 1967b;1967c, 1968) presents a detailed description of the rationale,history, contents, and psychometric properties of these tests.Most colleges expect their graduates to be familiar with, andknowledgeable about, ideas and methods from several broadareas of intellectual inquiry.Similarly, the college graduate isexpected to be able to express himself competently and clearly,and to be able to practice and understand the conventions ofgood English.Accordingly, the general examinations of theCollege-Level Examination Program consist of a battery of fivetestsEnglish composition, humanities, mathematics, naturalsciences, and socialsciences-history.The examinations aredesigned to be appropriate for assessing the kinds of intellectualskills students can be expected to have acquired by the end oftwo years in college.The manual describing the general exam-inations (College Entrance Examination Board, 1968) summarizestheir comprehensive nature as follows:1. The examinations are not based on a particular cur-riculum or course of study.2. The examinations sample widely the content of themajor disciplines with which each is concerned.3. The factual materials with which the examinations dealcan be found in many different courses in colleges anduniversities.4. The [general] examinations do not attempt to measurethe outcomes of specialized courses that students mightpursue when majoring in a particular field.5. The examinations stressunderstanding, not merelyretention, of facts, the ability to perceive relationships,and the grasp of basic principles and concepts.6. The examinations are constructed in such a way thatan individual does not need to be able to answer allthe questions on them to demonstrate competence.7. The examination questions cover a range of difficulty,both in the depth understanding required and the skillsand abilities measured.In addition to the general examinations, CLEP also offerssubject examinations designed to measure achievement in specificcollege courses.At the time the Score Interpretation Guidewas published (College Entra

6 nce Examination Board, 1967c),subject ex
nce Examination Board, 1967c),subject examinations were availablein13 fields: Americangovernment, analysis and interpretation of literature, Englishcomposition, general chemistry, general psychology, geology,introductorycalculus, introductory economics, introductorysociology, money and banking, statistics, tests and measure-ments, and Western civilization.Recently (College EntranceExamination Guide, 1969), seven new subject examinationsweredeveloped:college algebra, college algebra-trigonometry, com-puters and data processing, educational psychology, history ofAmerican education, introductory marketing, and trigonometry.4The CLEP examinations are typical products of the CollegeEntrance Examination Board and Educational Testing Service(ETS).The basic preparation of the testsis done for theCollege Board by test development specialists at ETS in coopera-tion with committees of examiners.These committees consistof outstanding teachers who are faculty members at colleges,universities, or two-year colleges.Their job is to specify theskills and content to be measured, assist with the preparationand tryout of items, and review and approve the final forms ofthe tests before they are made available.For the generalexaminations, scoresarereported on the standard CollegeBoard scale from 200 to 800 with the intention that an appro-priate norm group will have a mean of approximately 500 anda standard deviation of approximately 100.For the subjectexaminations, scores are reported on a scale from 20 to 80with the intention that an appropriate norm group will have amean of approximately 50 and a standard deviation of approx-imately 10.The publisher's manuals present norms on the general exam-inations for college freshmen, sophomores, and seniors. Normsfor the subject examinations are based on groups of studentsnear the end of a course believed to be appropriate for theexamination.Samples were obtained from diverse collegescoast to coast, and, in the case of sophomore norms, a repre-sentative sample was obtained of sophomores in two-year andfour-year American colleges.When both sexes are combined.the means and standard deviations for sophomores are veryclose to their intended values.In general, means increase fromthe freshman year to the sophomore year to the senior year,but there are exceptions.Most notably, mathematics decreasesslightly from the freshman to the sophomore to the senioryear.Re liabilities (K-R 20) are generally satisfactory, ranging from.91 to .95 with a median of .92 for the general examinationsand from .76 to .92 with a median of approximately .87 forthe subject examinations.Validity data are minimal. For thegeneral examinations, means are shown for students intendingto major in various fields and for students who have had varyingnumbers of courses in the area covered by the examination.Both of these comparisons generally support the constructvalidity of the tests, but neither the magnitude nor the con-sistency of the differences is overwhelming.For the subjectexaminations, validitydata involve the correlation betweenscores on the exam and grades in the relevant course. Correla-tions with final course grade ranged from .37 to .66 with amedian of approximately .52.Several important sorts of evidence are conspicuous by theirabsence.No test intercorrelations are presented in support ofdiscriminant validity.No correlations are presented betweenthe CLEP exams and the SAT, although it is important to showthat these tests are not merely duplicating information obtainedfrom the SAT.Finally, no longitudinal data are presenteddemonstrating growth as a function of exposure to relevantcourses. Overall, therefore, it would be appropriate to concludefrom the information presented in the manuals thatthesetests definitely discriminate reliably between bright and not-so-bright students, but that it is an open question whether thetests make valid discriminations for their intended purposes. In addition to the basic data presented inthe manuals, asubstantial research literature about these tests is beginning toaccumulate(Beanblossom,1969a,1969b;Burnette,1970;French,1969; Goolsby,1966; Harris,'968, 1970; Heath,1967; Hodgson, 1970; Sharon, 1970; von Kolnitz,1969). Someof these studies merely report the experiences of aparticularcollege in using CLEP.For example, Heath (1967) describesexperiments at San Jose State and von Kolnitz(1969) of theUniversity of South Carolina.Burnette (1970) has presented a detailed account of hisexperiences at Florida Southern College. His work grew out ofa concern with the problems of hiscollege in evaluating bothtranscripts of students transferring from two-year colleges andthe military service experience of returning servicemen.Anobvious answer to this problem is administration of a nationallystandardized test, and accordingly Burnette turned to CLEP.It was not easy to persuade the faculty at Florida Southern togrant credit by examination, however, and most of Burnette'sreport concerns how he went about overcoming resistance tothis innovation.His report, therefore, is a most interestingcase history, and one that could be very useful to facultymembers, administrators, students, o

7 r others trying to introduceinnovations
r others trying to introduceinnovations in the assessment of student accomplishment. Theevidence he used to persuade the faculty is similar to theevidence presented in the manual, and shows that students whoreceived high grades in courses at Florida Southern also tendedto get high scores on the CLEP exams.Burnette also presentsevidence showing a fairly strong tendency fnr students withhigh SAT scores to get high CLEP scores. SLA a high correla-tion could be interpreted either as evidence of the validity ofthe CLEP examinations or as evidence of the lack of inde-pendence of the CLEP tests.Burnette makes the former inter-pretation.Perhaps the most extensive, sustained research on the CLEPtests was done at the University of Washington (Beanblossom,1969a, 1969b; Hodgson, 1970).These studies are especiallyvaluable because virtuallyallstudents at the University ofWashington undergo the Washington Pre-College Testing Programbefore entering college.This makes it possible to compareCLEP scores with scores on a college admissions test developedaccording to a different rationale. Beanblossom (1969a, 1969b)has published two reports of a study in which the CLEPgeneral examinations were administered in the fall of 1968to 333 students who had entered the University of Washingtonas freshmen in the fall of 1966, and who had completed 80-100credits by the spring of 1968. All but two of these students hadalso taken the Washington Pre-College Tests. The CLEP testswere administered in order to measure proficiency in lowerdivisionstudies, particularlyinthe natural sciences, socialsciences, and humanities. Very high correlations were obtainedbetween scores in the different areas of the CLEP generalexaminations.Such correlations are evidence against the dis-criminant validity of these tests.Beanblossom also found thatstudents,who had taken relatively many courses in the naturalsciences definitely obtained higher scores on the CLEP tests.However, repeated exposure to courses increased' CLEP testscores only slightly for the humanities and hardly at all for the5socialstudies.Finally, Beanblossom found that GPAs areonly mildly correlated with scores on the CLEP tests.In his second study, Beanblossom used data from these samestudents to explore the extent to which the CLEP tests measuresomething different from what is measured by collegeadmis-sions tests.Specifically, three CLEP general examination scores(in the areas of social science-history, natural science, andhumanities) and 11scores from the WashingtonPre-CollegeTests were intercorrelated and factor analyzed todeterminewhether the CLEP scores increased the factorial complexity ofthe battery.In general, the results indicated that the CLEPgeneral examinations administered to students who have com-pleted two years of college do not measure anything differentfrom what is measured by traditional college admissions testsadministered during high school. It must be recognized that thefactor analysis procedures used in this study do not emphasizespecific variance, and that the CLEP tests are almost certainlyadding some unique variance.Nevertheless, this study makes itclear that the absolute amount of unique variance must besmall.Beanblossom, therefore, seems justified in his conclusionthat these tests should be used with caution in evaluatingliberal arts curricula.Hodgson (1970) reports similar results,indicating that the number of credits earned in related courseshad low to moderate correlations with CLEP scores, and thatCLEP scores in the second year of college can be successfullypredictedsubstantially more so than is typical for predictingGPAsfrom scores on college admissions tests.These resultsalso indicate that little in the CLEP is unique.In general, these studies have provoked skepticism amongthe Washington investigators about the validity and value of theCLEP tests in attaining their intended purposes. Sharon (1970),of the Educational Testing Service, reached contrary conclusionswhen he summarized a series of studies involving samples ofcollege students and members of the armed forces.Thesestudies involved a description of the relationships betweenCLEP scores and age, major field, amount of college education,and number of courses in related fields.Sharon interprets thefindings as indicating that the..LEP general examinations arevalid for assessing achievement in general academic fields.Itis clear that the results do, in fact, generally conform toexpectation and, in that sense, support the construct validityof the tests.However, this evidence has little relevance to-theissues raised by the Washington investigators, and does notreally answer criticisms regarding the usefulness of the testsin fulfilling their purposes.'Perhaps the most encouraging evidence for the utility of theCLEP general examinations is presented by Harris (1970) of theUniversity of Georgia.Harris conducted a longitudinal study inwhich the CLEP general examinations were administered tostudents in their first quarter and again in their sixth quarter incollege.Simple gain scores (X2 X1) were computed and aver-aged.For the five tests, average gains ranged from 31 to 60score points, with a median of

8 approximately 49 score points.In other
approximately 49 score points.In other words, students scored, on the average, about half astandard deviation higher in their sixth quarter than they scoredin their first quarter.Harris also relates gains to grades in relevant courses.Specifically, average gainscores are given forstudents who received grades of Bor better, C or C+, andbelow C. In genera!, average gain increasesas grades improve.These results do indicate that the CLEPtests, to somedegree, measure educational growthas well as aptitude.Thisevidence would have beenmore persuasive if more sophisticatedgain scores had been used and if the relationship between gradesand gr'lls had been presented interms of correlations.Never-theless, Harris' study isa valuable first step in providing thekinds of evidence necessary to justify theuse of CLEP generalexaminations in awarding credit.In another part of his study, Harris explored therelationshipof scores on the CLEP tests toscores on the SAT obtained priorto college entrance.These results are consistent with theresults obtained by the University of Washingtoninvestigatorsinthat the correlationsaresubstantially higher than thosetypically obtained in studies of the grade prediction.Thus, itappears that the characteristics measured by the CLEP generalexaminations overlap the characteristics measured by collegeaptitude tests to an undesirable degree.It should be emphasized that the studies summarized hereand the rather negative conclusions derived from them, pertainalmost entirely to the CLEPgeneral examinations.Littlesystematic work has been done on the subject examinations,but one would expect them to be muchmore unique and dis-similar from aptitudeteststhan the general examinations.Because the general examinationswere planned to be inde-pendent of specific courses and to measure "understanding," itwas virtually impossible to construct a measure that was notjust another aptitude test.The subject matter examinations,on the other hand, are designed to measure familiarity withfactual mRteriill covered in courses.Such tests should measureother characteristics than those measured by college admissionstests.It is important that systematic research be carried outon the subject examinations to determine how well they servetheir intended function.In the meantime, a reasonable policymight be to grant credit for satisfactory scoreson the subjectexaminations only.Criterion-referenced testsMany of the ideas involved in criterion-referenced, or domain-referenced, tests have been available in the published literaturefor a number of years (Cronbach, 1963; Ebel, 1962; Flanagan,David, Dailey, Shaycoft, Orr, Goldberg, and Neyman, 1964;Lord, 1955; Rajaratnam, Cronbach, and Gleser, 1965). Indeed,one could argue that the ideas have been implicit in psycho-metric theory from the beginning.Nevertheless, the desirableproperties for criterion-referenced tests, the implied proceduresfor building such tests, and the inferences to be drawn fromscores on them are sufficiently different from current testingpractice to make it plausible to talk about a revolution intesting.Although there is some ambiguity, "criterion" in this contextis usually used in the sense of a standard of performance ratherthan an external variableto be predicted from the test.Accordingly, the basic theoretical concept of criterion-referencedachievement testing is thatit aims to measure the student'sknowledge of a well defined "universe" of subject mattercontent.A "universe" might be defined as the entire subject6matter with which a particular college course deals. A criterion-referenced examination would, then, use a sample of items fromthis subject matter to determine whether a student has learnedthe subject matter for the course.The important differencebetween such acriterion-referenced examination and mostcurrent ("norm-referenced") examinations is that performanceon criterion-referenced tests is compared to an external standard,not with other students' performances. Thus, a properly con-structed criterion-referenced test neither explicitlynor implicitlygrades on the curve.In order to construct a criterion-referenced examination, theinstructor must define the objectives of his course in the formof a set of specific tasks that the student should be able to doas a consequence of taking the course. Ordinarily, an individualcourse will involve a large number of specific tasks. Examplesof such specific tasks might be solving systems of 5 linearequations in 5 unknowns, identifying the Greek gods and god-desses alluded to in the works of a particular poet,or rescgringa piano composition for an orchestra.The next step is todetermine a way to list all possible questions relevant to eachtask, setting limits inherent in the subject matteror leading toa manageable number of possible questions. An example of alimit inherent in the subject matter would be confining the listof all allusions to Greek gods and goddesses to the extant worksof a particular poet. To keep the number of questions aboutsystems of linear equations manageable, the instructor mightlimit the known value of the terns to numbers between 0 and99.The purpose of acriterion-referenced exami

9 nation thenbecomes to determine what pro
nation thenbecomes to determine what proportion of the given questionsthe student can answer correctly.Success in the course mightbe defined as the ability to answer, say, 90%or moreof thequestions correctly. One is no longer interested in whether thestudent can answer more questions correctly thansome otherstudent who, for fortuitous reasons, took the class at thesametime he did.Ideally, students would be entirely ignorantof the subject matter before taking the course. (If they alreadyknow the material, why take the course?)Again, ideally, allstudents taking the course would master the material and wouldbe able to answer all questions correctly; otherwise the professorhas failed.At least in some mathematical and scientific fields, it appearsto be relatively easy to write appropriate short answer questionsfor criterion-referenced tests (Osburn and Shoemaker, 1968).It is much harder to write satisfactory multiple choice questionsbecause itis difficult to determine what would constitute anappropriate wrong, ordistractor,alternative.Some smallstudies (Richards, 1967) have used reasonable, but essentiallyarbitrary, procedures for choosing distractor alternatives.Re-cently,atheoreticalbasisfor more systematic choice ofdistractors has appeared (Guttman and Schlesinger, 1967).Under this procedure, properly constructed incorrect alternativesyield diagonostic information about what the student misunder-stands or has failed to learn.It should be noted at this point that criterion-referencedtests and norm-referenced tests are not really mutually exclusive(Ebel, 1970).In setting the tasks for his course, the professorwill always be tempted to set standards that only a brilliant person with highly specialized training could meet.Thus, hemay find that few, if any, sophomores can succeed on hiscriterion-referenced test at the end of the course. This "norm-referenced" finding should suggest to him that his standards areunreasonable, not that all sophomores are incompetent.In determining the proportion of the universe of questionsthe student can answer correctly, only a sample of thosequestions will be administered to any individual student.Itwould be better to use rigorous sampling procedures rather thaninformal ones, and it appears that stratified sampling of itemsyields better results than random sampling (Osburn, 1968).Also, no two students would need to take the same items noreven the same number of items.Instead, each student couldrespond to systematically sampled questions from the universeof content untilon the basis of statistical decision theoryonecan tell whether he has mastered that content (Wood, 1970).When a number of tasks are considered, such a testing procedureis likely to yield better data more efficiently than conventionaltests (Ferguson, 1970).Obviously, thisideal casewill only be approximated inpractice.Nevertheless, certain important implications for testconstruction emerge from a consideration of the ideal.Theproper index for selecting items is the difference between thepercentage of students who answer the item correctly beforeand after taking the course, rather than the difference betweenthe percentage of students with high and low total test scoreswho answer the question correctly.These two indices arelikely to be only moderately correlated (Cox and Vargas, 1966).If the total test discriminates well between students who haveand have not taken the course, it may be evidence that it is agood test, if the internal consistency coefficients and the inter-correlation for before and after course administrations of the testare low.Validity in the usual sense has little meaning.If aproperly constructed criterion-referenced test fails to correlatewith external performance, it means that mastery of the subjectmatter isif relevant to the performance, not that the test is"invalid."A fairly extensive body of empirical work on criterion-referenced tests is beginning to emerge. The most extensive use,no doubt, of criterion-referenced tests and itemsisin pro-grammed instruction.Here, one or more performance framesare inserted at a number of points in the program. The learneris required to perform the task correctly before continuing theprogram.If he does not perform correctly, the programbranches to remedial frames, and, when these frames are com-pleted, readministers the test frames to see if the learner isready to continue the regular program. For an example of thisuse of tests, see Wendt, Rust, and Alexander (1965).Much work is also being done on more conventional tests.For a number of years, Osburn (1967, 1968; Osburn and Shoe-maker, 1968; Shoemaker and Osburn, 1968) has been workingwith criterion-referenced tests for elementary statistics.Inaddition to presenting detailed discussions of the theory ofsuch tests,Osburn,has developed a set of rules for writing shortanswer statistical items, and has pushed the procedure to itslogical conclusion by developing computer procedures for writ-ing such items.(One's response to this achievement should beadmiration, not dismay.)7In the first stages of this work, the computer generatedrandomly selecteditems.Two university level elementarystatistics classe

10 s received a series of examinations comp
s received a series of examinations composed ofboth computer-generated and instructor-selected items.Whileinstructor-selected items had greater reliability, the coefficientsfor computer-generated items were acceptable.The studentsrated the computer-generated and instructor-selected items ascomparable with respect to difficulty and fairness on a post-examination questionnaire.Theoretically, stratified sampling of items yields better resultsthan random sampling, and the most extreme case of stratifiedsampling is item matching. These theoretical expectations wereverified by Shoemaker and Osburn in their later wf._ 'lc (1968).Matched items yielded greater reliability than randomly selectedones, and stratification on item difficulty proved to be a veryimportant factor for unmatched items.Hills (1970) also worked with a statistics coursespecificallya graduate course in measurement.In addition to preparing acriterion-referenced test, Hills, on the first day of class, gavehis students a list of tasks they were expected to master by theend of the course. They were expected, for example, to be ableto derive the Spearman-Brown formula.Not only did thestudents display more mastery of the subject matte' than didthe preceding year's conventionally taught class, but they alsoappeared to be better motivated and to work harder.The most extensive work, perhaps, on multiplechoicecriterion-referenced tests has been carried out by Guttman andSchlesinger (1966, 1967a, 1967b) using what they call "facetdesign." A facet is a characteristic on which item alternativescan differ.Thus, an item of a test using geometrical figuresmight have three facets: shape, size, and orientation. Considera-tion of these three facets leads to systematic choice of distractoralternatives.For example, take all combinations of two sizes,two shapes, and two orientations and let one particular combi-nation be the correct answer. The possible distractors thenarethe seven other combinations.Three of these distractors differfrom the correct answer on one of the three facets; threedistractorsdiffer from the correct answer on two of thethree facets; and one distractor differson all three facets.This systematic design of distractors makes it possible to assigna score for each type of error. A student's profile of errors,therefore, will tell not only how much he has achieved inagiven area but also what typical kinds oferrors he makes.This detailed diagnosis of hiserrors makes it possible to pre-scribe an appropriate treatment.Guttman and Schlesinger(1967) have shown that pupils who make certain kindsof errorson one item tend to make the same kind of error on otheritems.Another consequence of facet design is that, items test theidentification of elements belonging toan ordered set.There-fore, the suitability of an item for a given testor subtest isdecided upon definitional grounds, 'instead of by statisticalitem analysis.Analysis of inter-item correlations is employedonly to test an empirical hypothesis about the relationship ofthe statistical structure to the faceted design.Guttman and Schlesinger have applied facet analysis toaseries of verbal, pictoral, and quantitativetests.In general,intensiveanalysis of distractors in terms of facets yielded satisfactory results only for quantitative and pictoral material.This finding provides additional evidence that it will be difficultto design criterion-referenced tests, in general, or facet-designedtests, particularly for verbal fields.This is especially true forthose fields in which itis hard to set limits on the subjectmatter.Moreover, it is not always clear that the facet designadequately summarizes the process of responding to the item.Consider Guttman's sample item:A storekeeper has 475 lbs. ofsugar in a bin and sells48%. How many lbs. did he sell?1. 4752. 2183. 9894. 2285. OtherAccording to the facet design, alternative 3 isan error resultingfrom use of the wrong formula.It seems obvious, however, thatchoosing alternative 3 involves not only use of awrong formulabut also gross insensitivity to absurdity.Another large scale application of the basic ideas of criterion-referencedtesting is the Minnesota Minnemast Project usingdomain-referenced tests.In this project, the tasks to be masteredare defined in terms of "behavioral objectives."In a recentsymposium, this team of researchers (Rabehl, 1970; Patterson,1970; Nitko, 1970; Johnson, 1970; Senison, 1970) summarizedtheir work as follows.Behavioral objectives must always be operationally definedby sets, or domains, of test items. (A test item is definedasany replicable set of stimulus conditions to which a student mayrespond, together with a set of specifications for recording hisresponses.)A useful way to define a domain of items is todraw up rules indicating the dimensions and valuesover whichstimulus conditions and response properties mayrange.Therules for generating the items constituting domains might becalled "item forms."Exact definition of a domain of itemsmakes possible the precise statistical estimation of each student'sperformance.Such precise knowledge provides a sound basisfor adapting instruction to the student's status and needs.Final

11 ly, clear identification of the rules us
ly, clear identification of the rules used in generating theitems which constitute a domain provides a basis for theoreticalprediction outside that domain.In addition to these rather systematic research programs,anumber of individual researchers have reported work on criterion-referenced tests.Popham (1970) discusses the difficulty of,and his struggles with, obtaining adequate item selection indicesfor criterion-referenced tests for college courses.Such diffi-culties could be avoided, of course, by using the item construc-tion procedures of Osburn, Guttman and Schlesinger, and theMinnemastinvestigators.Using these procedures, no itemselection is warranted.Crawford (1970) discusses his use ofsuch tests in the area of healtha domain in which itseemsclear that we definitely wish to establish a minimum level ofperformance which all practitioners must exceed.Crawforddiscusses employing criterion-referenced measurement for simu-lated clinical situations as well as for multiple choice tests.In summary, criterion-referenced tests offer a number oftheoretical advantages in the assessment of student accomplish-ment in college. The primary advantage, probably, is that-the8assessment of aparticular student's accomplishment woulddepend only on his own performance, and not on that ofother students who happen to be in his college at the same time.Because of this feature, such tests might be more acceptablethan norm-referenced tests to disadvantaged minority students.However, criterion-referenced tests are still in the exploratoryexperimental stage, and no thoroughly evaluated tests are avail-able for widespread me in college. Therefore, criterion-referencedtests offer promise for the future but little practical help insolving present problems.It also should be noted that somescepticism about the value of criterion-referenced tests (DeCecco,1970; Ebel, 1970; Mattson, 1970) remains.CreativitySeveral years ago, as part of their search for talented highschool students, the National Merit Scholarship Corporationresearch staff became interested in the wholearea of originality,creativity, or creative performance (Holland, 1966). Theywereimmediately confronted with the problems of hOw to distinguishan original from an unoriginal person, how to define creativebehavior, and whether creative behaviorcan be predicted.As a first step, Holland (1961) defined creative performanceas "a performance which is accorded public recognition throughawards, prizes, or publication and whichmay therefore beassumed to have exceptionalcultural value."Under thisrubric, a self-report checklist of achievements at the high schoollevel was derived by reviewing the accomplishments reported byNational Merit Finalists. Some typical items from this checklistwere:Won a prize or award in a scientific talent search.Invented a patentable device.Had a scientific paper published in a science journal.Won one or more speech contests.Had poems, stores, or articles published inapublicnewspaper or magazine or in a state or nationalhigh school anthology.Won a prize or award in an art competition (sculpture,ceramics, painting, etc.).Received the highest rating in a state music contest.Composed music which has been given at least onepublic performance.Won a literary award or prize for creative writing.The items were divided into two scales: Creative Science andCreative Art.The initial results for these scales were mixes:The reliabilities were not very encouraging, ranging from .36 to.55. The correlates of the scales, however,were consistent withother research on the creative person.Therefore, the researchwas continued.The next step was to develop similar scales at the collegelevel (Holland and Astin, 1962). The initial college-level check-lists yielded scores for leadership (4 items), scientific achieve-ment (6 items), and artistic achievement (10 items).Thesescales were administered to college seniors who had beenassessed with a special National Merit battery in high schools.The predictors from this battery were correlated with the threecollege level_ achieveniefil scales and with college grades.Thepattern -01 correlations indicated that college achievers in each--of the 4 areas resemble stereotypes inour culture of the scientist, artist, leader, and academic achiever. More importantly, achieve-ment in art, science, and leadership was hardly correlated atallwith grades.The investigators also learned that using wordslike "original" or "creative" in their research reportscreatedmany difficulties with journal editors.Accordingly, they beganto use terms like "nonacademic accomplishment" to refer to thekinds of achievements included in the checklists.By adding and revising items, both the high school and thecollege nonacademic achievement checklists were expanded toyield scores in six areas: art, music, drama, science, writing, andleadership.These scales, together with a large number of othervariables, were investigated in two longitudinal studies(Hollandand Nichols, 1964; Nichols and Holland, 1965). The results ofthese studies generally show that nonacademic accomplishmentcan be assessed with moderate reliability;that the nonacademicachievement scales mainl

12 y have low positive intercorrelations;th
y have low positive intercorrelations;that the best predictor of nonacademic achievement in collegeis similar achievement in high school; and that nonacademicaccomplishment is largely independent of grades and scores oncollege admissions tests.An obvious criticism of these studies is that grades and testscores are major factors in selecting National MeritFinalists, soone would not expect high correlations with these measures.To answer this criticism, a series of studies at the AmericanCollege Testing Program (Holland and Richards, 1965, 1967;Richards, Holland, and Lutz, 1967a, 1967b; Richards and Lutz,1968; Baird, 1969) examined similar relationships using samplesshowing a full range of talent.Using the items in the National Merit scales as guides, newitems were developed to measure college student accomplish-ment in the following areas: leadership, social participation, art,socialservice, science, business, humanities, religious service,music, writing, social science, and speech and drama.Eachitem was a behavior or event considered to be a sign of notableaccomplishment in a special area.Because each behavior orevent is also observable, the accomplishments are verifiable, atleast in principle.A large number of items were written for each area ofaccomplishment.Items were then submitted to experts forreview. On the basis of this review, items were shifted and revisedto yield final ten-item scales. Each scale is, in a sense, a criterionor standard of accomplishment in an important area of humanendeavor.Students with high scores on one or more scales areassumed to have attained a high level of accomplishment whichrequired complex skills, long term persistence, or originality,and which generally received public recognition.In earlier studies, such scales had produced highly skewed,almost dichotomous distributions, which might account in partfor their low correlations with measures of academic potentialand achievement.As a check on this possibility, a five-item"Recognition for Academic Accomplishment" scale was devel-oped.This scale includes such items as:"Participated in anindependent study program for outstanding students."Likethe other nonacademic accomplishment scales,it involves aself-report of achievement and it shares their statistical defectsof extreme skewness and many zero scores.Unlike the othernonacademic accomplishment scales, this scale was designed tobe correlated with grades and tests of academic aptitude.To determine the statistical characteristicsof these scales,they were administered to three groupsof college studentsfreshmen, sophomores, and seniorsin the spring of1965.These students were attending diverse collegesthroughout theUnited States and represented a wide range of academicaptitude.They did not, however, constitute a representative national sam-ple of either colleges or students,In general, the results showed that seniors haveaccomplishedmore than sophomores, and sophomores morethan freshmen.This trend supports the validity of the scales. The reliability co-efficients (KR-20) indicate that the.;sales generally possessmoderate internal consistency.Perhaps because of its brevity,the reliabilities for the Recognition for Academic Accomplish-ment scale are somewhat lower. The Business Achievementscalealso had relatively low reliabilities.The explanation for theselow co-efficients is not apparent, but they may bedue to greaterheterogeneity of content in this scale.In general, the intercorrelations of these nonacademic accom-plishment scales support the construct validity ofthe scales as dothe concurrent correlations between these scales and studentratings of the importance of various life goals. The intercorrela-tions of the nonacademic accomplishment scales are high enoughto suggest that if a student achieves at all, he is likely to achievein more than one area, but low enough to suggest that responsebias did not have a strong effect.The correlations between the nonacademic accomplishmentscales and grades generally conform to what would be expectedfrom early studiesnamely, that all of these correlations wouldbe low except for those involving the Recognition for AcademicAccomplishment scale. Because this scale correlated moderatelywith grades, the results provide both convergent and discrimi-nant validity, and make itless plausible that response bias,dissimulation or similar occurrences invalidate student responses.In summary, these college achievement scales appear tohave useful reliability and validity.They provide a brief setof socially relevant measures which can serve as fairly compre-hensive criteria of success in college.Coupled with grades,they can be used in studying such problems as the effects ofcolleges upon student accomplishment, the conservation oftalent, and the relationship between college and adult achieve-ment.These nonacademic accomplishment scales do not, ofcourse, exhaust all of the socially important areas in which acollege student might achieve.However, the principles under-lying the construction of these scales are simple. Once theseprinciples are grasped, it should be easy to develop other scalesto assess student accomplishment in other areas, to

13 estimatestudent attainment of the broad
estimatestudent attainment of the broader goals of a college education,or to satisfy a particular college's unique needs.Similar scalesto assess student attainment of the goals of a liberal educationhave been developed independently (Pace, 1969).Because the investigators who constructed these scales werequite concerned with the transition from high school to college,they used them in a number of longitudinal predictive studiescomparing accomplishment in college to earlier accomplishmentin high school (Baird, 1969; Richards, Holland, and Lutz, 1967;Richards and Lutz, 1968).In general, the results confirmedearlier National Merit findings for samples with a broad rangeof talent. Both academic and nonacademic accomplishment can9 be predicted from similar accomplishment in high school withmoderate success. To illustrate, in one study (Richards, Holland,and Lutz, 1967) the median correlation between student non-academic accomplishment in college and achievement in thesame area in high school was about .39, while the median corre-lation between gradesincollege andinhigh school wasabout .38.More importantly, these results also confirmed earlier findingsthat nonacademic accomplishment islargely independent ofacademic accomplishment and potential, although the collegeRecognition for Academic Accomplishment scale is moderatelycorrelated with high school grades and scores on collegeadmissions tests.Some critics (Worts, 1967) have suggestedthat the correlational methodology exaggerates the degree ofindependence.While some exaggeration may exist, the con-sistency and meaningfulness of the results make it doubtfulthere is more than a low relationship between academic andnonacademic accomplishment (Holland and Richards, 1967b).Because the nonacademic accomplishment scales rely onstudent self-report, the extent to which students exaggeratetheir accomplishments orlieis an important consideration.On the assumption that a student who would exaggerate hisaccomplishmentinone area would also claim exceptionalachievement in a number of areas, an infrequency scale wasdevised and students with high scores eliminated from thecomputations.The overall pattern of results remained un-changed.The most obvious practical applications of these findingsare in the area of college admissions. Accordingly, a simulationstudy of college admissions was conducted (Baird and Richards,1968) which showed that the selection of students on the basisof academic accomplishment yields a student body that does wellin the classroom, but eliminates many nonacademic achievers.Similarly, the selection of students on the basis of nonacademicaccomplishment yields a student body that does importantthings outside the classroom, but contains more students whofail academically.Supporting evidence has been obtained byWallach and Wing (1969). The results of both studies indicatethat any admissions policy has its costs and that a particularcollege cannot be fair to everyone unless it admits everyone.REFERENCESSome of the following works are available in microfiche (MF)or hard/photo copy (HC) from the ERIC Document Reproduc-tion Service, National Cash Register Company, 4936 FairmontAvenue, Bethesda, Maryland 20014.When ordering, pleasespecify the ERIC document (ED) number.Payment mustaccompany orders of less than $5.00.Abstracts of the docu-ments appear inResearch in Education,a monthly publicationavailable from the Superintendent of Documents, U.S. Govern-ment Printing Office, Washington, D.C. 20402.Single copiescost $1.75; annual subscriptions, $21.00.Astin, A. W., Panos, R. J., and Creager, .1. A.Implications of aProgram of Research on Student Development in HigherEducation.Washington:American Council on Education,1965. ED 031 127. MF-$0.25, HC-$2.20.10However, the results do suggest that a single-minded pursuitof academic excellence is destructive of other, perhaps moreimportant values, and that there should be a greater diversityof colleges and admissions policies.Their results also imply aneed for more diverse, though equallyrigorous, ways ofevaluating students once they are in college. A student mightbe forgiven, say, failure to master French verbs if he werecomposing good music.The use of these scales, however, is not a panacea for all theills of higher education.In some ways, itis discouraging tofind that the way to choose students who will achieve incollegeis to find students who have already demonstratedsimilar achievements in high school.Because the scales arealso somewhat correlated with family income (Baird, 1967),they may be of limited use in overcoming what genuinecultural bias may exist in colleges.In spite of their limitations,however, they seem to provide an important means of identi-fying and assessing a variety of student accomplishments incollege.The methodology should improve as more is learnedabout student accomplishment from systematic, longitudinalprograms of research, such as that conducted by the AmericanCouncil on Education (Astin, Panos, and Creager,1967).The common denominator of the three areas of researchdiscussed in this review appears to be that they all involvetechniques for treating each student as an unique individ

14 ual.The CLEP examinations recognize that
ual.The CLEP examinations recognize that individuals may learnsubjectmatterinunconventional ways; criterion-referencedtests consider only the individual's own performance in asses-sing his accomplishment; and the nonacademic accomplishmentscales measure each student's special pattern of abilities andachievements.Such recognition and cultivation of uniquetalents, of course, has always been part of the ideals of highereducation.Over the past fifty years, however, most of thetechniques developed through research seem to have beenmore usefulfor dealing with students en masse than fortreating each student in terms of his own needs and abilities.It is encouraging, therefore, that some researchers are beginningto work on techniques that may reduce the discrepancy betweenideals and procedures.Baird, L. L.Family Income and the Characteristics of College-Bound Students.Iowa City, Iowa: American College TestingProgram, 1967. ED 012 969. MF-$0.25, HC-$1.60.Baird,L.L."Prediction of Academic and NonacademicAchievement in Two-Year Colleges from the ACT Assess-ment,"Educational and Psychological Measurement 29,1969,pp. 421-30.Baird, L. L. and Richards, J. M., Jr.The Effects of SelectingCollege Students by Various Kinds of High School Achieve-ment.Iowa City, Iowa: American College Testing Program,1968. ED 017 966. MF-$0.25, HC-$1.85.Beanblossom, G. F."The Use of CLEP Scores in EvaluatingLiberal Arts Curriculum." Seattle: University of Washington,1969 a. ED 029 598. MF-$0.25, HC-$1.55. Beanblossom, G. F. "What Do the CLEP General ExaminationsMeasure?"Seattle:University of Washington, 1969 b.ED 031 173.MF-$0.25, HC-$0.65.Burnette,R.R."Use of the CLEP-GE's with ReturningServicemen and Junior College Transfers."Minneapolis:Paper presented at American Educational Research Associa-tion, 1970.College-Level Examination Program:Description and Uses,1967.New York:College Entrance Examination Board,1967 a.College-Level Examination Program:A Description of theGeneral Examinations. New York: College Entrance Exami-nationBoard, 1968.College-Level Examination Program:A Description of theSubject Examinations.New York:College Entrance Exami-nation Board, 1967c.College-Level Examination Program: Score Interpretation Guide.New York:College Entrance Examination Board, 1967 b.College-Level Examination Program: Supplement to the Scoreinterpretation Guide. New York:College Entrance Examina-tion Board, 1969.Cox, R. C. and Vargas, J. S. "A Comparison of Item SelectionTechniques for Norm-Referenced and Criterion-ReferencedTests." Pittsburgh:University of Pittsburgh, 1966. ED 010517. MF-$0.25, HC-$0.90.Crawford, W. R."Assessing Performance When the Stakes AreHigh."Minneapolis:Paper read at American EducationalResearch Association, 1970.Cronbach, L. J."Course Improvement Through Evaluation,"Teachers College Record 64,1963,pp. 672-83.De Cecco, J. P."The Measurement of Student Good Works ina School Without Faith."Minneapolis:Paper read atAmerican Educational Research Association, 1970.Ebel, R. L."Content Standard Test Scores,"Educational andPsychological Measurement22, 1962, pp. 15-25.Ebel, R. L. "Some Limitations of Criterion-ReferencedMeasure-ment."Minneapolis:Paper read at American EducationalResearch Association, 1970.Ferguson, R. L. "Computer-AssistedCriterion-Referenced Mea-surement." Minneapolis:. Paper readat American EducationalResearch Association, 1970.Flanagan, J. C., Davis, F. B., Dailey, J. T., Shaycoft,M. F.,Orr, D. B., Goldberg,I., and Neyman, C. A., Jr.TheAmerican High School Student.Pittsburgh:University ofPittsburgh and AmericanInstitutesforResearch, 1964.French, J. W."Types of Students Defined by Items in theCLEP GeneralSeries of Achievement Tests."Sarasota,Florida:New College, 1969.Goolsby, T. M., Jr. "The Validity of AComprehensive CollegeSophomore Test Battery for Use in Selection,Placement,and Advisement,"Educational and Psychological Measure-ment26, 1966, pp. 977-83.Guttman, L. and Schlesinger, I. M.Development of Diagnosticand Mechanical Ability Tests Through Facet DesignandAnalysis.Jerusalem, Israel:IsraelInstitute of AppliedSocial Research, 1966. ED 010 590. MF-$0.50, HC-$4.90.11Guttman L. and Schlesinger, I. M. "Systematic Constructionof Distractors for Ability and Achievement Test Items,"Educational and Psychological Measurement27, 1967 a,pp. 569-80.Guttman, L. and Schlesinger, I. M.The Analysli of DiagnosticEffectiveness of a Facet Design Battery of Achievement andAnalytical Ability Tests.Jerusalem:Israel, Israel Instituteof Applied Social Research, 1967 b. ED 014 773. MF-$0.50,HC-$5.10.Harris, J. W. "Performance on the College-Level Examinationof University of Georgia Juniors Tested in November, 1967."Athens, Georgia: University of Georgia, 1968.Harris, J. W."Gain. Scores and Summary of Content andSuggested Uses,"Minneapolis:Paper read at AmericanEducational Research Association, 1970.Heath, H. F."Sophomore Evaluation Project."San Jose,California: San Jose State College, 1967.Hills,J.R."Experiencein Small Graduate Classes andApproaches to Evaluating Criterion-Referenced Tests." Min-neapolis:Paper read at American Educational ResearchAssociation, 1970.Hodgson, T. F. "Norms and Factoral Structure of the G

15 E's."Minneapolis:Paper read at American
E's."Minneapolis:Paper read at American Educational ResearchAssociation, 1970.Holland, J. L."Creative and Academic Performance AmongTalented Adolescents,"Journal of Educational Psychology52,1961, pp. 13647.Holland, J. L. "The Prediction of Academic and NonacademicAccomplishment,"Proceedings of the 1966 Invitational Con-ference on Testing Problems.Princeton, New Jersey: Educa-tional Testing Service, 1966.Holland, J. L, and Astin, A. W. "The Prediction ofthe Academic,Artistic, Scientific, and Social Achievement ofUndergraduatesof Superior Scholastic Aptitude,"Journal of EducationalPsychology33, 1962, pp. 13243.Holland, J. L. and Nichols, R. C. "Prediction ofAcademic andExtracurricular Achievement in College,"Journal of Educa-tional Psychology55, 1964, pp. 55-65.Holland, J. L. and Richards, J. M.,Jr."Academic and Non-academic Accomplishment:Correlated or Uncorrelated?"Journal of Educational Psychology56, 1965, pp. 165-74.Holland, J. L. and Richards, J. M., Jr."Academic and Non-academic Accomplishment ina Representative Sample ofStudents Taking the American College Tests,"College andUniversity43, 1967 a, pp. 60-71.Holland, J. L. and Richards, J. M., Jr."The Many Faces ofTalent:A Reply to Werts,"Journal of Educational Psy-chology58, 1967 b,pp. 205-09.Johnson, P. E."The Origin of Item Forms."Minneapolis:Paper read at American Educational ResearchAssociation,1970.Lord, F. M."Sampling Fluctuations Resulting from theSampling of TestItems," Psychometrika20, 1955, pp. 1 -23.Mattson, D. E."Comparative Performance:The Basis ofCriterion Related. Measures."Minneapolis:Paper read atAmerican Educational Research Association,1970. Nichols, R. C. and Holland, J. L. "Prediction of the First YearCollege Performance of High Aptitude Students," Psycho-logical Monographs 77, 1963.Nitko, A. J. "SomeConsiderations When Usinga DRATS inInstructionalSituations."Minneapolis:Paper read atAmerican Educational Research Association, 1970.Osburn, H. G."A Note on Design of Test Experiments,"Educational and Psychological Measurement 27, 1967,pp.797-802.Osburn, H. G. "Item Sampling for Achievement Testing," Edu-cational and Psychological Measurement 28,19668,pp. 95-104.Osburn, H. G. and Shoemaker, D. M. Pilot Projecton ComputerGenerated Test Items.Houston, Texas:University ofHouston, 1968. ED 026 856. MF-$0.75, HC-S8.65.Pace, C. R. An Evaluation of Higher Emucation:Plans andPerspectives.Los Angeles:University of California at LosAngeles, 1969.Palmer, 0. "Seven Classic Ways of Grading Dishonestly," TheEnglish Journal 51, 1962, pp. 46467.Patterson, H. L."Applications of DRAT to Job Corps Mathe-matics Program Development."Minneapolis:Paper read atAmerican Educational Research Association, 1970.Popham, W. J. "Indices of Adequacy for Criterion-ReferencedTest Items."Minneapolis:Paper read at American Educa-tional Research Association, 1970.Rabehl, G. E."The Minnesota Experience with DRATS."Minneapolis:Paper road at American Educational ResearchAssociation, 1970.Rajaratnam, N., Cronbach, L. J., and Gleser, G. C. "Generaliza-bility of Stratified-Parallel Tests," Psychometrika 30, 1965,pp. 39-56.Richards, .1. M., Jr."Can Computers Write College AdmissionsTests?" Journal of Applied Psychology 51,1967, pp. 211-15.Richards, J. M., Jr. and Lutz, S. W. "Predicting Student Accom-plishment in College from the ACT Assessment," Journal ofEducational Measurement 5, 1968, pp. 17-29.12Richards, J. M., Jr., Holland, I. L., and Lutz, S. W. "The Assess-ment of Student Accomplishment in College," Journal ofCollege Student Personnel 8,1967 a, pp. 360-65.Richards, J. M., Jr., Holland, J. L., and Lutz, S. W. "Predictionof Student Accomplishment in College," Journal of Educa-tional Psychology 58, 1967 b, pp. 343-55.Senison, D. B. "Future Uses for DRATS." Minneapolis: Paperread at American Educational Research Association, 1970.Sharon, A. T."Validity of the GE's as Measures of AcademicAchievement." Minneapolis: Paper read at American Educa-tional Research Association, 1970.Shoemaker, D. M. and Osburn, H. G. "An Empirical Study ofGeneralizability Coefficients for Unmatched Data," BritishJournal of Mathematical and Statistical Psychology 21, 1968,pp. 239-46.Skager, R. W., Schultz, C. B., and Klein, S. P."Quality andQuantity of Accomplishments as Measures of Creativity,"Journal of Educational Psychology 56, 1965, pp. 31-39.von Kolnitz, L."Experimental Testing with College LevelExamination Program." Columbia, South Carolina: Univer-sity of South Carolina, 1969.Wallach, M. A. and Wing, C. W., Jr. The Talented Student: AValidation of the Creativity-Intelligence Distinction.NewYork: Holt, Rhinehart, & Winston, 1969.Wendt, P. R., Rust, G., and Alexander, D. D. "Study to TestRefinements in Intrinsic Programming in Pictoral, Audio, andPerformance Frames to Maximize the Probability of DesiredTerminal Behavior." Carbondale, Illinois:Southern IllinoisUniversity, 1965. ED 033 235. MF-S0.50, HC44.50.Werts, C. E. "The Many Faces of Intelligence," Journal of Edu-cational Psychology 58, 1967, pp. 198-204.Wood, R. "The Application of Bayesian Sequential Analysis toEducational and Psychological Testing." Minneapolis: Paperread at American Educational Research Association, 19