Rob Waring Notre Dame Seishin University wwwrobwaringorgpresentations Overview Purpose What kind of list List structure Selection factors Definitions or translations Mechanics Validating ID: 409539
Download Presentation The PPT/PDF document "The Ins and Outs of making a Wordlist" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Ins and Outs of making a Wordlist
Rob Waring
Notre Dame Seishin University
www.robwaring.org/presentations/Slide2
Overview
Purpose - What kind of list?
List structure
Selection factors
Definitions or translations?
Mechanics
ValidatingSlide3Slide4Slide5
Android tooSlide6
Black – in level
Red
out of list
Red underline
– out of level
Green
– ignored wordsSlide7
What kind of list - Purpose?
To give to students to learn from (paper or digital)
To analyze texts against e.g. a graded reader
To cover the majority of words in a given field (e.g. top 1000 business words)
Master list to source
sublists
from?
Multiple level lists, or one list?
For a single class – or general (e.g. all natives, all intermediates)
Spoken, written, mixed
?
For a specific audience?
TOEIC, business, academic
A certain age
A certain level (intermediates)Slide8
What kind of list - Starting point?
Use existing wordlists – GSL, Nation’s BNC lists, NGSL, NAWL
...
Use existing corpus (e.g. BNC, COCA) and dig out what you want
Create your own corpus (business, TOEIC, nursing)
Does it suit your purpose? Will BNC give you an academic list?
Is it structured the way you want? Headwords only? Lemmas? Mixture?Slide9
BNC raw (by type) BNC Nation Family listSlide10
List structure I - List with Levels
How many levels? Why?
What are the breaks between levels? Will learners get from one level to another with ease?
Will the breaks be even (say 560 words each) or vary?
Level by frequency? utility? range? intuition? Learnability?Slide11
Selection Criteria I
Representativeness
:
The
list should
adequately represent the
wide
range of uses of
language
Frequency
and
range
: A
word should
occur
frequently across a wide range of texts.
Word families:
S
ensible
set of criteria regarding what forms and uses are counted as being members of the same
family
Utility:
how useful will the words be to the target learners
Idioms
and set
expressions:
Some items larger than a word behave like high frequency
wordsSlide12
Selection Criteria II
Learnability
:
how easy to learn? Related words may be easier
Regularity
:
regular forms are easier than irregular forms, but some derivatives operate differently within a family.
Excuse
inexcusable
Coverage:
(it is not efficient to be able to express the same idea in different ways. It is more efficient to learn a word that covers a quite different idea
)
Stylistic
level and emotional
words:
West
saw second language learners as initially needing neutral
vocabulary
Intuition
:
how well does it match the teacher’s sense of what to includeSlide13
Which of these would you p
ut in your list?
o
ut
of
per
cent
s
uch
as
of
course
for example
in
front
of
all right
as
soon
as
in general
in addition to
next to
on
top ofinstead ofin charge ofjust aboutprovided thatas good aswith a view to
in between
by and large
at
random
per se
old
fashioned
grown
up
matter
of
fact
sq
m
fait
accompli
straight forward
habeas
corpus
self-same
haute
cuisine
a
good
deal
laissez
faire
persona non grataSlide14
How frequently do lexical phrases occur (BNC)?
Raw
Rank
Word
Per
million
words
177
o
ut
of
490
222
per
cent
382
272
s
uch
as
321
285
of
course
309
378
for example
2381538in front of651725all right582159as soon as472491in general412970in addition to343307next to303755on top of264378instead of215409in charge of175987just about157396provided that117885as good as109125with a view to8Raw RankWordPer million words11459
in between
6
13507
by and large514369at random416684per se419505old fashioned322060grown up228441matter of fact243572sq m148241fait accompli151717straight forward158511habeas corpus174321self-same076170haute cuisine082928a good deal083882laissez faire089371persona non grata0Slide15
Selection criteria – a new headword or in the family?
Only mega-headwords (that cover all meaning senses)
Inflections only? - Plurals, verb forms,
-
er
–
est
adjectives. Keep them all together? If not where do low frequency derivatives go?
USE uses using
used user
users useful useless usefulness usefully usable misused misuse misusing misuses misuser misusers uselessness uselessly unused usability reuse reuses reused reusing
unusable
Derivatives in the family or as a new headword?
interest, interesting, interested, disinterested, interestingly
Polygraphs with different meaning senses –
book
,
bank, bat, bill
Nuances –
a brain, to brain someone
Phrasal verbs –
bring down
,
bring back
,
bring up
,
bring over
Compound words –
handbag
,
policeman, airflow, birdwatchingMulti-word units? – traffic light, lunch box, all right, by and largeSlide16
Selection – where to put derivatives?
Level 1
: A
different form is a different word. Capitalization is ignored.
Level
2
: Regularly
inflected words are part of the same family. The inflectional
categories are
- plural; third
person
singular
present
tense; past tense; past participle
;
-
ing
; comparative; superlative; possessive.
Level
3
: -
able, -
er
, -ish, -less, -ly, -ness, -th, -y, non-, un-, all with restricted uses.
Level
4
: -
al, -
ation
, -
ess
, -
ful, -ism, -ist, -ity, -ize, -ment, -ous, in-, all with restricted uses.Level 5: -age (leakage), -al (arrival), -ally (idiotically), -an (American), -ance (clearance), -ant (consultant), -ary (revolutionary), -atory (confirmatory), -dom (kingdom; officialdom), -eer (black marketeer), -en (wooden), -en (widen), -ence (emergence), -ent (absorbent), -ery (bakery; trickery), -ese (Japanese; officialese), -esque (picturesque), -ette (usherette; roomette), -hood (childhood), -i (Israeli), -ian (phonetician; Johnsonian), -ite (Paisleyite; also chemical meaning), -let (coverlet), -ling (duckling), -ly (leisurely), -most (topmost), -ory (contradictory), -ship (studentship), -ward (homeward), -ways (crossways), -wise (endwise; discussion-wise), ante- (anteroom), anti- (anti-inflation), arch- (archbishop), bi- (biplane), circum- (circumnavigate), counter- (counter-attack), en- (encage; enslave), ex- (ex-president), fore- (forename), hyper- (hyperactive), inter- (inter- African, interweave), mid- (mid-week), mis- (misfit), neo- (neo-colonialism), post- (post-date), pro- (pro-British), semi- (semi-automatic), sub- (subclassify; subterranean), un- (untie; unburden).Level 6: -able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y, pre-, re-.Level 7: Classical roots and affixes.Slide17
Selection Criteria - How will you deal with
… I
Proper nouns:
SONY, Dave, Jackson, Thomson, Paris, London
Proper nouns that are words
- Bell, Sue, Jack, Nation, Mark
Numbers:
1, one, thirty, twenty-seven, thousand, billion
Acronyms –
NATO, DNA, UN
,
NSA, DARPA,
Dialectal differences (e.g. US
vs
UK spelling)
Multi-word units –
post office, train station, city hall,
Closed lexical sets such as days of the week, months etc.
Typos –
mispelings
,
heros
,
amatur,
arguement
,
bellweather
Incomplete words –
travelin
’,
roarin
’, ‘
ceptSlang forms – gonna, wanna, nuffink, wassupSlide18
Selection Criteria - How will you deal with
…
II
Offensive words –
pooh, shit, crap, bugger, bastard, fart,
Culturally loaded words –
temple vs. church, hijab, sporran
Non-pc words –
stewardess, waitress, negro, retarded, stupid
NCLB words -
beer, alcohol, drugs, tobacco, smoking,
Archaic words –
thou, thee,
thine
, groovy, gay,
Prototypical
sets – words often taught in sets
foods -
pizza, apple, cake, bread, salt, tomato,
zucchini,
eggplant, capsicum
drinks
–
coffee
, tea,
juice, water
, cola
,
mojito, screwdriver, bloody Mary
buildings –
office
,
station, hotel, city hall, auditorium, ice rinkshops – supermarket, mall, barber, stationer, grocercolors – red, blue, green, yellow, pink, violet, scarlet, puceSlide19
Definitions - What aspects of word knowledge to include?
Definition
POS – how detailed do you want to be?
Translations – how will you deal with translators who disagree?
Example sentence – authentic, contrived?
Usage notes – which ones?
Synonyms
Anyonyms
Distractors? (for online test auto-create software)Slide20
Definitions - style
What style?
e.g. Apple
synonym
fruit
short definition
hard red or green fruit
long definition
the fleshy usually rounded red, yellow
or green edible fruit of a usually
cultivated tree (genus
Malus
) of the
rose family
Use of a defining vocabulary list? Which one? Which words?Slide21
Mechanics
Word? Excel?
Specialized database software such as Access or
Filemaker
?
Versions. Is
it important to know which version of your wordlist was given to which users
?
Do you have the time and patience?
SERIOUSLY. Do you have the time and patience?Slide22
Validating your wordlist
How will you evaluate the list’s integrity?
How will you check if you missed words?
How will you check
mis-levelled
words?
How will you check consistency of definitions, examples, translations?Slide23
And soooooo much more!
Questions?
If you want help -
w
aring.rob@gmail.com