/
The Ins and Outs of making a Wordlist The Ins and Outs of making a Wordlist

The Ins and Outs of making a Wordlist - PowerPoint Presentation

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
394 views
Uploaded On 2016-07-18

The Ins and Outs of making a Wordlist - PPT Presentation

Rob Waring Notre Dame Seishin University wwwrobwaringorgpresentations Overview Purpose What kind of list List structure Selection factors Definitions or translations Mechanics Validating ID: 409539

list words word level words list level word family selection bnc criteria red forms good frequency bring green deal

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Ins and Outs of making a Wordlist" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The Ins and Outs of making a Wordlist

Rob Waring

Notre Dame Seishin University

www.robwaring.org/presentations/Slide2

Overview

Purpose - What kind of list?

List structure

Selection factors

Definitions or translations?

Mechanics

ValidatingSlide3
Slide4
Slide5

Android tooSlide6

Black – in level

Red

out of list

Red underline

– out of level

Green

– ignored wordsSlide7

What kind of list - Purpose?

To give to students to learn from (paper or digital)

To analyze texts against e.g. a graded reader

To cover the majority of words in a given field (e.g. top 1000 business words)

Master list to source

sublists

from?

Multiple level lists, or one list?

For a single class – or general (e.g. all natives, all intermediates)

Spoken, written, mixed

?

For a specific audience?

TOEIC, business, academic

A certain age

A certain level (intermediates)Slide8

What kind of list - Starting point?

Use existing wordlists – GSL, Nation’s BNC lists, NGSL, NAWL

...

Use existing corpus (e.g. BNC, COCA) and dig out what you want

Create your own corpus (business, TOEIC, nursing)

Does it suit your purpose? Will BNC give you an academic list?

Is it structured the way you want? Headwords only? Lemmas? Mixture?Slide9

BNC raw (by type) BNC Nation Family listSlide10

List structure I - List with Levels

How many levels? Why?

What are the breaks between levels? Will learners get from one level to another with ease?

Will the breaks be even (say 560 words each) or vary?

Level by frequency? utility? range? intuition? Learnability?Slide11

Selection Criteria I

Representativeness

:

The

list should

adequately represent the

wide

range of uses of

language

Frequency

and

range

: A

word should

occur

frequently across a wide range of texts.

Word families:

S

ensible

set of criteria regarding what forms and uses are counted as being members of the same

family

Utility:

how useful will the words be to the target learners

Idioms

and set

expressions:

Some items larger than a word behave like high frequency

wordsSlide12

Selection Criteria II

Learnability

:

how easy to learn? Related words may be easier

Regularity

:

regular forms are easier than irregular forms, but some derivatives operate differently within a family.

Excuse

inexcusable

Coverage:

(it is not efficient to be able to express the same idea in different ways. It is more efficient to learn a word that covers a quite different idea

)

Stylistic

level and emotional

words:

West

saw second language learners as initially needing neutral

vocabulary

Intuition

:

how well does it match the teacher’s sense of what to includeSlide13

Which of these would you p

ut in your list?

o

ut

of

per

cent

s

uch

as

of

course

for example

in

front

of

all right

as

soon

as

in general

in addition to

next to

on

top ofinstead ofin charge ofjust aboutprovided thatas good aswith a view to

in between

by and large

at

random

per se

old

fashioned

grown

up

matter

of

fact

sq

m

fait

accompli

straight forward

habeas

corpus

self-same

haute

cuisine

a

good

deal

laissez

faire

persona non grataSlide14

How frequently do lexical phrases occur (BNC)?

Raw

Rank

Word

Per

million

words

177

o

ut

of

490

222

per

cent

382

272

s

uch

as

321

285

of

course

309

378

for example

2381538in front of651725all right582159as soon as472491in general412970in addition to343307next to303755on top of264378instead of215409in charge of175987just about157396provided that117885as good as109125with a view to8Raw RankWordPer million words11459

in between

6

13507

by and large514369at random416684per se419505old fashioned322060grown up228441matter of fact243572sq m148241fait accompli151717straight forward158511habeas corpus174321self-same076170haute cuisine082928a good deal083882laissez faire089371persona non grata0Slide15

Selection criteria – a new headword or in the family?

Only mega-headwords (that cover all meaning senses)

Inflections only? - Plurals, verb forms,

-

er

est

adjectives. Keep them all together? If not where do low frequency derivatives go?

USE uses using

used user

users useful useless usefulness usefully usable misused misuse misusing misuses misuser misusers uselessness uselessly unused usability reuse reuses reused reusing

unusable

Derivatives in the family or as a new headword?

interest, interesting, interested, disinterested, interestingly

Polygraphs with different meaning senses –

book

,

bank, bat, bill

Nuances –

a brain, to brain someone

Phrasal verbs –

bring down

,

bring back

,

bring up

,

bring over

Compound words –

handbag

,

policeman, airflow, birdwatchingMulti-word units? – traffic light, lunch box, all right, by and largeSlide16

Selection – where to put derivatives?

Level 1

: A

different form is a different word. Capitalization is ignored.

Level

2

: Regularly

inflected words are part of the same family. The inflectional

categories are

- plural; third

person

singular

present

tense; past tense; past participle

;

-

ing

; comparative; superlative; possessive.

Level

3

: -

able, -

er

, -ish, -less, -ly, -ness, -th, -y, non-, un-, all with restricted uses.

Level

4

: -

al, -

ation

, -

ess

, -

ful, -ism, -ist, -ity, -ize, -ment, -ous, in-, all with restricted uses.Level 5: -age (leakage), -al (arrival), -ally (idiotically), -an (American), -ance (clearance), -ant (consultant), -ary (revolutionary), -atory (confirmatory), -dom (kingdom; officialdom), -eer (black marketeer), -en (wooden), -en (widen), -ence (emergence), -ent (absorbent), -ery (bakery; trickery), -ese (Japanese; officialese), -esque (picturesque), -ette (usherette; roomette), -hood (childhood), -i (Israeli), -ian (phonetician; Johnsonian), -ite (Paisleyite; also chemical meaning), -let (coverlet), -ling (duckling), -ly (leisurely), -most (topmost), -ory (contradictory), -ship (studentship), -ward (homeward), -ways (crossways), -wise (endwise; discussion-wise), ante- (anteroom), anti- (anti-inflation), arch- (archbishop), bi- (biplane), circum- (circumnavigate), counter- (counter-attack), en- (encage; enslave), ex- (ex-president), fore- (forename), hyper- (hyperactive), inter- (inter- African, interweave), mid- (mid-week), mis- (misfit), neo- (neo-colonialism), post- (post-date), pro- (pro-British), semi- (semi-automatic), sub- (subclassify; subterranean), un- (untie; unburden).Level 6: -able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y, pre-, re-.Level 7: Classical roots and affixes.Slide17

Selection Criteria - How will you deal with

… I

Proper nouns:

SONY, Dave, Jackson, Thomson, Paris, London

Proper nouns that are words

- Bell, Sue, Jack, Nation, Mark

Numbers:

1, one, thirty, twenty-seven, thousand, billion

Acronyms –

NATO, DNA, UN

,

NSA, DARPA,

Dialectal differences (e.g. US

vs

UK spelling)

Multi-word units –

post office, train station, city hall,

Closed lexical sets such as days of the week, months etc.

Typos –

mispelings

,

heros

,

amatur,

arguement

,

bellweather

Incomplete words –

travelin

’,

roarin

’, ‘

ceptSlang forms – gonna, wanna, nuffink, wassupSlide18

Selection Criteria - How will you deal with

II

Offensive words –

pooh, shit, crap, bugger, bastard, fart,

Culturally loaded words –

temple vs. church, hijab, sporran

Non-pc words –

stewardess, waitress, negro, retarded, stupid

NCLB words -

beer, alcohol, drugs, tobacco, smoking,

Archaic words –

thou, thee,

thine

, groovy, gay,

Prototypical

sets – words often taught in sets

foods -

pizza, apple, cake, bread, salt, tomato,

zucchini,

eggplant, capsicum

drinks

coffee

, tea,

juice, water

, cola

,

mojito, screwdriver, bloody Mary

buildings –

office

,

station, hotel, city hall, auditorium, ice rinkshops – supermarket, mall, barber, stationer, grocercolors – red, blue, green, yellow, pink, violet, scarlet, puceSlide19

Definitions - What aspects of word knowledge to include?

Definition

POS – how detailed do you want to be?

Translations – how will you deal with translators who disagree?

Example sentence – authentic, contrived?

Usage notes – which ones?

Synonyms

Anyonyms

Distractors? (for online test auto-create software)Slide20

Definitions - style

What style?

e.g. Apple

synonym

fruit

short definition

hard red or green fruit

long definition

the fleshy usually rounded red, yellow

or green edible fruit of a usually

cultivated tree (genus

Malus

) of the

rose family

Use of a defining vocabulary list? Which one? Which words?Slide21

Mechanics

Word? Excel?

Specialized database software such as Access or

Filemaker

?

Versions. Is

it important to know which version of your wordlist was given to which users

?

Do you have the time and patience?

SERIOUSLY. Do you have the time and patience?Slide22

Validating your wordlist

How will you evaluate the list’s integrity?

How will you check if you missed words?

How will you check

mis-levelled

words?

How will you check consistency of definitions, examples, translations?Slide23

And soooooo much more!

Questions?

If you want help -

w

aring.rob@gmail.com