/
Multi-language CASCOT Margaret Birch and Ritva Ellison Multi-language CASCOT Margaret Birch and Ritva Ellison

Multi-language CASCOT Margaret Birch and Ritva Ellison - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
380 views
Uploaded On 2018-03-10

Multi-language CASCOT Margaret Birch and Ritva Ellison - PPT Presentation

Institute for Employment Research C omputer A ssisted S tructured Co ding T ool CASCOT Software tool for coding text automatically or manually Developed at the Institute for Employment Research ID: 645707

coding cascot words code cascot coding code words performance language rules german tool classification titles isco languages results multi

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Multi-language CASCOT Margaret Birch and..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Multi-language CASCOT

Margaret Birch and Ritva Ellison

Institute for Employment ResearchSlide2

C

omputer

A

ssisted Structured Coding ToolCASCOT

Software tool for coding text automatically or manually

Developed at the Institute for Employment Research

at Warwick University 1993-

Used by over 100 organisations in the UK and abroadSlide3

IER contracted under the DASISH project to develop a

multilingual

version of CASCOT to code job titles to ISCO 08

A large task and limited resources, so this is a pilot projectThe 8 selected languages: - Dutch (Netherlands, Flemish-Belgium) - English - Finnish - French (France, Walloon-Belgium, Switzerland) - German (Germany, Austria, Switzerland) - Italian - Slovak - SpanishSlide4

Key Tasks

Translating

Cascot

user interface textsConstructing national language versions of the ISCO 08 structure for CascotIndexing job titles in the selected languages to ISCO 08- Some supplied by NSIs or other partners- Some found by exploring relevant national websitesValidating the software using raw data files from the European Social Survey (ESS) Round 6Testing Cascot multilingual softwareDeveloping language-based coding rulesUsing Cascot Performance Tool to fine-tune the softwareSlide5

Coding with Cascot

Enter text (could be from a file)

Cascot provides a recommendation for code but user can change it

Output can be directed to a fileSelected classificationSlide6

Multi-language Cascot

8

languages available:

Dutch, English, Finnish, French, German, Italian, Slovak and SpanishCascot detects language automatically but it can be changed from menu

ISCO-08 classification exists for each country (some with national code)Slide7

Coding in DutchSlide8

FinnishSlide9

FrenchSlide10

German

*

* The index is

© Federal Employment Agency Slide11

ItalianSlide12

SlovakSlide13

SpanishSlide14

A test of multi-language Cascot

Comparison of European Social Survey round 6 code and automatic

C

ascot codeData available from DE, ES, GB and NLISCO-08Slide15

Cascot

Performance

Tool

Allows the user to analyse the performance of Cascot by comparing manually coded data with code produced by Cascot for the same data.A delimited results file is needed that containsa reference code, Cascot code and Cascot score.The Tool shows Performance Results Display window with Performance Graph, Summary, Statistics and KeySlide16

Opening a results fileSlide17

Performance Results Display

The longer the green line stays high, the better

The more towards right the purple/blue lines are, the betterSlide18

The versions in different languages could be improved by developing coding rules

Contribution

needed from

experts who know the languageRules are developed with Cascot EditorFine-tuning multi-language CascotSlide19

Cascot

Editor

Classification files for Cascot are created and modified with the Editor

Each classification has Structure, Index, Rules for codingSlide20

Cascot

Editor Rules

Downgraded words

: words that are considered to be significantly less important than other words, e.g. deputy, junior, personEquivalent word ends: wait|er, wait|ressAbbreviations: asst  assistant, fe  further educationReplacement words: taylor  tailor, tesco  supermarketOmitting noise words, e.g. replace ‘part-time’ with nothingInput modifications: used when the rule absolutely can not be made elsewhereWord alternatives: words and phrases that should also be tried as possible solution candidatesConclusions, retired  can not conclude, agent  ambiguous (score 39)Default coding: a set of words and phrases that should be scored as though they were a different word or phraseSlide21

Example of a new rule -

English

Add two new Replacement Words rules:

The result:

The problem:Slide22

Potential for rules - German

German occupational titles were coded fully automatically with

Cascot

and the result was compared with an approved code. Above some examples where rules would improve Cascot coding performance.It is helpful to have “gold standard” files with a large number of real life job titles for which experts have assigned correct codes.Cascot coding result can be compared with “gold standard” to find areas for improvement.