Institute for Employment Research C omputer A ssisted S tructured Co ding T ool CASCOT Software tool for coding text automatically or manually Developed at the Institute for Employment Research ID: 645707
Download Presentation The PPT/PDF document "Multi-language CASCOT Margaret Birch and..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Multi-language CASCOT
Margaret Birch and Ritva Ellison
Institute for Employment ResearchSlide2
C
omputer
A
ssisted Structured Coding ToolCASCOT
Software tool for coding text automatically or manually
Developed at the Institute for Employment Research
at Warwick University 1993-
Used by over 100 organisations in the UK and abroadSlide3
IER contracted under the DASISH project to develop a
multilingual
version of CASCOT to code job titles to ISCO 08
A large task and limited resources, so this is a pilot projectThe 8 selected languages: - Dutch (Netherlands, Flemish-Belgium) - English - Finnish - French (France, Walloon-Belgium, Switzerland) - German (Germany, Austria, Switzerland) - Italian - Slovak - SpanishSlide4
Key Tasks
Translating
Cascot
user interface textsConstructing national language versions of the ISCO 08 structure for CascotIndexing job titles in the selected languages to ISCO 08- Some supplied by NSIs or other partners- Some found by exploring relevant national websitesValidating the software using raw data files from the European Social Survey (ESS) Round 6Testing Cascot multilingual softwareDeveloping language-based coding rulesUsing Cascot Performance Tool to fine-tune the softwareSlide5
Coding with Cascot
Enter text (could be from a file)
Cascot provides a recommendation for code but user can change it
Output can be directed to a fileSelected classificationSlide6
Multi-language Cascot
8
languages available:
Dutch, English, Finnish, French, German, Italian, Slovak and SpanishCascot detects language automatically but it can be changed from menu
ISCO-08 classification exists for each country (some with national code)Slide7
Coding in DutchSlide8
FinnishSlide9
FrenchSlide10
German
*
* The index is
© Federal Employment Agency Slide11
ItalianSlide12
SlovakSlide13
SpanishSlide14
A test of multi-language Cascot
Comparison of European Social Survey round 6 code and automatic
C
ascot codeData available from DE, ES, GB and NLISCO-08Slide15
Cascot
Performance
Tool
Allows the user to analyse the performance of Cascot by comparing manually coded data with code produced by Cascot for the same data.A delimited results file is needed that containsa reference code, Cascot code and Cascot score.The Tool shows Performance Results Display window with Performance Graph, Summary, Statistics and KeySlide16
Opening a results fileSlide17
Performance Results Display
The longer the green line stays high, the better
The more towards right the purple/blue lines are, the betterSlide18
The versions in different languages could be improved by developing coding rules
Contribution
needed from
experts who know the languageRules are developed with Cascot EditorFine-tuning multi-language CascotSlide19
Cascot
Editor
Classification files for Cascot are created and modified with the Editor
Each classification has Structure, Index, Rules for codingSlide20
Cascot
Editor Rules
Downgraded words
: words that are considered to be significantly less important than other words, e.g. deputy, junior, personEquivalent word ends: wait|er, wait|ressAbbreviations: asst assistant, fe further educationReplacement words: taylor tailor, tesco supermarketOmitting noise words, e.g. replace ‘part-time’ with nothingInput modifications: used when the rule absolutely can not be made elsewhereWord alternatives: words and phrases that should also be tried as possible solution candidatesConclusions, retired can not conclude, agent ambiguous (score 39)Default coding: a set of words and phrases that should be scored as though they were a different word or phraseSlide21
Example of a new rule -
English
Add two new Replacement Words rules:
The result:
The problem:Slide22
Potential for rules - German
German occupational titles were coded fully automatically with
Cascot
and the result was compared with an approved code. Above some examples where rules would improve Cascot coding performance.It is helpful to have “gold standard” files with a large number of real life job titles for which experts have assigned correct codes.Cascot coding result can be compared with “gold standard” to find areas for improvement.