2014 Megaputer Intelligence Inc PolyAnalyst Dictionaries Outline Why do we use Dictionaries Dictionaries are essential to good Text Mining Outline Changes In PolyAnalyst Dictionary Old Dictionary ID: 904542
Download The PPT/PDF document "PolyAnalyst Web Report Training" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
PolyAnalyst Web Report Training
© 2014 Megaputer Intelligence Inc.
PolyAnalyst Dictionaries
Slide2Outline
Why do we use Dictionaries?Dictionaries are essential to good Text Mining
Slide3Outline
Changes In PolyAnalyst DictionaryOld DictionaryCompanies
GeoAdministrative
Human Names
Morphology
Organizations
Phrases
Semantics
Statistics
Synonyms
Word classes
Spell Checks
Stop Lists
Word Lists
Old Dictionary Split into Multiple Parts
Slide4Dictionaries
New DictionaryCompaniesGeoAdministrative
Human Names
Morphology
Organizations
Phrases
Semantics
Statistics
Synonyms
Word classes
Spell Checks
Entity Extraction
Sentiment Analysis
Multiple Nodes
Stop Lists
Word Lists
Keyword Extraction
Slide5Statistics Dictionary
Statistics
Slide6Statistics Dictionary
Keyword Extraction computes Significance from base frequencies in the Statistics Dictionary
Slide7Improving Keyword Extraction
The Default Statistics dictionary is based on a large corpus of text to estimate word frequency in typical English.Your data might not be typical.
Slide8Domain Specific Statistics Dictionaries
In Pubmed Medical Abstracts the most significant word is “Placebo”“Placebo” is a common word in clinical drug trials and not helpful in this domain
Slide9Domain Specific Statistics Dictionaries
Train the Statistics Dictionary on Domain Data
Statistics Dictionary
Apply on our data
Slide10Editing a Dictionary
All Dictionaries are the Dictionary ManagerGo To File-> Manage Dictionaries or Ctrl +D
Slide11Setting Default Dictionaries
Go to Settings -> Program options -> Project options
Slide12Setting Default Dictionaries
Select Default Dictionaries for the project
Slide13Training a Statistics Dictionaries
The Statistics Dictionary is generated in the Index Node
Slide14Training a Statistics Dictionaries
Go to Generate -> Statistic Dictionary
Slide15Statistics Dictionaries
In the Keyword Extraction Node Select the Statistics Dictionary
Slide16Statistics Dictionaries
Updated keywords from new dictionaries
Slide17Multiple Nodes Dictionaries
SynonymsSpell ChecksStop Lists
Multiple Nodes
Slide18Spell Checks Dictionary
Spell Checks
Slide19Good Spell Check Practices
Editing the default spell checks dictionary isn’t best if you’re working in a group.Create a project Spell Check dictionary or a personal user dictionary.
Slide20Create New Dictionary
Creating a Spell Checks DictionaryDictionary Manager
Slide21Inherit Default Dictionaries
Creating a Spell Check Dictionary
Slide22Outline
Editing Spell Checks DictionaryImproving the Spell Checks Dictionary from within the spell check node .
Slide23Outline
Editing Spell Checks DictionarySelect the Proper Dictionary
Slide24Outline
Spell Checks DictionaryGreen color shows suggested correction.
Slide25Outline
Spell Checks Dictionary CodingBlue = Known Misspell from Dictionary (Confidence = 100%)Black = Probable Misspell from Algorithm (Confidence > Threshold)Grey = Suggested Misspell from Algorithm (Confidence < Threshold)Empty = Unknown Misspell (Confidence = 0)
Slide26Outline
Improving Spell Checks DictionaryCase 1) Correcting a misspellSpell Check Algorithm is baffled.
From context we can infer the word is “commitment.”
Slide27Outline
Improving Spell Checks DictionaryCase 1) Correcting a misspell
Select the word and click the Add button
Slide28Outline
Improving Spell Checks DictionaryCase 1) Correcting a known wordWrite the corrected word and click OK
Slide29Outline
Improving Spell Checks DictionaryRight Click -> Mark as known WordCase 2) To add a new word to the Spell Check dictionary
Slide30Outline
Improving Spell Checks DictionaryThe new word will turn red and be added to the dictionary.Case 2) To add a new word to the Spell Check dictionary
Slide31Outline
Improving Text Mining through SynonymsSynonyms
Slide32Outline
Improving Text Mining through SynonymsMany PDL functions make use of Relationships within the dictionary.Synonym is the most common relationship.
Slide33Outline
Dictionary Synonyms
Slide34Outline
Edit Dictionary Synonyms Manually
Slide35Outline
Import Dictionary Synonyms ListSynonym ListImport Dialog
Slide36Outline
Dictionary Synonyms PDLThe thesaurus function matches all synonyms of a token.
Slide37Outline
Dictionary Synonyms PDL
Slide38Outline
Dictionary Synonyms PDL
Slide39Outline
Stop List DictionaryThe Stop List Dictionary is a list of terms to ignore in Text Analysis.Keyword Extraction doesn’t include terms in stop list by default
Slide40Outline
Stop List DictionaryStop Lists
Slide41Outline
Stop List DictionaryImport Dialog
Slide42Outline
Morphology DictionaryMorphology
Slide43Outline
Morphology DictionaryLemmaAbdomen
Abdomen
Abdomen’s
Abdomens
Abdomens’
Singular
Singular Possessive
Plural
Plural Possessive
Slide44Outline
Semantics Dictionary
Slide45Outline
Semantics DictionaryDictionary RelationshipsHyponyms
Hypernyms
Meronyms
Holonyms
Synonyms
Antonyms
Slide46Outline
Hyponyms and Hypernyms
“Cardinal”, “Eagle”, and “Ostrich” are all hyponyms of “Bird”
“Bird” is a
hypernym
of “Cardinal”
Slide47Outline
Meronyms and Holonyms“Feather” is a Meronym of “Cardinal”“Cardinal” is a
Holonym of “Feather”
Meronym
= Is Part Of
Slide48Outline
Synonym and Antonyms“Birdcage” is a synonym of “Aviary”“Heat” is a antonym of “Cold”
Slide49Outline
PolyAnalyst DictionariesCompaniesGeoAdministrative
Human Names
Organizations
Word classes
Entity Extraction
Sentiment Analysis
Slide50Outline
Adding Word ClassesStep 1) Create a CSV FileVertical EntryHorizontal Entry
Slide51Outline
Adding Word ClassesStep 2) Create a New Dictionary
Slide52Outline
Dictionary Import ScreenStep 3) Name the DictionaryThe inherit option clones the inherited dictionaries
Slide53Outline
Dictionary Import ScreenStep 4) Import CSV as Word class
Slide54Outline
New Word Class
Slide55Outline
Use in a Lingua Mark Expression{<,P(1)> <Temperatures,PL(SP)>:@}:Temp
Slide56The high for Wednesday is 105 degreesRoom temperature is about 25 C
The product was left in the freezer at 11 F75 Fahrenheit is a comfortable temperature{<,P(1)> <Temperatures,PL(SP)>:@}:Temp Extracted Temperature
Slide57Word Classes that Convey Sentiment
The sentiment analysis relies heavily on wordclasses that convey sentiment.
Slide58Word Classes that Convey Sentiment
Default Word class DictionaryabsbadadjAccursedAwfulTerrible
badadv
Badly
Immorally
Irresponsibly
goodadj
Accommodating
Accurate
Adequate
Sentiment Word Classes convey Polarity, Part of Speech, Degree
Slide59Outline
Sentiment Word ClassesSentiment Word Classes are Customizable Domain specific additions such as slang and emoticons.:D ;( ;)
Slide60Wordlists are an older form of wordclassesLists of associated wordsDefault Wordlists are “Positive” and “Negative” and are used for Sentiment Analysis
Word Lists Dictionary
Slide61Word Lists Dictionary
Positive Word List
Slide62Using Word Lists
In the Taxonomy Node use the Term Function
Slide63Phrases DictionaryPhrases Dictionary is similar to Wordlists using multiple words or “Phrases”
Slide64Other Dictionaries
CompaniesGeoAdministrativeHuman NamesOrganizations
Entity ExtractionSentiment Analysis
Slide65Outline
Default Entity ExtractionPeople- “Leader Alvaro Hernandez”, “Bill Martin”Companies-”Blue Shield of California”, ”Global Systems Inc.”GeoAdministrative- “
Tucson Arizona”, “Ecuador”Units-
“Second, Meter, Degree”
Slide66Outline
Dictionaries are essential to good Text Mining
Slide67Contacting Megaputer
Questions?