/
Supervised, semi-supervised and Supervised, semi-supervised and

Supervised, semi-supervised and - PowerPoint Presentation

violet
violet . @violet
Follow
66 views
Uploaded On 2023-06-25

Supervised, semi-supervised and - PPT Presentation

Unsu pervised approaches for word sense disambiguation Under the guidance of Slides by Arindam Chatterjee amp Salil Joshi Prof Pushpak Bhattacharyya May 01 2010 roadmap Birds Eye View ID: 1003022

approaches supervised unsupervised semi supervised approaches semi unsupervised sense wsd word words score based training feature decision classifiers data

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Supervised, semi-supervised and" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Supervised, semi-supervised and Unsupervised approaches for word sense disambiguationUnder the guidance ofSlides byArindam Chatterjee&Salil JoshiProf. Pushpak BhattacharyyaMay 01, 2010

2. roadmapBird’s Eye View.Supervised Approaches.Semi-supervised Approaches.Unsupervised Approaches.SummarySupervised, Semi-supervised and Unsupervised Approaches in WSD

3. Bird’s eye viewSupervised, Semi-supervised and Unsupervised Approaches in WSDHybrid

4. Supervised approaches4Supervised, Semi-supervised and Unsupervised Approaches in WSD

5. Supervised approachesTraining PhaseSupervised, Semi-supervised and Unsupervised Approaches in WSDTesting PhaseClass 1Class 2Class 3(sense 1)(sense 2)(sense 3)5 training instances(words)Model trained from training dataClassified based on its feature vectorWSDClasses = sensesWater, riverMoney, financeblood, plasmaMoney, finance5

6. Feature vector for wsdIn supervised WSD, the feature vector consists of four featuresSupervised, Semi-supervised and Unsupervised Approaches in WSDThe feature vector consists of the following features:Part Of Speech (POS) of wSemantic & Syntactic features of wCollocation vector (set of words around it)  typically consists of next word(+1), next-to-next word(+2), -2, -1 & their POS's.Co-occurrence vector (number of times w occurs in bag of words around it)Feature 1Feature 2Feature 3Feature 4

7. Supervised approachesUnifying thread of operationUse of annotated corpora.They are all target-word WSD approaches.Representation of words as feature vectors.AlgorithmsDecision List.Decision Tree.Naïve Bayes.Exemplar Based Approach.Support Vector Machines.Neural Networks.Ensemble Methods.Supervised, Semi-supervised and Unsupervised Approaches in WSD

8. 1. Decision listsBased on ‘One sense per collocation’ property.Nearby words provide strong and consistent clues to the sense of a target word.Decision List is an ordered set of if-then-else rules.If (feature X) then sense (Si) Each rule is weighted by a score.In the Training phase the decision list is built from evidence in the corpus.In the Testing phase, the sense with the highest score wins.Supervised, Semi-supervised and Unsupervised Approaches in WSD

9. For a particular word:1. Decision lists(contd.) Supervised, Semi-supervised and Hybrid Approaches in WSDFeatures are extracted from the corpus.An ordered decision list of the form {feature-value, sense, score} is created.The score of a feature f is the log-likelihood ratio of the sense given the feature as:.Training Phase9Supervised, Semi-supervised and Unsupervised Approaches in WSD

10. 101. Decision lists(contd.)Supervised, Semi-supervised and Hybrid Approaches in WSDFeaturePredictionScoreaccount with bankbank/FINANCE4.83standing in bankbank/FINANCE3.35bank of bloodbank/SUPPLY2.48work in bankbank/FINANCE2.33the left river bankbank/RIVER1.12of the bank-0.01The decision list for the word bank.(Courtesy Navigli, 2009)Test Sentence: I went for a walk along the river bankSupervised, Semi-supervised and Unsupervised Approaches in WSD

11. 3.Support Vector MachinesSupervised, Semi-supervised and Unsupervised Approaches in WSDThis distance gives the confidence score for each SVMABSVMAB1S1S2, S2, S32S2S1, S3, S43S3S1, S2, S44S4S1, S2, S3E.g., If a word has 4 sensesThe SVM with the highest confidence score becomes the winner sense11

12. A collection of classifiers (C1, C2, …, Cn) are combined to improve the overall accuracy of WSD system.3. Ensemble methods.Supervised, Semi-supervised and Unsupervised Approaches in WSDC1C2C3S1S2Total_Score(S1)Total_Score(S2)SensesEnsemble Components(Classifiers)Score FunctionFor each approach, the score function varies.

13. Here the score function is a vote function.The sense with largest number of ‘votes’ is selected as winner sense.A. Majority Voting.Supervised, Semi-supervised and Unsupervised Approaches in WSDC1C2C3S1S2Winner senseEach ensemble component votes for one sense of targeted word.

14. B. Probability Mixture.ClassifierSenseConfidence scoreNormalized scoreC1S10.60.6/0.6 = 1.0S20.40.4/0.6 = 0.7C2S10.70.7/0.7 = 1.0S20.30.3/0.7 = 0.4C3S10.80.8/0.8 = 1.0S20.20.2/0.8 = 0.3Total_Score(S1) = 1.0 +1.0 + 1.0 = 3.0Total_Score(S2) = 0.7 +0.4 + 0.3 = 1.4The scoring function is a confidence score The confidence score is normalized as The normalized scores are summed up and the sense with maximum sum is selected as the winner sense.Supervised, Semi-supervised and Unsupervised Approaches in WSD

15. B. Probability Mixture.Supervised, Semi-supervised and Unsupervised Approaches in WSDC1C2C3S1S2Winner sense0.6/10.4/0.70.7/10.3/0.40.2/0.30.8/1Score = 3.0Score = 1.4Confidence Score/Normalized Score

16. c. Rank based combination.ClassifierSenseRanksNegated RanksC1S11-1S22-2C2S12-2S21-1C3S11-1S22-2Total_Score: S1 = (-1) + (-2) + (-1) = -4, S2 = (-2) + (-1) + (-2) = -5The score function is the rank of each sense.The ranks are negated and summed up.The sense with the highest sum wins.Supervised, Semi-supervised and Unsupervised Approaches in WSD

17. Supervised, Semi-supervised and Unsupervised Approaches in WSDC1C2C3S1S2Winner sense1/-12/-22/-21/-11/-1Score = -4Score = -5Rank/Negated Rank2/-2c. Rank based combination.

18. Semi-Supervised approaches18Supervised, Semi-supervised and Unsupervised Approaches in WSD

19. Semi-Supervised approachesSupervised, Semi-supervised and Unsupervised Approaches in WSDSemi-Supervised approaches use minimal annotated dataSupervised approaches use large annotated dataData required reduced19

20. Semi-Supervised approachesUnifying thread of operationUse of minimal annotated corpora.Use of unannotated data for tuning.AlgorithmsBootstrapping .Monosemous Relatives .Supervised, Semi-supervised and Unsupervised Approaches in WSD

21. 1. bootstrappingSupervised, Semi-supervised and Unsupervised Approaches in WSD

22. 1. bootstrappingSupervised, Semi-supervised and Unsupervised Approaches in WSDAn example of Yarowsky’s algorithm. At each iteration, new examples are labeled with class a or b and added to the set A of sense tagged examples.Courtesy Navigli, 2009

23. UNsupervisedapproaches23Supervised, Semi-supervised and Unsupervised Approaches in WSD

24. Unsupervised approachesSupervised, Semi-supervised and Unsupervised Approaches in WSDInput dataCircles of different size and colorsNo associated background knowledgeImplicit features are size and color of balls24Unsupervised Approach I (Clustering based on size of balls)clustersUnsupervised Approach II(Clustering based on color of balls)clusters

25. Hyperlex: Example showing graph for context of word वीज (electricity/lightning)For each high density component, highest degree node is selected as hub.The procedure is iterated by removing the hub with its neighbors.For this example, the hubs will be ज्वलन (combustion) and चमक (shine).Hyperlex (1/2)Supervised, Semi-supervised and Unsupervised Approaches in WSD25धन(positive)मुक्तता(discharge)प्रभार(charge)चमक(shine)वादळ(thunder)ऋण(negative)उर्जा(energy)उष्णता(heat)इंधन(fuel)वाफ(steam)ज्वलन(combustion)जनित्र(turbine)निर्माण(produce)

26. Hyperlex (2/2)Exampleजनित्रे वाफ वापरून वीज प्रभार निर्माण करतात. Turbines steam use to electricity produce (Turbines use steam to produce electricity)ज्वलनचमकजनित्र0.700.00वाफ1.000.00निर्माण0.550.00प्रभार 0.000.75Total2.250.75Scores of context words for वीज found using earlier graph.ज्वलन becomes the winner sense in this case.Supervised, Semi-supervised and Unsupervised Approaches in WSD26

27. SummarySupervised, Semi-supervised and Unsupervised Approaches in WSDSupervised Algorithms:Based on human supervision hence the name.Use corpus evidence instead of relying on knowledge bases.Build classifiers to classify words, where senses are classes.Semi-supervised AlgorithmsUse less information than supervised approaches.Create required information as a part of the algorithm.Unsupervised AlgorithmsCluster instances based on inherent features

28. SUMMARYSupervised, Semi-supervised and Unsupervised Approaches in WSDSupervised Algorithms:Perform better than all other approaches, especially knowledge based.E.g. Can pick up clues from several components like proper nouns, unlike knowledge based approaches.Depend heavily on large amount of tagged data.Suffer from data sparsity.Semi-supervised AlgorithmsTend to partially eradicate the knowledge acquisition bottleneck .Works at par with supervised approach.Unsupervised AlgorithmsPerformance is good for a limited set of target words.

29. referencesAGIRRE, E., AND MARTINEZ, D. Exploring automatic word sense disambiguation with decision lists and the web. In Proc. of the COLING-2000 (2000).  BOSER, B. E., GUYON, I. M., AND VAPNIK, V. N. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theor y (1992), p. 144152.  COST, S., AND SALZBERG, S. A weighted nearest neighbor algorithm for learning with symbolic features. Machine learning 10, 1 (1993), 5778.  ESCUDERO, G., MARQUEZ, L., AND RIGAU, G. Naive bayes and exemplar-based approaches to word sense disambiguation revisited. Arxiv preprint cs/0007011 (2000).  FELLBAUM, C., ET AL. WordNet: An electronic lexical database. MIT press Cambridge, MA, 1998. FREUND, Y., SCHAPIRE, R., AND ABE, N. A short introduction to boosting. JOURNAL-JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE 14 (1999), 771780.  KHAPRA, M. M., BHATTACHARYYA, P., CHAUHAN, S., NAIR, S., AND SHARMA, A. Domain specific iterative word sense disambiguation in a multilingual setting. KILGARRIFF, A., AND GREFENSTETTE, G. Introduction to the special issue on the web as corpus. Computational linguistics 29, 3 (2003), 333347. Supervised, Semi-supervised and Unsupervised Approaches in WSD

30. referencesKILGARRIFF, A., AND YALLOP, C. Whats in a thesaurus. In Proceedings of the Second Interna-tional Conference on Language Resources and Evaluation (2000), p. 13711379.  LITTLESTONE, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning 2, 4 (1988), 285318. MALLERY, J. C. Thinking about foreign policy: Finding an appropriate role for artificially intel-ligent computers. Cambridge: Masters Thesis, MIT Political Science Department (1988).  MCCULLOCH, W. S., AND PITTS, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology 5, 4 (1943), 115133. MILLER, G., BECKWITH, R., FELLBAUM, C., GROSS, D., AND MILLER, K. J. WordNet: an on-line lexical database. International journal of lexicography 3, 4 (1990), 235312.  NAVIGLI, R. Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2 (2009). NAVIGLI, R., AND VELARDI, P. Learning domain ontologies from document warehouses and dedicated web sites. Computational Linguistics 30, 2 (2004), 151179. Supervised, Semi-supervised and Unsupervised Approaches in WSD

31. referencesNG, H. T., ET AL. Exemplar-based word sense disambiguation: Some recent improvements. In Proceedings of the Second Conference on Empirical methods in natural Language Processing (1997), p. 208213.  PEDERSEN, T. A simple approach to building ensembles of naive bayesian classifiers f or word sense disambiguation. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (2000), p. 6369. QUINLAN, J. R. Induction of decision trees. Machine learning 1, 1 (1986), 81106. QUINLAN, J. R. C4. 5: programs for machine learning. Morgan Kaufmann, 1993. ROGET, P. M. Roget's International Thesaurus, 1st ed. Cromwell, New York, 1911.  ROTH, D., YANG, M., AND AHUJA, N. A snowbased face detector. In Neural Information Processing (2000), vol. 12. SCHAPIRE, R. E., AND SINGER, Y. Improved boosting algorithms using confidence-rated predic-tions. Machine learning 37, 3 (1999), 297336.  YAROWSKY, D. Decision lists for lexical ambiguity resolution: Application to accent restoration in spanish and french. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics (1994), p. 8895. YAROWSKY, D. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics (1995), p. 189196. Supervised, Semi-supervised and Unsupervised Approaches in WSD

32. Thank you?32Supervised, Semi-supervised and Unsupervised Approaches in WSD

33. Appendix33Supervised, Semi-supervised and Unsupervised Approaches in WSD

34. Lexical Sample [Targeted WSD]: System is required to disambiguate a restricted set of target words usually occurring one per sentence. Employs Supervised techniques using Hand-labeled instances as training set and then an unlabeled test set. All-words WSD: Systems are expected to disambiguate all open-class words in a text (i.e., nouns, verbs, adjectives, and adverbs). Wide coverage systems to disambiguate all open-class words. Suffers from Data sparseness problem, as large knowledge sources are not available.1. Wsd : variantsSupervised, Semi-supervised and Unsupervised Approaches in WSD

35. 2. Collocation VectorSupervised, Semi-supervised and Unsupervised Approaches in WSDSet of words around the target word. Typically consists of next word(+1), next-to-next word(+2), -2, -1 & their POS's:[wi−2, POSi−2, wi−1, POSi−1, wi+1, POSi+1, wi+2,POSi+2]For example, the sentence :“I usually have grilled bass on Sunday”and the target word bass, would yield the following vector: [have, VB, grilled, ADJ, on, PREP, Sunday, NN]

36. 3. Decision treesFeature vectors are represented in the form of a tree.The tree is built using ID3(C4.5) algorithm.Corresponding to the input sentence, the tree is traversed.The sense at the leaf node reached is the winner sense.Supervised, Semi-supervised and Unsupervised Approaches in WSD4. Naïve BayesApplying Bayes’ rule and naive independence assumption on the features sˆ= argmax s ε senses Pr(s).Πi=1nPr(Vwi|s)

37. Also known as Memory Based or Instance Based Learning approach.Unlike other Supervised approaches, builds a Classification model by keeping all the training instances in the memory.Typically implemented using kNN algorithm.Represented in form of points in feature space.The new examples are classified by computing distance with all training set examples.The k-nearest neighbors are found.Class from which largest number of neighbors are found is selected as the Winner sense.5. Exemplar Based approachSupervised, Semi-supervised and Unsupervised Approaches in WSD

38. The Hamming Distance between the points is calculated using: Where, x is the instance to be classified.xi is the ith training example.Wj is weight of jth feature, calculated using gain ration measure [Quinlan, 1993] or using modified value difference metric [Cost & Salzberg, 1993]. ∂ (xj, xij) is zero if xi = xj and 1 otherwise. Exemplar Based approach(Cntd.)Supervised, Semi-supervised and Unsupervised Approaches in WSD

39. WSD is treated as a sequence labeling task.The class space is reduced by using WordNet's super senses instead of actual senses.A discriminative HMM is trained using the following features:POS of w as well as POS of neighboring words.Local collocationsShape of the word and neighboring wordsE.g. for s = “Merrill Lynch & Co shape(s) =Xx*Xx*&XxLends itself well to NER as labels like “person”, location”, "time” etc are included in the super sense tag set.6. Neural networksSupervised, Semi-supervised and Unsupervised Approaches in WSD39

40. 7. Monosemous relativesSupervised, Semi-supervised and Unsupervised Approaches in WSDUses the web as corpus.Selects a seed of data from the web.The seed data is minimal.Then bootstraps and builds large annotated data.

41. 8. An iterative approach to wsdSupervised, Semi-supervised and Unsupervised Approaches in WSDUses semantic relations (synonymy and hypernymy) form WordNet.Extracts collocational and contextual information form WordNet (gloss) and a small amount of tagged data.Monosemic words in the context serve as a seed set of disambiguated words.In each iteration new words are disambiguated based on their semantic distance from already disambiguated words.It would be interesting to exploit other semantic relations available in WordNet.

42. 9. Results: SupervisedSupervised, Semi-supervised and Unsupervised Approaches in WSD42

43. Supervised, Semi-supervised and Unsupervised Approaches in WSD10. Results: Semi-Supervised43

44. Supervised, Semi-supervised and Unsupervised Approaches in WSD11. Results: Hybrid44

45. introductionSupervised, Semi-supervised and Unsupervised Approaches in WSDQ : What is Word Sense Disambiguation(WSD) ?John has a bank accountDomain1 : FINANCEDomain2 : GEOGRAPHYDomain3 : SUPPLYSenses of the word “bank”Target word : bank Context word : account WSD : DefinitionsGenerally: WSD is the ability to identify the sense(meaning) of words in context in a computational manner. Formally: WSD a mapping A from words to senses, such that A(i) ⊆ SensesD (wi ).Where:SensesD(wi) : Set of senses encoded in a dictionary D for word wi .A(i) : That subset of the senses of wi which are appropriate in the context T.As a classification problem: Where senses are classes.Winner Sense

46. MotivationSupervised, Semi-supervised and Unsupervised Approaches in WSDWSD: As the Heart NLP WSD is an AI-Complete Problem: It is as hard as the hardest problems in AI, like representation of common senseSRL : Semantic Role Labeling  TE : Text Entailment  CLIR : Cross Lingual Information Retrieval NER : Named Entity Recognition MT : Machine Translation  SP : Shallow Parsing  SA : Sentiment Analysis  WSD : Word Sense Disambiguation

47. Each instance is assigned equal weight initially.In each pass of the iteration, the weights of misclassified instances are increased.A value αj is calculated for each classifier, which is a function of the classification error for classifier CjD. AdaBoost.Supervised, Semi-supervised and Unsupervised Approaches in WSDstepsConstructs strong classifier as a linear combination of two or more weak classifiers.The method is adaptive because it adjusts the weak classifiers so that it correctly classifies previously misclassified instances.The algorithm iterates m times, if there are m classifiers.

48. A classifiers are then combined by the function ‘H’ for instance x.H is the strong classifier, which is a linear combination of the other weak classifiers.It is a sign function of the linear combination of the weak classifiers.D. AdaBoost.Supervised, Semi-supervised and Unsupervised Approaches in WSDSteps(Ctd.)

49. Future DirectionsSupervised, Semi-supervised and Unsupervised Approaches in WSDDevelopment of better sense recognition systems.Eradication of knowledge acquisition bottleneck.More attention needs to be paid towards Domain Specific approach in WSD.If larger annotated corpora can be built then the accuracy of supervised approaches will shoot higher.

50. 2.Support Vector MachinesSVM is a binary classifier which finds a hyper plane with the largest margin that separates training examples into 2 classes.As SVMs are binary classifiers, a separate classifier is built for each sense of the word.Training Phase: Using a tagged corpus, for every sense of the word a SVM is trained using features.Testing Phase: Given a test sentence, a test example is constructed using the features and fed as input to each binary classifier.The correct sense is selected based on the label returned by each classifier.In case of a clash, the SVM with higher confidence score is returned.Supervised, Semi-supervised and Unsupervised Approaches in WSD50

51. Hybridapproaches51Supervised, Semi-supervised and Unsupervised Approaches in WSD

52. hybrid approachesSupervised, Semi-supervised and Unsupervised Approaches in WSDKnowledge baseHuman Supervision(annotated data)Hybrid Approach52

53. hybrid approachesUnifying thread of operationCombine information obtained from multiple knowledge sources.Use a very small amount of tagged data.AlgorithmsSense Learner.Iterative WSD.Supervised, Semi-supervised and Unsupervised Approaches in WSD

54. 1. Sense learnerSupervised, Semi-supervised and Unsupervised Approaches in WSDUses some tagged data to build a semantic language model for words seen in the training corpus.Uses WordNet to derive semantic generalizations for words which are not observed in the corpus.Semantic Language ModelEach training example is represented as a feature vector and a class label which is word & senseIn the testing phase, for each test sentence, a similar feature vector is constructed.The trained classifier is used to predict the word and the sense.If the predicted word is same as the observed word then the predicted sense is selected as the correct sense.

55. 1. Sense learnerSupervised, Semi-supervised and Unsupervised Approaches in WSDSemantic GeneralizationsUses semantic dependencies form the WordNet.Labels a more general concept, higher in the WordNet.More training data can be found.For e.g. if “drink water” is observed in the corpus then using the hypernymy tree we can derive the syntactic dependency “take-in liquid”“take-in liquid” can then be used to disambiguate an instance of the word tea as in “take tea”, by using the hypernymy-hyponymy relations.

56. 1. bootstrappingSupervised, Semi-supervised and Unsupervised Approaches in WSDBased on Yarowsky’s supervised algorithm that uses Decision Lists.Uses two heuristics:“One sense per discourse” propertyA word is referred to by the same sense in a discourse (document).‘One sense per collocation’ property.Nearby words provide strong and consistent clues to the sense of a target word.Co-training: If the classifiers are alternated between iterations.Self-training: If only 1 classifier used (Yarowsky).