What is IR Sit down before fact as a little child be prepared to give up every conceived notion follow humbly wherever and whatever abysses nature leads or you will learn nothing Thomas Huxley ID: 778676
Download The PPT/PDF document "Introduction to Information Retrieval" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction to
Information Retrieval
Slide2What is IR?
Sit down before fact as a little child,
be prepared to give up every conceived notion, follow humbly wherever and whatever abysses nature leads, or you will learn nothing. -- Thomas Huxley --
Search Engines
2
Google
Query =
What is IR?
Query =
What is information retrieval?
Ask.com
Query =
What is IR?
Query =
What is information retrieval?
Yahoo!
Query =
What is IR?
Query =
What is information retrieval?
Google Korea
Query =
What is IR?
Query =
What is information retrieval?
Naver
Query =
What is IR?
Query =
What is information retrieval?
Daum
Query =
What is IR?
Query =
What is information retrieval?
Slide3IR:
Key Questions
What are we looking for?How do we find it?Why is it difficult?Search Engines3
“A prudent question is one-half of wisdom”
Francis Bacon
IR:
What are we looking for?
We areLooking for X.Q&A: population of ChinaKnown-item Search: “Cather in the Rye”Looking for something like/about X.General/background info: TalibanCollection Development: IR LiteratureSimilar to (known) X: like “Cather in the Rye”WhatyoumacallX: “the rye-boy story”Looking for somethingProblem Resoultion: how can we fight terrorism?Knowledge Development: what is IR?LookingNeed something, but don’t know whatwhat’s it all about?Serendipity: Web surfingSearch Engines
4
Slide5IR:
How do we find it?
Brute force searchEasy to build, maintain, and useSearcher does all the work; Hard to get satisfactionOrganize/structure the dataIntuitive to useHard to build and maintainKnowledge of builder’s language & organization structure is crucialUse a search toolEasier to build and maintain: Less manipulation of dataSometimes works, sometimes not (Helps to know the language of the data)Ask the expertsEasy and satisfying to use (by definition)“Expert” knowledge is transitory, hard to encapsulateGo with the crowdRelatively easy to build and maintainLimited utility: doesn’t work with “unpopular” XZen-Fusion search.Search Engines
5
Slide6Information Seeking Process:
Dynamic, Interactive, Iterative
UserIntermediary
Information
What am I looking for?
- Identification of info. need
What question do I ask?
- Query formulation
What is the searcher looking for?
- Discovery of user’s info. need
How should the question be posed?
-
Query representation
Where is the relevant information?
-
Query-document matching
What data to collect?
-
Collection development
What information to index?
-
Indexing/Representation
How to represent it? - Data structure
Search Engines
6
Slide7Information Seeking Models
Berry-picking Model
(딸기따기 모델)Interesting information is scattered like berries among bushes.Information seeking is a dynamic, non-linear process, where information need/queries continually shift.Information needs are not satisfied by a single, final retrieved set of documents, but rather by a series of selections and bits of information found along the way. Traditional ModelLinear process:Problem identificationIdentification of information needQuery formulationResult evaluationStatic information needThe goal is to retrieve a perfect match of the information needSearch Engines7
Bates, 1989
Broader, 2002
Slide8IR Research:
Overview
Search Engines8
Information Organization:
- Add structure & annotation
Information Retrieval
- Create a searchable index
Information Access
- Retrieve information
Data Mining
- Discover Knowledge
Slide9IR Research:
Information Retrieval
Search Engines9
Representation
- indexing, term weighting
Searchable Index
Raw Data
Query Formulation
-
“What is
information retrieval?”
Search Results
- (ranked) document list
D1
wd1 wd2 wd3
D2
wd3 wd2 wd1 wd3
D3
wd1 wd2
Index Term
D1
D2
D3
wd1
(information)
1
1
1
wd2
(model)
0
1
1
wd3
(retrieval)
1
2
0
wd4
(seminar)
1
0
0
Rank
docID
score
1
D2
3
2
D1
2
3
D3
1
D1: information retrieval seminars
D2: retrieval models and information retrieval
D3: information model
Slide10IR Research:
Information Organization
Search Engines10
Representation
- NLP & Machine Learning
Organized Data
Raw Data
Query Formulation
- “What is IR?”
Search Results
- document groups
Slide11IR Research:
Natural Language Processing
GoalUnderstanding/effective processing of natural languageNot just pattern matchingLexical Analysis usingPart-of-Speech (POS) taggingSentence ParsingResearch area, technique, tool forData Mining, Knowledge Discovery Search Engines11
Slide12IR Research:
Machine Learning
Research Area, technique, tool forInformation Organization, Data Mining, Knowledge DiscoveryInformation Organization viaSupervised Learning (Automatic Classification)Unsupervised Learning (Clustering)Search Engines12
Class 1
Class 2
Class 1
Class 2
Classification
Clustering