/
LIS6 18 lecture 4 before searching + LIS6 18 lecture 4 before searching +

LIS6 18 lecture 4 before searching + - PowerPoint Presentation

khadtale
khadtale . @khadtale
Follow
342 views
Uploaded On 2020-06-23

LIS6 18 lecture 4 before searching + - PPT Presentation

introduction to dialog Thomas Krichel 20111101 structure of talk some generalities about searching Working with DIALOG Overview Search command online information retrieval This subject can be though off as a subset of information retrieval IR Most IR is online or digital ID: 783954

mate search query boolean search mate boolean query set term terms database select command number index document web results

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "LIS6 18 lecture 4 before searching +" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

LIS618 lecture 4before searching + introduction to dialog

Thomas Krichel

2011-11-01

Slide2

structure of talksome generalities about searchingWorking with DIALOGOverviewSearch command

Slide3

online information retrievalThis subject can be though off as a subset of information retrieval (IR). Most IR is online or digital.IR concentrates on textual data.We can think of online IR to fall under two categoriesdatabase IR web IR

Slide4

database / web IRDatabase IR look at systems that havecontrolled set of recordlow heterogeneityuse requires authenticationadvanced search featuresWeb IR has opposite characteristics

Slide5

traditional social model User goes to a libraryDescribes problem to the librarianLibrarian does the searchwithout the user presentwith the user presentHands over the result to the userUser fetches full-text or asks a librarian to fetch the full text.

Slide6

economic rational for traditional modelIn olden days the cost of telecommunication was high. Database use costscost of communicationcost of access time to the databaseThe traditional model controls an upper limit to the costs.

Slide7

disintermediationWith access cost time gone, the traditional model is under threatThere is disintermediation where the librarian looses her role of doing the search.But that may not be good news for information retrieval resultsuser knows subject matter bestlibrarian knows searching best

Slide8

Web searchingIR has received a lot of impetus through the web, which poses unprecedented search challenges. With more and more data appearing on the web DS may be a subject in declineIt is primarily concerned with non-web databasesThere is more and more web-based methods of searching

Slide9

Public access vs qualityNow the public at large is able to do online searching. At the same time need for quality answers has grown.Quality-filtered services will become more important.In the current databases, there is as lot that would already be available for free mixed with quality-controlled stuff. Publishers have direct offerings and intermediated vending is in decline.

Slide10

components of the IR processproviderdefine data that is availabledocuments that can be useddocument operationsdocument structureindexuseruser needIR system familiarity

Slide11

the IR processQuery expresses user need in a query languageProcessing of query yields retrieved documents Calculation of relevance rankingExamination of retrieved documentsPossible return to the start, another query.

Slide12

main problemUser is not an expert at the formulation of a queryGarbage in garbage out, the retrieval yields poor resultWays around that problemdesign very intuitive interface for the querygive expert guidance

Slide13

before a search IWhat is the purpose of the query?brief overviewcomprehensive searchWhat perspective on the topic is required?scholarlytechnicalbusinesspopular

Slide14

before search IIWhat type of information does the patron want?fulltextbibliographicdirectorynumericAre there any known sources?authorsjournalspapersconferences

Slide15

before search IIIWhat are the language restrictions?What, if any, are the cost restrictions?How current need the data to be?How much of each record is required?

Slide16

concept analysisThis is the art/science of taking the topic to search for and develop facets. Example “Internet filtering in Libraries”Internet filterLibrariesControversy not technical issuesWe may also need the think about the aim of the search.

Slide17

search aimsa known needle in a known haystacka known needle in an unknown haystackan unknown needle in an unknown haystackany needle in a haystackthe sharpest needle in a haystackmost of the sharpest needles in a haystack

Slide18

search aimsall the needles in a haystackaffirmation of no needles in a haystackthings like needles in a haystackis there a new needle in the haystackwhere are the haystacksneedles, haystacks, anything

Slide19

types of searchesknown-item searchesnegative searchesselective dissemination of informationtopical or subject searchespassage searching, where the user is only interested in part of the item

Slide20

search strategies IBuilding block approachDo a number of elementary searchesCombine the resulting sets with Boolean operatorsThis is what I did in the example in the previous lectureWorks only with the Boolean model

Slide21

search strategies IISnowballing approachStart with a very specific queryThink of other term that can be added to get more resultsStop when a reasonable number of results are achieved.Not sure this really works well in practice.

Slide22

search strategies IIIThe successive fraction approach is the opposite of the snowballing approachFirst search for a broad conceptThen repeat the query by adding various limiting factors. Can work well if the IR system allows to repeat and edit queries.But queries can become unwieldy.

Slide23

search strategies IVMost specific facet firstConduct concept analysisLook for the most specific facetSearch that first, add others laterPresupposes that you have done a decent concept analysis.

Slide24

taxonomy of classic IR modelsBoolean, or set-theoreticfuzzy set modelsextended Boolean Vector, or algebraicgeneralized vector modellatent semantic indexingneural network modelProbabilisticinference networkbelief network

Slide25

summaryThere are three basic types of models in classic information retrieval. Extensions of these types are a matter of research concern and require good mathematical skills. All classic models treat document as individual pieces.

Slide26

key aid: indexAn index is a list of terms, with a list of locations where the term is to be found.The way to express locations usually depends on the form that the indexed data takes. for a book, it is usually the page number, e.g."shmoo 34, 75"for computer files it is usually the name of the file plus the number of the byte where the indexed term starts, e.g. "krichel index.html 34, cv.html 890 1209"There is usually more than one location of the term.

Slide27

key aid: index termsThe index term is a part of the document that has a meaning on its own.It is usually a noun word.Retrieval based on index term raises questionssemantics in query or document is lostmatching done in imprecise space of index termsOne way out is to specify several terms and require that they have to be close to each other.

Slide28

basic concept: weight of index termGiven all nouns, not all appear to have the same relevance to the textSometimes, we can have a simple measure of the importance of a term, example?More generally, for each indexing term and each document we can associate a weight with the term and the document.Usually, if the document does not contain the term, its weight is zero

Slide29

Boolean modelIn the Boolean model, the index weight of all index term for any document is 1 if the term appears in the document. It is 0 otherwise. This allows to combine query terms with Boolean operator AND, OR, and NOTThus powerful queries can be written.

Slide30

Dialog is a databank over 500 databasesthese are also known as files and cover references and abstracts for published literature, business information and financial data;complete text of articles and news stories;statistical tablesdirectoriesDIALOG uses the Boolean model

Slide31

DIALOG interface It is still rooted in "traditional" database systems.It has been dismissed as "dial-a-dog".It uses a command-driven interface.It is very complicated to learn fully.It is not suitable for the end-user. It therefore offers a valuable skill to the information professional.

Slide32

Accessing DIALOGOn the web, go tohttp://www.dialogclassic.com/Enter username and password.Forget about subaccount.Then click on logon.You are in the classic interface. Let’s hear three cheers for being old-fashioned.

Slide33

two steps in DIALOGStep one: select databases (aka files) to look at Step two: perform searches on the selected databasesYou may wonder why one does not have one single step like in a search engine. Discuss.

Slide34

sample searchWe want to know something about “current awareness in digital libraries”.Let us assume something of this is in the ERIC database and we know that ERIC is the database number 1.We issue the command "b 1" to begin working with ERIC.

Slide35

Boolean searchDo a number of searchess current(N)awarnesss digital(N)librarys digital(N)librariesEach search retrieves a set of documentsThe sets can be combineds s1 and (s2 or s3)

Slide36

What is the deal?There are two stages.At stage two we make Boolean queries. Each query splits the records into matching and non-matching records.The set of matching records is return. It can be further searched or combined with other sets using Boolean operators.Try this at home.

Slide37

two steps in DIALOGstep one: select databases (aka files) to look at step two: perform searches on the selected databasesYou may wonder why one does not have one single step like in a search engine. Discuss. today we concentrate on the second step

Slide38

working on selected filesWe assume that we have selected database that we know and we look at the search interface on the selected database. The database selection process is a bit more complicated, covered next week.First, let us login and look at the command prompt.Then we select the first database (file) with the begin command

Slide39

the ‘begin’ commandAs its name suggests, usually the first command.begin number, number,…selects files with numbers numberOnce they are selected they can be searched. Now select the ERIC "begin 1""Begin 1" can be abbreviated as "b 1"

Slide40

substeps in the second stepIdentify search termsUse Dialog basic commands to conduct a searchView records online or print the results

Slide41

the 's' (select) commandOnce issued the "begin" command to select a database, we issue the "s" command on the database."s query_expression" where query_expression is a query expression.This will search the index of selected database in full-text view for the query issuedIt will not find any of the following: "an and by for from of the to with". They are stop words.

Slide42

query expressionA query expression contains search terms expressed in special waysYou can truncate search terms. You can build an elementary expression by putting several keywords together. This is achieved by DIALOG's connectors. You can combine several expressions with the use of Boolean operatorsWe will cover this is in turn now.

Slide43

truncation of terms IOpen Truncation"select path?" retrieves all words that begin with path: paths, pathos, pathway, pathologyControlled-Length Truncation"select path??" retrieves the root and up to two additional characters: paths, pathos

Slide44

truncation of terms IIEmbedded Character truncation can be used for variant spellings:"select organi?ation" -> organization organisation "select fib??board" -> fiberboard fibreboard This truncation feature is also useful for searching for unusual plural forms:"select wom?n" -> woman womenApparently you can also do prefixes by putting the ? in the beginning. "?mobile" -> automobile metamobile

Slide45

use of connectorsConnectors are used to put several words together.One instance where this is useful is when you have words that on their own mean different things.For example "mate" is a herbal beverage consumed in South America. Looking for mate on the Internet retrieves a lot of singles' pages.

Slide46

example: terms related to "mate"What other terms to be used? matear (drink mate)matero (mate drinker)cebar (prepare mate)cebador (mate preparer) yerba (mate herb)bombilla (mate straw)

Slide47

connectors I'(W)' requires terms to appear one after the other next to each other e.g. 'yerba(W)mate?' matches "yerba mate".'(i W)' where i is an integer, means followed by at most i words, e.g. 'ceba?(3W)mate?' matches "cebar un maravilloso mate" but not "cebador guapo mirando un buen mate"

Slide48

connectors II'(N)' requires terms to be next to each other e.g. 'yerba(N)mate?' matches "yerba mate" or "mate yerba".'(i N)' where i is an integer, means proximity by at most i words, e.g. 'ceba?(3N)mate?' matches "cebar mate" or "matear con la cebadora".'(S)' searches for the occurrence of connected terms in the same paragraph.

Slide49

using Boolean operatorsIn your query, you can combine several expressions with Boolean operatorsExample: "S LIBRARY(W)SCHOOL? AND DISTANCE(W)EDUCATION"But I usually do not issue such fancy queries.

Slide50

executing several searchesThere can be several searches done sequentially, and the results sets are saved by the system. Each time the system assigns a set number, Si,These can be combined in Boolean expressions, e.g. 's S1 or S2 and S3'Remember that Boolean operations are set-theoretic!

Slide51

Boolean operators on setsWhen using Booleans, be aware that "and" has higher precedence than "or". Thus:a or b and cis not the same as(a or b) and cbut it is a or (b and c)Use parenthesis when in doubt

Slide52

DS (display sets)This command can be executed any time to review the sets that have been formed since the last B (begin) command. This can be useful to review your search history.

Slide53

the target command"target set" where set is a search result set creates a subset of the "statistically most relevant results" in the original set.I have not seen details about how this subset is computed. A new result set is being formed.

Slide54

display: the type commandtype set/format/range set is a result setformat is a formatrange can be start – endstart is a record number to startend is a record number to endall

Slide55

standard delivery formats 2 -- full record except abstract3 or medium – citation5 or long – full except full text6 or free – title and dialog number8 or short – title plus indexing termsuseful to find other indexing terms9 or full – everythingKWIC or K – keywords in context

Slide56

options for deliveryI once tried to email results to me, to no availYou can save the html of the search results in the browser. You can print the results within the browser.

Slide57

http://openlib.org/home/krichelThank you for your attention!