Information Architecture 2013 Ch 9 Metadata Many of the slides in this slideset are reproduced andor modified content from publically available ID: 287549
Download Presentation The PPT/PDF document "IBE312:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
IBE312: Information Architecture2013
Ch
. 9 –
Metadata
Many
of
the
slides in
this
slideset
are
reproduced
and/or
modified
content
from
publically
available
slidesets
by
Paul Jacobs (2012),
The
iSchool
, University of Maryland
http://terpconnect.umd.edu/~psjacobs/s12/INFM700s12.htm.
These materials were made available and
licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States
See
http://creativecommons.org/licenses/by-nc-sa/3.0/us/
for details.Slide2
2
Metadata
“Data about data” - Definitional and descriptive documentation/information about data…
From Free On-line Dictionary of Computing:
Data about data. In data processing, meta-data is definitional data that provides information about or documentation of other data managed within an application or environment.
For example, meta-data would document
data about data elements
or attributes, (name, size, data type, etc) and
data about records
or data structures (length, fields, columns, etc) and
data about data
(where it is located, how it is associated, ownership, etc.). Meta-data may include
descriptive information
about the context, quality and condition, or characteristics of the data.
(
Some other
definitions
.)Slide3
MetadataWhy do we need this?Types of metadataDescriptive/subjective/content (e.g. author, subject, keywords, …)
Administrative (e.g. owner, rights, cost, creation date, version, …)
Technical (e.g. format, size, dependencies, programs). . . .In practical terms:Metadata helps users locate, navigate, interpret contentMetadata helps organizations manage contentMetadata helps systems manipulate contentSlide4
Data without Metadata…
Who:
authored it? to contact about data?What: are contents of database?When: was it collected? processed? finalized? Where: was the study done?Why: was the data collected?How: were data collected?
processed? Verified?
… can be pretty useless!Slide5
Early Example of MetadataSlide6
Menagerie of TermsClassificationHierarchiesEpistemologyDirectoriesControlled vocabularies
Knowledge representation
Let’s focus on significant differences.Let’s focus on advantages/disadvantages.Let’s focus on how each is useful.Slide7
7
Controlled Vocabulary
Any defined subset of natural language
List of
equivalent terms
(synonym rings)
Use search logs.
List of
preferred terms
(authority files)
Commonly also include variant terms
Educating users, enabling
browsing
Term rotation (pointers in index)
p.201
Classification scheme / taxonomy
Hierarchical relationships
(narrower/broader)Slide8
Controlled Vocabulary
Queries
can be ”exploded” to increase recallSlide9
Controlled Vocabularyauthority file – inclusive
,
preferred term can serve as the unique identifier for a collection of terms, educate usersSlide10
Related Terms & TechniquesTaxonomiesAnything organized in some sort of hierarchical structureTagging
Adding almost any kind of metadata to content, but now often descriptive and user-provided
ThesauriFocus on relations between termsFocus on “concepts”OntologiesUsually model a specific domain or part of the worldGenerally machine-readableIncreasing complexity and richness
Metadata
Taxonomies
& Thesauri
Practical UsesSlide11
How are taxonomies, tagging, controlled vocabularies and thesauri used? The semantic gap:
What’s the problem
?Synonymy – roughly, different words or phrases can be used to express similar ideas (e.g. “notebook”, “laptop”)Polysemy – roughly, the same word can have different meanings (e.g., “line” (fishing, code, queue, . . .) )Taxonomies try to group similar concepts“Tags” often assign words to concepts, making it easier to find related conceptsControlled vocabularies avoid ambiguity (like a specific tag set)Thesauri represent attempts to better organize mappings between words and conceptsDo these present precision or recall problems?Slide12
TaxonomiesOrganization of objects according to some principleFamiliar examples:Linnaean taxonomy (for living organisms)
Web directories (e.g., Yahoo or ODP)
Corporate directoriesOrganization chartsOrganizational structures previously discussedMetadataTaxonomies & ThesauriPractical UsesSlide13
Tagging- e.g. Flickr – popular tags
Metadata
Taxonomies & ThesauriPractical UsesSlide14
Flickr – related tags
Metadata
Taxonomies & ThesauriPractical UsesSlide15
Del.icio.us – related tags
Metadata
Taxonomies & ThesauriPractical UsesSlide16
Thesauri: Motivation“Semantic gap” between concepts and words
Online
thesauri help mapping many synonyms or word variants onto one preferred term – improve precision in retrieval (p.203)Words are used to evoke conceptsConcrete objects: MacBook Pro, iPhoneAbstract ideas: freedom, peaceConceptsWords
Ideas
MeaningSlide17
17
Thesauri
Book of synonyms, often including related and contrasting words and antonyms.
In this class:
A controlled vocabulary in which equivalence, hierarchical, and associative relationships are identified for purposes of improved retrieval.
Technical lingo …
Thesauri standards: ISO 2788, …Slide18
18
Thesauri TypesSlide19
IA Uses of ThesauriFor organizationFor navigationFor indexing contentFor searchingSlide20
Applying IA PrinciplesFocus on users and user needs – users are different, and have different modelsFocus on content – concepts are different, too – different levels, words, complexity, vaguenessExamples:
What’s the difference between laptop, PDA, phone, and convergence device?
When is “cancer research” “oncology”?When a user browses a furniture catalog for chairs, do you show them ottomans and footstools?Slide21
Standard Thesaurus StructureComputer
Notebook
LaptopDesktopReplacement
Ultraportable
Tablet PC
IS-A
IS-A
AKA
Synonyms (variants)
Narrower
Terms
Broader
Terms
PreferredSlide22
Semantic relationships in a thesaurus
(
pp. 204-205): Abbreviations: PT, VT, BT, NT, RT, Use (U) – VT use PT, Use For (UF) – full list of VT on the PT record, Scope Note (SN) – meaning of
the term to rule
out
ambiguity
.Slide23
Semantic relationships of a wine
thesaurus, p. 206Slide24
Some Real ExamplesContent tagging and social media (e.g. flickr, del.i.cious
)
Special-purpose classification schemes and thesauri (e.g. art & architecture thesaurus – AAT, UMLS)General semantic tools and classification schemes (e.g., Princeton WordNet, Roget’s Thesaurus)Slide25
Art & Architecture Thesaurus
Metadata
Taxonomies & ThesauriPractical Useshttp://www.getty.edu/research/conducting_research/vocabularies/aat/Slide26
UMLS (Unified Medical Labeling System)Source: National Library of Medicine (NIH)
Metathesaurus
Semantic
Network
SPECIALIST
Lexicon +Tools
135 broad
categories
and
54
relationships
between them
1 million+
biomedical
concepts
from over 100 sources
lexical information and programs for
language processing
3 Knowledge Sources
used separately or together
Metadata
Taxonomies
& Thesauri
Practical UsesSlide27
E.g. UMLS (Unified Medical Labeling System)Source: National Library of Medicine (NIH)
Metadata
Taxonomies & ThesauriPractical UsesBegan in 1986 as long-term R&D project
Designed for systems developers
Develop multi-purpose tools to enhance understanding of medical meaning across
systems
Overcome barriers to effective retrieval of machine-readable information
Overcome variety of ways the same concepts are expressed in machine readable and human languageSlide28
UMLS UsesSource: National Library of Medicine (NIH)
Metadata
Taxonomies & ThesauriPractical UsesInformation retrieval
Thesaurus construction
Natural language processing
Automated indexing
Electronic health records (EHR)
Distribution mechanism for
HIPAA, CHI, PHIN regulatory standards
SNOMED CT Slide29
UMLS Metathesaurus
http://www.nlm.nih.gov/research/umls/Slide30
UMLS Metathesaurus
http://www.nlm.nih.gov/research/umls/Slide31
UMLS Thesaurus Browserhttp://www.nlm.nih.gov/research/umls/Slide32
32
Semantic Relationships
Equivalence
(PT = VT)
Hierarchical:
Generic (Bird NT Magpie), whole-part (Foot NT big toe)
or
instance (Seas NT Mediterranean Sea)
Faceted / multiple hierarchies
Associative
Related
terms (hammer RT nail)
Preferred terms
:
Form, selection, definition and specificity
Polyhierarchy
(Medline
corss
-lists viral pneumonia under both ...Fig 9-25, p. 220)
Faceted
classification
– multiple taxonomies that focus on different dimensions of the content. (e.g. wine.com pp. 223-224.)Slide33
Associative TermSlide34
Poly-HierarchiesConcepts can have multiple parentsExample:
What are the advantages and disadvantages?
What’s the relationship to polysemy?Cracow (Poland : Voivodship)Auschwitz II-Birkenau (Poland : Death Camp)
Block 25
(Auschwitz II-Birkenau)
German death camps
Kanada
(Auschwitz II-Birkenau)
From Shoah Foundation’s thesaurus of holocaust termsSlide35
Faceted HierarchiesAlternative to single and poly-hierarchiesBasic idea:Describe objects along multiple facetsEach facet has its associated hierarchy
Issues:
What’s a facet?How do you navigate faceted hierarchies?Slide36
Faceted Browsing ExampleSlide37
Faceted Browsing ExampleDemo:
http://flamenco.berkeley.edu/demos.htmlSlide38
Advantages of FacetsIntegrates searching and browsingEasy to build complex queriesEasy to narrow, broaden, shift focusHelps users avoid getting lost
Helps to prevent “categorization wars”Slide39
Relationship to IA?
Database
WebServerApplicationServer
Network
Ontologies are implicitly “hidden” here!!!
Flight
Trip
From:
Part-of
Airplane
Equipment
To:
Departure Time:
Arrival Time:
Origin:
Destination:
Type:
Capacity:
Rule:
Arrival Time is always after Departure Time
Rule:
Distance from Origin to Destination typical > 100 milesSlide40
Putting it all together…
Database
WebServerApplicationServer
Network
Database
Web
Server
Network
Two-Layer Architecture
Three-Layer Architecture
Apache
mySQL
PHPSlide41
Popular Implementation
Content
MetadataPresentation
SQL Database
PHP/HTMLSlide42
Content
Presentation
A
B
C
D
E
F
G
H
You are here: A > C > D
Contents at D
Related
- D
- E
Hierarchy(child, parent)
Content(id, attribute
1
, attribute
2
, attribute
3
, …)Slide43
Faceted Browsing
Matching
ResultsFilter by - Facet1 (possible values)
- Facet2 (possible values)
Hierarchy(child, parent)
Content(id, attribute
1
, attribute
2
, attribute
3
, …)Slide44
SummaryMeta-dataGeneral function
Types of meta-data
Taxonomies and ThesauriRole in organizing, navigating and searching contentGeneral-purpose taxonomiesSpecial-purpose taxonomiesPractical use & implementation