ConceptRank Petra Budíková Michal Batko Pavel Zezula Outline Searchbased annotation Motivation Problem formalization Challenges ConceptRank Idea Semantic network construction PageRank and ConceptRank ID: 623736
Download Presentation The PPT/PDF document "Scalable Image Annotation with" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Scalable Image Annotation with ConceptRank
Petra Budíková, Michal Batko, Pavel ZezulaSlide2
Outline
Search-based annotation
Motivation
Problem formalization
Challenges
ConceptRank
Idea
Semantic network construction
PageRank and ConceptRank
Image annotation with ConceptRank
MUFIN Image Annotation
Framework description
Current implementation
and parameters
Examples
Experimental evaluation
Future workSlide3
What and why?Slide4
Motivation
What is in the image?
Why do I care?
Keyword-based image retrieval
Impaired users
Data summarization
S
cientific data classification…
Yellow flower
Flower, yellow, dandelion, detail, close-up, nature, plant, beautiful
Taraxacum
officinale
The first dandelion that bloomed this year in front of the White House.
nature
dandelionSlide5
Problem formalization
The
annotation task is defined by a
query image
I
and a
vocabulary V of target concepts
The annotation function fA
assigns to each concept c ∈ V a value from <0,1> that expresses the probability of the concept c being relevant for
IDepending on the application, only a subset of V can be returned to the usera fixed number of the most probable concepts concepts with probability higher than a given thresholdsome advanced selection of interesting concepts
V = {flower, animal, person, building}Slide6
How can we describe the image?
Option 1:
Classifiers
Option 2:
S
earch-based
approach
Principles
Learning phase
: use reliable training data to create classifiers for selected concepts
Annotation phase
: run classifiers
Learning phase
: none
Annotation phase
: similarity search over annotated data +
postprocessing
Main advantages
mature technologies available (e.g. neural networks)
fasthigh precision and recallreducing the reliance on cleanly labeled data, utilization of web datano costly learning phase, annotation phase can be easily adjusted to user’s preferencesscalability w.r.t. vocabulary sizeUse casesAnnotations with fixed vocabulary and reliable training dataidentification of peopleclassification of cancer cells…Annotations with open/adaptable vocabularyproposing keyword annotations for web image databases – need to be rich, adapt to the changing vocabulary of users
Option 1: Classifiers Option 2: Search-based approachPrinciplesLearning phase: use reliable training data to create classifiers for selected conceptsAnnotation phase: run classifiersLearning phase: noneAnnotation phase: similarity search over annotated data + postprocessingMain advantagesmature technologies available (e.g. neural networks)fasthigh precision and recallreducing the reliance on cleanly labeled data, utilization of web datano costly learning phase, annotation phase can be easily adjusted to user’s preferencesscalability w.r.t. vocabulary size
Option 1: Classifiers Option 2: Search-based approachPrinciplesLearning phase: use reliable training data to create classifiers for selected conceptsAnnotation phase: run classifiersLearning phase: noneAnnotation phase: similarity search over annotated data + postprocessing
Option 1:
Classifiers
Option 2:
S
earch-based
approachSlide7
Search-based approach: basic scheme
V = {flower, animal, person, building}
Annotated i
mage collection
Content-based
image retrieval
Similar annotated images
Yellow, bloom, pretty
Meadow
,
outdoors, dandelion
Mary’s
garden, summer
Text processing
Semantic resources
Selection of the
final annotation
flower
Candidate keywords with probabilities/scoresPlant 0.3Flower 0.3Garden 0.15Animal 0.05Human 0.1Park 0.1Slide8
Search-based approach: challenges
Selection and preprocessing of underlying database of annotated images
Size vs. quality
Effective and efficient image search
Descriptors, indexing technique
Image search results processing
Baseline: word cloud
Advanced: semantic analysis, annotation with hierarchic structureSelection of output(user?)selected level of the hierarchic structureSlide9
ConceptRankSlide10
Baseline word cloud solution
???
What would a person do?
Search for
semantic connections
between candidate keywords
Flowers bloom; dandelion is a flower; there are usually flowers in a garden; …
Based on the connections, estimate probabilities of vocabulary terms“Flower” is rather likely
Idea
Content-based
image retrieval
?
V = {flower, animal, person, building}
Similar annotated images
Yellow, bloom, pretty
Meadow
,
outdoors, dandelion
Mary’s
garden, summerSlide11
What can the computer do?Search for semantic connections between candidate
keywords?
Yes! Ontologies
, WordNet
,
image dataset statistics
, web,
…Based on the connections, estimate probabilities of vocabulary terms?Yes! Based on the connections, add new candidates and/or adjust the score of existing candidates
So, lets try it!Tasks: find a suitable source of semantic informationpropose an algorithm that
uses the selected resource to discover semantic connections between candidate concepts and performs score recomputationWe want a generic and theoretically sound solution
Idea (cont.)
ConceptRankSlide12
ConceptRank overview
Let us asume we have some semantic resource S that contains
Semantic objects
Relationships between semantic objects
Mapping from English words to semantic objects
For ConceptRank, we need to
Transform the input keywords into semantic objects from S
Lets call the result “initial candidate objects”Retrieve relationships between candidate objects and if suitable, add new candidate objectsWe need a suitable representation for this: semantic networksCompute the probability of candidate objects
The actual ConceptRank algorithmSlide13
Graph representation of semantic relationshipsNodes: candidate objects
Node probability: current probability of the respective candidate concept
Edges: relationships between candidate objects
Edge weight:
“relevance transfer”
capacity
the weight of edge from A to B expresses the ratio of probability which node A contributes to node B
Semantic network for annotations
dog
cat
animal
mouse
computer
keyboard
1
1
0.5
0.5
1
0.33
0.330.330.50.5
0.20.10.20.10.20.2Slide14
Building the semantic network
Input
:
initObjectsWithProb
–
set of initial objects with probabilities,
S
- semantic resource,
rels – set of interesting relationships Output
: semanticNet
– the semantic network
begin
queue <- initObjectsWithProb.getObjects();
for
(o :
queue)
do
semanticNet.addNode(o); queue.remove(o); for (r : rels) do for (o2 : S.getConnectedObjects(o,r)) do if (semanticNet.contains(o2)) then semanticNet.addEdge(o,o2,r,computeWeight(r,…)); else if (r.isExpandingRel) then queue.add(o2); semanticNet.addNode(o2); semanticNet.addEdge(o,o2,r,computeWeight(r
,…)); fi fi done donedoneendSlide15
ConceptRank algorithm
Task
: Using the probabilities of initial concepts (which were obtained from previous annotation phases) and the semantic network, compute the probability of each node in the network
Observations:
The nodes in the network mutually influence each other’s
probability
The
computation of node probabilities needs to be an iterative processGoal: theoretically sound algorithm that finds a balanced state of the iterative processInspiration: Google PageRank algorithm
dog
cat
animal
mouse
computer
1
1
0.5
0.5
keyboard
1
0.33
0.330.330.5
0.50.20.10.20.10.20.2dogcatanimalmousecomputer110.50.5
keyboard10.330.330.330.50.50.066
0.066
0.35
0.166
0.1
0.25Slide16
PageRank
Input: Web pages
and
links represented in a graph
Output: Importance score
of
pages
Algorithm idea: In its simplest form, PageRank is a solution to the recursive equation “a page is important if important pages link to it.”The importance of any node is computed as the probability that this node is reached by a random surfer who starts in an arbitrary node of the network graph and moves for a long time.
Network graph construction: Pages are represented by nodes, hyperlinks by oriented edges.
For each node in the graph, the sum of weights of all outgoing edges is 1.
A
C
B
0.5
1
0.5Slide17
PageRank (cont.)
Some math behind:
Since the
probability
of reaching a node depends solely on the probabilities of referencing
nodes
, the random surfer model is a Markov process.
For Markov processes, it is known that the distribution of the surfer approaches a limiting distribution, provided two conditions are met:the graph is strongly connected (it is possible to get from any node to any other n.)
there are no dead ends (nodes that have no outgoing edges)To meet these conditions, the random surfer can perform random restarts – with
a probability Prestart, he can restart at any moment in any nodeComputation of scores: eigenvector computation over the matrix representation of the adjusted graph
Prestart=0.3
A
C
B
0.35
0.7
0.35
0.33
0.1
0.1
0.10.10.330.1
0.330.1ACB0.510.5ACB0.350.7
0.35Slide18
ConceptRank vs. PageRank
Input
:
PageRank: web
pages and
hyperlinks
ConceptRank: candidate concepts and semantic links
Output: PageRank: importance score of pagesConceptRank: importance score of candidate conceptsSimilarities:
We have nodes and links that can be used to form a graph/networkThe network can be mode
lled as a Markov processThe random walk intuition makes sense for both problemsRandom walk with internet: simulates randomly surfing userRandom walk with keywords: simulates user’s thinking while looking for relevant concepts
Differences:For ConceptRank, we want to consider initial probabilities associated with nodesSlide19
Adaptation of initial probabilities into the model
Random restarts will not be uniformly random
Instead, the probability that the walk will restart in a given node will correspond to the initial probability of that node
The initial probability is determined by previous steps of the annotation process
For concepts found among the keywords of similar images, the initial probability corresponds to the frequency of the concept
For concepts that were added during the semantic network building, the initial probability is 0
dog
cat
animal
mouse
computer
keyboard
0.4
0.35
0.2
0.4
0.35
0
0.2
0.05
00.40.050.350.2
0.05dogcatanimalmousecomputer110.50.5keyboard1
0.330.330.330.50.5Slide20
ConceptRank algorithm
Input
:
initObjectsWithProb
– initial concepts and
their
probabilities,
semanticNet
– the semantic network,
rels – selected relationships and their weights,
restartProb – probability of random surfer restart
Output:
nodeProbs
– probabilities of network nodes
begin
//construct the restart vector and matrix
restartVector <- constant vector of 0 values;
for (
n : semanticNet.getNodes()) do if (initObjectsWithProb.contains(n)) then restartVector[semanticNet.indexOf(n)] <- initObjectsWithProb.get(n); fidonerestartM <- unityVector*restartVector;// construct the transition matrix, normalize, solve dead endstransitionM <- new Matrix;for (r : rels.getRelationshipTypes()) do relM = constructTypeMatrix(semanticNet.getNodes,semanticNet.getEdges(r)); transitionM.add(relM*rels.getWeight(r));donetransitionM.normalize();for (i=0; i<transitionM.getColumnDimension(); i++) do
if (transitionM.getColumn(i).getSum() == 0) then transitionM.replaceColumn(i, restartVector); fidone// compute the eigenvectorcompleteMatrix <- (1-restartProb)*transitionM + restartProb*restartM;nodeProbs <- completeMatrix.getPrincipalEigenvector();endSlide21
Efficiency issues
For larger sets of similar images, the number of initial keywords and subsequentially the number of nodes in the network may get high (1000+)
Costly construction of the semantic network
Costly computation of the ConceptRank
Therefore, approximations can be used
For semantic network construction: limiting the number of initial nodes
For ConceptRank computation: limited number of multiplications by the transfer matrix instead of the exact mathematic computation of the eigenvector
Approximation used by Google, known to work very wellSlide22
Putting theory to useSlide23
The basic annotation scheme again
V = {flower, animal, person, building}
Annotated i
mage collection
Content-based
image retrieval
Similar annotated images
Yellow, bloom, pretty
Meadow
,
outdoors, dandelion
Mary’s
garden, summer
Text processing
Semantic resources
Selection of the
final annotation
flower
Candidate keywords with probabilities/scoresPlant 0.3Flower 0.3Garden 0.15Animal 0.05Human 0.1Park 0.1ConceptRankSlide24
MUFIN Image Annotation Framework
Modular architecture for image annotation
There is an extensible set of modules that implement the same interface
Can be arbitrarily combined into an “annotation pipeline”
There is an “annotation record” object that is passed from one module to another
Carries information about
query and candidate
keywords, current estimate of probabilities, and any other knowledge deemed relevant by individual modulesClear structure, easy adaptabilityUpgrade from MPEG7 to DeCAF descriptors = replacing one module without disturbing others
MUFIN Image Annotation applicationSlide25
MUFIN Image Annotation – current version
Objective:
Annotation with semantic relationships evaluated by ConceptRank
Basic decisions:
Reference dataset: 20M Profiset
20M high-quality images with rich and systematic
annotation
20 keywords per image on averageObtained from a commercial web-site selling stock images Evaluation of visual similarity: DeCAF descriptorsState-of-the-art for image content description
Indexing: PPP-codesSource of semantic information: WordNetLexical database of EnglishNouns
, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms – synsetsSynsets are interlinked by conceptual-semantic and lexical relationsHypernyms, hyponyms, …Slide26
WordNet ConceptRank details
Basic objects for semantic analysis: synset
Step 1: Transformation
of keywords to
synsets
For keywords with multiple meanings, there exist more synsets (e.g. mouse). How do we decide which synset(s) to pick?
There is an additional resourse that for most English words lists the possible synsets together with a score that corresponds to the frequency of use of the keyword in the meaning described by the given synset
We take a fixed number of the most probable synsets for each keywordThere may be many synsets retrieved by the previous step, which could lead to costly processing of the semantic networkTherefore, only a fixed number of the most probable synsets are used to build the networkSlide27
WordNet ConceptRank details (cont.)
Step 2: Construction
of WordNet-based semantic network
Which
relationships are interesting
?
For now: Hypernyms, hyponyms, holonyms, meronyms
Which relationships should be used to extend the network and which should be used only to add edges between existing nodes?Extending mode: bottom-up relationships (hypernyms, maybe holonyms)How shall we compute the weights of semantic network edges for each relationship?Bottom-up relationships: edge weight 1
Top-down relationships: edge weight 1/(number of child nodes)
dog
cat
animal
mouse
computer
1
1
1
1
keyboard
1
0.330.330.330.50.5Slide28
The complete annotation pipeline
Similarity
search
Extraction of the DeCAF descriptor from the query image
Retrieval
of k visual nearest neighbors
Semantic analysis
Frequency analysis of keywords + normalizationTransformation of keywords to synsetsConstruction of WordNet-based semantic networkComputation of ConceptRankSelection of the final annotation
Mapping synsets with probabilities to vocabulary conceptsSlide29
Overview of annotation parameters
Similarity search
# of similar images
Transformation of keywords to synsets
# of most probable synsets per keyword
# of most
probable synsets
that enter the network constructionConstruction of WordNet-based semantic networktypes of relationshipsfor extending networkfor adding edges
weights of edges for individual relationshipsComputation of ConceptRankrestart probabilityweights of individual relationship matricesSlide30
Annotation query example
Input:
?
Vocabulary: all English wordsSlide31
Example: kNN search and initial synsets
kNN
search: k=5
Keywords to
synsets
:
at most 3 most probable synsets per keywordMerge synsets: 20
synsets with the highest probability
beak, cotswolds
, flamingoes
(
2),
head, janes
(
2)
,
pink
,
site, slimbridge (2), trust , water, wetlands, wildfowl beak, cotswolds , flamingoes (2), head, janes (2), pink, preening, site, slimbridge (2), trust, water, wetlands, wildfowlamerican, birds, darwin, flamingo (2), flap, flapping (2), galapagos, greater (2), islands, markings, phoenicopterus, race, ruber, south, wing, wings (2)aythya, drake, duck, sv, swimmingaythya, drake, duck, sv, swimming?flamingo 0,185greater 0,062wildfowl 0,062Cotswolds 0,062Aythya 0,062wetland 0,062site 0,058head 0,049pink 0,047water 0,046trust 0,037wings 0,037duck 0,034Drake 0,031drake 0,031swimming 0,031Galapagos_Islands 0,031beak 0,025beak 0,025American 0,023Initial synsets:Slide32
Example: semantic network – hypernymsSlide33
Example: annotation results
Top 5
keywords
– demonstration settings
Flamingoes (4.15)
Duck (2.44)
Wildfowl (1.74)
Birds (1.48)Wetlands (1.41
)Top 5 keywords – 70 images, 7 synsets/kw, 100 init. synsets, all relationships
Animal (2.68)Bird (2.42)Travel (2.30)Vertebrates (2.04)Swimming (1.42) Slide34
Experimental evaluation
ImageCLEF 2014: Scalable Concept Image Annotation
Focus on concept-wise scalability
No reasonable training data
Provided development queries, GT and evaluation scripts
Vocabulary
: aerial
airplane baby beach bicycle bird boat bridge building car cartoon castle cat chair child church cityscape closeup cloud cloudless coast
countryside
daytime desert diagram dog drink drum elder embroidery fire firework fish flower fog food footwear furniture garden grass guitar harbor hat helicopter
highway
horse indoor instrument lake lightning logo monument moon motorcycle mountain nighttime overcast painting park person plant portrait
protest
rain rainbow reflection river road sand sculpture sea shadow sign silhouette smoke snow soil space spectacles sport sun
sunrise/sunset table teenager
toy traffic train tricycle truck underwater unpaved wagon
water
GT
: countryside daytime grass horse plantSlide35
Development data results
Processing time
:
1500
ms on average for parameters used in
the
table
1000 ms for
descriptor
extraction (can be improved)
300 ms for similarity search
Competition results: a close 2
nd placeExperimental evaluation
(cont.)
MP-c
MR-c
MF-c
MP-s
MR-s
MF-s
MAP-sRandom baseline2.791.031.173.151.912.238.78DISA baseline – freq. analysis, 1 synset per kw20.9634.2222.8737.3043.1438.0740.59DISA baseline with multiple synsets per kw31.2036.7627.79
44.3051.0045.0050.03DISA with hyper-hypo30.1036.5728.7548.4258.2250.2658.35DISA with hyper-hypo-holo-mero30.2936.6328.9849.0859.1151.0059.34Slide36
What next?Slide37
Summary and Future work
Already done
The ConceptRank algorithm
Working
annotation
system
Good results in the ImageCLEF
competitionNear futureMore evaluationsInfluence of dataset size and quality, approximation params, …Google ground truthPublish or perish
More distant futureOther resources of semantic relationshipsOntologies, Word2VecRelevance
feedbackCombined architecture: search-based approach and modern NN classifiers