Combinatoria l Pattern Matching CPM June 29 2015 Rayan Chikhi CNRS Lille Sofya Raskhodnikova Penn State Paul Medvedev Penn State Martin Milanič University of Primorska ID: 380553
Download Presentation The PPT/PDF document "Readability" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
ReadabilityCombinatorial Pattern Matching (CPM)June 29, 2015
Rayan Chikhi, CNRS LilleSofya Raskhodnikova, Penn StatePaul Medvedev, Penn StateMartin Milanič, University of PrimorskaSlide2
Overlap Digraph (definition)A string overlaps
a string if there is a suffix of that is equal to a prefix of . They overlap properly
if, in addition, the suffix and prefix are both proper.The overlap digraph of a set of strings
is a digraph where each string is a vertex and there is an edge
if and only if properly overlaps
.
Various variants of overlap graphs used in bioinformatics applications
ACGTA
GTAAC
CCCCT
GGACTSlide3
QuestionsDo overlap digraphs have any properties or structure that can be exploited
?Given a graph, Braga and Meidanis (2002) showed how to label the vertices so that the graph is an overlap graphHow does the set of graphs generated depend on the string length?BM labeling used strings of length
Limiting the string length limits the graphs that can be generated
?
?
?
?Slide4
Readability in the digraph modelA labeling is an assignment of strings to verticesLet
be a directed graph.An overlap labeling is a labeling such that
is an edge if and only if the string of x
properly overlaps the string of y.The readability of a digraph D, denoted
, is the smallest nonnegative integer
such that there exists an injective overlap labeling of
with
strings of length .
ACGTA
GTAAC
CCCCT
GGACT
Slide5
Readability in the bipartite graph modelLet
be a bipartite graph.
An overlap labeling is a labeling such that
is an
edge if
and only if the
string of
x properly overlaps the string of y.The readability
of a bipartite graph , denoted r(G), is the smallest nonnegative integer r such that there exists an
injective overlap labeling of G with strings of length r.
Thm: There exists a bijection
such that for all
= set of
bipartite
graphs with nodes
in each
part
= set of all digraphs with nodes
.
ACA
CAC
AGA
CATSlide6
ExamplesComplete bipartite graph on
vertices ()
Even cycle on vertices (
)
41
12
12
23
23
34
34
41Slide7
Is there a simple and useful string-free formulation of readability?Slide8
P
4
-rule and
P
4
Lemma
A
decomposition of size k
is a weight function
Given an overlap labeling
,
the
-decomposition
is a decomposition assigning each edge
the length of the minimum overlap between
and
.
P
4
Lemma
: If
is an overlap labeling, then the
-decomposition
satisfies the following (called the P
4
-rule):
For every induced
, if middle edge has the maximum weight, then
Slide9
Trees
Given a decomposition
, we say that labeling
achieves
if it is an overlap labeling and
is the
-decomposition.
Let
be a tree.
Theorem:
P
4
Lemma implies
Claim:
if
satisfies the P
4
-rule, then there exists a labeling achieving
Order edges by non-decreasing weight, and def
Inductively construct labeling
for
. Let
Note that
, because of
-rule and
is
-free
Relabel
and
with
where A
has length
and is composed
of new, non-repeating
characters
A
A
Slide10
Proof of claim (key idea)Case
A
A
Case
Slide11
For cycles, theorem not true
2
4
2
3
1
2
3Slide12
-free bipartite graphs
The strict -rule is
For every induced , if middle edge has the maximum weight, then
Theorem
: For a
-free bipartite graph
For graphs with
,
theorem
not true
4
2
3
3
1
1
1Slide13
General bipartite graphsLet
be the subgraph of including only edges with weight .Define
as the size of the smallest decomposition satisfying the HUB-rule: for all
bicliques: is a disjoint union of
bicliqueshierarchical: If and
have the same neighborhoods in
, then they have the same neighborhoods in
for
.
Thm
:
Slide14
How large can readability be?Theorem: Almost all graphs have readability
via counting argument
Slide15
DistinctnessDistinctness of two vertices in the same bipartition is the number of vertices in one neighborhood and not the other (taking the max of the two values)Distinctness of
is the minimum distinctness over all pairsThm:
Consider the decomposition of an optimal labelingCase 1: every is a matching
Adding a matching can increase the distinctness by at most oneCase 2: Let
be the last one that is not a matchingUsing the fact that the decomposition satisfies the HUB-rule
Slide16
Hadamard Graphs
bipartite graph
vertices assigned -long binary codewords
edge if the inner-product of the
codewords is odd
00
01
1
0
1
1
00
01
1
0
1
1
Theorem
:
Slide17
Trees
Thm
:
For all trees
,
For full k-
ary
tree of height k,
Assume
fsoc
there exists an opt
decomp
of size
A path from root to leaf with distinct edge weights,
with
values, with
edges
Slide18
ConclusionsResultsA string-free formulation of readability that isexactly equivalent for treesasymptotically equivalent for
-free bipartite graphs“weakly” equivalent for general graphsExistence of a graph family with readability of
Open problemsFind other rules that an
-decomposition must satisfy to close the gap :
Let
We know
Do there exists graphs with
?
Complexity
Understand graphs that have poly-logarithmic readability
Slide19
The endCombinatorial Pattern Matching (CPM)June 29, 2015Rayan
Chikhi, CNRS LilleSofya Raskhodnikova, Penn StatePaul Medvedev, Penn StateMartin Milanič, University of PrimorskaSlide20
General graphsDefine
for as the subgraph of including only edges with weight at most
.Lem: An -decomposition satisfies the following (HUB-rule), for all
is a disjoint union of bicliquesIf and
have the same neighborhoods in
, then they have the same neighborhoods in
for
.
Define
as the size of the smallest decomposition satisfying the HUB-rule.
Thm
:
Slide21
Questions/ResultsDo there exists graphs with readabilitySlide22
Almost all graphs have readability
Counting argument
There are
bipartite graphs with vertices.There are at most
labellings of length