/
Matrix   and Graph • Matrix   and Graph •

Matrix and Graph • - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
371 views
Uploaded On 2018-03-06

Matrix and Graph • - PPT Presentation

Matrix Binary Matrix Sparse Matrix Operations for VectorsMatrices Graph and Adjacent Matrix Adjacent List Matrix and Graph Matrix is a 2dimensional ID: 640830

sparse matrix int row matrix sparse row int column graph edges array cell vertices edge vectors cells data lists binary vertex incidence

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Matrix and Graph •" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Matrix and Graph

Matrix

Binary Matrix

Sparse Matrix

Operations for Vectors/Matrices

Graph and Adjacent Matrix

Adjacent ListSlide2

Matrix and Graph

Matrix is a 2-dimensional

structure

Used in wide areas from physical simulations to customer management

Graphs are also used in many areas, to represent the relations and flows between data

Some data structures have been considered to handle matrix and graph; update, preserve, search, and operateSlide3

2-Dimensional Structure of Matrix

An

n×m

matrix has

n×m

numbers

 

can be stored in an

array of size

n×m

 

[

i,j

]

element corresponds to the

i

*

m+j

th

cell of the array

  

A naïve design is done, but there are something moreSlide4

2-Diemnsaional Array

There is a way to make 2-dimensional array, instead of usual 1-dimensional array

Prepare an array of pointers of size

n• Prepare n arrays of size m, and write the place of the first cell of i-th array to the i-th cell of the pointer array• [i,j] element of matrix a is accessed by a[i][j] (in C)  

O(nm)

memory space

Simple structureSlide5

Allocate a 2-Dimensional Arrayint

*

MATRIX_alloc

(

int

n, int m ){ int i, **a, flag =0; a = malloc ( sizeof(int *)*n ); if ( a=NULL ) return (NULL); for ( i=0 ; i<n ; i++ ){ a[i] = malloc (sizeof(int)*m); if ( a[i] = NULL ) flag = 1; } if ( flag == 1 )

return (NULL); else return (a); }

int *MATRIX_free ( int **a ){

int i; for

( i=0 ; i<n ; i++ ) free ( a[

i] );}Slide6

Binary Matrix

A binary matrix is a matrix all whose cells are either 0

or 1

 

+ each cell is either ○ or × + adjacency matrix of a graph, shown later• Space consuming if use one integer for one 01 value (1 bit)  motivated to compress the matrix0101010001

1111011000Slide7

Representation by Bits

A row composed of 01 values can be considered as a big integer

by chopping into some integers of 32 bits (or 64 bits), the integer becomes tractable └m/32┘ integers are sufficient to store a row   (space efficiency also increases, and also cache efficiency)• [i,j] element can be accessed by looking at the (j%32)th

bit of the j/32 th integer in the i-

th rowSlide8

Handling Bit Access

[

i,j

]

element can be accessed by looking at the (

j%32)th bit of the j/32 th integer in the i-th row  … writing a code is bothering• Prepare an arrayBIT_MASK[]= {1,2,4,8,16,…}BIT_MASK_[]= {0xfffffffe, 0xfffffffd, 0xfffffffb, …}+ read value: a[i

][j/32] & BITMASK[j%32]+ set to 1

: a[i][j/32] =

a[i][j/32] | BITMASK[j%32]+

set to 0 : a[i

][j/32] = a[i][j/32] & BITMASK_[j%32]Slide9

Sparse Matrix

That’s all, for structures for

s

imple matrices

Space efficiency is in some sense optimal• But, in application, it is often not sufficient/efficient  for example, if matrix is sparse, many parts are redundant• Sparse matrix has the same value in many cells (usually 0)• Sparse matrix should be stored by memorizing the places with non-zero valuesSlide10

Storing Sparse Matrix

Let’s begin from binary matrix, for simplicity

 

almost cells are

0, and few 1’s• A simple idea is to make a list of the places of the cells being 1• That is, memorize (x1,y1),(x2,y2

),(x3,y3),…

, store the row ID and column ID of the cells being 1

• The memory requirement is “twice the number of

1’s” this is very efficient if there are few 1’s (sparse)

But, bad accessibility; to read a cell, we have to scan all (binary tree / hash can be used)Slide11

Store Row-wise

Let’s have a structure to improve the accessibility

Classify the places of

1’s according to their row ID  prepare n arrays, and store the column ID of 1’s in i-th row, in the ith array• We need to have n pointers to n arrays, but we don’t have to store the row ID’s, thus memory efficiency increases

The memory requirement is “# 1’s + #rows

×2”

(can be “# 1’s + #rows”)

Accessibility is good; sorting ID’s in a row array, binary search works (linear scan is enough, if few column ID’s)Slide12

Structure in Each Row

In sparse cases, the efficiency is increased,

However, the update concerned with insertion/deletions is not efficient

They are the same, in the situation of stacks and queues• So, according to the purpose, we use lists bucket/hash/binary tree for structures in a row (having n arrays is equivalent to having buckets)Slide13

Real World Data

The characteristics of sparse matrices in practice are;

+

Matrix representing mesh network (structural calculation) few meshes are adjacent to one, in geometrical sense, thus not so many non-zeros per row  array is sufficient for structures of rows+ Road network data (adjacency of cross points + distance) 

 almost the same, but update comes sometimes

(would be sufficient if (re-)allocate bit larger memory)Slide14

Real World Data (2)

Ex)

A matrix representing, row

text, column

 word, a cell is one if the word is included in the text, is sparse, usually

 (POS data, Web links, Web surfing, etc.)+ on average, #1’s in a row/column is constant, but some have so many (texts having many words, words included in many texts)+ distribution of 1’s is that so called power (zip) law, scale free; #of items of size D is proportional to 1 / ΔD can be often seen in real world data (≠ geometric distribution)

+

Such data needs algorithms designed so that the dense part will not affect badly; will be the bottle neck of the computationSlide15

Non-binary Sparse Matrix

Usual matrix are of course non-binary, it is not sufficient to remember the places having non-zero value

remember (place, value)

• In the case of using array, (place, value), (place, value), (place, value),…, or place1, plcae2,…, value1, value2,…• In the case of lists of binary tree, assign (place, value) to each cell/node or, simple prepare two of themSlide16

Exercise

Make data representing the following matrix in a sparse way

0,0,1,4,0

0,1,0,0,5

2,0,0,0,0

1,2,5,0,20,0,0,0,0Slide17

Column: Memory Saving for Matrix

Buckets, or a row of a sparse matrix needs two data

(pointer to the first cell, and the size

k

i)• We decrease these from two to one• First, prepare an array of size equal to # non-zero cells. Then, + 0th row uses the cells of the array ranging from 0 to k0

-1 +

1st uses from k0

to k0+k

1-1 …

+ i-

th row uses from k0

+…+ki-1 to k0

+…+ki-1, and we remember only the start positions of the rows

The size of i-

th row can be obtained by (start position of i+1)

- (start position of i)Slide18

Matrix Operation

Basic matrix operations are addition and multiplication

(inner product of vectors is a special case)

Further, AND

and

OR for binary matrix• Algorithms for the operations are trivial if the matrices are in the form of 2-dimensional arrayHowever, not clear if they are in sparse forms• Further, there are several structures that have advances for matrix operationsSlide19

Addition of Matrix

For the addition, it is sufficient to have algorithms for additions of each row

(so, operations of vectors are sufficient)

First, we see the case of inner product of sparse vectorsSlide20

Inner Product

For computing inner product of two sparse vectors, the difficulty is that we have to find the cell corresponding to each

Sort the cells in each vector according to their column ID

Scan two vectors simultaneously, from smaller indices “simultaneously” means that iteratively pick up the smallest column ID among the two vectors• When we find a column ID at which both vector have non-zero values, accumulate the product of the cells155

1

7

3

1

1

3

3

5

4Slide21

A Code for Sparse Inner Productint

SVECTOR_innerpro (

int

*va,

int

ta,

int

*vb, int tb){ int ia=0, ib=0, c=0; while ( ia<ta && ib<tb){ if (va[ia*2] < vb[ib*2] ) ia++; else if (va[ia*2] > vb[ib*2] ) ib++; else { c = c + va[ia*2+1]*vb[ib*2+1]; ia++; ib++; } } return ( c );}15

5

1

7

3

2

1

3

3

5

4Slide22

Addition of Two Vectors

The addition can be done in a similar way

Sort the cells in each vector according to their column ID

• Scan two vectors simultaneously, from smaller indices• The positions of non-zero values in the resulted vectors are those having non-zero values in one of two vectors, thus can be easily identified by the scan155

1

7

3

2

1

3

3

5

4Slide23

A Code for Additionint

SVECTOR_add (

int

*vc,

int

*va,

int

ta, int *vb, int tb){ int ia=0, ib=0, ic=0, c, cc; while ( ia<ta || ib<tb){ if (ia == ta ){ c = vb[ib*2+1]; cc = vb[ib*2]; ib++; } else if ( ib == tb ){c = va[ia*2+1]; cc = va[ia*2]; ia++; } else if (va[ia*2] > vb[ib*2] ) { c = vb[ib*2+1]; cc = vb[ib*2]; ib++; } else if (va[ia*2] < vb[ib*2] ) { c = va[ia*2+1]; cc = va[ia*2]; ia++; } else { c = va[ia*2+1] + vb[ib*2+1]; cc = vb[ib*2]; ia++; ib++; } vc[ic*2] = cc; vc[ic*2+1] = c; ic++; } return ( ic );}1

5

5

1

7

3

2

1

3

3

5

4Slide24

Column: Endmarks do a Good Job!

Compared to inner product, code for addition is relatively long

w

e have exceptions at the end of the array• So, we are motivated to simplify the code by using “endmark” (endmark is a symbol that represent the end of the array, or something else representing the end)• 0, -1 or a very large value is used as an endmark•

We prepare an additional cell next to the end of each array, and put an endmark at the cellSlide25

Column: Endmarks do a Good Job! (2)

int

SVECTOR_innerpro (

int

*vc,

int

*va,

int ta, int *vb, int tb){ int ia=0, ib=0, ic=0, c, cc; while ( va[ia*2] != ENDMARK && vb[ib*2] != ENDMARK){ if (va[ia*2] > vb[ib*2] ) { c = vb[ib*2+1]; cc = vb[ib*2]; ib++; } else if (va[ia*2] < vb[ib*2] ) { c = va[ia*2+1]; cc = va[ia*2]; ia++; } else { c = va[ia*2+1] + vb[ib*2+1]; cc = vb[ib*2]; ia++; ib++; } vc[ic*2] = cc; vc[ic*2+1] = c; ic++; } vc[ic*2] = ENDMARK; return ( ic );}

1

5

5

1

7

3

2

1

3

3

5

4

■Slide26

Matrix Multiplication

For sparse matrix multiplication, compute the inner products of all the pairs of a row and a column

However, a sparse matrix has row representations but not column representations, getting column vectors is hard

A simple solution is to use transposing algorithm that is explained in the section of bucket; we will have column representation• On the other hand, some data structures are designed to be enabled to trace also columnsSlide27

Four-Direction List

Lists are good at storing sparse vectors, for tracing

However, collection of lists isn’t good at tracing column vectors, because the cells are not connected vertically

• …so, let’s have a list connected in both row direction and column direction• Each cell has four arms, that point the neighboring cells in directions of (←, →, ↑, ↓)7

4

2Slide28

Pointing the Neighbors

Links to four directions seems to form a mesh network, but not

…since, the links can cross

• In the other words, this structure can be seen as a superimpose of two kinds of lists; horizontal direction and vertical direction, and the identical cells are unified into one7

4

2

4

4

4Slide29

Having Lists of 2-Directions

If we have lists of row vectors and column vectors both, we can have the same accessibility, but insertions/deletions are not same

For example, when we want to delete a cell in a row vector, we would take long time to find the corresponding cell in column lists

In four-direction lists, they are already unified7

4

4Slide30

Graph Structure

A

graph

is a structure composed of a set of

vertices

and a set of edges (an edge is a pair of vertices)• Formed by sets, so the information such as positions, shapes, and crossing edges do not matter, when it is drawn as a picture (a graph with shape/position information is called “graph visualization” or “embedded graph”)• When edges have directions (from one vertex to another), it is called directedvery popular structureSlide31

Examples of Graph Data

Adjacency relation

Hierarchy in an organization

Similarity relation• Web network, human network, SNS friend network,…Slide32

Graph Terminology

Edge

e

is said to be

incident to u, v, and vice versa, if e = (u,v) also u and v are said to be adjacent• The #edges incident to v is the degree of v • A graph having edges for any two vertices is a complete graph

• When there are two or more edges connecting two vertices, the edges are called

multiple edges

• If there is a partition of vertices so that any edge connects a vertex in a group and one in the other, the graph is called bipartite

graphSlide33

• n vertices can be seen as numbers 0,…,n-1

Then, an

edge is

a pair of numbers

 can be stored by writing the pairs in array, lists, etc.• Further, we need something for the accessibilityfor example, we often visit a vertex, and go to the neighboring vertex, and so we need to scan all edges incident to the vertexStoring a GraphSlide34

Using Matrix

The set of edges can be represented by a matrix as follows

j-

th row/column corresponds to vertex j, and ij-cell is 1 if there is edge (i, j) (called adjacency matrix) + efficient for dense graph having many edges + multiplicity of edges can be represented by the value of a cell

j-th row corresponds to vertex

j, and each column corresponds to an edge; when edge e

is incident to vertex i, ij cell is

1 (called incidence matrix)

+

multiple edges represented easily

• Sparse matrix representation has advantage for

incidence matrix and sparse graphSlide35

In Practice

2-dimensional array is sufficient when the matrix size is small

the cost is small, redundancy is small

Sparse matrix such as 100 by 100 with 10 non-zero elements in a row, sparse representation will be efficient (approximately, when density is less than 10%)+ When we often want to scan non-zero elements, such as tracing all vertices adjacent to a vertex, sparse representation is useful+ If we want to check whether there is an edge between two specified vertices, 2-dimensional array has advantageSlide36

Incidence Matrix

An incidence matrix represents the incidence relation between vertices and edges

Put indices from

0,…,n-1 to vertices, and 0,…,m-1 to edges + store edges incident to a vertex to the corresponding row = storing vertices incident to an edge in the corresponding column0: 1,31: 0,2,4,52: 1,3,4,53: 0,2

4: 1,2,55: 1,2,4

0

1

2

3

4

5

0

1

2

3

4

5

6

8

7

0:

0,2

1:

0,1,3,4

2:

4,6,7,83:

2,74: 3,5,85: 1,5,6

0: 0,11: 1,52:

0,33: 1,44:

1,25: 4,56: 2,57:

2,38: 2,4

+Slide37

Advantage of Incidence Matrix

In the case of incidence matrix, each edge has ID

so, easy to handle the attached information to each edge

just allocate an array of size m, and it is sufficient• In the case of adjacency matrix, edge doesn’t have ID, thus not easy to manage correspondence of edge and its data• Multiple edges are also easy to handle0: 1,31: 0,2,4,52: 1,3,4,53: 0,24: 1,2,55: 1,2,4

0

1

2

3

4

5

0

1

2

3

4

5

6

8

7

0:

0,2

1:

0,1,3,4

2:

4,6,7,83: 2,74:

3,5,85: 1,5,6

0: 0,11: 1,52:

0,33: 1,44: 1,25: 4,56:

2,57: 2,38:

2,4

+Slide38

Allocate Memory for Cells

Incidence matrix can be realized by cells of lists having four links like sparse matrix

(two for vertices of the edges, and two for the edges in the vertex)

disadvantages of arrays are eliminated

• Also can be of two array lists• or, prepare an array and edge i corresponds to cells 2i and 2i+1, to represent four links0: 1,31: 0,2,4,52: 1,3,4,53: 0,24: 1,2,55:

1,2,4

0

1

2

3

4

5

0

1

2

3

4

5

6

8

7

0:

0,2

1:

0,1,3,4

2:

4,6,7,83: 2,74: 3,5,85:

1,5,60:

0,11: 1,52: 0,33: 1,4

4: 1,25: 4,56:

2,57: 2,38: 2,4

+Slide39

Exercise

Make an adjacency matrix of the following graph, and that in

A sparse incidence matrix

0

1

2

3

4

5

6Slide40

Bipartite Graph

A bipartite graph is often seen as a representation of a (binary) (sparse) matrix

associate nodes of one group to rows, and the others to columns connect by edges between vertices corresponding a cell with non-zero value• A representation of different style0: 4,61: 4,52: 5,63: 5,6

0

1

2

3

4

5

6Slide41

Column: Store Huge Graph

A graph needs two pointer (or integer) per edge

weight, and etc. need more

64 bits are required in 32 bit CPU• However, Web graphs have billion of vertices, and 20 billions of edges  160GB is necessary in this way• This is too much. Can we reduce the storage size?Slide42

Column: Store Huge Graph (2)

Only few edges have large degrees

Vertices are mainly adjacent to these few vertices

Put indices so that large degree vertices have small indices, and represent small indices by small number of bits, and large indices by many bitsEx.) • If the bit sequence representing a number begins with “0”, the following 7 bits represent [0-127]• If “10”, the following 14 bits represent 128+[0-16383]

If “11”, the following 30 bits represent 16384+128,…Slide43

Column: Store Huge Graph (3)

Sort the sites in dictionary order of their URLs

links are usually to near, thus difference of ID’s becomes small

• They can be recorded in the same way, to reduce the space• Using these, one edge needs just 10 bits Further, we can reduce it to 5 bits The storage will be 20GB, thus can fit recent computersSlide44

Summary

Data structures for matrix

Structures for sparse matrix, and four directed lists

• Structures for graphs: adjacency matrix and incidence matrix adjacency list