Data Structures - PowerPoint Presentation

calandra-battersby . @calandra-battersby

406 views
Uploaded On 2016-04-01

Data Structures - PPT Presentation

and Algorithms Huffman compression An Application of Binary Trees and Priority Queues CS 102 Encoding and Compression Fax Machines ASCII Variations on ASCII min number of bits needed cost of savings ID: 272703

tree 102 characters building 102 tree building characters file huffman text code node character bits compression frequency codes queue

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/272703" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "Data Structures" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Data StructuresandAlgorithms

Huffman compression

www.mif.vu.lt

algisSlide2

Encoding and Compression

Fax Machines

ASCII

Variations on ASCII

min number of bits needed

cost of savings

patterns

Modifications

Proposed by Dr. David A. Huffman in 1952

“

A Method for the Construction of Minimum Redundancy Codes

”

Applicable to many forms of data transmission

Our example: text filesSlide3

The Basic Algorithm

Huffman coding is a form of statistical coding

Not all characters occur with the same frequency!

Standard, all characters are allocated the same amount of space

1 char = 1 byte, be it

Any savings in tailoring codes to frequency of character?

Code word lengths are no longer fixed like ASCII.

Code word lengths vary and will be shorter for the more frequently used characters.Slide4

The Basic Algorithm

Scan text to be compressed and tally occurrence of all characters.

Sort or prioritize characters based on number of occurrences in text.

Build Huffman code tree based on prioritized list.

Perform a traversal of tree to determine all code words.

Scan text again and create new file using the Huffman codes.Slide5

Huffman Compression

Background:

Huffman works with arbitrary bytes, but the ideas are most easily explained using character data

Consider extended ASCII character set:

8 bits per character

BLOCK code, since all

codewords

are the same length

8 bits yield 256 characters

In general, block codes give:

For K bits, 2

characters

For N characters,

log

N bits are required

Easy to encode and decodeSlide6

Huffman Compression

What if we could use variable length codewords, could we do better than ASCII?

Idea is that different characters would use different numbers of bits

If all characters have the same frequency of occurrence per character we cannot improve over ASCII

What if characters had different

freqs

of occurrence?

Ex: In English text, letters like E, A, I, S appear much more frequently than letters like Q, Z, X

Can we somehow take advantage of these differences in our encoding?Slide7

Huffman Compression

First we need to make sure that variable length coding is feasible

Decoding a block code is easy – take the next 8 bits

Decoding a variable length code is not so obvious

In order to decode unambiguously, variable length codes must meet the prefix property

codeword

is a prefix of any other

See example on board showing ambiguity if PP is not met

Ok, so now how do we compress?

Let's use fewer bits for our more common characters, and more bits for our less common charactersSlide8

Huffman CompressionSlide9

Huffman CompressionSlide10

Huffman Compression

Huffman Algorithm:

Assume we have K characters and that each uncompressed character has some weight associated with it (i.e. frequency)

Initialize a forest, F, to have K single node trees in it, one tree per character, also storing the character's weight

while (|F| > 1)

Find the two trees, T1 and T2, with the smallest weights

Create a new tree, T, whose weight is the sum of T1 and T2

Remove T1 and T2 from the F, and add them as left and right children of T

Add T to F Slide11

Huffman Compression

Huffman Issues:

Is the code correct?

Does it satisfy the prefix property?

Does it give good compression?

How to decode?

How to encode?

How to determine weights/frequencies?Slide12

Huffman Compression

Is the code correct?

Based on the way the tree is formed, it is clear that the codewords are valid

Prefix Property is assured, since each codeword ends at a leaf

all original nodes corresponding to the characters end up as leaves

Does it give good compression?

For a block code of N different characters,

log

N bits are needed per character

Thus a file containing M ASCII characters, 8M bits are neededSlide13

Huffman Compression

Given Huffman codes {C

,…C

N-1

} for the N characters in the alphabet, each of

length|C

Given frequencies {F

,…F

N-1

} in the file

Where sum of all frequencies = M

The total bits required for the file is:

Sum from 0 to N-1 of (|C

| * F

)

Overall total bits depends on differences in frequencies

The more extreme the differences, the better the compression

If frequencies are all the same, no compression

See example at the endSlide14

Huffman Compression

How to decode?

This is fairly straightforward, given that we have the Huffman tree available

start at root of tree and first bit of file

while not at end of file

if current bit is a 0, go left in tree else go right in tree // bit is a 1

if we are at a leaf output character go to root

Data Structures - PowerPoint Presentation

Data Structures - PPT Presentation

Share:

Link:

Embed:

Related Contents