Huffman Codes and Data Compression
哈夫曼编码让不同字符之间代码长度不同,保证经常出现的字符的代码要短。
Prefix Code
浅层定义:没有字符代码是别的字符代码的前缀。
定义:for a set of letters, a prefix code is a function that maps each letter to some sequence of zeros and ones, in such way that for distinct , the sequence is not a prefix of the sequence .
Optimal Prefix Codes
假设对于任意一个字母 , 代表字母 出现的频率。
- 假设总共有 个字母,其中 个字母是 .
用 表示 的长度,那么
定义: the average number of bits required per letter
目标:minimize
算法设计
A binary tree with the number of leaves equal to the size of the alphabet , and we label each leaf with a distinct letter in .
定理 (4.27): The encoding of constructed from is a prefix code.
证明:(反证法)
若 是 的 prefix,the path from the root to would have to be a prefix of the path from
the root to
但是,这等同于说 would lie on the path from root to
与 是 leaf 矛盾
定理 (4.28): The binary tree corresponding to the optimal prefix code is full.
证明:(exchange argument)
Let denote the binary tree corresponding to the optimal prefix code, and suppose it contains
a node with exactly one child .
Convert into a tree by replacing node with .
分两类情况讨论:
1. 是 root: 删除 , 并将 变成 root。
2. 不是 root:
令 为 的 parent in .
删除 ,并让 成为 的 child.
这个改变 decrease the number of bits needed to encode any leaf in the subtree rooted at node ,
and it does not affect other leaves
与 optimality of 矛盾
定理 (4.29): 假设已知 optimal binary tree 。Suppose that and are lives of , such that . Further, suppose that in a labeling of corresponding to an optimal prefix code, leaf is labeled with and leaf is labeled with . Then, .
证明:用 exchange argument 证明,交换 的 label。
定理 (4.30): Consider a leaf in with largest depth. Leaf has a parent , and by (4.28) is a full binary tree, so has another child . Then, is a leaf of . 反证法
定理 (4.31): There is an optimal prefix code, with corresponding tree , in which the two lowest-frequency letters are assigned to leaves that are siblings in .
算法伪代码
To construct a prefix code for an alphabet , with given frequencies:
- If has two letters then
- Encode one letter using 0 and the other letter using 1
- Else
- Let and be the two lowest-frequency letters
- Form a new alphabet by deleting and and replacing them with a new letter of frequency
- Recursively construct a prefix code for , with tree
- Define a prefix code for as follows:
- Start with
- Take the leaf labeled and add two children below it, labeled and
-
EndIf