Information Theory - Chapter X+1 Shannon-Fano-Elias Codes and Arithmetic Codeing

3 minute read

Published: March 06, 2026

This chapter we will introduce Shnnon-Fano-Elias codes and arithmetic coding.

Shannon-Fano-Elias Coding

For $P=(p_1, \cdots, p_m), X\in Z_+^m, p_i>0(\forall i)$. Let $\bar{F}(x)=\sum_{a<x}P(a)\frac{1}{2}P(x)$. Take $l(x)=\lceil\log\frac{1}{P(x)}+1\rceil$. Then represent $x$ as the first $l(x)$ bits of $\bar{F}(x)$. Denote the set of $\bar{F}(x)$ as:

\[S_1 = \{\bar{F}(x):x\in X\}\]

Remark:

(1). $S_1$ is nonsingular. Since $P(x)>0$ for all $x$, $\bar{F}(a)\neq\bar{F}(b)$ if $a\neq b$.

$\Rightarrow$ We can determine $x$ if $\bar{F}(x)$ is known $\Rightarrow$ $S$ is non-singular.

In general, $\bar{F}(x)$ is not expressible by a finite number of bits.

(2). Shannon-Fano-Elias: use the first $l(x)$ bits of $\bar{F}(x)$. denoted by $\lfloor\bar{F}(x)\rfloor_{l(x)}$. So the set of $\lfloor\bar{F}(x)\rfloor_{l(x)}$ is denoted by:

\[S_2=\{\lfloor\bar{F}(x)\rfloor_{l(x)}:x\in X\},\]

which is a prefix code.

Recall the cumulative probability distribution of $X$:

$F(x)=Pr(X\le x)=\sum_{a\le x}P(a)$ $\bar{F}(x)-\lfloor\bar{F}(x)\rfloor_{l(x)}<\frac{1}{2^{l(x)}}$

Since $l(x)=\lceil\log\frac{1}{p(x)}\rceil +1$, then

\[\begin{aligned} &\frac{1}{2^{l(x)}}=\frac{1}{2}\frac{1}{2^{\lceil\log\frac{1}{p(x)}\rceil}}\le\frac{P(x)}{2}=\bar{F}(x)-F(x-1)\\ \Rightarrow&\bar{F}(x)-\lfloor\bar{F}(x)\rfloor_{l(x)}<\bar{F}(x)-F(x-1)\\ \Rightarrow&\lfloor\bar{F}(x)\rfloor_{l(x)}\in (F(x-1), \bar{F}(x)]\\ \Rightarrow&\{\lfloor\bar{F}(x)\rfloor_{l(x)}:x\in X\} \text{ is non-singular.} \end{aligned}\]

Now check $\lfloor\bar{F}(x)\rfloor_{l(x)}$ is also prefix-free.

Associate $\lfloor\bar{F}(x)\rfloor_{l(x)}=0.z_1z_2\cdots z_l$ with the interval.

\[I_x=[0.z_1z_2\cdots z_l, 0.z_1z_2\cdots z_l+\frac{1}{2^l}).\]

If we can show $I_x\subset(F(x-1), \bar{F}(x)]$, then the code is prefix-free.

For lower end of $I_x$:

\[\lfloor\bar{F}(x)\rfloor_{l(x)}\in(F(x-1), \bar{F}(x)].\]

For upper end of $I_x$:

\[\begin{aligned} \lfloor\bar{F}(x)\rfloor_{l(x)}+\frac{1}{2^{l(x)}}&\in (F(x-1)+\frac{1}{2^{l(x)}}, \bar{F}(x)+\frac{1}{2^{l(x)}}]\\ &\subseteq (F(x-1)+\frac{1}{2^{l(x)}}, \bar{F}(x)] \\ &= (F(x-1)+\frac{1}{2^{l(x)}}, F(x)] \end{aligned}\] \[\begin{aligned} \Rightarrow& I_x\subset (F(x-1), F(x)]\\ \Rightarrow& \text{all intervals } I_x, x\in X \text{ are disjoint}\\ & \vert \cup_xI_x\vert =\sum_{x}\vert I_x\vert <1. \\ \Rightarrow&\{\lfloor\bar{F}(x)\rfloor_{l(x)}:x\in X\} \text{ is prefix free}. \end{aligned}\]

(3). The encoding procedure does not require ordering the symbols $x\in X$ according to their probabilities. Unlike Huffman codes calculate the probabilities of all length-$n$ sequence. So for $\vert X\vert ^n$ sequences, complexity is exponential in $n$.

Arithmetic coding

An extension of Shannon-Fano-Elias coding, encodes sequences of random variables with a complexity linear in $n$.[used in JPEG].

Idea: represent a sequence of RVs by a subinterval of $[0, 1]$.

The encoding function of arithmetic coding relies on the following lemma.

Lemma: Let $Y$ be a random variable with continuous probability distribution function $F(y)$. Let $U=F(Y)$. Then $U$ is uniformly distributed in $[0, 1]$.

Proof:

For $u\in [0, 1], F_U(u)=P(U\le u)=P(F(Y)\le u)=P(Y\le F^{-1}(u))=F[F^{-1}(u)]=u.$

$X_1, X_2, \cdots$ takes values in alphabet $\mathcal{X}$. Let $X=0.X_1X_2\cdots$ be the real-alued random variable between [0, 1]. Then the codeword of $X$ with realization $x=0.x_1x_2\cdots$ is given by

\[U=F_\mathcal{X}(X)=F_\mathcal{X}(0.X_1X_2\cdots)=0.F_1F_2\cdots.\]

Assuming the distribution on $X^\infty$ has no atoms, by the lemma, we have $U$ is uniformly distributed on $[0, 1]$. In other words $F_i\sim \text{Bornouli}(\frac{1}{2})$.

Remark: Arithmetic codes are non-singular.

The probability distribution function forms an invertible mapping from infinite source seq. to incompressible infinite seq.

Associate the seq. $X_1, X_2, \cdots X_n$ with the interval $[0.x_1x_2\cdots x_n, 0.x_1\cdots x_n+\frac{1}{2^n})$

Under the probability transform, the interval is mapped to

\[I_{x^n}=[F_X(0.x_1x_2\cdots x_n), F_X(0.x_1x_2\cdots x_n+\frac{1}{2^n})).\]

For $u\in I_{x^n}$, its preimage is a sequence that starts with $x_1, x_2, \cdots,x_n$.

So we can reconstruct $x_1, \cdots, x_n$ using $u$.

$\Rightarrow$ We may use any $u\in I_{x^n}$ as a codeword for $x_1, \cdots, x_n$.

Recall: what point does Shannon-Fano-Elias choose? $u=\lfloor\bar{F}(x)\rfloor_{l(x)}, l(x)=\lceil\log\frac{1}{p(x)}\rceil+1$.

So far, we assumed we can represent a real number with arbitrary accuracy. But in practice, we only have finite precision. Instead of infinite precision points, consider the subintervals of $[0, 1]$.

Note that when we observe more and more RVs, the length of $I_{x^n}$ decreases.

Observe that $F_{X}(0.x_1x_2\cdots x_n)=F_{X}(0.x_1, \cdots x_{n-1})+P_X(0.x_1x_2\cdots x_{n-1})\sum_{y\in X}P_{X_n\vert X_1, X_2, \cdots, X_{n-1}}(y\vert 0.x_1x_2\cdots x_{n-1})$.

So $F_{X}(0.x_1\cdots x_n)$ can be computed sequentially with the complexity of $n\vert \mathcal{X}\vert $.

This is useful because it allows to incorporate learned information about the stream. (we may Laplace estimate for $P_{X_n\vert X_1, \cdots, X_{n-1}}$.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Yeqing Qiu

Information Theory - Chapter X+1 Shannon-Fano-Elias Codes and Arithmetic Codeing

Shannon-Fano-Elias Coding

Arithmetic coding

Share on

You May Also Enjoy

我的五年

Information Theory - Chapter 1: Entropy

Information Theory - Chapter 2 Property of Entropy

Information Theory - Chapter 3 Important Inequality