Disclaimer
Copyright (c) Arturo San Emeterio Campos 1999. All rights reserved.
Permission is granted to make verbatim copies of this document for private
use only.
Download
Download the whole article zipped.
Table of contents
Introduction
Move To Front is a transformation algorithm which does not compress
data but can help to reduce redundancy some times, like after a bwt
transformation, where a symbol which has recently seen, appears again.
Mtf
Mtf instead of outputting the symbol (byte), outputs a code which refers
to the position of the symbol in a table with all the symbols, thus the
length of the code is the same as the length of the symbol. (when using
bytes as symbols, byte based output can be performed) Both encoder and
decoder should initialize the table with the same symbols in the same positions.
Once a symbol is being processed the encoder outputs its position in the
table and then put it in the top of the table, position 0, All the codes
from the position till the position of the symbol being coded are moved
to the next position. This simple scheme assign codes with lower values
for more redundant symbols. (symbols which appear more frequently) Let's
say we have an string like the following: "abaacabad" Mtf will process
it in this way:
|
|
"011021213" As you can see this is easier to entropy code. In runs of
the same byte, where mtf can transform "aaaaabbbb" to "000001000" an entropy
encoder (I mean order-0) will assign less bits to 0 which now occurs more
often than "a" or "b" did. Don't try to compress the output of mtf with
a sensitive context compressor, results will be poor, because this transformation,
though it's reversible, completely eliminates the contexts.
Encoder
The first thing to do is having an array of 256 entries (in the case
we have an alphabet with 256 entries, like when coding bytes), unsigned
char list[256];
Once you have it, you have to initialize it, the code is very easy:
for (counter=0;counter!=256;++counter)
list[counter]=counter;
Then you start with the main loop, get a byte, and search it in the
array:
for (index=0;;++index)
scan the whole array, no need of end condition
if (_byte_ == bytes_order[index]) till
we found it
break; //then, break
the for
Once you are here, you output the byte, and then update the array, moving
all the bytes till the position of our match, and then putting in the position
0 the byte:
for ( ; index!=0 ; --index)
scan from the position of the byte to 0
list[index] = list[index-1];
move all of them to the right
list[0]=_byte_
update the order 0
This is the end of the main loop, do that for all the symbols to code.
Decoder
Well, once you have done the encoder, the decoder is very easy. Like
the encoder does, initialize the table, then the main loop, get a code,
this is the position in the table, and then get the symbol from the table:
code = getc(file);
_byte_ = list[code];
get the byte
Now update the table, like the encoder does:
index = position; the
position to start
for (;index!=0;--index)
list[index] = list[index-1];
list[0]=_byte_;
This is everything you need for a mtf codec.
Closing words
If you are a beginner I recommend you to learn
lz77
and static huffman. If you find any
mistake in this text, or have any idea, email me.
Contacting the
author
You can reach me via email at: arturo@arturocampos.com
Also don't forget to visit my home page http://www.arturocampos.com
where you can find more and better info. See you in the next article!
This article comes from Arturo Campos home page at http://www.arturocampos.com Visit again soon for new and updated compression articles and software.