
shcodec is order-0 32-bit canonical static huffman codec. It
encodes alphabet of 256 symbols with minimum-redundancy or
length-restricted codes(basic method: Alistair Moffat and Jyrki
Katajainen, modified by Artur A. Pessoa). shcodec uses efficient
method for tree packing: on text files packed tree size is approx
68 bytes, on binary files this value is about 132 bytes. Memory
requirements are very small: 1280 bytes for encoding and only 574
bytes for decoding! shcodec uses extremely fast and simple
SHIFT-OR method for encoding, and CANONICAL-DECODE with a cache
for small codewords for decoding.

shcodec package contain all routines, needed to implement fast and
efficient static huffman codec. All necessary preprocessing steps
are described in the test program. In order to keep high operating
speed, encode/decode routines are implemented as macro insead of
functions: they encode/decode one symbol at time to/from the
specified buffer The way how encode/decode loops must be organized
is also shown in the test program.

Next table presents experimental results for shcodec version 1.0.1 and
range coder version 1.3
on the calgary corpus set.
In this table we use the following notation:
file | file to encode |
file size | original file size in bytes |
tree | packed tree size in bytes (tree is included into the shc size) |
shc/rng size | encoded file size in bytes (with shcodec/range coder) |
shc/rng bpb | bpb value (for shcodec/range coder) |
ent bpb | order-0 entropy bpb value |
With (#) sign we denote files where range coder is more efficient
than shcodec(only 2 files from 18).
file | file size | tree | shc size | rng size | shc bpb | rng bpb | ent bpb |
bib | 111,261 | 68 | 72,837 | 73,421 | 5,237 | 5,279 | 5,201 |
book1 | 768,771 | 68 | 438,449 | 441,851 | 4,563 | 4,598 | 4,527 |
book2 | 610,856 | 68 | 368,373 | 370,656 | 4,824 | 4,854 | 4,793 |
geo | 102,400 | 132 | 72,693 | 73,078 | 5,679 | 5,709 | 5,646 |
news | 377,109 | 56 | 246,457 | 248,417 | 5,228 | 5,270 | 5,190 |
obj1 | 21,504 | 132 | 16,189 | 16,493 | 6,023 | 6,136 | 5,948 |
obj2 | 246,814 | 132 | 194,233 | 189,329 | 6,296 | 6,137# | 6,260 |
paper1 | 53,161 | 68 | 33,413 | 33,419 | 5,028 | 5,029 | 4,983 |
paper2 | 82,199 | 68 | 47,689 | 48,022 | 4,641 | 4,674 | 4,601 |
paper3 | 46,526 | 56 | 27,337 | 27,645 | 4,701 | 4,753 | 4,665 |
paper4 | 13,286 | 52 | 7,917 | 8,047 | 4,767 | 4,845 | 4,700 |
paper5 | 11,954 | 56 | 7,493 | 7,698 | 5,015 | 5,152 | 4,936 |
paper6 | 38,105 | 68 | 24,097 | 24,225 | 5,059 | 5,086 | 5,010 |
pic | 513,216 | 140 | 106,697 | 79,391 | 1,663 | 1,238# | 1,210 |
progc | 39,611 | 56 | 25,977 | 26,200 | 5,246 | 5,291 | 5,199 |
progl | 71,646 | 56 | 43,045 | 43,215 | 4,806 | 4,825 | 4,770 |
progp | 49,379 | 68 | 30,289 | 30,677 | 4,907 | 4,970 | 4,869 |
trans | 93,695 | 76 | 65,301 | 65,586 | 5,576 | 5,600 | 5,533 |
average | | 79 | | | 4,959 | 4,969 | 4,891 |
As you can see, on most files(16 from 18), shcodec outperformes
range coder a little(on 0,201% in average). On very redundant files,
such as pic, and inhomogenious ones, adaptive entropy coders are
usually better.
So, shcodec does`t suits for high order models and not so efficient,
when symbol distribution is very skewed(e.g. a lot of zeros in pic),
but in practice, under order-0 model, it shows compression performance
comparable with best known entropy coders and runs very fast.

This software may be used freely for any purpose. However, when
distributed, the original source must be clearly stated, and,
when the source code is distributed, the copyright notice must
be retained and any alterations in the code must be clearly marked.
No warranty is given regarding the quality of this software.

shc101.zip(~45 Kbytes) - shcodec source
code, win32 executable and documentation.
sh-sfx.zip(~28 Kbytes) - utility for
creating SFX archives from shcodec files under win32.
HuffmanCode.zip(~29 Kbytes) -
My article about Huffman Code (in Russian). Also in
html.

If you have a question about using shcodec, you can reach me via e-mail:
xander@online.ru or ICQ:
#138740136, nick: xander(use Russian or English). I will be happy to help ;)
Don`t forget to visit my new site:
http://www.entropyware.info
|