The Art
of Lossless
Data Compression
vol. 18t
Here are the results of tests performed in September 2000 to compare
lossless compression of english texts by all known good enough programs
developed for such purpose, including RK, DC, YBS, Bzip2, IMP, RAR and 7-zip.
See Archive Comparison Test by J.Gilchrist for more details: http://act.by.net
If anybody wants to start or continue such tests,
or can suggest some other sets of texts, or other compression programs,
(not sources or algorithm descriptions, executable programs only)
or knows we have missed something important,
(some new fantastic technology, an algorithm or even a program capable
of lossless compression of up to 1000:1 etc.)
please let us know immediately: artest@hotmail.ru Thank you!
[[1]] COMPRESSION QUALITY
(see also
[[2]] Speed
[[3]] Details
[[4]] Comments)
Fifth line shows results for the sum of four Canterbury Corpus Large Set files,
tenth line - for the sum of all 556 files in five sets.
original ACE32 BEE BIX BOA BA BZip2 DC ERI IMP
-m5-d4096 -m3 -d3 -m1 -mdg -m15 -k50 -m -k -9 -b16300-mt5 -m5 -2 -s4
581.79% 138.67 108.95 129.00 106.46 109.61 121.55 104.85 112.32 119.84
411.40% 112.54 105.04 105.48 100.56 103.86 110.95 101.39 106.17 109.09
582.55% 139.98 106.19 130.78 106.37 106.98 120.52 102.53 109.57 118.23
657.05% 139.67 112.21 137.08 112.45 110.49 130.05 110.92 112.48 128.20
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
523.75% 128.40 106.29 120.77 104.15 106.01 117.43 102.85 108.43 115.51
485.12% 134.76 105.29 129.30 104.67 106.57 116.69 101.84 110.39 115.42
395.40% 130.54 104.40 124.45 102.72 105.51 112.96 100.90 109.14 112.65
432.31% 133.93 104.01 128.44 103.30 106.38 115.81 101.65 110.52 115.48
723.25% 147.93 112.09 143.07 110.68 118.26 135.44 109.89 118.12 143.21
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
448.68% 133.42 104.23 127.82 103.25 106.48 116.12 101.60 110.13 116.26
Arhangel ppmonstr RAR RK SZip UFA YBS ZZip 7-zip PkZip
-2-mm-mt -o8 -m5-mm-mde -mx3 -o10-b41 -m5-mu32 -m16mu -b20-mx -mx -exx
115.91 103.50 138.73 *100% 111.26 114.79 105.39 110.23 159.77 168.57
100% 102.56 112.46 102.13 103.83 100.50 102.00 103.28 111.08 115.52
115.28 102.02 141.03 *100% 111.22 112.14 102.81 107.61 161.22 169.60
139.25 104.61 141.29 100% 115.21 127.33 109.73 110.31 184.90 191.21
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
111.81 102.06 128.87 100% 108.11 109.42 103.13 106.08 144.02 150.72
113.92 100.67 134.99 *100% 110.86 112.33 104.56 107.44 152.34 158.71
107.53 100% 134.55 100.53 109.22 107.92 103.40 106.38 141.96 148.11
110.38 ^100% 135.25 100.63 109.56 109.02 104.18 107.36 147.41 153.39
137.70 105.98 153.76 100% 117.12 116.00 118.47 119.20 178.32 185.82
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
111.36 100.03 135.46 100% 109.45 108.89 104.45 107.39 147.87 153.99
* RK -mx2 (not -mx3 )
^ PPMonstr -o9 -m56 (not -o8 -m56)
[[2]] Speed
Canterbury Corpus Large Set http://corpus.canterbury.ac.nz/ftp/large.zip
was used for this test, and an AMD-K6-400 machine with 64M RAM and Windows98.
Programs,options Overall Average Compress Extract Compressed
score, Users' time, time, size,
score, seconds seconds bytes
seconds % seconds %
777 a -m5 -mu32 1354 147% 1171 133% 203 222 3343996
777 a -mg -s 1880 205% 1262 144% 688 139 3793939
7zip a 1307 142% 1232 140% 83 4 4393623
7zip a -mx 1358 148% 1240 141% 131 4 4401160
acb B 2540 276% 1818 207% 803 808 3346915
acb b 2997 326% 2059 235% 1042 1047 3267480
acb u 3802 414% 2496 285% 1452 1456 3221349
arhangel a -2 -mm 1203 131% 1117 127% 96 94 3647060
arhangel a -mt 1173 127% 1069 122% 115 109 3417110
arhangel a -mtf 1177 128% 1071 122% 118 110 3418181
ba -k 1057 115% 988 112% 78 26 3432541
ba -k -1 1170 127% 1122 128% 54 26 3927264
ba -k -50 1046 114% 954 109% 103 17 3337823
boa -m1 1623 176% 1387 158% 263 281 3886856
boa -a 1560 170% 1266 144% 327 340 3217347
boa -m15 1588 173% 1277 145% 346 358 3182732
bzip2 -k -1 1201 130% 1159 132% 47 13 4109767
bzip2 -k -5 1089 118% 1046 119% 48 14 3697142
bzip2 -k -9 1070 116% 1023 116% 53 15 3611558
dc e 950 103% 918 104% 36 22 3214240
dc e -a 950 103% 921 105% 33 23 3223329
dc e -b16300 1098 119% 875 100% 248 64 2829394
eri a -m2 1117 121% 975 111% 158 30 3346586
eri a -m3 1123 122% 971 110% 169 32 3318853
eri a 1136 123% 972 111% 183 33 3313568
imp98 a -2 1043 113% 1002 114% 46 11 3547964
imp98 a -2 -s4 1040 113% 998 114% 48 11 3535351
imp a -2 -s4 1041 113% 1001 114% 45 11 3548156
pkzip -es 1659 180% 1655 189% 5 3 5945608
pkzip -a 1326 144% 1307 149% 22 2 4691477
pkzip -exx 1498 163% 1303 148% 217 2 4605928
ppmd e -o5 -m56 950 103% 932 106% 20 23 3268214
ppmd e -o7 -m56 917 100% 893 102% 28 30 3095512
ppmd e -o9 -m56 985 107% 944 107% 46 46 3215327
ppmonstr e -o5 -m56 989 107% 954 109% 40 42 3268306
ppmonstr e -o7 -m56 965 105% 918 104% 53 56 3083063
ppmonstr e -o9 -m56 1036 112% 967 110% 77 77 3178172
rar a 1226 133% 1134 129% 103 4 4029077
rar a -m1 1247 135% 1205 137% 48 4 4304853
rar a -s -m5 1560 170% 1144 130% 463 4 3937052
rk -mf1 1194 130% 1166 133% 32 21 4110184
rk -mf2 1308 142% 1149 131% 177 76 3798456
rk -mx1 1736 189% 1350 154% 430 449 3089384
rk -mx2 1825 199% 1403 160% 470 502 3074900
rk -mx3 1891 206% 1440 164% 502 535 3076136
szip -v0 -b41 1019 111% 984 112% 39 34 3405120
szip -o8 -b41 1021 111% 974 111% 53 36 3356744
szip -o0 -b41 1055 115% 959 109% 107 24 3326271
ufa a -m5 -mu32 1378 150% 1185 135% 216 234 3343996
ufa a -m5 -mu10 1312 143% 1154 131% 177 195 3387619
ufa a -mg -s 1630 177% 1161 132% 522 28 3889878
uharc a 1381 150% 1183 135% 220 27 4081072
uharc a -m1 1354 147% 1244 142% 122 29 4333271
uharc a -m3 1514 165% 1125 128% 432 26 3801399
zzip a 1085 118% 1030 117% 62 28 3584447
zzip a -a3 1076 117% 1014 115% 69 30 3517619
zzip a -a4 -b12 1029 112% 950 108% 88 31 3277976
Overall score is calculated by adding compression time, extraction time, and
time it would take to transfer the compressed file over a 28,800bps network:
(compressed_size)/3600 , because 28800 bits_per_second is 3600 bytes_per_second
Average Users' score is calculated by adding (compress_time/10)+ extract_time +
time it would take to transfer the compressed file over a 28,800bps network.
Compression time is divided by 10 here, because more than 90% of people would
never compress anything during their life (with compression programs), but they
use compressed data almost _every_ time they use computers and/or Internet.
That's why compression time is not so actual for them.
[[3]] Details
are no longer put to this main text
(738 lines reporting 22796 results on 556 files in 5 sets),
but can be found in FULL version with TEXTS.DAT and *.BAT
at http://geocities.com/SiliconValley/Bay/1995/artest18.zip
or http://artest1.tripod.com/artest18.zip
[[4]] Comments
Links to download programs:
7-Zip 2.11 :W http://www.7-zip.com/dl/7zip211.exe 493K
BIX 1.00b7 :W http://www.7-zip.com/dl/ufa/bix100b7.zip 89K
777 0.04b1 :W http://www.7-zip.com/dl/ufa/777004b1.zip 72K
UFA 0.04b1 :W http://www.7-zip.com/dl/ufa/ufa004b1.zip 64K
ArHanGeL 1.40 :a http://geocities.com/SiliconValley/Lab/6606/arh140.zip 50K
ERI32 4.7fre :e http://geocities.com/eri32/eri47fre.zip 91K
Imp 1.1 :e http://www.winimp.com/imp110d.zip 266K
Imp-win 1.12 :W http://www.winimp.com/imp112.exe 122K
PkZip 2.50 :a ftp://ftp.simtel.net/pub/simtelnet/msdos/arcers/pk250dos.exe 202K
RK 1.03b1 :e http://malcolmt.tripod.com/downloads/rk103a1d.exe 478K
RK 1.03b1 :W http://malcolmt.tripod.com/downloads/rk103a1w.exe 380K
RAR32 2.71 :e ftp://ftp.netlab.sk/public/rarsoft/rar/rarx271.exe 257K
WinRAR 2.71 :W ftp://ftp.netlab.sk/public/rarsoft/rar/wrar271.exe 588K
PPMD var.F,
PPmonstr v.F :W ftp://ftp.simtel.net/pub/simtelnet/win95/compress/ppmdf.zip 97K
ACB 2.00c :e ftp://ftp.simtel.net/pub/simtelnet/msdos/compress/acb_200c.zip 42K
BOA 0.58b :e ftp://ftp.cdrom.com/.3/sac/pack/boa058.zip 74K
DC 0.98b :W ftp://ftp.cdrom.com/.3/sac/pack/dc124.zip 55K
BA 1.00 beta :e ftp://ftp.cdrom.com/.3/sac/pack/ba100b.zip 60K
Bzip2 1.0.1 :W ftp://sourceware.cygnus.com/pub/bzip2/v100/bzip2-100-x86-win32.exe 68K
SZip 1.12a :W http://www.compressconsult.com/szip/szip_112a_win32.zip 71K
ZZip 0.35e :W http://www.via.ecp.fr/~damien/zzip/zzip-win32.zip 24K
ACE32 2.0b2 :W ftp://ftp.forlangs.net/pub/windows/winace/ace20b2.exe 546K
YBS 0.03d :e http://members.nbci.com/vycct/ybs003dd.zip 48K
YBS 0.03d :W http://members.nbci.com/vycct/ybs003dw.zip 42K
:a - any DOS - DOS programs, will run under pure DOS or in a DOS box
:e - extender - DOS programs using DOS extenders like DOS/4GW or CWSDPMI
:W - windoze - Windows95/98/NT/etc programs
If direct link doesn't work-most probably newer version of the program appeared
at the same site: visit web page, or read the whole directory from ftp server
(i.e. try the same URL, but without filename).
Homepages:
Arhangel : http://geocities.com/SiliconValley/Lab/6606
Eri32 : http://geocities.com/eri32
mirror : http://artest1.tripod.com
RK : http://malcolmt.tripod.com
Imp,WinImp : http://www.technelysium.com.au
mirror : http://www.winimp.com
ACE32 : http://www.winace.com
PkZip : http://www.pkware.com
RAR,WinRAR : http://www.rarsoft.com
BZip2 : http://sources.redhat.com/bzip2
SZip : http://www.compressconsult.com/szip
ZZip : http://www.via.ecp.fr/~damien/zzip
YBS : http://members.nbci.com/vycct
Ufa,777,
BIX,7-Zip: http://www.7-zip.com
What's new:
8 new programs were tested:
ACE32 2.0c2, BEE 0.4.8, BIX 1.00b7, DC 0.99.158b,
PPMonstr var.Gpre, RK 1.03b1, YBS 0.03d, ZZip 0.35e.
Latest beta versions of BEE, DC, PPMonstr are available
from authors by e-mail request:
BEE: Andrew.Filinsky@p11.f4.n452.z2.fidonet.org
DC: EdgarBinder@t-online.de
PPMonstr: shkarin@arstel.ru , dmitry.shkarin@mtu-net.ru
Section [[2]] Speed
was not updated this time.
Next release will be soon.
5029 "binary" files were added - see vol.18b, binary.txt .
Results of some programs were not put to latest release:
take previous vol.17 to see performance of ACB, UHARC and PPMDF.
WARNINGS:
RK 1.03b1 was unable to correctly decompress 555 files (all except E.TXT)
compressed with "-mx3 -ft-" , reporting
ERROR 303: CRC check failed.
BA 1.00beta can't decompress any file compressed with -mf , and says nothing
like "CRC fails"
DC 0.99.158b failed to decompress 1DFRE10.dc , ANDES10.dc , and BTI0110.dc ,
saying "Corrupted block" (while t(est) command writes "Test successful").
UFA and 777 can't handle files with symbol ` (ASCII code 96) in their names.
It was replaced with _ in nine filenames.
ERI32 4.7fre can't compress files larger than (free DPMI memory)/6, i.e.
about 10Mb on a PC with 64Mb RAM. The largest 44Mb file was split to 5 chunks
9000000 bytes long (last chunk was 8894190 bytes).
The LATEST RELEASE, and fifteen previous versions of these tests can be found
at http://geocities.com/SiliconValley/Bay/1995/ and http://artest1.tripod.com/
The FINAL PART
> [[5]] PLEASE read THIS before replying to this article
was removed from this text, but can be easily found at
http://geocities.com/SiliconValley/Bay/1995/artest10.html
http://artest1.tripod.com/artest10.html
Send your suggestions, comments to artest@hotmail.ru
With best kind regards,
RAO Inc.