The Art
of Lossless
Data Compression
vol. 19t
Here are the results of tests performed in September 2000 to compare
lossless compression of english texts by all known good enough programs
developed for such purpose, including RK, DC, YBS, Bzip2, IMP, RAR and 7-zip.
See Archive Comparison Test by J.Gilchrist for more details: http://act.by.net
If anybody wants to start or continue such tests,
or can suggest some other sets of texts, or other compression programs,
(not sources or algorithm descriptions, executable programs only)
or knows we have missed something important,
(some new fantastic technology, an algorithm or even a program capable
of lossless compression of up to 1000:1 etc.)
please let us know immediately: artest@hotmail.ru Thank you!
[[1]] COMPRESSION QUALITY
(see also
[[2]] Speed
[[3]] Details
[[4]] Comments)
Fifth line shows results for the sum of four Canterbury Corpus Large Set files,
tenth line - for the sum of all 556 files in five sets.
Original ACE32 BEE BIX BOA BA BZip2 DC ERI IMP
length -m5-d4096 -m3-d3 -m1 -mdg -m15 -k50 -m -k -9 -b16300-mt5 (none) -2-s4
581.79% 138.67 108.95 129.00 106.46 109.61 121.55 104.85 112.32 119.84
411.40% 112.54 105.04 105.48 100.56 103.86 110.95 101.39 106.17 109.09
582.55% 139.98 106.19 130.78 106.37 106.98 120.52 102.53 109.57 118.23
657.05% 139.67 112.21 137.08 112.45 110.49 130.05 110.92 112.48 128.20
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
523.75% 128.40 106.29 120.77 104.15 106.01 117.43 102.85 108.43 115.51
485.12% 134.76 105.29 129.30 104.67 106.57 116.69 101.84 110.39 115.42
395.58% 130.60 104.45 124.51 102.76 105.56 113.01 100.95 109.19 112.70
432.57% 134.01 104.07 128.51 103.36 106.45 115.88 101.71 110.58 115.55
723.25% 147.93 112.09 143.07 110.68 118.26 135.44 109.89 118.12 143.21
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
448.75% 133.44 104.25 127.84 103.27 106.50 116.14 101.61 110.15 116.28
ArHanGel PPMonstr SBC RAR RK SZip 777 7-zip YBS ZZip
-2-mm-mt -o8-m58 -b19 -m5-mm-mde -mx3 -o10-b41 -m5-mu32 -mx -m16mu -b20-mx
115.91 103.48 111.74 138.73 *100% 111.26 114.79 159.77 105.39 109.54
100% 102.55 101.83 112.46 102.13 103.83 100.50 111.08 102.00 103.38
115.28 101.98 109.04 141.03 *100% 111.22 112.14 161.22 102.81 106.94
139.25 104.59 112.95 141.29 100% 115.21 127.33 184.90 109.73 110.23
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
111.81 102.04 106.61 128.87 100% 108.11 109.42 144.02 103.13 105.77
113.92 100.61 110.78 134.99 *100% 110.86 112.33 152.34 104.56 107.07
107.58 100% 109.23 134.61 100.57 109.27 107.97 142.02 103.44 106.12
110.45 ^100% 110.75 135.33 100.69 109.62 109.09 147.50 104.24 107.05
137.70 105.95 117.14 153.76 100% 117.12 116.00 178.32 115.11 118.63
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
111.38 100% 110.16 135.48 100.01 109.47 108.91 147.89 104.24 107.04
* RK -mx2 (not -mx3 )
^ PPMonstr -o9 -m56 (not -o8 -m56)
[[2]] Speed
Canterbury Corpus Large Set http://corpus.canterbury.ac.nz/ftp/large.zip
was used for this test, and an AMD-K6-400 machine with 64M RAM and Windows98.
Programs,options Overall Average Compress Extract Compressed
score, Users' time, time, size,
score, seconds seconds bytes
seconds % seconds %
777 a -m5 -mu32 1354 156% 1171 140% 203 222 3343996
777 a -mg -s 1880 217% 1262 150% 688 139 3793939
7zip a 1307 151% 1232 147% 83 4 4393623
7zip a -mx 1358 156% 1240 148% 131 4 4401160
acb B 2540 293% 1818 217% 803 808 3346915
acb b 2997 346% 2059 246% 1042 1047 3267480
acb u 3802 439% 2496 298% 1452 1456 3221349
ace32 a 1265 146% 1132 135% 148 7 3998222
ace32 a -d4096 1265 146% 1123 134% 158 7 3962314
ace32 a -d4096 -s- 1265 146% 1123 134% 159 7 3962374
ace32 a -d4096 -m1 1221 141% 1150 137% 80 7 4086782
ace32 a -d4096 -m5 1552 179% 1142 136% 456 7 3923686
arhangel a -2 -mm 1203 139% 1117 133% 96 94 3647060
arhangel a -mt 1173 135% 1069 127% 115 109 3417110
arhangel a -mtf 1177 136% 1071 128% 118 110 3418181
ba -k 1057 122% 988 118% 78 26 3432541
ba -k -1 1170 135% 1122 134% 54 26 3927264
ba -k -50 1046 120% 954 114% 103 17 3337823
bee a -m1 1297 149% 1143 136% 171 178 3414048
bee a -m2 1371 158% 1177 140% 215 222 3361009
bee a -m3 1615 186% 1303 155% 347 353 3295506
bee a -m1 -d3 1247 144% 1114 133% 148 168 3353767
bee a -m2 -d3 1312 151% 1143 136% 188 210 3289365
bee a -m3 -d3 1534 177% 1268 151% 296 336 3248025
bee a -m3 -s 1846 213% 1430 171% 463 466 3303624
bee a -d3 -s 1363 157% 1176 140% 209 216 3378513
bix a 1243 143% 1063 127% 201 3 3743319
bix a -mdg 1245 143% 1051 125% 215 4 3690815
bix a -m9 1246 144% 1064 127% 202 4 3743319
bix a -mdg -m9 1249 144% 1052 125% 219 5 3690815
bix a -mdg -s 1274 147% 1054 126% 244 5 3690984
boa -m1 1623 187% 1387 165% 263 281 3886856
boa -a 1560 180% 1266 151% 327 340 3217347
boa -m15 1588 183% 1277 152% 346 358 3182732
bzip2 -k -1 1201 138% 1159 138% 47 13 4109767
bzip2 -k -5 1089 125% 1046 125% 48 14 3697142
bzip2 -k -9 1070 123% 1023 122% 53 15 3611558
dc e 948 109% 917 109% 35 19 3218290
dc e -ft 954 110% 921 110% 37 20 3232273
dc e -b16300 1024 118% 872 104% 170 69 2826931
dc e -b16300 -mt5 995 115% 869 103% 141 70 2826931
dc e -b12000 867 100% 836 100% 35 18 2931168
dc e -b12000 -mt5 865 100% 836 100% 33 18 2931168
eri a -m1 1110 128% 982 117% 143 29 3378440
eri a -m2 1108 128% 975 116% 148 30 3346586
eri a -m3 1114 128% 970 116% 160 32 3318853
eri a 1127 130% 971 116% 175 33 3313568
eri a -m5 1162 134% 975 116% 208 33 3313559
imp98 a -2 1043 120% 1002 119% 46 11 3547964
imp98 a -2 -s4 1040 120% 998 119% 48 11 3535351
imp_d a -2 -s4 1041 120% 1001 119% 45 11 3548156
pkzip -es 1659 191% 1655 197% 5 3 5945608
pkzip -a 1326 153% 1307 156% 22 2 4691477
pkzip -exx 1498 173% 1303 155% 217 2 4605928
ppmd e -o5 953 110% 934 111% 21 22 3276542
ppmd e -o7 967 111% 941 112% 29 32 3260462
ppmd e -o9 1027 118% 990 118% 42 45 3387445
ppmd e -o5 -m56 948 109% 931 111% 20 22 3266132
ppmd e -o6 -m56 927 107% 906 108% 24 26 3159004
ppmd e -o7 -m56 914 105% 890 106% 27 29 3090636
ppmd e -o8 -m56 917 106% 885 105% 36 36 3045769
ppmd e -o9 -m56 956 110% 919 109% 42 42 3142087
ppmonstr e -o5 1025 118% 975 116% 57 59 3276610
ppmonstr e -o7 1038 120% 975 116% 70 75 3214871
ppmonstr e -o9 1106 127% 1022 122% 93 98 3293262
ppmonstr e -o5 -m56 1018 117% 971 116% 53 58 3267452
ppmonstr e -o7 -m56 983 113% 924 110% 65 69 3055431
ppmonstr e -o9 -m56 1048 121% 955 114% 104 96 3051781
rar a 1226 141% 1134 135% 103 4 4029077
rar a -m1 1247 144% 1205 144% 48 4 4304853
rar a -s -m5 1560 180% 1144 136% 463 4 3937052
rk -mf1 1134 131% 1096 131% 43 29 3826096
rk -mf2 1228 141% 1109 132% 133 81 3652520
rk -mf3 1347 155% 1121 134% 252 83 3645264
rk -mx1 1615 186% 1249 149% 407 352 3083632
rk -mx2 1735 200% 1320 157% 461 418 3080372
rk -mx2 -ft+ -fe+ 1737 200% 1321 158% 463 419 3080372
rk -mx3 1768 204% 1336 159% 480 437 3064076
rk -mx3 -ft+ -fe+ 1765 204% 1334 159% 479 435 3064076
sbc c 1058 122% 993 118% 73 24 3459990
sbc c -b9 1052 121% 967 115% 95 26 3352214
sbc c -b19 1103 127% 958 114% 162 43 3233894
sbc c -b19 -e 1033 119% 941 112% 103 26 3257878
szip -v0 -b41 1019 117% 984 117% 39 34 3405120
szip -o8 -b41 1021 118% 974 116% 53 36 3356744
szip -o0 -b41 1055 121% 959 114% 107 24 3326271
ufa a -m5 -mu32 1378 159% 1185 141% 216 234 3343996
ufa a -m5 -mu10 1312 151% 1154 138% 177 195 3387619
ufa a -mg -s 1630 188% 1161 138% 522 28 3889878
uharc a 1381 159% 1183 141% 220 27 4081072
uharc a -m1 1354 156% 1244 148% 122 29 4333271
uharc a -m3 1514 175% 1125 134% 432 26 3801399
ybs_d -y 986 113% 932 111% 61 19 3265494
ybs_d -m2mu 986 113% 932 111% 61 19 3265494
ybs_d -m16mu 988 114% 925 110% 71 19 3236677
ybs_d -m16mu -r 992 114% 930 111% 70 18 3257713
zzip a 1033 119% 975 116% 65 25 3396007
zzip a -mm 1615 186% 1555 186% 68 29 5468735
zzip a -mm -b20 1436 166% 1364 163% 81 28 4780656
zzip a -mm -mx 1030 119% 971 116% 66 26 3376260
Overall score is calculated by adding compression time, extraction time, and
time it would take to transfer the compressed file over a 28,800bps network:
(compressed_size)/3600 , because 28800 bits_per_second is 3600 bytes_per_second
Average Users' score is calculated by adding (compress_time/10)+ extract_time +
time it would take to transfer the compressed file over a 28,800bps network.
Compression time is divided by 10 here, because more than 90% of people would
never compress anything during their life (with compression programs), but they
use compressed data almost _every_ time they use computers and/or Internet.
That's why compression time is not so actual for them.
[[3]] Details
are no longer put to this main text
(738 lines reporting 22796 results on 556 files in 5 sets),
but can be found in FULL version with TEXTS.DAT and *.BAT
at http://geocities.com/SiliconValley/Bay/1995/artest19.zip
or http://artest1.tripod.com/artest19.zip
[[4]] Comments
Links to download programs:
7-Zip 2.11 :W http://www.7-zip.com/dl/7zip211.exe 493K
BIX 1.00b7 :W http://www.7-zip.com/dl/ufa/bix100b7.zip 89K
777 0.04b1 :W http://www.7-zip.com/dl/ufa/777004b1.zip 72K
UFA 0.04b1 :W http://www.7-zip.com/dl/ufa/ufa004b1.zip 64K
ArHanGeL 1.40 :a http://geocities.com/SiliconValley/Lab/6606/arh140.zip 50K
ERI32 4.8fre :e http://geocities.com/eri32/eri48fre.zip 91K
Imp 1.1 :e http://www.winimp.com/imp110d.zip 266K
Imp-win 1.12 :W http://www.winimp.com/imp112.exe 122K
PkZip 2.50 :a ftp://ftp.simtel.net/pub/simtelnet/msdos/arcers/pk250dos.exe 202K
RK 1.03b1 :e http://malcolmt.tripod.com/downloads/rk103a1d.exe 478K
RK 1.03b1 :W http://malcolmt.tripod.com/downloads/rk103a1w.exe 380K
RAR32 2.71 :e ftp://ftp.netlab.sk/public/rarsoft/rar/rarx271.exe 257K
WinRAR 2.71 :W ftp://ftp.netlab.sk/public/rarsoft/rar/wrar271.exe 588K
PPMD var.F,
PPmonstr v.F :W ftp://ftp.simtel.net/pub/simtelnet/win95/compress/ppmdf.zip 97K
ACB 2.00c :e ftp://ftp.simtel.net/pub/simtelnet/msdos/compress/acb_200c.zip 42K
BOA 0.58b :e ftp://ftp.cdrom.com/.3/sac/pack/boa058.zip 74K
DC 0.98b :W ftp://ftp.cdrom.com/.3/sac/pack/dc124.zip 55K
BA 1.00 beta :e ftp://ftp.cdrom.com/.3/sac/pack/ba100b.zip 60K
Bzip2 1.0.1 :W ftp://sourceware.cygnus.com/pub/bzip2/v100/bzip2-100-x86-win32.exe 68K
SZip 1.12a :W http://www.compressconsult.com/szip/szip_112a_win32.zip 71K
UHArc 0.2b :e ftp://ftp.cdrom.com/.3/sac/pack/uharc02.zip 101K
ZZip 0.35g :W http://www.via.ecp.fr/~damien/zzip/zzip-win32.zip 23K
ACE32 2.0b3 :W ftp://ftp.forlangs.net/pub/windows/winace/ace20b3.exe 573K
YBS 0.03e :e http://members.nbci.com/vycct/ybs003ed.zip 55K
YBS 0.03e :W http://members.nbci.com/vycct/ybs003ew.zip 43K
SBC 0.305b :e http://geocities.com/sbcarchiver/sbc0305b.zip 158K
BEE 0.4.8 : mailto:Andrew.Filinsky@p11.f4.n452.z2.fidonet.org
:a - any DOS - DOS programs, will run under pure DOS or in a DOS box
:e - extender - DOS programs using DOS extenders like DOS/4GW or CWSDPMI
:W - windoze - Windows95/98/NT/etc programs
If direct link doesn't work-most probably newer version of the program appeared
at the same site: visit web page, or read the whole directory from ftp server
(i.e. try the same URL, but without filename).
Homepages:
Arhangel : http://geocities.com/SiliconValley/Lab/6606
Eri32 : http://geocities.com/eri32
mirror : http://artest1.tripod.com
RK : http://malcolmt.tripod.com
Imp,WinImp : http://www.technelysium.com.au
mirror : http://www.winimp.com
ACE32 : http://www.winace.com
PkZip : http://www.pkware.com
RAR,WinRAR : http://www.rarsoft.com
BZip2 : http://sources.redhat.com/bzip2
SZip : http://www.compressconsult.com/szip
ZZip : http://www.via.ecp.fr/~damien/zzip
YBS : http://members.nbci.com/vycct
SBC : http://geocities.com/sbcarchiver
Ufa,777,
BIX,7-Zip: http://www.7-zip.com
PPMD, PPMonstr, ACB, BA, Bee, BOA, DC, UHArc - no homepage.
What's new:
7 new programs were tested:
PPMD var.Gpre Sep29, PPMonstr var.Gpre Oct4, YBS 0.03e -DOS and Win32 versions,
ZZip 0.35f, SBC 0.304b, ERI32 4.8fre.
Newer versions of ZZip, SBC, ACE, UFA are ready, and will be tested next time.
Latest beta versions of BEE, DC, PPMonstr, UFA are available
from authors by e-mail request:
BEE: Andrew.Filinsky@p11.f4.n452.z2.fidonet.org
DC: EdgarBinder@t-online.de
PPMonstr: shkarin@arstel.ru , dmitry.shkarin@mtu-net.ru
UFA: support@7-zip.com
ACB, UHArc and PKzip are not tested on all 556 text files any more,
their results can be found in previous versions:
ACB - ARTest17
UHArc - ARTest17
PKzip - ARTest17,18
Results of PPMD (an open source version of PPMonstr)
are in full version only, TEXTS.DAT file,
UFA 0.04b1 performs on text files exactly as 777 0.04b1.
Results of old programs (not updated for more than 3 years, and no homepage),
programs with low overall score will not be put to latest versions of ARTest.
And also results of programs that are known to have bugs
(in compression/decompression functions) for more than half a year.
WARNINGS:
BA 1.00beta can't decompress any file compressed with -mf , and says nothing
like "CRC fails"
DC 0.99.158b failed to decompress 1DFRE10.dc , ANDES10.dc , and BTI0110.dc ,
saying "Corrupted block" (while t(est) command writes "Test successful").
RK 1.03b1 was unable to correctly decompress 555 files (all except E.TXT)
compressed with "-mx3 -ft-" , reporting
ERROR 303: CRC check failed.
ERI32 4.8fre can't compress files larger than (free DPMI memory)/6, i.e.
about 10Mb on a PC with 64Mb RAM. The largest 44Mb file was split to 5 chunks
9000000 bytes long (last chunk was 8894190 bytes).
Bugs in tested versions of SBC and ZZip were found,
but they are removed from latest versions ZZip 0.35g and SBC 0.305b .
Problems in all other compressors were not found.
The LATEST RELEASE, and all previous versions of these tests can be found
at http://geocities.com/SiliconValley/Bay/1995/ and http://artest1.tripod.com/
The FINAL PART
> [[5]] PLEASE read THIS before replying to this article
was removed from this text, but can be easily found at
http://geocities.com/SiliconValley/Bay/1995/artest10.html
http://artest1.tripod.com/artest10.html
Send your suggestions, comments to artest@hotmail.ru
With best kind regards,
RAO Inc.