The Art
of Lossless
Data Compression
vol. 17
Here are the results of tests performed in July 2000 to compare
lossless compression of english texts by all known good enough programs
developed for such purpose, including RK, DC, PPMDF, Bzip2, IMP, RAR and 7-zip.
See Archive Comparison Test by J.Gilchrist for more details: http://ACT.BY.net
If anybody wants to start or continue such tests,
or can suggest some other sets of texts, or other compression programs,
(not sources or algorithm descriptions, programs for DOS or Windows only)
or knows we have missed something important,
(some new fantastic technology, an algorithm or even a program capable
of lossless compression of up to 1000:1 etc.)
please let us know immediately: ratush@srsc-gw.sscc.ru Thank you!
[[1]] COMPRESSION QUALITY
(see also
[[2]] Speed
[[3]] Details
[[4]] Comments)
Fifth line shows results for the sum of four Canterbury Corpus Large Set files,
tenth line - for the sum of all 556 files in five sets.
(modeling and ppm-based, slow-extracting programs)
original RK ppmonstr PPMDF BOA ACB 777 UFA Arhangel UHARC
-mx3-ft+ -o7-m56 -o7-m56 -m15 u -m5-mu32-m5-mu32 -2-mm-mt -m3-mm
569.47% 100% 103.03 103.94 104.20 105.75 112.36 112.36 113.46 136.80
411.40% 100.03 101.95 101.98 100.56 102.85 100.50 100.50 100% 100.84
572.82% *100% 103.43 104.45 104.59 104.73 110.27 110.27 113.35 138.89
644.43% ^100% 106.03 107.28 110.29 109.01 124.88 124.88 136.57 134.41
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
521.41% *100% 103.06 103.73 103.68 104.76 108.93 108.93 111.32 123.80
486.73% 100% 102.67 104.25 105.02 107.57 112.71 112.71 114.30 133.35
398.62% ^100% 101.75 103.39 103.55 107.73 108.80 108.80 108.41 128.69
438.62% 100% 102.23 103.88 104.81 108.93 110.61 110.61 111.99 133.70
704.14% 100% 103.10 104.02 107.75 112.77 112.93 112.93 134.06 148.99
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
454.91% 100% 102.18 103.74 104.68 108.70 110.41 110.41 112.91 133.42
(dictionary-based and block-sorting, fast-extracting programs)
DC BA ZZip SZip ERI BZip2 IMP RAR 7-zip PkZip
-b16300 -k50-m -a4-b12 -o10b41 -m5 -k -9 -2 -s4 -m5-mm -mx -exx
102.63 107.29 110.25 108.91 109.94 118.98 117.30 135.80 156.39 165.00
101.46 103.86 102.41 103.83 106.17 110.95 109.09 112.46 111.08 115.52
100.82 105.20 108.7 109.36 107.74 118.50 116.25 138.68 158.53 166.77
108.77 108.37 109.43 113.00 110.32 127.55 125.74 138.57 181.35 187.54
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
102.42 105.53 106.78 107.62 107.94 116.91 114.99 128.30 143.37 150.05
102.17 106.92 108.93 111.23 110.76 117.07 115.80 135.44 152.85 159.24
101.01 106.37 107.97 110.12 110.03 113.89 113.57 135.65 143.11 149.32
103.00 107.94 110.32 111.16 112.13 117.50 117.17 137.22 149.56 155.62
107.00 115.13 119.33 114.02 115.00 131.86 139.43 149.70 173.61 180.91
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
102.79 107.96 110.27 110.97 111.66 117.74 117.87 137.34 149.92 156.13
* RK -mx2 (not -mx3 -ft+ )
^ RK -mx3
[[2]] Speed
Canterbury Corpus Large Set http://corpus.canterbury.ac.nz/ftp/large.zip
was used for this test, and an AMD-K6-400 machine with 64M RAM and Windows98.
Programs,options Overall Average Compress Extract Compressed
score, Users' time, time, size,
score, seconds seconds bytes
seconds % seconds %
777 a -m5 -mu32 1354 147% 1171 133% 203 222 3343996
777 a -mg -s 1880 205% 1262 144% 688 139 3793939
7zip a 1307 142% 1232 140% 83 4 4393623
7zip a -mx 1358 148% 1240 141% 131 4 4401160
acb B 2540 276% 1818 207% 803 808 3346915
acb b 2997 326% 2059 235% 1042 1047 3267480
acb u 3802 414% 2496 285% 1452 1456 3221349
arhangel a 1205 131% 1117 127% 98 94 3647060
arhangel a -2 -mm 1203 131% 1117 127% 96 94 3647060
arhangel a -2 -1 1514 165% 1148 131% 407 94 3647060
arhangel a -mt 1173 127% 1069 122% 115 109 3417110
arhangel a -mtf 1177 128% 1071 122% 118 110 3418181
ba -k 1057 115% 988 112% 78 26 3432541
ba -k -m 1057 115% 988 112% 78 26 3432541
ba -k -1 1170 127% 1122 128% 54 26 3927264
ba -k -10 1056 115% 986 112% 79 26 3424345
ba -k -50 1046 114% 954 109% 103 17 3337823
boa -m1 1623 176% 1387 158% 263 281 3886856
boa -a 1560 170% 1266 144% 327 340 3217347
boa -m15 1588 173% 1277 145% 346 358 3182732
bzip2 -k 1075 117% 1025 117% 56 16 3611558
bzip2 -k -s 1145 124% 1102 125% 48 14 3902513
bzip2 -k -1 1201 130% 1159 132% 47 13 4109767
bzip2 -k -5 1089 118% 1046 119% 48 14 3697142
bzip2 -k -9 1070 116% 1023 116% 53 15 3611558
dc e 950 103% 918 104% 36 22 3214240
dc e -a 950 103% 921 105% 33 23 3223329
dc e -d 3567 388% 3547 405% 24 2 12751141
dc e -b16300 1098 119% 875 100% 248 64 2829394
eri a -m1 1119 122% 983 112% 153 29 3378440
eri a -m2 1117 121% 975 111% 158 30 3346586
eri a -m3 1123 122% 971 110% 169 32 3318853
eri a 1136 123% 972 111% 183 33 3313568
eri a -m5 1167 127% 975 111% 215 33 3313559
imp98 a -2 1043 113% 1002 114% 46 11 3547964
imp98 a -2 -s4 1040 113% 998 114% 48 11 3535351
imp a -2 -s4 1041 113% 1001 114% 45 11 3548156
pkzip -es 1659 180% 1655 189% 5 3 5945608
pkzip -a 1326 144% 1307 149% 22 2 4691477
pkzip -exx 1498 163% 1303 148% 217 2 4605928
ppmd e -o5 958 104% 937 107% 24 23 3279292
ppmd e -o7 983 107% 953 108% 34 34 3296502
ppmd e -o9 1057 115% 1015 116% 47 48 3464715
ppmd e -o5 -m56 950 103% 932 106% 20 23 3268214
ppmd e -o7 -m56 917 100% 893 102% 28 30 3095512
ppmd e -o9 -m56 985 107% 944 107% 46 46 3215327
ppmonstr e -o5 997 108% 958 109% 43 43 3278191
ppmonstr e -o7 1023 111% 972 111% 57 59 3265897
ppmonstr e -o9 1097 119% 1031 117% 74 78 3406265
ppmonstr e -o5 -m56 989 107% 954 109% 40 42 3268306
ppmonstr e -o7 -m56 965 105% 918 104% 53 56 3083063
ppmonstr e -o9 -m56 1036 112% 967 110% 77 77 3178172
rar a 1226 133% 1134 129% 103 4 4029077
rar a -mm 1227 133% 1134 129% 105 4 4029077
rar a -m1 1247 135% 1205 137% 48 4 4304853
rar a -m5 1555 169% 1144 130% 457 4 3938348
rar a -s 1227 133% 1134 129% 104 4 4028163
rar a -s -mda 1307 142% 1236 141% 79 4 4408220
rar a -s -mdc 1252 136% 1168 133% 93 4 4157251
rar a -s -m5 1560 170% 1144 130% 463 4 3937052
rar32 a -s -m5 1560 170% 1144 130% 463 4 3937052
rk -mf1 1194 130% 1166 133% 32 21 4110184
rk -mf2 1308 142% 1149 131% 177 76 3798456
rk -mf3 1504 164% 1151 131% 392 72 3742232
rk -mx1 1736 189% 1350 154% 430 449 3089384
rk -mx2 1825 199% 1403 160% 470 502 3074900
rk -mx2 -ft+ 1915 208% 1452 165% 514 540 3099400
rk -mx2 -fe+ 1844 201% 1413 161% 480 510 3074904
rk -mx3 1891 206% 1440 164% 502 535 3076136
szip -v0 1040 113% 1003 114% 41 34 3473957
szip -o4 1061 115% 1044 119% 19 29 3646906
szip -o8 1040 113% 993 113% 53 35 3429112
szip -o0 1063 115% 979 111% 94 24 3403202
szip -v0 -b41 1019 111% 984 112% 39 34 3405120
szip -o4 -b41 1045 113% 1029 117% 17 30 3591824
szip -o8 -b41 1021 111% 974 111% 53 36 3356744
szip -o0 -b41 1055 115% 959 109% 107 24 3326271
ufa a -m5 -mu32 1378 150% 1185 135% 216 234 3343996
ufa a -mg -mu32 1381 150% 1185 135% 219 234 3343996
ufa a -m5 -mu16 1323 144% 1156 132% 186 203 3363895
ufa a -m5 -mu10 1312 143% 1154 131% 177 195 3387619
ufa a -m5 -mu4 1342 146% 1187 135% 173 192 3519553
ufa a -mg -s 1630 177% 1161 132% 522 28 3889878
uharc a 1381 150% 1183 135% 220 27 4081072
uharc a -m1 1354 147% 1244 142% 122 29 4333271
uharc a -m3 1514 165% 1125 128% 432 26 3801399
uharc a -m3 -mm 1515 165% 1126 128% 433 26 3801399
uharc a -m3 -md64 1501 163% 1221 139% 311 28 4184881
uharc a -m3 -md2048 1515 165% 1126 128% 433 26 3801399
zzip a 1085 118% 1030 117% 62 28 3584447
zzip a -mm 1085 118% 1030 117% 61 28 3584447
zzip a -lm 1085 118% 1030 117% 61 28 3584447
zzip a -a1 1085 118% 1030 117% 61 28 3584447
zzip a -a2 1080 117% 1021 116% 66 31 3543392
zzip a -a3 1076 117% 1014 115% 69 30 3517619
zzip a -a4 1085 118% 1015 116% 79 30 3517619
zzip a -a4 -b12 1029 112% 950 108% 88 31 3277976
Overall score is calculated by adding compression time, extraction time, and
time it would take to transfer the compressed file over a 28,800bps network:
(compressed_size)/3600 , because 28800 bits_per_second is 3600 bytes_per_second
Average Users' score is calculated by adding (compress_time/10)+ extract_time +
time it would take to transfer the compressed file over a 28,800bps network.
Compression time is divided by 10 here, because more than 90% of people would
never compress anything during their life (with compression programs), but they
use compressed data almost _every_ time they use computers and/or Internet.
That's why compression time is not so actual for them.
[[3]] Details
are no longer put to this main text
(738 lines reporting 22796 results on 556 files in 5 sets),
but can be found in FULL version with TEXTS.DAT and *.BAT
at http://geocities.com/SiliconValley/Bay/1995/artest17.zip
or http://artest1.tripod.com/artest17.zip
[[4]] Comments
Links to download programs:
7-Zip 2.11 :W http://www.7-zip.com/dl/7zip211.exe 493K
777 0.04b1 :W http://www.7-zip.com/dl/ufa/777004b1.zip 72K
UFA 0.04b1 :W http://www.7-zip.com/dl/ufa/ufa004b1.zip 64K
ArHanGeL 1.40 :a http://geocities.com/SiliconValley/Lab/6606/arh140.zip 50K
ERI32 4.6fre :e http://geocities.com/eri32/eri46fre.zip 91K
Imp 1.1 :e http://www.winimp.com/imp110d.zip 266K
Imp-win 1.12 :W http://www.winimp.com/imp112.exe 122K
PkZip 2.50 :a ftp://ftp.simtel.net/pub/simtelnet/msdos/arcers/pk250dos.exe 202K
RK 1.02a5 :W http://malcolmt.tripod.com/downloads/rk102a05.exe 191K
RAR32 2.71 :e ftp://ftp.netlab.sk/public/rarsoft/rar/rarx271.exe 257K
WinRAR 2.71 :W ftp://ftp.netlab.sk/public/rarsoft/rar/wrar271.exe 588K
PPMD var.F ,
PPmonstr v.F :W ftp://ftp.simtel.net/pub/simtelnet/win95/compress/ppmdf.zip 97K
ACB 2.00c :e ftp://ftp.simtel.net/pub/simtelnet/msdos/compress/acb_200c.zip 42K
BOA 0.58b :e ftp://ftp.cdrom.com/.3/sac/pack/boa058.zip 74K
DC 0.98b :W ftp://ftp.cdrom.com/.3/sac/pack/dc124.zip 55K
BA 1.00 beta :e ftp://ftp.cdrom.com/.3/sac/pack/ba100b.zip 60K
Bzip2 1.0.1 :W ftp://sourceware.cygnus.com/pub/bzip2/v100/bzip2-100-x86-win32.exe 68K
SZip 1.12a :W http://www.compressconsult.com/szip/szip_112a_win32.zip 71K
ZZip 0.35a :W http://www.via.ecp.fr/~damien/zzip/zzip-win32.zip 28K
:a - any DOS - DOS programs, will run under pure DOS or in a DOS box
:e - extender - DOS programs using DOS extenders like DOS/4GW or CWSDPMI
:W - windoze - Windows95/98/NT/etc programs
If direct link doesn't work-most probably newer version of the program appeared
at the same site: visit web page, or read the whole directory from ftp server
(i.e. try the same URL, but without filename).
Homepages:
Arhangel : http://geocities.com/SiliconValley/Lab/6606
Eri32 : http://geocities.com/eri32
mirror : http://artest1.tripod.com
RK : http://malcolmt.tripod.com
Imp,WinImp : http://www.technelysium.com.au
mirror : http://www.winimp.com
PkZip : http://www.pkware.com
Ufa,777,7-Zip: http://www.7-zip.com
RAR,WinRAR : http://www.rarsoft.com
BZip2 : http://sources.redhat.com/bzip2
SZip : http://www.compressconsult.com/szip
ZZip : http://www.via.ecp.fr/~damien/zzip
What's new:
All contents of this page.
407 Megabytes of plain (english) texts in 556 files in 5 sets,
including the four Canterbury Corpus Large Set files.
Non-english texts will probably be added in future,
but don't expect that results will differ more than 1%.
One file (pgwht04.txt) is an html file,
and one (E.TXT, originally E.COLI), the first of Large Set - pseudo-text.
19 archivers and file-to-file compressors,
known to be best in plain texts compression (plus few most popular tools).
.BAT files used for tests are more compact and readable - see TEXT_ALL\*.BAT
inside artest17.zip, and .BATs used for calculations are also added this time.
DOS prompt calculator with user def. functions
(math.exe being used for ARTest) can be found at
ftp://ftp.simtel.net/pub/simtelnet/msdos/calculte/mathfc24.zip (26K)
Ultra Precision Command Timer 1.6 - Freeware (C) 1993 by Erik de Neve
(upct.exe being used for ARTest) can be found at
ftp://ftp.cdrom.com/.3/sac/utilmisc/upct16.zip (7K)
MultiEdit 7.00jP-386 was used for files editing with macrocommands, blocks etc,
and standard fc.exe from any DOS/Windows package - for comparing files.
WARNINGS:
RK 1.02a5 was unable to correctly decompress CHNBG10.TXT compressed with any
-mx1,-mx2, -mx3
("This program has performed an illegal operation and will be shut down"),
and also MISCC10.TXT with -ft+ and any of -mx1,-mx2,-mx3, reporting
ERROR 303: CRC check failed.
BA 1.00beta can't decompress any file compressed with -mf , and says nothing
like "CRC fails"
DC 0.98b failed to decompress 1DFRE10.dc , ANDES10.dc , and BTI0110.dc ,
saying "Corrupted block" (while t(est) command writes "Test successful").
UFA and 777 can't handle files with symbol ` (ASCII code 96) in their names.
It was replaced with _ in nine filenames.
ERI32 4.6 can't compress files larger than (free DPMI memory)/6 , i.e.
about 10Mb on a PC with 64Mb RAM. The largest 44Mb file was split to 5 chunks
9000000 bytes long (last chunk was 8894190 bytes).
The LATEST RELEASE, and thirteen previous versions of these tests can be found
at http://geocities.com/SiliconValley/Bay/1995/ and http://artest1.tripod.com/
The FINAL PART
> [[5]] PLEASE read THIS before replying to this article
was removed from this text, but can be easily found at
http://geocities.com/SiliconValley/Bay/1995/artest10.html
http://artest1.tripod.com/artest10.html
Send your suggestions, comments to ratush@srsc-gw.sscc.ru
With best kind regards,
RAO Inc.