The Art of Lossless Data Compression vol. 18t

Here are the results of tests performed in September 2000 to compare lossless compression of english texts by all known good enough programs developed for such purpose, including RK, DC, YBS, Bzip2, IMP, RAR and 7-zip. See Archive Comparison Test by J.Gilchrist for more details: http://act.by.net If anybody wants to start or continue such tests, or can suggest some other sets of texts, or other compression programs, (not sources or algorithm descriptions, executable programs only) or knows we have missed something important, (some new fantastic technology, an algorithm or even a program capable of lossless compression of up to 1000:1 etc.) please let us know immediately: artest@hotmail.ru Thank you!

[[1]] COMPRESSION QUALITY

(see also [[2]] Speed [[3]] Details [[4]] Comments) Fifth line shows results for the sum of four Canterbury Corpus Large Set files, tenth line - for the sum of all 556 files in five sets. original ACE32 BEE BIX BOA BA BZip2 DC ERI IMP -m5-d4096 -m3 -d3 -m1 -mdg -m15 -k50 -m -k -9 -b16300-mt5 -m5 -2 -s4 581.79% 138.67 108.95 129.00 106.46 109.61 121.55 104.85 112.32 119.84 411.40% 112.54 105.04 105.48 100.56 103.86 110.95 101.39 106.17 109.09 582.55% 139.98 106.19 130.78 106.37 106.98 120.52 102.53 109.57 118.23 657.05% 139.67 112.21 137.08 112.45 110.49 130.05 110.92 112.48 128.20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 523.75% 128.40 106.29 120.77 104.15 106.01 117.43 102.85 108.43 115.51 485.12% 134.76 105.29 129.30 104.67 106.57 116.69 101.84 110.39 115.42 395.40% 130.54 104.40 124.45 102.72 105.51 112.96 100.90 109.14 112.65 432.31% 133.93 104.01 128.44 103.30 106.38 115.81 101.65 110.52 115.48 723.25% 147.93 112.09 143.07 110.68 118.26 135.44 109.89 118.12 143.21 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 448.68% 133.42 104.23 127.82 103.25 106.48 116.12 101.60 110.13 116.26 Arhangel ppmonstr RAR RK SZip UFA YBS ZZip 7-zip PkZip -2-mm-mt -o8 -m5-mm-mde -mx3 -o10-b41 -m5-mu32 -m16mu -b20-mx -mx -exx 115.91 103.50 138.73 *100% 111.26 114.79 105.39 110.23 159.77 168.57 100% 102.56 112.46 102.13 103.83 100.50 102.00 103.28 111.08 115.52 115.28 102.02 141.03 *100% 111.22 112.14 102.81 107.61 161.22 169.60 139.25 104.61 141.29 100% 115.21 127.33 109.73 110.31 184.90 191.21 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 111.81 102.06 128.87 100% 108.11 109.42 103.13 106.08 144.02 150.72 113.92 100.67 134.99 *100% 110.86 112.33 104.56 107.44 152.34 158.71 107.53 100% 134.55 100.53 109.22 107.92 103.40 106.38 141.96 148.11 110.38 ^100% 135.25 100.63 109.56 109.02 104.18 107.36 147.41 153.39 137.70 105.98 153.76 100% 117.12 116.00 118.47 119.20 178.32 185.82 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 111.36 100.03 135.46 100% 109.45 108.89 104.45 107.39 147.87 153.99 * RK -mx2 (not -mx3 ) ^ PPMonstr -o9 -m56 (not -o8 -m56)

[[2]] Speed

Canterbury Corpus Large Set http://corpus.canterbury.ac.nz/ftp/large.zip was used for this test, and an AMD-K6-400 machine with 64M RAM and Windows98. Programs,options Overall Average Compress Extract Compressed score, Users' time, time, size, score, seconds seconds bytes seconds % seconds % 777 a -m5 -mu32 1354 147% 1171 133% 203 222 3343996 777 a -mg -s 1880 205% 1262 144% 688 139 3793939 7zip a 1307 142% 1232 140% 83 4 4393623 7zip a -mx 1358 148% 1240 141% 131 4 4401160 acb B 2540 276% 1818 207% 803 808 3346915 acb b 2997 326% 2059 235% 1042 1047 3267480 acb u 3802 414% 2496 285% 1452 1456 3221349 arhangel a -2 -mm 1203 131% 1117 127% 96 94 3647060 arhangel a -mt 1173 127% 1069 122% 115 109 3417110 arhangel a -mtf 1177 128% 1071 122% 118 110 3418181 ba -k 1057 115% 988 112% 78 26 3432541 ba -k -1 1170 127% 1122 128% 54 26 3927264 ba -k -50 1046 114% 954 109% 103 17 3337823 boa -m1 1623 176% 1387 158% 263 281 3886856 boa -a 1560 170% 1266 144% 327 340 3217347 boa -m15 1588 173% 1277 145% 346 358 3182732 bzip2 -k -1 1201 130% 1159 132% 47 13 4109767 bzip2 -k -5 1089 118% 1046 119% 48 14 3697142 bzip2 -k -9 1070 116% 1023 116% 53 15 3611558 dc e 950 103% 918 104% 36 22 3214240 dc e -a 950 103% 921 105% 33 23 3223329 dc e -b16300 1098 119% 875 100% 248 64 2829394 eri a -m2 1117 121% 975 111% 158 30 3346586 eri a -m3 1123 122% 971 110% 169 32 3318853 eri a 1136 123% 972 111% 183 33 3313568 imp98 a -2 1043 113% 1002 114% 46 11 3547964 imp98 a -2 -s4 1040 113% 998 114% 48 11 3535351 imp a -2 -s4 1041 113% 1001 114% 45 11 3548156 pkzip -es 1659 180% 1655 189% 5 3 5945608 pkzip -a 1326 144% 1307 149% 22 2 4691477 pkzip -exx 1498 163% 1303 148% 217 2 4605928 ppmd e -o5 -m56 950 103% 932 106% 20 23 3268214 ppmd e -o7 -m56 917 100% 893 102% 28 30 3095512 ppmd e -o9 -m56 985 107% 944 107% 46 46 3215327 ppmonstr e -o5 -m56 989 107% 954 109% 40 42 3268306 ppmonstr e -o7 -m56 965 105% 918 104% 53 56 3083063 ppmonstr e -o9 -m56 1036 112% 967 110% 77 77 3178172 rar a 1226 133% 1134 129% 103 4 4029077 rar a -m1 1247 135% 1205 137% 48 4 4304853 rar a -s -m5 1560 170% 1144 130% 463 4 3937052 rk -mf1 1194 130% 1166 133% 32 21 4110184 rk -mf2 1308 142% 1149 131% 177 76 3798456 rk -mx1 1736 189% 1350 154% 430 449 3089384 rk -mx2 1825 199% 1403 160% 470 502 3074900 rk -mx3 1891 206% 1440 164% 502 535 3076136 szip -v0 -b41 1019 111% 984 112% 39 34 3405120 szip -o8 -b41 1021 111% 974 111% 53 36 3356744 szip -o0 -b41 1055 115% 959 109% 107 24 3326271 ufa a -m5 -mu32 1378 150% 1185 135% 216 234 3343996 ufa a -m5 -mu10 1312 143% 1154 131% 177 195 3387619 ufa a -mg -s 1630 177% 1161 132% 522 28 3889878 uharc a 1381 150% 1183 135% 220 27 4081072 uharc a -m1 1354 147% 1244 142% 122 29 4333271 uharc a -m3 1514 165% 1125 128% 432 26 3801399 zzip a 1085 118% 1030 117% 62 28 3584447 zzip a -a3 1076 117% 1014 115% 69 30 3517619 zzip a -a4 -b12 1029 112% 950 108% 88 31 3277976 Overall score is calculated by adding compression time, extraction time, and time it would take to transfer the compressed file over a 28,800bps network: (compressed_size)/3600 , because 28800 bits_per_second is 3600 bytes_per_second Average Users' score is calculated by adding (compress_time/10)+ extract_time + time it would take to transfer the compressed file over a 28,800bps network. Compression time is divided by 10 here, because more than 90% of people would never compress anything during their life (with compression programs), but they use compressed data almost _every_ time they use computers and/or Internet. That's why compression time is not so actual for them.

[[3]] Details

are no longer put to this main text (738 lines reporting 22796 results on 556 files in 5 sets), but can be found in FULL version with TEXTS.DAT and *.BAT at http://geocities.com/SiliconValley/Bay/1995/artest18.zip or http://artest1.tripod.com/artest18.zip

[[4]] Comments

Links to download programs:

7-Zip 2.11 :W http://www.7-zip.com/dl/7zip211.exe 493K BIX 1.00b7 :W http://www.7-zip.com/dl/ufa/bix100b7.zip 89K 777 0.04b1 :W http://www.7-zip.com/dl/ufa/777004b1.zip 72K UFA 0.04b1 :W http://www.7-zip.com/dl/ufa/ufa004b1.zip 64K ArHanGeL 1.40 :a http://geocities.com/SiliconValley/Lab/6606/arh140.zip 50K ERI32 4.7fre :e http://geocities.com/eri32/eri47fre.zip 91K Imp 1.1 :e http://www.winimp.com/imp110d.zip 266K Imp-win 1.12 :W http://www.winimp.com/imp112.exe 122K PkZip 2.50 :a ftp://ftp.simtel.net/pub/simtelnet/msdos/arcers/pk250dos.exe 202K RK 1.03b1 :e http://malcolmt.tripod.com/downloads/rk103a1d.exe 478K RK 1.03b1 :W http://malcolmt.tripod.com/downloads/rk103a1w.exe 380K RAR32 2.71 :e ftp://ftp.netlab.sk/public/rarsoft/rar/rarx271.exe 257K WinRAR 2.71 :W ftp://ftp.netlab.sk/public/rarsoft/rar/wrar271.exe 588K PPMD var.F, PPmonstr v.F :W ftp://ftp.simtel.net/pub/simtelnet/win95/compress/ppmdf.zip 97K ACB 2.00c :e ftp://ftp.simtel.net/pub/simtelnet/msdos/compress/acb_200c.zip 42K BOA 0.58b :e ftp://ftp.cdrom.com/.3/sac/pack/boa058.zip 74K DC 0.98b :W ftp://ftp.cdrom.com/.3/sac/pack/dc124.zip 55K BA 1.00 beta :e ftp://ftp.cdrom.com/.3/sac/pack/ba100b.zip 60K Bzip2 1.0.1 :W ftp://sourceware.cygnus.com/pub/bzip2/v100/bzip2-100-x86-win32.exe 68K SZip 1.12a :W http://www.compressconsult.com/szip/szip_112a_win32.zip 71K ZZip 0.35e :W http://www.via.ecp.fr/~damien/zzip/zzip-win32.zip 24K ACE32 2.0b2 :W ftp://ftp.forlangs.net/pub/windows/winace/ace20b2.exe 546K YBS 0.03d :e http://members.nbci.com/vycct/ybs003dd.zip 48K YBS 0.03d :W http://members.nbci.com/vycct/ybs003dw.zip 42K :a - any DOS - DOS programs, will run under pure DOS or in a DOS box :e - extender - DOS programs using DOS extenders like DOS/4GW or CWSDPMI :W - windoze - Windows95/98/NT/etc programs If direct link doesn't work-most probably newer version of the program appeared at the same site: visit web page, or read the whole directory from ftp server (i.e. try the same URL, but without filename).

Homepages:

Arhangel : http://geocities.com/SiliconValley/Lab/6606 Eri32 : http://geocities.com/eri32 mirror : http://artest1.tripod.com RK : http://malcolmt.tripod.com Imp,WinImp : http://www.technelysium.com.au mirror : http://www.winimp.com ACE32 : http://www.winace.com PkZip : http://www.pkware.com RAR,WinRAR : http://www.rarsoft.com BZip2 : http://sources.redhat.com/bzip2 SZip : http://www.compressconsult.com/szip ZZip : http://www.via.ecp.fr/~damien/zzip YBS : http://members.nbci.com/vycct Ufa,777, BIX,7-Zip: http://www.7-zip.com

What's new:

8 new programs were tested: ACE32 2.0c2, BEE 0.4.8, BIX 1.00b7, DC 0.99.158b, PPMonstr var.Gpre, RK 1.03b1, YBS 0.03d, ZZip 0.35e. Latest beta versions of BEE, DC, PPMonstr are available from authors by e-mail request: BEE: Andrew.Filinsky@p11.f4.n452.z2.fidonet.org DC: EdgarBinder@t-online.de PPMonstr: shkarin@arstel.ru , dmitry.shkarin@mtu-net.ru Section [[2]] Speed was not updated this time. Next release will be soon. 5029 "binary" files were added - see vol.18b, binary.txt . Results of some programs were not put to latest release: take previous vol.17 to see performance of ACB, UHARC and PPMDF.

WARNINGS:

RK 1.03b1 was unable to correctly decompress 555 files (all except E.TXT) compressed with "-mx3 -ft-" , reporting ERROR 303: CRC check failed. BA 1.00beta can't decompress any file compressed with -mf , and says nothing like "CRC fails" DC 0.99.158b failed to decompress 1DFRE10.dc , ANDES10.dc , and BTI0110.dc , saying "Corrupted block" (while t(est) command writes "Test successful"). UFA and 777 can't handle files with symbol ` (ASCII code 96) in their names. It was replaced with _ in nine filenames. ERI32 4.7fre can't compress files larger than (free DPMI memory)/6, i.e. about 10Mb on a PC with 64Mb RAM. The largest 44Mb file was split to 5 chunks 9000000 bytes long (last chunk was 8894190 bytes). The LATEST RELEASE, and fifteen previous versions of these tests can be found at http://geocities.com/SiliconValley/Bay/1995/ and http://artest1.tripod.com/

The FINAL PART

> [[5]] PLEASE read THIS before replying to this article was removed from this text, but can be easily found at http://geocities.com/SiliconValley/Bay/1995/artest10.html http://artest1.tripod.com/artest10.html Send your suggestions, comments to artest@hotmail.ru With best kind regards, RAO Inc.