The Art of Lossless Data Compression vol. 22t

Here are the results of tests performed in May 2001 to compare lossless compression of english texts by all known good enough programs developed for such purpose, including RK, DC, YBS, Bzip2, IMP, RAR and 7-zip. See Archive Comparison Test by J.Gilchrist for more details: http://act.by.net If anybody wants to start or continue such tests, or can suggest some other sets of texts, or other compression programs, (not sources or algorithm descriptions, executable programs only) or knows we have missed something important, (some new fantastic technology, an algorithm or even a program capable of lossless compression of up to 1000:1 etc.) please let us know immediately: artest@inbox.ru Thank you!

[[1]] COMPRESSION QUALITY

(see also [[2]] Speed [[3]] Details [[4]] Comments) Fifth line shows results for the sum of four Canterbury Corpus Large Set files, eleventh line - for the sum of all 1231 files in six sets. Original PPMonstr PPMD RK DC BOA SBC BEE YBS UHArc 585.61% 100% 105.23 100.70 105.54 107.16 105.98 109.66 106.08 105.36 414.65% 101.41 105.75 102.94 102.19 101.35 103.73 105.87 102.81 100% 591.98% 100% 104.97 101.61 104.18 108.09 104.72 107.91 104.47 103.34 675.45% 100% 108.79 102.80 114.03 115.60 112.90 115.35 112.80 116.17 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 529.35% 100% 104.77 101.06 103.95 105.26 104.63 107.42 104.23 103.13 492.94% 100% 104.25 101.90 103.73 106.62 105.57 107.26 106.54 105.45 398.92% 100% 102.76 101.46 101.83 103.66 103.64 105.35 104.36 104.05 436.99% 100% 102.63 101.61 102.66 104.30 104.13 105.02 105.19 105.50 733.54% 100% 101.39 101.42 111.46 112.25 108.82 113.68 110.65 113.37 341.03% 100% 102.39 106.22 107.84 103.75 106.50 104.23 105.70 110.54 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 427.88% 100% 102.48 102.51 104.14 104.43 104.77 105.30 105.33 106.68 BA ZZip ACB 777 SZip ERI BZip2 ACE RAR 7-zip 110.28 110.48 108.75 115.54 111.99 113.05 122.35 139.58 139.64 160.82 104.68 104.01 103.67 101.29 104.65 107.02 111.82 113.43 113.35 111.96 108.65 108.78 108.23 113.96 113.02 111.34 122.46 142.25 143.32 163.83 113.47 113.41 114.26 130.89 118.43 115.63 133.69 143.59 145.24 190.08 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 107.10 106.92 106.35 110.59 109.26 109.58 118.69 129.78 130.25 145.56 108.52 109.13 109.26 114.41 112.96 112.49 118.83 137.33 137.55 155.12 106.41 107.07 107.85 108.93 110.25 110.17 114.01 131.81 135.76 143.27 107.35 108.10 108.40 110.07 110.61 111.58 116.94 135.22 136.58 148.86 119.88 114.63 117.48 117.65 118.78 119.80 137.36 150.03 155.95 180.86 108.68 109.75 109.43 111.02 111.11 111.93 112.85 130.72 130.22 138.02 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 108.05 108.52 108.85 110.52 110.98 111.69 116.54 134.15 135.61 147.05

[[2]] Speed

Canterbury Corpus Large Set http://corpus.canterbury.ac.nz/ftp/large.zip was used for this test, and an AMD-K6-400 machine with 192Mb RAM and Windows98. Programs,options Overall Average Compress Extract Compressed score, Users' time, time, size, score, seconds seconds bytes seconds % seconds % NO COMPRESSION 4446 538% 4446 562% 0 0 16005619 7z a -tufa1 1324 160% 1057 133% 296 8 3672086 7z a -tufa1 -mx 1322 160% 1057 133% 294 8 3672086 7z a -tzip 1283 155% 1231 155% 58 5 4393637 7z a -tzip -mx 1325 160% 1237 156% 97 6 4401174 7zip a 1278 154% 1229 155% 55 3 4393637 7zip a -mx 1325 160% 1237 156% 98 5 4401174 777 a -mg 1372 166% 1159 146% 237 151 3544038 acb B 3236 392% 2202 278% 1148 1156 3352388 acb b 3934 476% 2585 327% 1499 1527 3272388 acb u 5115 619% 3243 410% 2080 2139 3225662 ace32 a 1212 146% 1124 142% 98 5 3992645 ace32 a -d4096 1168 141% 1072 135% 106 6 3801917 ace32 a -d4096 -s- 1208 146% 1116 141% 103 5 3962381 ace32 a -d4096 -m1 1160 140% 1112 140% 53 6 3965841 ace32 a -d4096 -m5 1353 164% 1076 136% 309 5 3746553 arh a 1133 137% 1078 136% 61 59 3647067 arh a -2 -mm 1132 137% 1078 136% 60 59 3647067 arh a -1 -mm 1438 174% 1302 164% 152 8 4605607 arh a -2 -1 1283 155% 1093 138% 212 59 3647067 ba -k 1024 124% 977 123% 52 23 3421195 ba -k -1 1148 139% 1113 140% 39 22 3914655 ba -k -50 1006 121% 945 119% 68 22 3298943 bee a -m1 -d3 1407 170% 1217 153% 211 198 3593467 bee a -m2 -d3 1460 177% 1229 155% 256 238 3479698 bee a -m3 -d3 1700 206% 1347 170% 392 355 3432029 bix a -mdg -s 1141 138% 995 125% 163 3 3514944 boa -m15 1344 162% 1142 144% 225 236 3182739 boa -m15 -s 1321 160% 1122 141% 221 230 3132810 boa -m7 1316 159% 1130 142% 207 216 3217354 boa -m1 1400 169% 1260 159% 155 166 3886863 bzip2 -k 1060 128% 1021 129% 43 13 3616113 bzip2 -k -1 1185 143% 1155 146% 33 12 4106479 bzip2 -k -5 1077 130% 1044 132% 37 13 3700097 bzip2 -k -9 1057 128% 1021 129% 40 13 3616113 dc e 927 112% 902 114% 27 17 3179173 dc e -ft 933 113% 906 114% 29 17 3192832 dc e -b16300 826 100% 791 100% 39 17 2773427 dc e -b16300 -mt5 825 100% 790 100% 38 17 2773427 eri a -m1 1057 128% 970 122% 96 22 3378440 eri a -m2 1054 127% 962 121% 102 23 3346586 eri a -m3 1060 128% 958 121% 113 25 3318853 eri a 1070 129% 958 121% 124 26 3313568 imp98 a -mm -m3 1218 147% 1140 144% 87 4 4059874 imp98 a -mm -2 1024 124% 995 125% 33 10 3533763 imp98 a -2 -s4 1025 124% 995 125% 33 10 3533695 imp a -2 -s4 1021 123% 992 125% 32 9 3530158 pkzip -es 1657 200% 1653 209% 4 2 5945622 pkzip -a 1317 159% 1305 165% 14 1 4691491 pkzip -exx 1390 168% 1291 163% 110 1 4605942 ppmd e -o3 -m184 1093 132% 1083 137% 11 13 3849571 ppmd e -o4 -m184 985 119% 973 123% 13 15 3447452 ppmd e -o5 -m184 938 113% 925 116% 15 17 3263988 ppmd e -o6 -m184 912 110% 897 113% 17 19 3155348 ppmd e -o7 -m184 898 108% 880 111% 20 22 3084162 ppmd e -o8 -m184 890 107% 869 110% 23 25 3032824 ppmd e -o9 -m184 891 108% 867 109% 27 29 3007612 ppmd e -o10 -m184 901 109% 865 109% 40 41 2953155 ppmd e -o11 -m184 915 110% 864 109% 56 56 2891692 ppmd e -o12 -m184 1029 124% 937 118% 102 112 2935640 ppmonstr e -o3 -m184 1136 137% 1107 140% 32 35 3850354 ppmonstr e -o4 -m184 1031 125% 998 126% 37 40 3437676 ppmonstr e -o5 -m184 986 119% 949 120% 42 44 3243866 ppmonstr e -o6 -m184 966 117% 924 116% 47 49 3132547 ppmonstr e -o7 -m184 952 115% 905 114% 52 55 3040773 ppmonstr e -o8 -m184 942 114% 888 112% 60 63 2949530 ppmonstr e -o9 -m184 942 114% 880 111% 68 72 2888367 ppmonstr e -o10 -m184 959 116% 882 111% 85 87 2834425 ppmonstr e -o11 -m184 987 119% 892 112% 106 108 2785525 ppmonstr e -o12 -m184 1100 133% 959 121% 157 158 2831191 rar a 1191 144% 1130 143% 67 5 4029084 rar a -mm -m1 1233 149% 1204 152% 33 5 4304860 rar a -mm -m5 1438 174% 1132 143% 340 5 3938355 rar a -mm -s 1193 144% 1129 142% 71 5 4023405 rk -mf1 1147 139% 1127 142% 23 20 3978408 rk -mf2 1186 143% 1095 138% 101 48 3735704 rk -mf3 1280 155% 1096 138% 204 50 3693704 rk -mx1 1385 167% 1163 147% 248 279 3093640 rk -mx2 1454 176% 1200 151% 282 315 3086308 rk -mx3 1461 177% 1204 152% 285 319 3087044 sbc c -on -b59 947 114% 897 113% 56 18 3146705 sbc c -oa -b59 966 117% 912 115% 59 19 3195883 sbc c -of -b59 959 116% 907 114% 58 19 3176987 sbc c -os -b59 871 105% 813 102% 65 20 2832457 szip -o6 1021 123% 994 125% 29 27 3475264 szip -o8 1020 123% 985 124% 39 29 3430586 szip -o8 -b41 1001 121% 964 121% 41 30 3348344 ufa a 1387 168% 1137 143% 277 28 3895425 ufa a -mg -mu32 1335 161% 1159 146% 196 211 3344003 uharc a -m1 -md8192 1262 153% 1184 149% 87 25 4141149 uharc a -m2 -md8192 1275 154% 1139 144% 152 25 3955624 uharc a -m3 -md8192 1434 173% 1108 140% 363 25 3768111 uharc a -mz -md8192 1134 137% 1111 140% 26 30 3884071 uharc a -mx -md8192 1019 123% 932 117% 97 83 3023083 ybs -m16mu 914 110% 820 103% 104 17 2857446 ybs -m16mu -r 933 113% 827 104% 118 16 2878433 ybs -m8m 935 113% 888 112% 51 16 3123345 zzip a 1017 123% 972 122% 51 23 3400243 zzip a -mm -mx 1015 123% 969 122% 52 24 3383215 zzip a -mm -a 1017 123% 970 122% 52 25 3383215 Overall score is calculated by adding compression time, extraction time, and time it would take to transfer the compressed file over a 28,800bps network: (compressed_size)/3600 , because 28800 bits_per_second is 3600 bytes_per_second Average Users' score is calculated by adding (compress_time/10)+ extract_time + time it would take to transfer the compressed file over a 28,800bps network. Compression time is divided by 10 here, because more than 90% of people would never compress anything during their life (with compression programs), but they use compressed data almost _every_ time they use computers and/or Internet. That's why compression time is not so actual for them.

[[3]] Details

are no longer put to this main text (1490 lines reporting 65614 results on 1231 files in 6 sets), but can be found in FULL version with TEXTS.DAT and *.BAT at http://geocities.com/SiliconValley/Bay/1995/artest22.zip or http://artest1.tripod.com/artest22.zip

[[4]] Comments

Links to download programs:

7-Zip 2.24 :W http://www.7-zip.com/dl/7zip224.exe 463K ACE32 2.02 :W ftp://ftp.forlangs.net/pub/windows/winace/ace202.exe 587K ERI32 4.16fre :e http://geocities.com/eri32/eri416fr.zip 94K PkzipC 4.00 :W ftp://ftp.pkware.com/pkzc400s.exe 3470K RK-dos 1.04.1 :e http://rksoft.virtualave.net/downloads/rk104a1d.exe 461K RK 1.04.1 :W http://rksoft.virtualave.net/downloads/rk104a1w.exe 380K RAR32 2.80 :e ftp://ftp.netlab.sk/public/rarsoft/rar/rarx280.exe 269K WinRAR 2.80 :W ftp://ftp.netlab.sk/public/rarsoft/rar/wrar280.exe 621K BA 1.01b5 :e http://hem.spray.se/mikael.lundqvist/ba101br5.zip 61K SBC 0.860b :e http://geocities.com/sbcarchiver/sbc0860b.zip 208K ZZip 0.36c :W http://www.via.ecp.fr/~damien/downloads/zzip-win32.zip 35K PPMD var.H, PPmonstr v.H :W ftp://ftp.cdrom.com/.2/sac/pack/ppmdh.rar 57K BIX 1.00b7 :W http://www.7-zip.com/dl/ufa/bix100b7.zip 89K 777 0.04b1 :W http://www.7-zip.com/dl/ufa/777004b1.zip 72K UFA 0.04b1 :W http://www.7-zip.com/dl/ufa/ufa004b1.zip 64K ArHanGeL 1.40 :a http://geocities.com/SiliconValley/Lab/6606/arh140.zip 50K Imp 1.1 :e http://www.winimp.com/imp110d.zip 266K Imp-win 1.12 :W http://www.winimp.com/imp112.exe 122K PkZip 2.50 :a ftp://ftp.simtel.net/pub/simtelnet/msdos/arcers/pk250dos.exe 202K ACB 2.00c :e ftp://ftp.simtel.net/pub/simtelnet/msdos/compress/acb_200c.zip 42K BOA 0.58b :e ftp://ftp.cdrom.com/.2/sac/pack/boa058.zip 74K DC 0.98b :W ftp://ftp.cdrom.com/.2/sac/pack/dc124.zip 55K Bzip2 1.0.1 :W ftp://sourceware.cygnus.com/pub/bzip2/v100/bzip2-100-x86-win32.exe 68K SZip 1.12a :W http://www.compressconsult.com/szip/szip_112a_win32.zip 71K UHArc 0.2b :e ftp://ftp.cdrom.com/.2/sac/pack/uharc02.zip 101K YBS 0.03e :e http://members.nbci.com/vycct/ybs003ed.zip 55K YBS 0.03e :W http://members.nbci.com/vycct/ybs003ew.zip 43K BEE 0.4.8 :W Andrew.Filinsky@p11.f4.n452.z2.fidonet.org :a - any DOS - DOS programs, will run under pure DOS or in a DOS box :e - extender - DOS programs using DOS extenders like DOS/4GW or CWSDPMI :W - windows - Windows95/98/NT/etc programs If direct link doesn't work-most probably newer version of the program appeared at the same site: visit web page, or read the whole directory from ftp server (i.e. try the same URL, but without filename).

Homepages:

Arhangel : http://geocities.com/SiliconValley/Lab/6606 BA : http://hem.spray.se/mikael.lundqvist Eri32 : http://geocities.com/eri32 mirror : http://artest1.tripod.com RK : http://rksoft.virtualave.net Imp,WinImp : http://www.technelysium.com.au mirror : http://www.winimp.com ACE,WinACE : http://www.winace.com PkZip : http://www.pkware.com RAR,WinRAR : http://www.rarsoft.com BZip2 : http://sources.redhat.com/bzip2 SZip : http://www.compressconsult.com/szip ZZip : http://www.zzip.f2s.com YBS : http://members.nbci.com/vycct SBC : http://geocities.com/sbcarchiver Ufa,777, BIX,7-Zip: http://www.7-zip.com PPMD, PPMonstr, ACB, Bee, BOA, DC, UHArc - no homepage.

What's new:

12 new programs tested: RK, SBC, ZZip, ACE, 7-zip, RAR32, WinRAR, ERI32, BA, PPMD, PPMonstr, UHARC. Test data was updated, a set of Russian texts was added. Latest beta versions of BEE, DC, UFA, UHArc are available from authors by e-mail request: BEE: Andrew.Filinsky@p11.f4.n452.z2.fidonet.org DC: EdgarBinder@t-online.de UFA: support@7-zip.com UHARC: Uwe.Herklotz@gmx.de Results of ArHanGeL, IMP, BICOM, BIX, Pkzip are in full version only, TEXTS.DAT file.

WARNINGS:

BA 1.00beta5 can't correctly decompress shaks12.txt. DC 0.99.158b failed to decompress 1DFRE10.dc , ANDES10.dc , and BTI0110.dc, saying "Corrupted block" (while t(est) command writes "Test successful"). ERI32 4.8fre can't compress files larger than (free DPMI memory)/6, i.e. about 10Mb on a PC with 64Mb RAM. The largest 44Mb file was split to 5 chunks 9000000 bytes long (last chunk was 8894190 bytes). Problems in all other compressors were not found. The LATEST RELEASE, and all previous versions of these tests can be found at http://geocities.com/SiliconValley/Bay/1995/ and http://artest1.tripod.com/ Send your suggestions, comments to artest@hotmail.ru With best kind regards, A.Ratushnyak, RAO Inc. Back to main ARTest page