The results of the compression on the Canturbury Corpus
*All files underwent BWT before compression.
File |
Size |
B00 |
Compression rates |
gzip-9 |
Compression rates |
xargs.1 |
4,227 |
1,636 |
3.096 |
1,756 |
3.323 |
sum |
38,240 |
12,100 |
2.531 |
12,772 |
2.672 |
ptt5 |
513,216 |
30,493 |
0.475 |
52,382 |
0.817 |
plrabn12.txt |
481,861 |
138,040 |
2.292 |
194,277 |
3.225 |
lcet10.txt |
426,754 |
101,351 |
1.900 |
144,429 |
2.707 |
Kenndey.xls |
1,029,744 |
24,259 |
0.188 |
209,733 |
1.629 |
Grammar.lsp |
3,721 |
1,184 |
2.546 |
1,246 |
2.679 |
fields.c |
11,150 |
2.869 |
2.058 |
3,136 |
2.250 |
asyoulik.txt |
125,179 |
37,767 |
2.414 |
48,829 |
3.131 |
alice29.txt |
152,089 |
40,988 |
2.156 |
54,191 |
2.850 |
cp.html |
24,603 |
7,259 |
2.360 |
7,981 |
2.595 |
E.coli |
4,638,690 |
1,111,234 |
1.916 |
1,299,066 |
2.240 |
bible.txt |
4,047,392 |
748,607 |
1.480 |
1,176,645 |
2.326 |
world192.txt |
2,473,400 |
413,511 |
1.337 |
721,413 |
2.333 |
Totals |
13,970,266 |
2,671,296 |
1.911 (1.530) bits/byte |
3,927,856 |
2.483 (2.249) |