..


Sponsored Links

Compress files. Algorithms and software for comparison.

Article written by Damiano Verda
Page 1 of 2

Among the most common operations that can be useful even for unskilled users identify definitely compression. Through this transaction in fact, any file on your computer can be restored, so as to occupy a portion of memory than hard disk. We observe however that, currently, there are large capacity hard drives available at prices fairly low, so the usefulness of compression programs would appear to shrink. But this is a hasty conclusion.

In fact, for example, may be interesting to compress one or more files before sending via email, to reduce the time of reception and transmission. Interesting and practical addition also the ability to store multiple documents into a single compressed file, more practical to transmit and manipulate.

But what are the main options available to users to compress a file or files? What distinguishes the various software available? Let us examine first the theoretical foundation of software compression, thus illustrating the main features of compression algorithms.

Compression algorithms

A first distinction among the compression algorithms can be identified between lossless algorithms, ie without any loss of quality and lossy algorithms, in which the reduction of disk space is accompanied by a loss of quality. It is often difficult to perceive a deterioration in quality: for example in the case of the mp3 encoding for audio files.

Among the most widely used algorithms identify without doubt the 'Huffman algorithm, the' Shannon-Fano algorithm and 'algorithm Lempel, Ziv and Welch. While not going into theoretical explanations, we examine the main characteristics of 'Huffman algorithm, which has marked the history of compression techniques.

We refer readers interested in further details on the techniques of Shannon-Fano and Lempel-Ziv-Welch links to more specific about this topic:

The Huffman algorithm

'S algorithm belongs to the category Huffman lossless, ie does not introduce any loss of quality. We scomporne operation in five elementary steps:

  • Is analyzed and counted the number of occurrences of the basic elements of the file to be compressed: the individual characters in a text file, the pixels in an image file.
  • The two elements are brought together in a less frequent category that represents them both. So for example if X and Y occurs 8 times 7 times, it creates the category XY, with 15 recurrences. Meanwhile, the components X and Y each receive a different marker that identifies them as elements entered in an 'association.
  • The next two items are identified less frequent in the file and sits as a new category, using the same procedure described in step 2. The group XY can in turn enter into and form new associations, for example, the XYZ category. When this happens, the X and Y are given a new identifier that ends with the Association extend the code that uniquely identifies each of the two letters in the compressed file will be generated.
  • Is then created for the next steps, a tree consists of a series of binary branches, within which appear more frequently and in subsequent combinations rarer elements in the file, while the elements are rarely more frequent. According to the mechanism described, this means that the rare elements in the uncompressed files are associated with an identification code length, which is growing every element of a new association. The elements are repeated more often instead of the original file are not present in the 'tree of associations, so that their identification code will be as short as possible.
  • Compressed file is generated, replacing each element of the original file, the code produced at the end of the chain of associations based on the frequency of that element in the source document.
The gain of space at the end of the compression is due to the fact that the elements that are repeated often are identified by a short code, which takes up less space than they occupy their normal encoding. Conversely rare elements in the original file in the compressed file receive a long code, which may require, for each of them, an area considerably larger than that occupied in the uncompressed file.

From the algebraic sum of the space gained by encoding short of the most frequent and space lost with the encoding of the most rare long you get the compression ratio produced by 'Huffman algorithm. From the above it follows that this type of compression is more effective the wider the frequency differences of the components of the original file, while poor results are obtained when the distribution of the elements is uniform.

In the same category ...
E-Learning
HTML (Course) HTML (Course)
The markup language for the Web from 29 €.
Webmaster Advanced (Course) Webmaster Advanced (Course)
Become a professional Webmaster. From 39 €.
Webmaster Base (First) Webmaster Base (First)
Create a Web site from scratch. Starting from 29 €.
Sponsored Links