The PCX File Format: Very Old and Somewhat Gray
Th.M. Hupkens
The PC Paintbrush file format, or PCX-format, is one of the oldest and most popular graphical file formats. Almost every drawing program can read and write files in this format. Many word processors, including WordPerfect and MS-Word, can read illustrations in PCX-format without problems. The PCX-format supports all popular color resolutions. Older drawing programs may not be able to process 24-bits (or true color) PCX-files. Still, the format is not very well adapted for modern applications, so the importance of PCX is gradually decreasing. Nevertheless, there are still many compact disks filled with clip-art in PCX-format and some older application programs use PCX-files exclusively.
Symptoms of the Past
Originally the PCX format was used by the drawing program PC Paintbrush of ZSoft. Later-on a Microsoft version of this program was distributed with Windows. In Windows 95 the program Paint is included, that can be seen as the successor of Paintbrush. However, the PCX-format is not the standard format of Paint anymore. Older versions of Paint can read, but not write PCX-files anymore; recent versions of Paint cannot even read PCX-files anymore.
In the time the PCX-format was defined, the computers were very slow. So it was important that the stored image could be shown on screen fast. Therefore, the format was adapted to the then common graphical adapters. The complete history of everything that has to do with graphical applications on computers is mirrored in the PCX-format. For instance, the data storage of increasingly sophisticated graphical adapters up to the VGA is found in the format. Originally the whole screen image had to be stored, different sizes were not allowed. It was possible to store only part of the image on screen though, but then the PCC-format was used. This format is almost equal to the PCX-format. In later versions this limitation was removed and the PCC-format became redundant. In case you have any old *.PCC files, just rename them to *.PCX. The consequence of this limitation has been that programs that for example converted from GIF to PCX, or scanners that scanned images in nonstandard size, were forced to deviate from the then PCX-standard. That is why so many undocumented "dialects" of the PCX-format have been developed. I think that there is no other file format that has become such a mess.
Outline of the format
A PCX-file contains a header in which all relevant information of the picture is specified. All dimensions always are in pixels, starting at the left upper corner of the device. It should be noted that there are very many PCX-files that contain incorrect information. Therefore a program that reads PCX-files should try to rely not too much on the data in the header.
PCX files always start with an identifier, containing the decimal value 10. A program that reads PCX-files must check this, because there are many programs that use the extension PCX for other purposes. After this byte the version number of the format follows. From this number, among other things, one can determine the corresponding version of Paintbrush (see Table 1).
0 | PC Paintbrush Version 2.5 |
2 | Version 2.8 with palette information |
3 | Version 2.8 without palette information |
4 | PC Paintbrush for Windows |
5 | PC Paintbrush Version 3 and up, and PC Paintbrush Plus |
Table 1 Version number information.
The field Compression is always 1 (or true), there are no PCX-files that use no compression.
The field BitsPerPixel specifies how many bits in one color plane are used per pixel. The coordinates (xMin, yMin) and (xMax, yMax) define a window in which the image must fit. In older versions of Paintbrush, the image always had to start at the upper left corner of the screen. In modern applications it should be left to the decoder to determine where the image is placed. So xMin and yMin usually are zero. However, I have found some programs that store default xMin = 1 and yMin = 1 (meaning the upper left corner). HorizontalResolution and VerticalResolution should be the resolution (in dots per inch), of the device where the image was originally generated. However, often the total number of pixels is given (in older versions this corresponds to the resolution (as an absolute number of pixels) of the screen that was used when making the PCX-file). Sometimes these fields are set to zero. Because of the unreliability of the stated resolution it is best not to use it. It appears that many programs that process PCX-files assume that the pixels are always square, even if it can be concluded from the information in the header that this certainly is not the case. The consequence is that the image looks somewhat tall or wide. Except for output devices that use a high resolution (such as printers), it is often difficult to correct for the aspect ration if the output device cannot change its pixel size. Today, the number of images that use an aspect ratio other than 1 are rare, so it is questionable if a program should compensate for a different aspect ratio.
In the array Colors the red, green and blue intensities are specified, if at maximum 16 colors are used. If there are 256 colors, usually the most important colors will be specified here also. As always, 255 is the maximum intensity for that specific primary color (red, green or blue). So black is specified as 0/0/0, pure red as 255/0/0 and white as 255/255/255.
The number of planes originally was the number of planes that was used by the graphical adapter, but at present everything ranging from 1 to 4 is possible. The number of colors of the image can be calculated as follows: NumberOfColors = 2Planes * BitsPerPixel. Note that the same number of colors can be obtained in various ways, for instance the combination 4 colors can stored in 2 planes using 1 bit per pixel or in 1 plane using 2 bits per pixel.
BytesPerLine is the number of bytes per horizontal line in the original image, rounded up to an even number. If you think that BytesPerLine is redundant information, than you're right. However, I have found PCX-files, that contain more bytes per line than strictly necessary. This is probably due to a bug in the program that generated those files (or is there a secret message hidden in the superfluous bytes?), but nevertheless we should be able to process such files. To avoid problems we must use the specified number of bytes when reading the file, but use the specified width when displaying the image. Many well-known programs don't do this right, amongst others Word Perfect and the otherwise excellent Paint Shop Pro 6.0 (see Figure 1).
Figure 1 Left: This happens when BytesPerLine is not used properly (Image processed by Paint Shop Pro); Right: correct image.
In GrayInformation you can specify that the image contains gray values only, but many versions of Paintbrush neglect this field, so you can do that as well. The 58 bytes of Filler are mainly used to obtain a header of 128 bytes (this was of some importance in the earlier days of computers). All bytes of Filler should be zero. However, in some exceptional cases important information seems to be contained in it (for instance, if a the drawing was made using a VGA adapter and stored as a CGA-resolution image in 4 colors not being the standard CGA colors). I have not been able to find precise information on how this array is used in those cases, however.
Directly after the header the compressed image data are stored. If the file contains a 256 color image, then at the end of the file, after the compressed data there are 768 bytes that specify the 256 red- green and blue intensities. This 'tail' is preceded by a byte containing the (decimal) value 12.
Run Length Encoding
The PCX-format always uses the same compression method: run length encoding. The compression algorithm is very simple: depending on the maximum number of colors the image can contain, in the header the number of color planes is specified. If the image comes from a drawing program this number corresponds to the actual number of planes the graphical adapter used at the time the drawing was made. So, for instance, EGA and VGA use 4 planes when used in 16 color mode; every plane contains one bit per pixel, together making a number ranging from 0 to 15. The actual color is stored in a palette table. If the image is scanned or is obtained through file conversion from another file type, it is also possible that the 16 colors are stored in one plane (four bits in a row correspond to one pixel). A decoder should be able to process any possible combination.
When encoding, the screen (or image) is scanned from left to right. For each horizontal line all specified planes are scanned, one after the other, then the next lower line is scanned. If within one scan-line a series of identical bytes is found, only one byte is stored, preceded by a byte that contains the repeat rate. So, identical bytes in the next scanline will need a new repeat count.
Note: If the same bytes are found in the same scanline, but at successive planes, these bytes may be taken together in one repeat count. This makes it much harder to make a decoder, but the fact of the matter is that PCX-files that use this 'option' do exist and a decoder should be able to process them. Again the famous file conversion program Paint Shop Pro does not do this correctly.
To indicate that a certain byte cannot be used directly, but contains a repeat count, the first two bits (the "high" bits) are set to 1. The remaining six bits together make up the repeat count. Since the two high bits form the number 192, a decoder can simply test for the presence of a repeat count by checking whether the value is greater or equal to 192. If that is the case than the repeat count is found by subtracting 192 from the value. If the value is lower than 192, that value can be used once. Of course in the image can be pixels that have a value >=192. If that value is not repeated (it corresponds to a single pixel) than an extra byte containing a repeat count of 1 is necessary. This makes the algorithm slightly less efficient.
Three bytes is a crowd
The algorithm described above was developed very long ago to compress drawings, which contain -- particularly with the simple drawing programs of that time -- many repeats. So this algorithm is rather efficient for drawings containing many repeats. Today, however, the PCX-format is also used for scanned photographs, and drawings made by sophisticated ray tracing programs. The algorithm is completely useless for this kind of images. In many cases, instead of having less bytes we end up with more bytes. This is certainly the case for 24-bits images. I have compared some popular formats for a number of 'typical' images, to compare the efficiency of these formats. Obviously this gives a rough indication only, the precise figures strongly depend on the contents of the images. It is obvious that GIF always is better than PCX. Only with simple drawings the result of the PCX algorithm are acceptable. To obtain a difficult test image, I used Paint Shop Pro to add randomly colored pixels. From the fact that GIF still give some compression it can be deduced that the colors of the pixels were not completely random. As a third image I have taken a scanned photograph, in which large plain areas were visible. I used rather large images so that the headers and other overhead could not influence the results too much. In all cases the compression factor of the BMP-file is 1, because this format doesn't use compression. It is not possible to make the same comparison using 24-bits images, because GIF and RLE do not support this number of colors. To check the efficiency of the compression-algorithm of PCX for realistic 24 bits images I have scanned many color photographs. The size of the BMP-file was compared with the size of the PCX-file. In all cases the PCX-file was 5 to 15% larger. You cannot simply solve this problem by turning compression off, because this is not allowed in the PCX-format, although there is a field in the header for this purpose I think it is clear why the PCX-format is threatened with extinction.
Figure 2 Average compression factor for various formats and color resolutions
Figure 3 Filesize versus "quality" (as stated by Paint Shop Pro and Graphic Workshop)
If photographs are scanned in 24-bits, usually the last two or three bits of every byte are perfectly random. The same is true if the images are recorded directly from a video camera. For this reason even a good compression algorithm cannot compress 24-bits images very well. In view of the speed it is better not to use compression at all.
Tips concerning 24-bits images:
If some loss of quality can be accepted (so the compressed image is not exactly the same as the original image), than a much higher compression rate is possible. For this purpose the JPG-format has been developed. In figure JPGDATA you can see what enormous data reduction can be achieved. I used both Paint Shop Pro and Graphics Workshop to make the conversion from BMP to JPG; both programs gave comparable quality for equal compression rates. Generally for a stated "quality" of 85% (Paint Shop Pro calls this a compression of 15%) hardly any loss in quality is visible, although the file is about a factor of 5 (!) reduced in size. At 10% "quality" the resulting image looks horrible.
Viewers for everything near PCX
You can obtain the source code of two viewers that I have written in Delphi and in Turbo Pascal, by writing me an e-mail. Both viewers try to be as forgiving as possible when the PCX-file is not correct: all dubious PCX-files that I know give correct results. If you find a file that cannot be viewed by my viewer, but can be viewed correctly by any other PCX-viewer, please let me know so I can improve my viewer.
The Delphi PCX-viewer
In the Delphi version always a 32-bits bitmap is used to store the decoded image. This makes the program simpler and makes it easier to adapt the source code for other purposes. If you have a CGA-image, the four colors are usually translated into gray values. This is the same way as most professional programs do it, however, it is possible that the creator of the picture meant the default CGA-colors. Anyway it is very unlikely that you want to see CGA images with those awful 'CGA-colors' on your super VGA screen, if so then use the DOS version of this viewer described below or adapt the Delphi versions to your need.
The Turbo Pascal PCX-viewer
The Turbo Pascal version is a DOS-application. It is meant for simple systems: it can be used with any standard VGA. If can run under MS-Windows in a DOS box; Windows will change back and forth to the correct video mode automatically if needed. The viewer shows true color images as gray images, because it would be too difficult to implement an algorithm to reduce the number of colors to 256. The color of CGA pictures will be correct in the sense that the display is put in CGA mode, to give the default CGA-colors. The viewer uses some special video modes [3]. If the program becomes extremely slow when used under MS Windows, try holding the ALT-key pressed, maybe everything suddenly is very fast. Of course you can better use the Delphi program if you have Windows!
Literature
[1] Steve Rimmer: Bit-Mapped Graphics, 1st Ed, Windcrest Books
1990 and Supercharged Bitmapped Graphics 1st Ed,
Windcrest/McGraw-Hill 1992.
[2] Andreas Mitschein/dem PCX-Format enträtselt (The PCX-format
unraveled), Tool 8/91 p 78.
[3] Th.M. Hupkens: Oude VGA's op hun paasbest (Old VGA's at there
very best), DOS/WIN special 9 number 1 (1994) p 63.