Home Articles Benchmarks Information Resources VPR

ArticlesPostScript Sins

PostScript is now a common means of exchanging formatted documents. So why so many problems?

Kevin Thompson

The first implementation of Adobe's PostScript page-description language raised the capabilities of laser printers to new heights and led to the growth of the desktop publishing market. Because the PostScript language was designed to be device-independent, many vendors created PostScript printers and imagesetters, and virtually all applications and GUI environments provided PostScript drivers to support these output devices.

The proliferation of PostScript drivers has created an interesting side effect. Because all applications can produce PostScript files and PostScript printers are reasonably common (although less so than PCL printers), PostScript files have become a de facto standard for the distribution of formatted documents. This usage is especially prominent on the Internet, where we recently counted more than 20,000 PostScript files available for public download, and it has driven the development of PostScript viewing and other post-processing applications.

Those of us who routinely download and print these files may be surprised to discover that most of them contain errors. These errors are typically invisible when the files are printed on laser printers, but become painfully visible when the pages in the files are viewed or printed out of sequence. This article identifies common PostScript sins and their perpetrators.

The Document Structure Conventions

The DSC (Document Structure Conventions) defined by Adobe for PostScript files are a set of conventions that are not enforced by the language but to which all drivers should adhere. The DSC divide a PostScript file into three main portions: the header (or prologue), the page area, and the trailer. The header consists of all code from the start of file (denoted by the %!PS-Adobe comment) up to but not including the first page. Each page begins with a %%Page: <label> <ordinal> comment, where <label> is a string containing the page number (e.g., ii or 2), and <ordinal> is the sequence number of the page (first page is page 1). The trailer follows the last page, beginning with the %%Trailer comment and ending with the %%EOF comment. (Note that a percent sign denotes a comment, and two percent signs denote a predefined DSC comment.)

Encapsulated PostScript files are intended to represent a single image to be pasted into a larger document. Their internal structure is therefore simpler, and they contain no %%Page: comments, because there are no pages. The header and trailer portions remain, but the page area of a standard PostScript file is replaced by the code that draws the single EPS image.

The principle purpose of the DSC is to provide page independence, which allows the pages to be rendered in any sequence. Thus the header should contain all setup information, and the trailer code restores the interpreter state to that which existed before the file was processed. Each page should contain the information required to render that page--meaning any text, graphics, or font data required by that page--which has not been defined in the header.

The following sections describe common errors and the environments that typically make them. Most of these errors correspond to DSC violations.

PostScript Coding Errors

Page-independence violation (Windows). These files contain font or procedure definitions on one page that are used on subsequent pages. If you render the pages out of sequence, the fonts or procedures are undefined and rendering fails. This is perhaps the most egregious violation, and the one with the least excuse, because relocating the definitions to the header where they belong is a simple matter.

Page commands in trailer (OS/2). These files put page commands in the header or trailer. The OS/2 driver puts the showpage command for the last page in the file trailer so that rendering the last page by itself yields no image at all. This is another trivial error for which there is no excuse.

Page-boundary clipping (OS/2). These files contain code that clips the image to the physical page size, less a small margin (see the screen). The pages print normally, but when magnified for on-screen viewing the enforced clipping chops off the top and right portions of the image. This clipping should simply be omitted, because it is unnecessary and troublesome. This is a case of going to a lot of effort to do the wrong thing.

Color mapping on host (OS/2). The drivers that produce these files replace colors in the original document by gray-scale values in the PostScript file, guaranteeing gray-scale images even on color-capable devices. Because PostScript interpreters contain sophisticated algorithms to map colors to the properties of the output device (including black-and-white devices), host mapping is unnecessary and degrades the usefulness of the output.

Line-length violation (Windows). These files contain lines that exceed the DSC limit of 255 bytes. You'll often see this problem in font definitions. The font should be broken into lines of conforming length.

Zero-width lines (Windows, Tex). These files assume that visible results are produced by stroking or filling zero-width lines or filling a rectangle with a clipping region of zero width or height. Although the Adobe PostScript interpreter produces results in these circumstances, the documentation on PostScript painting rules indicates otherwise, and other interpreters may behave differently. (Similarly, you should avoid producing PostScript code that draws with single-pixel rectangles.)

Binary image data (Macintosh). These files contain bit-mapped images (usually photos) that are encoded in binary form, violating the DSC requirement that PostScript files contain only printable ASCII characters. Violating this requirement produces files that are damaged by E-mail or ASCII network transfers.

Header commands in page (Corel). These files put page-resizing commands such as letter in the page text. These commands are benign for printers, but cause the image bit map to be reallocated in a viewer, thus erasing the image immediately after it has been rendered. Page size commands belong in the header, not the page area.

Hexadecimal strings (Interleaf). These files use hexadecimal-encoded strings instead of literal strings. Although this usage is not strictly in error, it is undesirable, because the hexadecimal encoding requires twice the space of a literal string and impairs search operations.

PostScript Comment Errors

Although a PostScript interpreter ignores comments, other post-processing programs (e.g., file viewers) are dependent on the DSC comments and cannot function without them. Thus, DSC comments are important, and all drivers should provide them. The following comment errors are unfortunately common:

Omission of all comments (DOS applications). These files lack the %%Page:, %%Trailer, %%EOF, and other DSC comments. They cannot be viewed or otherwise post-processed.

Multipage EPS files (Windows). This is a user-interface issue in Windows. The Windows PostScript driver prominently displays an option to save to an EPS file, while not even documenting the obscure mechanism by which you produce a standard PostScript file. The result is that most Windows users select the EPS option to create multipage files. The driver then dumps all page images into the file without any %%Page: comments to denote their position. All post-processing programs, which rely on page-boundary markers, fail to locate the page data in these files.

The solution to this problem is to put the PostScript and EPS file options on equal footing in the user interface. Alternatively, having the driver put the %%Page: comments in the EPS files, although unnecessary for real EPS files, would at least solve the page-boundary problem.

Improper document nesting (Windows). These files lack the %%BeginDocument: <name> and %%EndDocument comments, which are supposed to denote embedded EPS files. The result is that post-processors may incorrectly identify the embedded file as a new, stand-alone PostScript file and fail to render the surrounding page or the rest of the document.

Omission of resource comments (OS/2). These files are lacking the %%BeginResource: <fontname> and the %%EndResource comments, which are supposed to denote font definitions, thus preventing post-processors from finding the fonts when needed.

Ignorance Is No Excuse

You should bear in mind that this is not an exhaustive list of PostScript errors; these are simply the most common. The errors we've described have two characteristics: They are due to ignorance, and they are easily rectified. The correct approach is usually no more difficult to implement than the incorrect approach, and sometimes even easier. Let's hope this article will inspire PostScript driver writers to improve their products and thus make articles like this one unnecessary.


Looks Lousy, Prints Great

screen_link (41 Kbytes)

Page-boundary clipping on OS/2. This document would print fine, but it gets more than margins hacked when it is magnified for viewing.


Kevin Thompson is the president of Magus, a company that produces PostScript viewing software for OS/2 and Microsoft Windows. He has a Ph.D. in physics from Princeton University and never intended to learn so much about PostScript. You can reach him on the Internet at thompson@magus.com or on BIX c/o "editors."
UplevelPrevNextSearchComment  Copyright © 1994-1995Logo