http://www.byte.com/art/9510/sec9/art1.htm (PC Press Internet CD, 03/1996)

Bomb's Away

Here's how BYTE remade a postcard-based reader survey into a Common Gateway Interface-based Web application

Jon Udell

Smuggled onto your PC, in the guise of a multimedia viewer, your World Wide Web browser is actually an all-purpose client that runs applications on a vast global network. Likewise, your Web server, justified to your company's controller as a marketing tool, can act as an applications server that connects you to customers, suppliers, and business partners.

One of the most popular uses of this new client/server model is the on-line survey. If you read BYTE in the 1970s and/or 1980s, you may recall BYTE's Ongoing Monitor Box, more commonly known as the BOMB. It was a page that reported how readers ranked the previous month's articles, together with a postcard that solicited feedback on the current issue.

The BOMB fizzled years ago, and we've missed it. The advent of the BYTE Web site set the stage for a 1990s remake. To try it, surf on into http://www.byte.com and go to any article in our archive. Along with the standard links across the top of each page, you'll find one labeled "Comment." Click on it to invoke a Hypertext Markup Language (HTML) form with multiple-choice questions and space for comments.

Our archive isn't just an electronic publication now; it's also an application. What follows is a look at how we put it together and how you can build one like it.

Step 1: Make the Form Context-Sensitive

The first task was to make the Web BOMB context-sensitive. If you're reading a story on-line and decide to comment on it, the form that pops up when you click on the Comment button will display a title such as "February 1995 / News & Views / Optical Drives." These same three items -- the story's issue, section, and title -- will also appear in the database record that's created when you submit the form.

Since the archive has 1500 documents, must there also be 1500 forms? Actually, there are none. As is common practice, the Web BOMB relies on a Perl script to print the forms on demand. What activates that Perl script? The Comment link on each of the 1500 pages. So, while I didn't have to write 1500 forms, I did need to write 1500 links that look like this:

  <a href="[unarchived-link]"  cmt1.pl+9502+News_&_Views+\
  Optical_Drives>Comment</a>

This code says: "When the Comment link is clicked on, launch the Perl interpreter, feed it the HTML-form-writing Perl script cmt1.pl, and in turn feed cmt1.pl three arguments -- the issue, section, and title of the current article."

These links are far more concise than the forms that cmt1.pl generates, but it still wouldn't have been practical to write 1500 of them by hand. But then, I don't write any of the archive's HTML source by hand. An Epsilon Extension Language (EEL) program (see part A of the figure "Anatomy of the Web BOMB: Part 1") does that for me, and one additional printf statement was all it took to put a context-sensitive comment link on every page in the archive. Part B of the figure shows the EEL program's output -- the HTML source for a page. Part C illustrates how a browser renders that HTML source.

Step 2: Print the Form

When you click on the Comment button, the Web server invokes cmt1.pl and relays its output -- an HTML form -- back to your browser. The cmt1.pl script begins as shown in part D of the figure. The first line prints the standard Common Gateway Interface (CGI) header. The second line saves the uniform resource locator (URL) of the article that prompted the comment, so the form-processing script (cmt2.pl) can offer the user a link back to the article.

HTTP_REFERER is the charmingly misspelled CGI environment variable that contains the URL of the referring document -- in this case, the article on which the Comment link appears. Note how the name HTTP_REFERER keys into the associative array (i.e., the list of name-value pairs) that Perl uses to represent environment variables. Each of the next three lines takes an argument from the Perl command line, saves it in a variable, and turns underscores (which are inserted at the EEL stage to make multiword arguments behave atomically) back into spaces.

The cmt1.pl script then prints the HTML document that your browser renders as the Comment form. It incorporates the issue, section, and title variables into the document's header. Then it begins writing the form as the body of the document. The opening phrase of the HTML form element says: "On completion of this form, launch the Perl interpreter, feed it the form-processing script cmt2.pl, and in turn feed cmt2.pl four arguments. Transmit the data collected on the form to cmt2.pl by way of a file (method=post) rather than on the command line (method=get)."

The rest of cmt1.pl completes the form. HTML can describe interactive GUI controls, such as check boxes, radio buttons, drop-down listboxes, and single- or multiple-line text boxes. The Web BOMB just uses a series of check boxes for the multiple-choice questions, and a multiline text box for the free comments. The figure shows some of the HTML source that creates the multiple-choice questions (part D) and the resulting form as displayed in a browser (part E). The cmt1.pl script ends this way:

  print <input type=submit value=
          Send><input type=Reset>
        </form></body></html>";

The input tags specify two buttons. The first submits the form to cmt2.pl for processing, and the second resets the form. The final three tags close the form element, the document's body, and the document.

Step 3: Process the Form

When you fill out the form and click on the Send button, the Web server launches the back-end CGI program named in the form's action="[unarchived-media]"> statement: cmt2.pl. Its first two lines (see part F of the figure) say: "Include CGI support routines; call one of them, ReadParse, to locate the file in which the server placed the completed form's data; and transfer the name-value pairs stored there into a Perl associative array called `input.'" Using the names of form variables as keys into the array, cmt2.pl then takes the values of the form variables and puts them into Perl variables, as shown in part F of the figure.

With the form data in hand, cmt2.pl can now rule on whether to accept the form. The Web BOMB presents a form in three parts -- questions about the article, questions about the reader, and free comments -- but requires only that you complete the first section. If you don't, at least one of that section's form variables will be null. The cmt2.pl script notifies you if that's the case.

How? A conventional GUI program pops up a dialog box, but CGI programs can communicate with users only by way of HTML, so cmt2.pl must format its "need more info" message as an HTML document that it relays back to you by way of the server.

How does the Web BOMB redisplay the form so you can complete the required fields? That's easy. The "need more info" document (part G of the figure) tells you to use your browser's "go back" function, which reloads the form. When you complete the form to cmt2.pl's satisfaction, it logs the data (part H) and returns a "thanks for your input" document (part I).

From there, you could unwind the document stack using the "go back" function, but it's more convenient to jump straight to the article that prompted the comment, so cmt2.pl supplies a link to it. This is where the article's URL, which cmt1.pl got from the HTTP_REFERER environment variable and passed to cmt2.pl, finally comes into play.

Step 4: Log the Data

When I first envisioned the Web BOMB, I thought it should pump the survey results into a relational database. This client/server scenario, in which clients connect through HTML forms to relational servers, represents the most compelling (and, to vendors of SQL front-end toolkits, the most frightening) aspect of Web/CGI technology. Since a survey isn't a transactional application, though, implementing the Web BOMB this way seemed like overkill. Users would pay an unnecessary performance penalty, and I'd end up with an application that was more complex and less portable than it had to be.

Instead I opted for a simpler, more modular solution. Rather than inserting a record into a database, cmt2.pl logs data by writing a line of text to a file, as shown in part H of the figure. As my Perl mentor Ben Smith likes to say, "If every record is a file, record locking isn't a problem."

How do you analyze the data? That's easy. Combining all the comment files yields a comma-delimited ASCI import file that FoxPro, dBase, or almost any other database program can read. Then you can answer questions such as "How did the interest in reviews vary by month?" or "What were the top 10 articles?" with straightforward SQL queries.

Next-Generation Client/Server?

Like the thousands of CGI programs on the Internet, the Web BOMB exhibits four desirable properties: client-server, because your browser interacts with our server; cross-platform, in that Windows, OS/2, Mac, and Unix clients can connect to any of these flavors of servers; WAN, since clients connect to servers over the Internet; and rapid application development (RAD), because it's a snap to build.

Web technology isn't just a way to publish electronic documents. It's also a way to build networked applications that work within and across corporate boundaries. But it isn't yet a client/server developer's dream. I realized this when, in parallel with the Perl version of the Web BOMB, I prototyped a Visual Basic version using the nifty Windows CGI that's unique to O'Reilly and Associates' WebSite.

In principle, this sounds great: VB, the preeminent RAD tool, has to be the ultimate enabler for Web development, right? Well, maybe not. It's true, as O'Reilly's documentation says, that VB's a great way to get at data that's available through Windows APIs, such as OLE and Open Database Connectivity (ODBC).

But that's not necessarily a VB exclusive. A Win32 version of Perl, for example, could also wield these strategic APIs. On a level playing field, I'll choose Perl: For the text-processing chores typical of CGI applications, Perl's far more capable than VB.

What makes VB great are its plug-in user-interface (UI) components, such as the data-bound controls that have revolutionized client/server database development. However, Web-aware VB custom controls (VBXes) and OLE custom controls (OCXes) don't exist yet, and even if they did, today's browsers couldn't host them.

Browsers and components will evolve in this direction. Meanwhile, Sun Microsystems' Java/Hot Java technology has the potential to trigger a revolution like the one that VB touched off. Java's a language that describes Web applets; Hot Java's a browser that has the ability to download and run them. One way or another, strategic networked applications will increasingly run on the Web.

BOOKNOTE Build a Web Site, $34.95 by the staff of net.Genesis and Devra Hall Prima Online Books, 1995 ISBN 0-7615-0064-2 A fat cookbook full of useful recipes. Shows how to acquire and set up the CERN (the European Laboratory for Particle Physics) and National Center for Supercomputing Applications (NCSA) Web servers. Discusses HTML 2.0 and 3.0, teaches basic CGI programming, and offers an advanced tutorial on rolling your own Web clients and servers. TOOLWATCH Submit It http://www.cen.uiuc.edu/banister/submit-it/ Once you've built a Web site, you'll want to publicize it. Where? On Yahoo, Lycos, WebCrawler, and a dozen other well-known sites. You can visit each of these in turn or you can have Scott Banister's Submit It act as your publicity agent. Anatomy of the Web Bomb: Part 1 illustration_link (119 Kbytes) Common Gateway Interface (CGI) programming involves peculiar flows of control and data. Follow the solid black arrows to trace the flow of control from programs to documents to programs to documents as the Web BOMB works. Follow the red, numbered circles to trace the flow of the three items of data -- the article's issue (1), section (2), and title (3) -- that make the Web BOMB context-sensitive. A. Don't even think about publishing a large collection of documents on the Web unless you can automatically generate those documents. I could have used Perl for this job, but Epsilon's EEL seems even better. I like how EEL supports both declarative and navigational text processing. You can do the same kinds of global regular-expression searching and replacing that Perl can do. But you can also programmatically wield all the navigational and interactive powers of a text editor: searching forward and backward, inserting text, and jumping to locations. B. HTML, like PostScript, is a language that should mostly be written by programs rather than by humans. Unlike PostScript, though, HTML is easy to write. That makes Web development very convenient. When you want to use a program-generated form like this one, first write it out by hand and test it in your browser. When it works the way you want, you've got a specification for the document that your program must write. C. You launch a CGI program by clicking on a link. Here the link is textual -- it's the word Comment. To create this kind of link, you write HTML source like this: <a href="[unarchived-link]"> Alternatively, you can make the link graphical. To do that, you write HTML source like this: <a href="[unarchived-link]" src="[unarchived-media]"> Note: With most Web servers, the CGI link actually looks like this: <a href="[unarchived-link]"> Invocation of the Perl interpreter is implicit. However, NT servers derived from EMWAC's code require explicit invocation of Perl. Isn't that dangerous, I wondered? What's to keep a user from entering a URL like http://www.byte.com/cgi-bin/perl.exe?-e+unlink+*.* using Perl's "enter a line of script on the command line" feature to trash a bunch of files -- or worse? Process Software agreed this is a problem, and it will have a fix in Purveyor 1.1. D. Web servers communicate a number of environment variables to back-end CGI programs. Here, Perl gets the referring article's URL from the HTTP_REFERER variable. It will be a string like this: http://www.byte.com/art/9502/sec1/art1.htm Separately, Perl saves the issue, section, and title information passed to it on the command line. Anatomy of the Web Bomb: Part 2 illustration_link (97 Kbytes) Common Gateway Interface (CGI) programming involves peculiar flows of control and data. Follow the solid black arrows to trace the flow of control from programs to documents to programs to documents as the Web BOMB works. Follow the red, numbered circles to trace the flow of the three items of data -- the article's issue (1), section (2), and title (3) -- that make the Web BOMB context-sensitive. E. Widgets that the HTML forms language supports include radio buttons, check boxes, single-line text boxes, multiline text boxes, drop-down listboxes, and command buttons. Of these, the Web BOMB uses only three: radio buttons for the multiple-choice questions, a multiline text box (not visible here) for the free comments, and command buttons (not visible here) to send or reset the form. Be sure, as always, to check the look of your form in a variety of browsers. This layout, with vertical bars separating the choices, looks fine in Netscape, but not so hot in Mosaic. F. I found the NT version of Perl 4 that I'm using here on Process Software's ftp site, ftp.process.com. The CGI library, cgi-lib.pl, is there, too. Intergraph originally ported Perl to NT, so you can also find NT Perl stuff at ftp.intergraph.com. G. In the Web/CGI environment, it's easy to return the user to a form that hasn't been completely or correctly filled in. Just tell the user to go back -- something all browsers can do. No programming required here! H. I could have fed the results into a live database back end, but that wouldn't have been as easy, as modular, or as portable. This solution creates a file for each comment record. Off-line, I can scoop up the comments, turn them into a database import file, and run SQL queries on them. I. Here's where the original article's URL, obtained from the HTTP_REFERER variable, finally gets put to use. Clicking on it returns the user directly to the original article. Jon Udell (judell@bix.com) is BYTE's executive editor for new media. Copyright © 1994-1995