If you work for a company that employs more than a hundred people, it's probably set up a lot like BYTE: one headquarters and a bunch of satellite offices. Linking far-flung intracorporate LANs is a problem that many companies solve poorly, if at all. Linking those intracorporate sites to extracorporate ones is a problem that most companies don't even try to solve.
World Wide Web technology can tackle both. I'll show how this month with a pair of applications, one private and one public.
The private application, demos, converts reports filed by BYTE staff members into a password-protected Web archive that only fellow BYTE staff members can view and search. The public application, vpr (which stands for virtual press room), empowers vendors to submit product and technology announcements directly to BYTE's Web server; it builds a Web archive that BYTE's staff or any other Web user can navigate and search.
Web vs. Notes
Nearly every day of the week, vendors demonstrate their products and technologies to BYTE editors in Peterborough; San Mateo, California; and New York. We file reports on these demos in our private conference on BIX, where we also hold an ongoing debate about which emerging technologies matter most and how best to cover them.
BIX is a fine discussion forum, but a lousy text database. Once a report scrolls by, it's hard to find it again. The deluxe tool for distributing, managing, and discussing this kind of semistructured data is clearly Lotus Notes. Pervasively deployed throughout a company, Notes enables even ordinary users to build demos-like applications -- a stunning simplification of client/server software development. The problem is getting Notes deployed as pervasively as the Web technology that's becoming the kudzu of the corporate LANscape.
You wouldn't toss out your Notes server and put in a Web server to run the likes of demos and vpr. But if you don't already have a Notes infrastructure in place, these applications might make you stop and think. They're easy to build, they work within and across corporate networks, and they run on nearly every kind of client.
Destination Host Unreachable
The Internet's pervasive, but is it a reliable transport for mission-critical applications? Not by a long shot. Start with a network that's anarchically managed, dismantle and privatize its backbone (NFSnet), and dump in a few million new users (from America Online, CompuServe, MSN, and Prodigy). You'll end up with a mess.
Folks in the Internet business assume -- not unreasonably -- that it'll get cleaned up sooner or later. But right now, slowdowns and partial outages are common. I access the Internet primarily from four points: two local providers and two national providers. More often than I like to admit, a ping command from any one of these access points returns the infuriating reply "Destination host unreachable."
Experiments with traceroute reveal no single culprit; congestion seems to afflict a variety of intermediate networks in a variety of ways. Those on the East Coast of the U.S. know that afternoons, when both Route 128 and Silicon Valley are in full swing, can be especially sluggish. At an Internet seminar I recently attended in New York, not a single afternoon presenter could fetch a complete home page.
Still, a global network that works most of the time is far more powerful than a local one that always works. You wouldn't build a real-time trading system on today's Internet, because such applications require guaranteed low latency and high availability. But applications like demos and vpr can deliver useful benefits even on an imperfect substrate.
The Power of Plain ASCII
Some months ago, I posted a form (see the figure "A Simple Tagged ASCII form") to our staff conference on BIX and asked editors to use it to structure their demo reports. This simple but surprisingly powerful technique is all you need to make an E-mail or conferencing system work as the data-input front end to a database.
Because the demos database that these forms go into is really just a collection of Hypertext Markup Language documents, the form we use could make everyone tag entries in HTML, but that would be overkill. A simpler format, like the one shown in the figure, is easier for users and works just fine.
It matters not at all which tags you use; it matters only that you use them and do so consistently. With Perl or any other text-processing language that supports regular expressions, it's trivial to convert any well-formed ASCII tags into HTML. (Why plain ASCII? All word processors can produce it, Internet mailers can reliably exchange it, and text-processing tools can easily parse it.)
For demos, I wrote a little Epsilon Extension Language (EEL) program to do the HTML conversion and built a simple table of contents that lists the reports in reverse chronological order. When a bunch of demo reports had accumulated in the BIX conference, I downloaded them, ran the converter, indexed the resulting HTML files with freeWAIS (see "Web Search," September BYTE), moved the archive to the Web server, and secured it with a password.
Securing Private Data
The demos application is an example of a private application that uses the public Internet as its transport. (Another flavor of private application runs only within private IP networks hidden from the public Internet; see the sidebar "RFC 1597 Revisited.") Because demos runs on the Internet, it's accessible from anywhere -- offices, homes, hotel rooms, and phone booths around the world. But since the Internet's one big party line, how can you publish documents intended for only limited distribution on a Web server that the whole world can see?
The first line of defense is to disallow browsing of the directory tree that the Web server controls. If nobody can see the DEMOS directory on the server, and no readily accessible pages contain links to it, then nobody's likely to find and explore it. Many Web archives and applications begin life this way: as prototypes that are public but are advertised only to a small audience.
If you need more than the most casual form of privacy, though, you'll want to guard against unauthorized use of your private archive's hidden uniform resource locators (URLs). Most servers, following the style of the National Center for Supercomputing Applications (NCSA) and European Laboratory for Particle Physics (CERN) servers, support two forms of authorization: by user name/password and by IP address.
User name/password methods rely on a directory of Web users that's distinct from the underlying network OS's (NOS's) user directory. This duplication is something that I normally abhor because I've spent too many hours synchronizing E-mail, fax-server, and other user directories with NetWare binderies. I want LAN-based services to synchronize with NOS user databases, and I'm often disappointed when they don't.
But Web applications are a different breed. They're inherently global. And unless your corporate LAN is in the minority that's running NetWare 4.x or Vines, you probably don't have a global directory to tap into, so it makes sense to create one for this purpose.
Don't be scared off by the grand unified theory of directory services, however. In practice, securing a Web archive is quite simple. If your private application doesn't need to track individual users, you can even have everyone share a single, common account, as we do with demos.
With the other approach -- restriction by IP address -- you can, in principle, form a virtual private network on the Internet. If our Peterborough, San Mateo, and New York subnetworks were 199.125.99, 199.125.100, and 199.125.101, respectively, our Web server could be configured to allow only the nodes on those subnetworks into the DEMOS directory.In practice, however, this doesn't work so well. Allowing only these subnetworks defeats the worldwide access that makes this kind of application so compelling. But if you also allow access for IP addresses from the likes of BIX, CompuServe, and MSN so that traveling staff can reach the Internet through these services, you can't distinguish between staff and nonstaff users: Both will present the same IP addresses.
User/Password Authentication: Unix and NT
To secure a DEMOS directory on the BYTE Network Project's BSDI 2.0 machine, which was running the NCSA Web server, I first created a password file with this command:
htpasswd -c /pwdir/.passwd DEMOS_USER
This prompts for and confirms a password, creates in /pwdir the password file .passwd, creates in .passwd the user name DEMOS_USER, and prompts for a password. Then, in the DEMOS directory that I wanted to protect, I put a file called .htaccess, which contains these lines:
AuthUserFile /pwdir/.passwd AuthGroupFile /dev/null AuthName ByPassword AuthType Basic <Limit GET> require user DEMOS_USER </Limit>
This setup limits HTTP GET requests (the kind that fetch documents) to user DEMOS_USER, specifies password authentication, and then refers the server to /pwdir/.passwd for DEMOS_USER's password.
On the pair of NT Web servers I've been using -- O'Reilly and Associates' WebSite and Process Software's Purveyor -- there's none of this Unix-style, text-file-oriented administration. Both offer GUI tools that you use to create users and groups, specify passwords, and authorize access to resources. WebSite provides tabbed dialog boxes that you use to define users, groups, and realms, which contain both users and groups. Purveyor installs a File Manager extension that you use to define and interactively test access controls.These graphical tools are great when you're sitting at the server's console. But they're no help at all when you need to modify access restrictions on a distant server. NT's lack of support for both telnet- and X Window System-style remote access is its most serious drawback as a platform for Web service. The Unix way is more primitive, but for remote administration it's far more effective.
Searching the Archive
For this project, I used the Simple Web Indexing System for Humans (SWISH) 1.1 (ftp://ftp.eit.com), which compiled easily and ran smoothly under BSDI 2.0. SWISH is just an indexing and search tool. To use it in a Web application, you need to wrap it in a front-end form that sends in search keywords and a back-end filter that formats the SWISH output as HTML -- more trivial chores for Perl scripts.
Once the vpr archive was searchable, I realized it could merge with the BYTE on-line archive. That way, a user of the archive who enters the search keyword telephony sees a list of hits that includes BYTE articles mentioning telephony, as well as vendor-supplied press releases and white papers on telephony (see the screen). Therefore, users get more information, vendors get more exposure, and BYTE gets better control over vital -- but hitherto intractable -- sources of information.
I'm excited about vpr because I'm swamped with paper press releases. Vendors: I hate to say it, but if you send me paper, I might as well burn it for all the good it does either you or me. So drop on by http://www.byte.com and try vpr. It will ensure that BYTE staff members worldwide can find your information when they need it.
How long will it take to move the stuff from a private editors-only archive to the public BYTE archive? The gating factor is administrative, not technical: We just need someone to verify what we republish to the world. Nothing else prevents this internal corporate application from becoming a global one. That's why the Internet, if it can get past the era of brownouts, just might live up to its hype.
TOOLWATCH
Weblint (http://www.unipress.com/web-lint/) HTML Validation Service (http://www.halsoft.com/html-val-svc/) BYTE reader Adrian John Howard recommended these on-line HTML checkers to me, and I recommend them to you. Point either one of them at one of your own URLs to find out which HTML sins Netscape has been allowing you to commit.
BOOKNOTE
Programming Perl, $29.95 by Larry Wall and Randall Schwartz O'Reilly and Associates, 1995 ISBN 0-937175-64-1 O'Reilly and Associates says that an update to the Perl bible is forthcoming. Until then, this is it: the complete guide to the programming glue that holds the Web together.
You can use any old tags for the front end of a database of HTML documents; just use them consistently. Perl makes conversion to HTML a snap.
The HTML <meta> tag is a convenient place to tuck arbitrary name/value pairs. You can use the values stored here to create ordered views of an HTML document archive.
Use this form to submit a press release or product announcement to BYTE. Copy text into each of the fields from your GUI word processor. If you supply HTML, we'll use it. If not, we'll automatically convert URLs into links.
The hit list includes BYTE articles and vendor-supplied info.