Updated: March 13,1996 | Go To TechNet Home Page |
Getting Wired Into the Internet: A Crash Course on FTP, Gopher, Web, and More
"Acceptable Use Policies" for the Internet
The Internet is many things to many people. For some, it's a social gathering place; for others, it is a critical resource for collaboration and communication, an electronic mail backbone, or a rich source of information. The most important aspect of the Internet is that it allows more than 20 million people (a number that's growing rapidly) to exchange ideas and information in new and innovative ways.
As more people connect and the technology advances, the Internet becomes more and more exciting. Five years ago, the coolest applications were electronic mail and file transfer. Today's Internet is a whole new world with advanced information systems and incredible volumes of content. The next wave of applications may well be focused around real-time collaboration such as audio and video conferencing. If you are not already wired into the Internet, now is the time to get connected. Here we'll explore the structure of the Internet and then examine some of the more popular Internet services including File Transfer Protocol (FTP), Gopher, and the World Wide Web. We'll also show you a sampling of the current set of applications and services available on the Internet for users of Microsoft® Windows® operating system, and how those services are implemented.
An internetwork is simply a collection of networks. The Internet refers to a specific collection of networks around the world linked together using Transport Control Protocol/Interface Program (TCP/IP) protocol suite. The Internet has become so ubiquitous that sometimes it is simply referred to as the Net.
The Internet began in the late 1960s as a research project sponsored by the U.S. government through DARPA, the Defense Advanced Research Projects Agency. Today the Internet has grown to more than 2 million hosts (computers connected to Internet) in over 130 countries. In addition to the DARPANET, the U.S. portion of the Internet includes Milnet, NASA Science Internet (NSI), and the largest of these networks, NSFNet.
As recently as the late 1980s, the Internet was considered by many as something for computer geeks to entertain themselves with. Although the Net did provide a foundation for collaborative research among universities for many years, only recently has the Internet become a part of business and commerce, especially in high-tech industries. The Internet is quickly finding its way into the nation's secondary schools with programs such as the NSF-sponsored Global Schoolhouse, which is connecting high school students around the world with Internet technologies.
One of the most useful aspects of the Internet is that it's a great means for the dissemination of information about itself and related software. Since the Internet is so dynamic, much of what is written about it (including parts of this article) is out of date by the time it's printed. For this reason, we recommend that you refer to the referenced sources on the Internet for the latest specifications and data. We'll give you locations and resources where you can find more recent information with Uniform Resource Locators (URLs). URLs are discussed in more detail in the World Wide Web section.
Interconnecting the millions of systems on the Internet today would not be possible without a set of standard protocols. Each Internet standard is described in a document called a Request For Comment (RFC). RFCs date back to 1969 and are the working notes of the Internet research and development community. Generally, an RFC is a description of a protocol, procedure, or service; a status report; or a summary of research. There are approximately 1,600 published RFCs to date.
RFCs are reviewed by the community of Internet users before they become standards. Most of the standard protocols on the Internet got started as RFCs. They are considered public domain documents and, with a few exceptions, are available online from several repositories. RFCs are numbered sequentially. Once given a number they're never revised; new versions of the documents are issued instead. More information on RFCs and the Internet protocol standards process can be found in RFC 1310. Later, I'll show you how to find RFCs on the Internet.
Transmission Control Protocol/Internet Protocol (TCP/IP)-one of the suite of protocols proposed in RFC 1310-combines a number of different protocols across several network layers. Traditionally, networks are defined in layers that allow for the separation of functionality. It is easiest to view these layers as the sequence of steps a data packet follows as it makes its way from one computer in the network to another. For this article TCP/IP is best represented as a four-layer system (see Figure 1).
Figure 1: TCP/IP Protocol Stack
Between each layer there is generally an Application Program Interface (API) or convention for interpreting messages, or packets, as they pass between layers. Under Windows, you write to the Windows Sockets API, which is shown in Figure 1 even though it's not part of the TCP/IP protocol suite. Using the Windows Sockets API frees you from having to worry about all the layers beneath you, including how TCP packets are passed through IP and then on to the network card in the link layer. Windows Sockets allows applications written to it to run over different vendors' TCP/IP implementations. For more on Windows Sockets, see "Plug into Serious Network Programming with the Windows Sockets API," Microsoft Systems Journal (July 1993).
The link layer is most often associated with the device driver or the network card interface. The primary role of the link layer is to determine how to get data packets onto the physical network. Sometimes this includes interfacing with a traditional network card, such as Ethernet, and other times this includes interfacing with serial lines, such as a modem. Two important link layer protocols are Serial Line Internet Protocol (SLIP) and Point-to-Point Protocol (PPP). Both of these allow two machines to connect to the Internet using TCP/IP protocols, but over a standard dial-up phone line and modem instead of non-permanent network medium like Internet.
The network layer has the primary responsibility for moving packets around the network. In the TCP/IP suite, the protocol that defines this behavior is the Internet Protocol, or IP. Since this is the layer that's concerned with routing information packets around the Internet, it's where unique Internet addresses become important. Before we look at the transport layer, let's look at just how network addresses are created on the Internet.
Most of you have probably seen an Internet electronic mail address before. For example, BARNEY@MICROSOFT.COM is an e-mail address for the user "Barney" on the Internet system MICROSOFT.COM. Although friendly system names like MICROSOFT.COM (called domain names in the Internet) are great for people, they do not work so well for network traffic where routers need to move packets efficiently to the appropriate network. For this reason, each host on the Internet must have a unique 32-bit address, commonly referred to as an IP address.
Each host must have a globally unique IP address, though not necessarily a domain name. To make the look-up and routing of packets more efficient, this address is a structured 32-bit field divided into four numeric fields. Each field must be a number less than 256 and is separated by periods. For example, the Microsoft FTP server is FTP.MICROSOFT.COM, which has a unique address of 198.105.232.1.
To translate these 32-bit unfriendly names into domain names, the Internet uses the DNS (domain name system). The DNS is essentially a distributed static database that maintains a mapping of domain names to IP addresses specified by network administrators. The DNS organizes the names of hosts in a hierarchical fashion, much like a file system. The top names in the hierarchy are called top-level domains, and include the classifications that you are familiar with in the U.S. along with two-letter country codes for the rest of the world (see Figure 2). The second-level domain (MICROSOFT in the example) is the portion of the domain name administered by the Internet Network Information Center (InterNIC). The remainder of the domain name is administered by the network administrator of the subnetwork and is often broken up into additional zones to ease administration. The administrator is also responsible for setting up name servers to handle requests for resolving domain names to addresses for that zone. The specification of the DNS protocol can be found in RFCs 1034 and 1035.
Organizational Domains (Most Commonly Used in the U.S.)
Domain Coverage com commercial organizations edu educational institutions gov governmental agencies (U.S.) mil military departments (U.S.) net network providers org other organizations
Some Geographical Domains
Domain Coverage uk United Kingdom au Australia de Germany ch Switzerland
Returning to the protocol layers, the transport layer is responsible for regulating the flow of data between networked machines running IP. In the TCP/IP suite, two different protocols are used in the transport layer. One is TCP, and the other is the User Datagram Protocol, or UDP. TCP is a connection-oriented protocol designed to tolerate an unreliable link between the two machines. Originally, the Internet used a protocol that assumed a reliable connection. As DARPANET began to include remote hosts and use a variety of link layers, it became clear that this could no longer be assumed. TCP basically takes a user message of any length, breaks it up into pieces less than 64KB, and passes them onto the network layer for routing and delivery. To guarantee correct delivery, TCP uses acknowledgment messages, understands time-out delays, and uses checksums. Because TCP does all of this work, the application layer can just assume that the data it wishes to send to another machine will make it through correctly. Most applications use TCP for this reason.
At the other end of the reliability spectrum is UDP. UDP provides a simple interface for applications to send user data, called datagrams, through the network. Unlike TCP, UDP does not guarantee that the information will make it to the other machine on the network. Thus, applications using UDP must include their own code to ensure reliability.
The bulk of this article is concerned with the application layer, which is where all the fun takes place. Here "application" refers to a particular protocol built at the application layer of the TCP/IP suite. For any given application there will likely be dozens of implementations on each platform. Even under Windows there will likely be many different Windows Sockets-compatible applications to choose from, each with its own features and Graphical User Interface (GUI). There are literally dozens of TCP/IP application protocols (see Figure 3).
Figure 3: Sampling of Common Application Layer Protocols in the TCP/IP Protocol Suite
Protocol Translation Description Archie Catalog of the contents of 1000+ anonymous FTP servers disseminating the location of FTP retrievable files. DHCP Dynamic Host Protocol Auto-configuration service that Configuration allows a machine to obtain an address without a priori knowledge at boot time. DNS Domain Name System Distributed database that allows applications to map between computer names and IP addresses. Finger Returns information on users on a specific Internet machine. FTP File Transfer Copies files from one Internet machine Protocol to another. Gopher Protocol for distributed document navigation and retrieval. The sum of Gopher resources is sometimes referred to as "gopherspace". IRC Internet Relay Real-time text-based conversation Chat system. MUD Multi-User Dungeon Multi-player adventure game known to consume massive network bandwidth and programmer cycles. NFS Network File Provides transparent file access across System clients on the Internet. PEM Privacy Enhanced Protocol for electronic mail that Mail provides for encryption and authentication using RSA and DES. POP 2 and Post Office Protocol for the management of 3 Protocol electronic mail using store and forward. SMTP Simple Mail Governs the exchange of electronic mail Transfer Protocol between two message transfer agents (MTAs) on the Internet. SNMP Simple Network Protocol for remote administration of Management TCP/IP machines. Protocol Telnet Telecommunications Remote login between hosts running Network Protocol potentially different operating systems. USENET Uses NNTP Huge collection of messages and newsgroups arranged in a Network News Transport Protocol hierarchical bulletin board system. VERONICA Very Easy Indexed search to gopherspace to Rodent-Oriented Computerized Archives Netwide Index WAIS Wide Area Implementation of Z39.50 content Information Server indexing and information retrieval standard. Whois Who Is Returns information on an Internet user voluntarily registered with InterNIC. WWW (also World Wide Web Protocol for distributed hypertext Web or W3) document searching, navigation, and retrieval.
The proliferation of protocols is a by-product of the ease with which new protocols can be developed, disseminated, and established in the Internet community. Of course not all of these protocols are used in equal numbers. To give you some idea of how the current Internet traffic breaks down, let's look at some statistics.
Figure 4 shows how each of the major TCP/IP application services contributes to the NSFNet traffic flow. NSFNet is one of the busiest networks in the Internet. It's the backbone connecting many of the educational and research facilities connected today. As you might suspect, file download using FTP accounts for the largest percentage of traffic. Although most of us use e-mail frequently, the average message size is small enough that networked mail does not account for a large percentage of the traffic. If you were to look at this graph in terms of user interactions, ignoring packet size, you'd see networked mail accounting for a larger percentage.
Figure 4: Major Components of the NSFNet Traffic
It's interesting how much of the traffic is taken up by resolving names on the Internet. There is about a 6 percent overhead associated with name service lookups using the DNS, the distributed database of all of the Internet hosts. New applications like Web viewers and Gopher use DNS extensively (and sometimes poorly), clearly contributing to this statistic.
If you dive down another level and look at a specific month, March 1994, you can see how specific services are being used (see Figure 5). An average busy month in the world of the Internet, March 1994, saw over 14,024,028,116,050 bytes travel over just the NSFNet. These bytes were packaged up into 69,552,904,950 packets, giving an average packet size of about 202 bytes.
Figure 5: Service Usage on NSFNet for March 1994
The file transfer protocol (FTP) is the most widely used TCP/IP application protocol in terms of traffic. This protocol can copy files from one machine to another. This is not the same as being able to remotely access a file, which is provided by another application layer service. The FTP protocol is analogous to an application like Kermit or XModem, though it also includes some navigational functions. Nearly every commercial TCP/IP offering includes some form of FTP support.
The definitive reference for the FTP protocol is RFC 959, which we show you how to obtain in Figure 6. This example uses the FTP.EXE that comes with Microsoft Windows NT, which is also available with the Microsoft TCP/IP-32 for Windows® for Workgroups product, currently in beta (FTP TO FTP.MICROSOFT.COM/PEROPSYS/WFW/TCPIP/VXDBETA/).
FTP> open nis.nsf.net Connected to nis.nsf.net. 220 nic.merit.edu FTP server (Version 4.76 Tue Apr 26 02:23:40 EDT 1994) ready. Name (nis.nsf.net:stevesi): anonymous 331 Guest login ok, send your email address as password. Password: ********************* 230- Guest login ok, access restrictions apply. 230- Local time is: Sun May 1 23:15:31 1994 230 Remote system type is UNIX. Using binary mode to transfer files. FTP> help Commands may be abbreviated. Commands are: ! debug mget pwd status $ dir mkdir quit struct account disconnect mls quote system append form mode recv sunique ascii get modtime reget tenex bell glob mput rstatus trace binary hash newer rhelp type bye help nmap rename user case idle nlist reset umask cd image ntrans restart verbose cdup lcd open rmdir ? chmod ls prompt runique close macdef proxy send cr mdelete sendport site delete mdir put size FTP> ascii 200 Type set to A. FTP> cd documents 250 CWD command successful. FTP> ls 200 PORT command successful. 150 Opening ASCII mode data connection for /bin/ls. total 61 -rw-r--r-- 1 nic merit 2300 Jul 31 1992 INDEX.documents drwxr-sr-x 2 nic merit 512 Mar 19 19:04 fyi drwxr-sr-x 3 iesg ietf 2048 May 1 07:12 iesg drwxr-sr-x 154 iesg ietf 3584 May 1 07:13 ietf drwxr-sr-x 2 iesg ietf 28160 May 1 07:13 internet-drafts drwxr-sr-x 2 nic merit 20480 Mar 25 22:27 rfc drwxr-sr-x 2 nic merit 1536 Mar 14 15:17 std 226 Transfer complete. FTP> cd rfc 250 CWD command successful. FTP> get rfc0959.txt local: rfc0959.txt remote: rfc0959.txt 200 PORT command successful. 150 Opening ASCII mode data connection for rfc0959.txt (148550 bytes). 226 Transfer complete. 152483 bytes received in 5.6 seconds (26 Kbytes/s) FTP> bye 221 Goodbye.
In Figure 6, we are using a command-line interface to FTP, which like all such interfaces, provides full power on one hand and plenty of room for mistakes on the other. Fortunately, there are several menu-driven GUI available to FTP (see Figure 7). We used the command-line interface because it's more common. In Figure 6 the characters entered by the authors are in italic.
Figure 7: Windows Sockets FTP
To transfer files with FTP you need several things: a client machine running TCP/IP, a client FTP application, a server running the FTP server process that you can connect to, and some idea of where the file you want is located. In this example, we are connecting to a server that we happen to know contains interesting documents for the Internet, perhaps through a URL that was passed in e-mail or in a Microsoft Systems Journal article.
To connect to a server, use the Open command, giving a complete host address as the argument. The host address can be either a string name, if you have a domain name server, or you can use the dotted-decimal address. As a courtesy, common FTP sites often give both their address and string name. In Figure 6 we are connecting to a server at the NSFNet administrative center. The server is called nis at the nsf second-level domain, in the net top-level domain.
Once we are connected to the server, the server responds asking for a login identifier. You can practically always use the login identifier
anonymous
if you just wish to download software. As a courtesy, you are asked to enter your fully qualified Internet mail address as your password. Although this used to be a convention, some (obnoxious) servers are now enforcing this by parsing your domain name from your e-mail address (all the elements to the left of and including the @) and then attempting to resolve your domain name to the address you're connecting from. This "feature" can be problematic if your e-mail address is on a different system or network from the one you're connecting from, or if you're fortunate enough to have "firewalled" access to the Internet at your organization. If the name cannot be resolved, the server will not allow you to log on; you should ask your local administrator what the appropriate domain name to use is.
Servers use this information merely to maintain a log. According to Internet use policies, administrators are not supposed to use this information for any purpose other than maintaining the security of their servers. You don't need to worry about receiving junk mail or people keeping track of your downloads for competitive reasons. The use of the anonymous login identifier is referred to as anonymous FTP.
Once you are connected to the server, you'll find you've entered a fairly archaic world where your only guidance are conventions that have been established over time. There are three key areas to understand: the standard FTP commands, the organization of files and directories, and the common file formats.
If you are using a GUI front-end you'll have easy access to menu commands to guide you along. If you're not, you'll need a few basic commands for keyboard interaction. In general, FTP commands resemble a subset of the traditional Novell, Inc. UNIX® shell commands (see Figure 8). The first command we entered was help, which, unlike most command shells, actually gets us some assistance. We received a list of all possible commands available on the server.
Figure 8: Common FTP Commands
Command Description help Print a list of available commands. Sometimes you can get further help on a command by trying the help command (this depends on your client). open server Initiate a connection with a system named server. ascii Sets the current transfer mode to ASCII characters, which includes CR/LF translation. binary Sets the current transfer mode to binary, which will transfer a byte image of the file. cd Change directory on the server. You can use ".." to go up a directory. lcd Change the local directory. This allows you to change the current directory, which is usually where downloaded files are stored. close End the current connection and remain in FTP. bye End the current connection and exit FTP. ls filename List a directory, or the attributes of filename. get sfile Download a file named sfile. If you do not specify a dfile file, you will generally be prompted for one. It is sometimes desirable to rename a file during transfer by specifying dfile as well. mget list Download a group of files in list, which may include server-specific wildcard matching. put file Upload a file. If you do not specify a file, you will be prompted for both a remote name and a local name. Note that with anonymous FTP, you will often have limited or no write permissions.d mput list Upload a group of files in list, which may include client-specific wildcard matching.
We entered the ASCII command, explained further below. Then we used the cd command to change directories and put us into a new directory on the host without affecting the local directory. The get command copies a file from the host to the local machine. In the command-line FTP client, the file is copied to the current directory. You can use the lcd command to change the local directory prior to downloading a file if you prefer. The get command usually provides some feedback on how large the file is and some approximation of the time of download. In GUI front-ends you might see a more elaborate progress indicator.
Two important items to note include the ASCII/binary commands and the need to map filenames between different operating systems. The FTP protocol includes a facility to translate text files between different operating systems; for example, systems that require both a CR and LF character to terminate a line. This translation is handled by the client-server interaction, since a standard on-the-wire representation for ASCII was incorporated with the FTP specification. Unfortunately if you download a compressed file in ASCII mode, you will not be able to decompress it since the file will likely have been corrupted. As per the specification, ASCII mode should be the default mode of the server, so be careful to switch to binary. Some of the more clever GUI front-ends to FTP automatically attempt to switch modes based on the file extension, which is a great help.
You might have noticed the feedback from the server in the example that indicated both the remote and local names. In this case they were the same. However, many public FTP sites are UNIX hosts that use long filenames. These filenames must be translated to 8.3 names on a File Allocation Table (FAT) file system; sometimes the client software will do this automatically. As you might suspect, a corollary to this rule is that filenames are usually case sensitive on the server, so be careful.
Files are maintained on public FTP sites in a number of standard formats. The most common is the PKZIP format. Files compressed with PKZIP will have a ZIP extension and can be decompressed with PKUNZIP.EXE. Sometimes the files are self-extracting PKZIP archives; these are .EXE files that you download and run to decompress. Some other formats include ARC files, which require use of ARC.EXE. These utilities can usually be found in the bin directory of any FTP host.
Normally there is a full index or directory listing for each subdirectory. For example, the host mentioned in Figure 6 has the file INDEX.DOCUMENTS, which is a complete description of the contents of the documents subdirectory. If you need descriptions or if the directory listing is very long, you might download that file, open it in a text editor, and then decide which files to download rather than wasting time and bandwidth by randomly downloading files with awkward names.
You might have noticed that in the sample FTP exchange, I knew what server to connect to and where to go to find a file. You'll often see a reference made to a particular file in a message or an article you've read and then pull the file down using FTP. Relying on word-of-mouth can be frustrating, though.
An attempt has been made to catalog files on a number of anonymous FTP sites using a TCP/IP application called Archie. Archie provides a simple mechanism for locating files across more than 1,000 public hosts. But this application provides only limited help: the indexes Archie relies on are built from filenames rather than content, so you need to know at least part of the filename that you're interested in locating. Figure 9 shows an example of WinSock Archie being used to locate the PKZIP utility that we talked about earlier. Even though I neglected to provide a URL reference, Archie was able to find the file. Now we can go to FTP.DEMON.CO.UK to get a copy.
Figure 9: WinSock Archie
For Windows-based developers connected to the Internet, there are a number of anonymous FTP sites to which you might want to connect. These include: FTP.MICROSOFT.COM, FTP.CICA.INDIANA.EDU, SUNSITE.UNC.EDU, and EMWAC.ED.AC.UK. These hosts maintain large libraries of Microsoft Windows files, bitmaps, applications (freeware and shareware), Windows Sockets utilities, and even sample source code.
The FTP protocol has worked well for quite some time, yet it is still a rather limited application. Even with the best GUI front-end, FTP is fairly cumbersome to navigate, and filenames are often cryptic. In order to simplify the process of locating documents distributed over geographically dispersed hosts, the University of Minnesota Microcomputer Center developed the Gopher protocol. Gopher is an application layer client-server protocol for distributed document search and retrieval. Details of the protocol specification can be found in RFC 1436:
ftp ds.internic.net/rfc/rfc1436.txt
A Gopher client (see Figure 10) probably looks rather familiar to Windows File Manager users. The client shown is a public domain application known as HGopher, available on the Internet:
ftp ftp.cica.indiana.edu//pub/pc/win3/winsock/hgoph24.zip
Figure 10: Gopher for Windows NT
HGopher is just one of at least a dozen Gopher front-ends, ranging from free to low-cost to commercial, though most have similar feature sets because the protocol is simplistic and easy to implement.
A Gopher client basically displays a list of objects, generally documents and directories. This virtual file system is known as gopherspace-the items in it can be physically located in widely dispersed locations not known a priori to the Gopher user. The client, using a standard file system metaphor, shows a directory listing consisting of files and directories, usually with an iconic hint as to the item type. You can imagine how much easier this model is for new users than FTP. For example, the client would show a file folder or an arrow to represent a local subdirectory or a link to another server, as well as specific document icons for various document types. By double-clicking or using another suitable interface action, the user navigates through the directory hierarchy. When a leaf node is reached, the user can open the document causing the client to download and display the document's contents. Since Gopher's content is in a variety of formats, most clients simply spawn an appropriate document viewer such as Notepad or PaintBrush to display the acquired data or image.
The simplicity of the Gopher protocol between client and server makes it easy to create a server, write client applications, and extend the protocol functionality. The essence of the protocol is shown in the following steps:
1. Client opens a TCP connection to a server using standard port 70.
2. Server accepts the connection and waits silently.
3. Client sends a "selector" or a single carriage return/linefeed to retrieve the root directory.
4. Server responds with a sequence of lines, which the client interprets and displays appropriately (where these responses include selectors).
5. Steps 3 and 4 are repeated, with the client sending selectors based on user actions.
During the conversation, the server maintains no state information about the client. This means the server can handle a large number of clients and can run on very minimal hardware-two key advantages.
Let's look at the above protocol steps in a little more detail, using the data that would be sent to create the display shown in Figure 10. This example assumes a fictional server GOPHER.MICROSOFT.COM. In this example we used the letter F to indicate a tab character, <CR> is a carriage return, and <LF> is a linefeed, to be consistent with the conventions in the RFC.
After the client establishes a connection and sends a blank line to request the contents of the root directory, the server will respond with several lines of data, terminating the message with a line that contains only a period. The data from the server is shown in Figure 11.
IA Roadmap for this Gopher server (bitmap)Froadmap.bmpFgopher.microsoft.comF70<CR><LF> 0A Roadmap for this Gopher server (ASCII Text)Froadmap.txtFgopher.microsoft.comF70<CR><LF> 0Frequently Asked Questions About Windows NT (ASCII Text)Ffaqnt.txtFgopher.microsoft.comF70<CR><LF> 5Frequently Asked Questions About Windows NT (Word Format)Ffaqnt.docFgopher.microsoft.comF70<CR><LF> 1Microsoft Product LiteratureFproductsFgopher.microsoft.comF70<CR><LF> 1Sample Win32 Source CodeFsampcodeFgopher.microsoft.comF70<CR><LF> 1The Microsoft KnowledgeBase CollectionFkbcollectFgopher.microsoft.comF70<CR><LF> 1Windows Sockets UtilitiesFwinsockFgopher.microsoft.comF70<CR><LF>
As you can see, the server responds with basic, albeit difficult to read, information. The most interesting part is the first character of each line, the type code. The example in Figure 11 shows the three most common standard type codes: a single document has the code 0, a directory has the code 1, and a binary file uses code 5. In addition, there are a number of agreed upon type codes. Figure 12 shows the most commonly used codes defined by the Gopher specification.
Figure 12: Selected Type Codes in the Gopher Protocol
Type Item Description Client Actions Character 0 File File is an ASCII text file that the client should display after receiving, using a text file browser such as Notepad. 1 Directory Client displays the title for each item and holds on to the selector for each item, should the user drill-down. 2 Phone Book server This exists for mostly historic reasons. The CSO protocol is used to look up an address entry in a university phone book server. 3 Error General place holder for error handling. 4 BinHex Macintosh Client prompts user for a location to file store file. Client might also display a Binhex icon and/or de-Binhex the file upon download. 5 MSDOS® binary Client prompts user for a location to archive store file. Client might also display an archive icon and/or de-archive the file upon download. The client can determine the archive type from the file suffix, such as ZIP. 6 UUENCODE archive Client prompts user for a location to store file. Client might also display a UUENCODE icon and/or de-code the file upon download. 7 Search server Client displays a query icon (that is, a question mark or search icon) and prompts the user for a query string. The syntax depends upon the server the user connects to. 8 Text-based TELNET Connect the user to another server using a TN3270 session or TELNET. Usually requires a user-supplied TELNET/TN3270 client application. 9 Binary file Client prompts user for a location to store file. The client might take special action based on the file type (suffix). g GIF file Generally the client will display the GIF (Graphic Interchange Format) using a user-supplied GIF viewer, such as WinGIF. I Image file Generally the client will display/play (sound, video) the image file using a user-supplied viewer (MPLAYER.EXE, for example). The viewer is usually determined from the filetype suffix.
For each code a well-behaved client will give the user a visual indication of the item type. In a character-oriented interface, it is common to preface directories with a /. In a graphical environment like HGopher, each type of item usually has a distinct icon, such as a folder for a directory. In addition, clients can choose to do additional work on behalf of users for some well-established types of documents. For example, a client receiving a binary document (type 9) with a ZIP suffix might choose to unzip the file for the user automatically after prompting for a download location.
You may have noticed that there has been no information about the size of the file or the number of packets to download. For the document types, the Gopher client will download information until the server closes the connection. Of course this is not optimal, especially over slow links. Some recent enhancements to the Gopher protocol (called Gopher+) include a provision to tell the client the length of an item in advance. Gopher+ also adds support for some richer document types.
Following the item type code in the server message is the user display string. This string is what the client should display to the user to indicate the contents of the document or directory; since it is the only information the user gets, it should be very clear. When you cruise GOPHERSPACE you'll see several familiar titles, as the Gopher community includes fairly common conventions. For example, most servers have a document entitled "About this Gopher Server" as the first in the root menu. Also, many servers have a directory "Other Gopher Servers Around the World" that will connect you to another Gopher in another physical location, thus creating a connected web of Gopher servers.
The remaining portion of the line contains the information used by the client to retrieve the item when the user requests it, usually with some action such as a double-click in a Windows-based application. In our example, the first item is a document with the selector ROADMAP.BMP. Although this might look like a filename and may actually be a file on the server, the client must opaquely pass this back to the server, making no assumptions based on the string. It's the responsibility of the server to maintain the selector string-to-item mapping. For example, the information might be maintained in a Microsoft SQL Server client-server database management system rather than a file, and the selector might be a query string. In addition to the selector string, there is a host domain name (say, GOPHER.MICROSOFT.COM) and an optional port number (if a nonstandard port is used, with port 70 used by convention and assignment). A Gopher server application simply responds to requests that come over the agreed-upon port. Although the requests result in directory enumeration, file downloads, and search execution, they are all provided using these opaque selectors. The protocol does not define how the server represents information internally. In fact, since the selector strings are completely determined by the server, any number of representations are possible.
The simplest server-and not a very friendly one-will simply enumerate directory entries. Depending on the item type, the server will create an appropriate string with the correct type code to return to the client. The title of the item, in the simplest case, could be just the file system name. Using NTFS this could be a friendly name; using FAT, it could be a less than friendly 8.3 filename. Increasingly elaborate representation schemes are possible depending on the level of control you wish to maintain over the title, item type, and other information.
If you wish to explore Gopher you can obtain an Internet connection and link to one of the well-known Gopher sites, such as GOPHER.BOOMBOX.MICRO.UMN.EDU, using a public domain front-end such as HGopher. If you wish to set up a server on your own local network without an Internet connection, just using TCP/IP, you can obtain the free GopherS (pronounced Gopher-ess) server application for Microsoft Windows NT Server (Windows NT) network operating system. Of course, this would mean only local network clients could use this service. GopherS is available from the European Microsoft Windows NT Academic Centre (EMWAC--not affiliated with the Microsoft Corporation) for both Intel® and Alpha-based systems running on Windows NT via anonymous FTP:
FTP emwac.ed.ac.uk/pub/gophers
The World Wide Web (or simply, Web) project started at CERN (the European Laboratory for Particle Physics) research labs in Switzerland. The Web is a wide-area hypermedia information retrieval initiative aiming to give universal access to a large universe of documents. What characterizes the Web most is that its protocols are a superset of many of the most common Internet application services, and that there has been tremendous growth in the diversity of information published in the Web format. Web servers exist for libraries, corporations, research scientists, and so forth, covering topics from aeronautics to molecular biology to MTV.
Before we talk about the plumbing details of the Web, let's look at an example of a Windows client for the Web (see Figure 13). This front-end, called Mosaic, is running under the beta 1 release of Microsoft Windows 95 operating system (Chicago) using Windows Sockets. Mosaic has become very popular, but it's important to remember that this is merely one implementation of the Web protocols. Mosaic also implements other protocols, so it can be used as a replacement for FTP and Gopher in addition to using it as a viewer of Web information. Mosaic is written using the Microsoft Foundation Classes in Microsoft Visual C++. Mosaic is currently available for a wide variety of platforms, including Microsoft Win32®. You can also run it on Microsoft Win32s® under Windows 3.1.
Figure 13: Mosaic Running on the First Chicago Beta
The creators of Web developed several application-layer protocols and a document-publishing standard. The three key concepts are URLs, the HyperText Markup Language (HTML), and the HyperText Transport Protocol (HTTP). This section will explore HTML and URLs. For additional information on the underlying HTTP transport protocol, refer to the information online in Web format:
info.cern.ch/hypertext/WWW/Protocols/HTTP/HTTP2.html
or
ftp info.cern.ch/pub/www/doc
In a hypertext environment, the key requirement is being able to globally represent any item of information or resource. For the Internet this is an especially hard problem, since the universe potentially consists of all Internet hosts. Out of the Web effort came the first stab at a formalized syntax, called Uniform Resource Locators (URLs), used to refer to globally available information. URLs are essentially an extension of a full pathname. URLs add a prefix that indicates the type of retrieval method to use, along with a host domain name indicating the physical location of the information. The remainder of the URL is the pathname to a document (usually), along with any retrieval options that might be allowed (most often an account name and password). URLs are quickly becoming the standard for representing locations on the Internet. Figure 14 lists some of the most commonly used retrieval schemes in URLs. Note that the URL convention does not define how a client operating system transforms a URL reference into an object, as this is left to the implementer. As you'll see, Web front-ends make extensive use of URLs. The standard reference to URLs can be found in Web format on
info.cern.ch/hypertext/WWW/Addressing/Addressing.html
Figure 14: Common Retrieval Schemes Used in URLs
Retrieval Description Scheme FTP Following the domain name is a full path used to retrieve the document or directory using the FTP protocol. HTTP Following the domain name is a full path to an HTML document to be retrieved using the HTTP protocol. Gopher Following the domain name is a valid Gopher selector to be used to retrieve a document or directory from a Gopher server. WAIS Following a domain name is a path to a WAIS server along with fields used to retrieve a document from a WAIS full-text index. TELNET Following the domain name is an optional user name and password to use when establishing an interactive session using TELNET. MAILTO Instead of a domain name, the string includes a full Internet mail address. The client should spawn a user's mail application and send mail to the address given.
Once you have a mechanism for implementing links, you need a mechanism for incorporating them into online documents. The creators of Web defined HTML to represent hyperlinked documents on the Web. An HTML document can contain graphics, rich text, sound, video, and links to other HTML documents all around the world. HTML is implemented as a Document Type Definition (DTD) in Standard Generalized Markup Language (SGML), which is a language for specifying grammars in a document. SGML does not define any formatting conventions, but rather lets you tag text structurally; you use it only in concert with a DTD, which essentially defines a style sheet or set of allowable structure elements in a document. Generally, SGML text resembles an off-line formatting language such as Rich Text Format (RTF) or troff, where the user edits a standard text file and later processes it to actually display the formatting. In order to publish rich hypertext information on the Web, you set up a Web server in a manner similar to setting up a gopher server (in fact EMWAC offers the HTTP server application needed to do this) and then populate the server with HTML documents linked together.
Looking at an example of an HTML document, you can see that the style sheet for HTML is fairly straightforward. Figure 15 shows the HTML for the Web home page (the root page on a Web server) for CERN. This example includes two graphics, several links, and a URL that lets you send mail to the Web project.
<HEAD> <TITLE> World-Wide Web Home </TITLE> <BODY> <IMG SRC="[unarchived-media]" Web Home </H1> Welcome to the web about the web, with everything you ever needed to know or pointers to it. This server is supported by the WWW team at <A HREF="[unarchived-link]" the originators of WWW, with the help of collaborators worldwide. <H2>About The Web Project</H2> Everything you need to know about the W3 is on the web. A few selected jumping off points: <UL> <LI> <A HREF="[unarchived-link]" definitive WWW project page</A> <LI> <A HREF="[unarchived-link]" list of client software, and documentation</A> <LI> <A HREF="[unarchived-link]" list of server software, and documentation.</A> <LI> <A HREF="[unarchived-link]" details, specifications, etc</A> </UL> <H2>Places to start browsing</H2> <DL> <DT><A HREF="[unarchived-link]" NAME=z0><IMG SRC="[unarchived-media]" The Virtual Library points to network resources, arranged according to subject area by specialists in each field. <A HREF="[unarchived-link]" Catalogue</A> <DD> A good searchable catalogue of web resources from CUI <A HREF="[unarchived-link]" catalogues</A> <DD> Other organizations of resources on the web. <A HREF="[unarchived-link]" NAME=z4><DT>List of servers</A> <DD> All registered HTTP servers by country <A HREF="[unarchived-link]" NAME=z1><DT>by Service Type</A> <DD> The Web includes data accessible by many other protocols. The lists by access protocol may help if you know what kind of service you are looking for. </DL> Corrections and suggestions to information on this server <A HREF="[unarchived-link]">In this example it's clear that HTML formatting commands come in pairs; they are basically analogous to codes used in, say, Novell, Inc.'s WordPerfect® or Microsoft's Rich Text Format. The first code turns on a formatting string, the second turns it off. Figure 16 lists the core set of HTML formatting codes. As with Gopher, HTML is receiving constant new enhancements and tweaks, so it is best to find the most recent online reference if you are implementing an HTML parser. In each case, the HTML DTD does not define how these elements display; this is the role of the client. The HTML specification, however, does provide suggestions for this. Rather than describing the HTML example mentioned above, we'll just outline some of the HTML codes and leave parsing the example as an exercise for you.
Figure 16: Selected HTML Formatting Tags
HTML Element Meaning TITLE Title of a document, which is displayed by the client prominently. IMG Inline graphic. The client is responsible for displaying it. The standard is to use GIF files. The SRC attribute indicates the location of the graphic. A An anchor marks the beginning of a hyperlink. Following the A tag is usually a qualifier, such as HREF, which is a reference to another document. The NAME attribute allows an anchor to itself serve as the destination of a hyperlink. H1, H2, H3, HTML supports six levels of headings. These headings H4, H5, H6 imply a change in font and/or emphasis, as in the sections of a document. For example, H1 is usually a large bold font, centered across a line. DT, DD Glossary entries used to indicate a list of definitions. DT is the "term" and DD is the "definition". These are often used just to show lists. UL Defines a list, which is a sequence of paragraphs. There are several types of list items, each with a different suggested display. UL is generally a bulleted list. OL is generally a numbered list. DIR is a list of short elements, perhaps used to display the results of an FTP directory listing. P Marks a new paragraph. B Boldface. I Italic. U Underline.Authoring information in this HTML format is tedious. Fortunately there are a number of public domain tools, and even some commercial tools on the way, that make this process a little easier. For example, CU_HTML.DOT, a set of macros for Microsoft Word word processor, automates the insertion of the control codes and the creation of the URLs needed for links. CU_HTML.DOT is available via FTP from a number of sites including FTP.CUHK.HK/PUB/WWW/WINDOWS/UTIL/CU_HTML.ZIP. The set of macros was written by the Computer Services Centre of The Chinese University of Hong Kong. There are also some dedicated authoring environments, such as HTML Writer from Kris Nosack, available on FTP.BYU.EDU/TMP/HTMLWRIT.ZIP.
Getting Connected
At this point you may be wondering how you go about getting connected. Many corporations and organizations have direct links to the Internet. Ask your local administrator what you need to do if you have a direct Internet feed at your organization.
For the home user or employee of an unconnected organization, you'll probably have to rely on a low-cost, low-bandwidth connection through a local provider. Over 100 Internet providers in the U.S. offer some form of service to the Internet; most offer 9600/14.4KB service via modem. Many providers are either evaluating or in the process of extending their service to ISDN, switched 56KB, or to other higher-bandwidth technologies.
Peter Kaminski has put together a list of public access Internet providers called PDIAL. To obtain the list, send an e-mail message to
info-deli-server@netcom.comwith the Subject: line reading
Send PDIALYou can also pick up the file from a number of anonymous FTP servers. Our glance with Archie returned over 50 locations.
Whether you're connecting locally or through an Internet provider, you'll need a TCP/IP stack. Since you're reading Microsoft Systems Journal, you're likely a Windows junkie so Windows Sockets support is a must. For dial-up access you'll need either SLIP or PPP support with your TCP/IP package. A FAQ (Frequently Asked Questions) document discussing stacks and applications is available both for free and commercially; you can get it at:
FTP FTP.cac.psu.edu:/pub/dos/info/tcpip.packagesTop Internet Connections
You probably have a favorite spot at your local library, newsstand, and bookstore. On the Internet, it will be no different. Figure 17 shows just a few interesting places to get you started exploring. Once you find them, you'll no doubt branch out to others that you find interesting. Many locations even maintain lists of their most popular connections.
Figure 17: Some of the More Useful (and Fun) Internet Hosts
URL Description HTTP://WWW.NCSA.UIUC.EDU Web home page for the NCSA. This is a very crowded host and you should probably avoid it, though Mosaic defaults to pointing here for its home page. HTTP://WWW.CENSUS.GOV Web home page for the U.S. Bureau of the Census. Lots of government statistics. HTTP://MTV.COM Web home page for the Adam Curry (of MTV) Web server about Music. HTTP://HTTP.UCAR/EDU Web home page for the National Center for Atmospheric Research. Lots of good information about the weather. HTTP://FATTY.LAW.CORNELL.EDU Cornell University Law School home page for World Wide Web. Generally a good starting place to cruise Webspace, and less crowded than the NCSA home page. GOPHER://FATTY.LAW.CORNELL.EDU Cornell University Law School Gopher root. A well-known Gopher site with lots of good information from the U.S. Government. GOPHER://BOOMBOX.MICRO.UMN.EDU University of Minnesota Gopher site. This is the mother of all Gophers and is often very crowded. FTP://SUNSITE.UNC.EDU FTP server that maintains archives of Windows-based applications. FTP://NIS.NSF.NET NSFNet administrative server includes information on NSFNet, as well as a repository for standards documents such as RFCs. FTP://INFO.CERN.CH FTP site for obtaining information about the Web, including client applications. FTP://FTP.NCSA.UIUC.EDU FTP site for obtaining the Mosaic front end. This host is very crowded, but don't worry. Mosaic is available all over the place (including cica). FTP://FTP.MICROSOFT.COM Microsoft FTP server including system software (drivers, etc.), Knowledge Base articles, MSJ code listings, and other technical information. FTP://FTP.CICA.INDIANA.EDU FTP server that maintains archives of Windows-based applications. This is a very popular server so you might not be able to get through all the time. FTP://EMWAC.ED.AC.UK/PUB FTP site for getting Windows NT-based Gopher and Web server services.Chicago and Daytona on the Internet
Finally, at this point you've got to be wondering what Internet-related software will be part of Chicago and Daytona. Unfortunately we can't divulge a whole lot, but we can say that these releases will include all the plumbing necessary to connect to the Internet. Windows 3.1 already includes LAN-based TCP/IP support, including client and server support for FTP.
Both Chicago and Daytona will extend the trend by including robust 32-bit network stacks for TCP/IP. When running under Chicago, the stack will be a 32-bit VxD. The key features of this software will include the following:
What does this mean? Both Chicago and Daytona will be great Internet clients, and with the built-in FTP server and the work done at EMWAC, Windows NT is already a great Internet server. All the plumbing you need to connect to the Internet is built into both operating systems. In addition to TCP/IP, both will also provide SLIP and PPP, or "dial-in support." This means that Chicago and Daytona will be Internet-ready, whether you dial in to a commercial Internet provider or you have access to the Internet via your corporate network over TCP/IP.
The Windows Sockets support will allow both Chicago and Daytona to support a large number of public domain tools like Mosaic, WinWais, and HGopher, both 16- and 32-bit versions. Chicago and Daytona will also include a set of base utilities to help you get started on the Internet, such as FTP and TELNET. Daytona will also run as a server for popular protocols such as Gopher and World Wide Web. Several of these server applications are available today for Windows NT 3.1, on Intel and Alpha microprocessors.
If you are really anxious to get a taste of the new VxD-based TCP/IP support for Chicago, pick up the "Wolverine" beta, an early version of the same stack for Microsoft Windows for Workgroups operating system with integrated networking. Although it does not yet have dial-up support, it is a great LAN version of TCP/IP for Windows for Workgroups 3.11 users. You can find it at
ftp ftp.microsoft.com/peropsys/wfw/tcpip/vxdbeta
The Microsoft Internet FTP server (FTP.MICROSOFT.COM) has been online since the day Windows NT 3.1 shipped last summer. The server features Microsoft KnowledgeBase articles, Resource Kits, software updates, and sample code from many different product groups at Microsoft. Figure 18 shows the growing popularity of this service. The present server is running on an Intel Corporation Intel® 486/50 system with Windows NT.
Figure 18: Volume on Microsoft's FTP Server
If you are interested in setting up your own server and you have Windows NT 3.1, all you need to do is install the TCP/IP support and add the FTP server. The FTP server service (FTPSVC.EXE) is included as part of the standard Windows NT product. Installation is facilitated through the Network Control Panel and is described in the online help.
Since the Internet does not have a supreme ruler, over time ad-hoc standards of behavior have developed. The guiding principles for these standards are known as Acceptable Use Policies. Most networks on the Internet have their own policy guides. Here is a reprint of the Acceptable Use Policy for NSFNet, the main backbone network in the U.S. It can also be found via:
FTP NIS.NSF.NET/ACCEPTABLE.USE.POLICIES/NSFNET.TXT
It's always a good idea to know what's acceptable on your network, as well as what's permissible by your network provider or employer.
1. NSFNet backbone services are provided to support open research and education in and among U.S. research and instructional institutions, plus research arms of for-profit firms when engaged in open scholarly communication and research. Use for other purposes is not acceptable.
2. Communication with foreign researchers and educators in connection with research or instruction, as long as any network that the foreign user employs for such communication provides reciprocal access to U.S. researchers and educators.
3. Communication and exchange for professional development, to maintain currency, or to debate issues in a field or sub-field of knowledge.
4. Use for disciplinary-society, university-association, government-advisory, or standards activities related to the user's research and instructional activities.
5. Use in applying for or administering grants or contracts for research or instruction, but not for other fundraising or public relations activities.
6. Any other administrative communications or activities in direct support of research and instruction.
7. Announcements of new products or services for use in research or instruction, but not advertising of any kind.
8. Any traffic originating from a network of another member agency of the Federal Networking Council if the traffic meets the acceptable use policy of that agency.
9. Communication incidental to otherwise acceptable use, except for illegal or specifically unacceptable use.
10. Use for for-profit activities, unless covered by the General Principle or as a specifically acceptable use.
11. Extensive use for private or personal business.
This statement applies to use of the NSFNet backbone only. NSF expects that connecting networks will formulate their own use policies. The NSF Division of Networking and Communications Research and Infrastructure will resolve any questions about this Policy or its interpretation.
© 1995 Microsoft Corporation.
THESE MATERIALS ARE PROVIDED "AS-IS," FOR INFORMATIONAL
PURPOSES ONLY.
NEITHER MICROSOFT NOR ITS SUPPLIERS MAKES ANY WARRANTY, EXPRESS
OR IMPLIED WITH RESPECT TO THE CONTENT OF THESE MATERIALS OR THE
ACCURACY OF ANY INFORMATION CONTAINED HEREIN, INCLUDING, WITHOUT
LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. BECAUSE SOME STATES/JURISDICTIONS DO
NOT ALLOW EXCLUSIONS OF IMPLIED WARRANTIES, THE ABOVE LIMITATION
MAY NOT APPLY TO YOU.
NEITHER MICROSOFT NOR ITS SUPPLIERS SHALL HAVE ANY LIABILITY FOR
ANY DAMAGES WHATSOEVER INCLUDING CONSEQUENTIAL INCIDENTAL, DIRECT,
INDIRECT, SPECIAL, AND LOSS PROFITS. BECAUSE SOME STATES/JURISDICTIONS
DO NOT ALLOW THE EXCLUSION OF CONSEQUENTIAL OR INCIDENTAL DAMAGES
THE ABOVE LIMITATION MAY NOT APPLY TO YOU. IN ANY EVENT, MICROSOFT'S
AND ITS SUPPLIERS' ENTIRE LIABILITY IN ANY MANNER ARISING OUT
OF THESE MATERIALS, WHETHER BY TORT, CONTRACT, OR OTHERWISE SHALL
NOT EXCEED THE SUGGESTED RETAIL PRICE OF THESE MATERIALS.
Click Here to Search TechNet Web Contents | TechNet CD Overview | Microsoft TechNet Credit Card Order Form At this time we can only support electronic orders in the US and Canada. International ordering information. |
Go To TechNet Home Page | ©1996 Microsoft Corporation | Go To Microsoft Home Page |