hide random home http://www.nlm.nih.gov/images.dir/webinfo-help.html (Einblicke ins Internet, 10/1995)

Webinfo Information

Overview

Webinfo provides graphical and textual summaries of HTTP server data both for browsers that do and do not support forms.

Browsers without Forms Support

Browsers with Forms Support

Other Features


Context Sensitive Help

Graphical Usage Summaries

First select the radio button for the type of plot desired, and then choose from the menu items for that type. You can currently plot either connections or hosts by day, month, or year for certain pre-selected intervals. If you desire a different interval, choose the last radio button and proceed to another form.

Plots are made using gnuplot and have the MIME type image/gif. These may either viewed externally or inline depending upon your browser.


Time Range

Enter the starting and ending dates for the range you wish to plot. Enter the day of the month in the text box, and choose a month and year from the pop-up menus.

If the starting and ending dates are not specified, they will be set to the first and last days of the months specified, respectively.

The day field is ignored if you choose to plot usage by month, and both the day and month fields are ignored if you choose to plot usage by year.


Data Type

Choose whether to plot by the number of individual connections or the number of hosts connecting.


Time Axis Units

Choose whether to plot usage by day, month, or year over the time range specified above. You are restricted to plotting a maximum of one year's worth of daily data and ten years worth of monthly data.


Domain and Host Restrictions to Apply

You can choose to restrict the data plotted in one of four ways.

No Restrictions
All connections or hosts will be plotted - no restrictions apply.

Restrict by Domain
Select the organizational and geographical domains for which you wish to see usage data. Only connections or hosts from the domains selected will be plotted.

Restrict by Host
Enter the name of a host for which you wish to see usage data. Only connections from the host specified will be plotted, but you may use the wildcards '?' (any one character) and '*' (any sequence of characters) to plot connections for more than one host at a time.

Note: Plots created using this option can take a long time generate - on the order of minutes if a lot of data is being plotted.

Restrict by URL
Enter the name of a URL for which you wish to see usage data, or make a selection from the menu which lists the symoblic names assigned by Webinfo to certain URLs. Only accesses to the URL specified will be plotted, but you may use the wildcards '?' (any one character) and '*' (any sequence of characters) to plot accesses to multiple URLs. The match is case sensitive. Some tips:

Note: Plots created using this option can take a long time generate - on the order of minutes if a lot of data is being plotted.


Textual Usage Summaries

Use the radio buttons and menus to select whether to generate a textual summary for a specific day, month, or year, or for the entire history of the service.

Select whether to view a specific number of the most commonly accessed URLs, or all of those whose number of accesses exceeds a threshold value, which you can specify.

Finally, select whether to view a specific number of the most active hosts, or all of those whose number of accesses exceeds a threshold value, which you can specify.


Overall Summary

This section shows overall usage for all domains. There are two columns of output, one labelled "Connections" and the other, "Hosts". The former refers to the number of individual requests logged by the HTTP server, while the latter refers to the number of unique hosts (an underestimate of the number of users) connecting to this service.

The line labelled "Total" shows the usage for all domains. The line labelled "Malformed" shows the number of malformed requests that were logged by this server. These are mainly generated by bugs in Web clients and servers. Currently the following conditions account for a "malformed" request:


Summary by Organizational Domain

This section shows usage broken down by top-level "organizational" domains. There are two columns of output, one labelled "Connections" and the other, "Hosts". The former refers to the number of individual requests logged by the HTTP server, while the latter refers to the number of unique hosts (an underestimate of the number of users) connecting to this service.


Summary by Geographical Domain

This section shows usage broken down by top-level "geographical" domains. There are two columns of output, one labelled "Connections" and the other, "Hosts". The former refers to the number of individual requests logged by the HTTP server, while the latter refers to the number of unique hosts (an underestimate of the number of users) connecting to this service.


Summary by IP Class

This section shows usage broken down by class of IP address for hosts in the log that didn't have a reverse mapping to a fully qualified domain name as determined by the HTTP server. Class A addresses generally represent networks with large numbers of hosts, for example, MILNET has the network address 26. Class B represents medium sized networks with fewer than (roughly) 65,000 hosts. These are generally university or government institutions, for example, NLM (U.S. National Library of Medicine) has the network address 130.14. Class C addresses represent small networks with fewer than 254 hosts, for example: small companies, local libraries, etc. Class D addresses are used for multicasting.

There are two columns of output, one labelled "Connections" and the other, "Hosts". The former refers to the number of individual requests logged by the HTTP server, while the latter refers to the number of unique hosts (an underestimate of the number of users) connecting to this service.


Frequently Accessed Documents

This section shows the most frequently accessed documents on the server. The column titled "Frequency" lists the number of times each document was accessed in the time period summarized, and the column titled "URL" lists the URL for the document.

Some documents do not have a URL listed, but are referred to by a symbolic name, defined in the Webinfo configuration file. Several documents can be mapped to one symbolic name.

Where a URL is listed, it is anchored to the document it references. Note that there is no assurance that the URL will still be active, since this is essentially historical information. Where a symbolic name is listed, it is anchored to a page describing the mapping between symbolic names and actual URLs.


Active Hosts

This section shows the most active hosts accessing this server. The column titled "Frequency" lists the number of times each host accessed a document on this server during the time period summarized, and the column titled "Host" contains the names of the hosts.


Webinfo URL Equivalents

This page shows how Webinfo maps URLs to symbolic names. Any access to a URL which matches the (perl-format) regular expression in the "From" column will be recorded by Webinfo as an access to the corresponding symbolic name in the "To" column. Multiple URLs can be mapped to the same symbolic name. The symbolic names are anchored to a document specified in the configuration file. Thus, a symbolic name can refer to a group of documents, but can contain a link to the most important one, such as the table of contents.

Although one URL cannot be mapped to multiple symbolic names, it is possible to collect aggregate data from a number of them by using consistent names and wildcard matching with the " Restrict by URL" option on the user-designed plot page.


Connection

A connection is an individual request to the HTTP server that is logged. This can currently be a request of the type GET, POST, or HEAD. Note that each inlined image within a document results in an independent connection. Thus, to retrieve a document with two inlined images, there would be three connections (one for the document and one each for the images) if there was no client caching. In some sense this is an underestimate since all clients are able to cache images and documents. So if the user reloads a cached document it will not be logged on the server. In another sense, it is an overestimate of how useful the server is, since directory and image accesses are given the same weight as document access.


Host

A useful first cut at measuring how many people are accessing your documents is the number of individual hosts accessing your server. While there is not generally a one-to-one correspondence between hosts and users, especially with proxies, firewalls and the like, it is a good start. Webinfo keeps track of this host information by sorting and merging (gigantic) lists of hostnames.