Webinfo Information
Overview
Webinfo provides graphical and textual summaries of HTTP server data
both for browsers that do and do not support forms.
Browsers without Forms Support
- Plot previous three months connections by day
- Plot previous three months hosts by day
- Provide textual summary of previous day's data
Browsers with Forms Support
- Plot connections or hosts over arbitrary time periods by day
(365 days maximum)
- Plot connections or hosts over arbitrary time periods by month
(10 years maximum)
- Plot connections or hosts over arbitrary time periods by year
- Restrict data plotted by host domain, host name (by regular expression),
or URL name (by regular expression).
- Provide textual summary of data for any day, month, or year, or since
the beginning of service.
Other Features
- Mapping of URLs to logical names by regular expression mapping
- Transparent GIFs
First select the radio button for the type of plot desired,
and then choose from the menu items for that type. You can
currently plot either connections
or hosts by day, month, or year
for certain pre-selected intervals. If you desire a different interval,
choose the last radio button and proceed to another form.
Plots are made using gnuplot and have the MIME type image/gif.
These may either viewed externally or inline depending upon your browser.
Enter the starting and ending dates for the range you wish to plot.
Enter the day of the month in the text box, and choose a month and year
from the pop-up menus.
If the starting and ending dates are not specified, they will be set
to the first and last days of the months specified, respectively.
The day field is ignored if you choose to plot usage by month, and both
the day and month fields are ignored if you choose to plot usage by year.
Choose whether to plot by the number of individual
connections or the
number of hosts connecting.
Choose whether to plot usage by day, month, or year over the
time range specified above.
You are restricted to plotting a maximum of one year's worth of daily data and
ten years worth of monthly data.
You can choose to restrict the data plotted in one of four ways.
- No Restrictions
- All connections or hosts
will be plotted - no restrictions apply.
- Restrict by Domain
- Select the organizational and geographical domains for which you
wish to see usage data. Only connections
or hosts from the domains
selected will be plotted.
- Restrict by Host
- Enter the name of a host for which you wish to see usage data. Only
connections from the
host specified will be plotted, but you may use
the wildcards '?' (any one character) and '*' (any sequence of
characters) to plot connections for more than one host at a time.
Note: Plots created using this option can take a long
time generate - on the order of minutes if a lot of data is
being plotted.
- Restrict by URL
- Enter the name of a URL for which you wish to see usage data, or make
a selection from the menu which lists the symoblic names assigned by
Webinfo to certain URLs. Only
accesses to the URL specified will be plotted, but you may use
the wildcards '?' (any one character) and '*' (any sequence of
characters) to plot accesses to multiple URLs.
The match is case sensitive. Some tips:
Note: Plots created using this option can take a long
time generate - on the order of minutes if a lot of data is
being plotted.
Use the radio buttons and menus to select whether to generate a
textual summary for a specific day, month, or year, or for the entire
history of the service.
Select whether to view a specific number of the most commonly
accessed URLs, or all of those whose number of accesses exceeds a threshold
value, which you can specify.
Finally, select whether to view a specific number of the most active
hosts, or all of those whose number of accesses exceeds a threshold value,
which you can specify.
This section shows overall usage for all domains.
There are two columns of output,
one labelled "Connections" and the other, "Hosts".
The former refers to the number of individual requests logged
by the HTTP server,
while the latter refers to the number of unique hosts
(an underestimate of the number of users)
connecting to this service.
The line labelled "Total" shows the usage for all domains.
The line labelled "Malformed" shows the number of malformed requests
that were logged by this server.
These are mainly generated by bugs in Web clients and servers.
Currently the following conditions account for a "malformed" request:
- Missing or erroneous time information in the log file entry (date, month, year, etc.)
- Missing or incorrect host name in the client field
- Request URL is malformed (has 8-bit characters or is not fully specified)
This section shows usage broken down by top-level "organizational" domains.
There are two columns of output,
one labelled "Connections" and the other, "Hosts".
The former refers to the number of individual requests logged by
the HTTP server,
while the latter refers to the number of unique hosts
(an underestimate of the number of users)
connecting to this service.
This section shows usage broken down by top-level "geographical" domains.
There are two columns of output,
one labelled "Connections" and the other, "Hosts".
The former refers to the number of individual requests logged by
the HTTP server,
while the latter refers to the number of unique hosts
(an underestimate of the number of users)
connecting to this service.
This section shows usage broken down by class of IP address for
hosts
in the log that didn't have a reverse mapping to a fully qualified
domain name as determined by the HTTP server.
Class A addresses generally represent networks with large
numbers of hosts,
for example,
MILNET has the network address 26.
Class B represents medium sized networks with fewer than (roughly) 65,000 hosts.
These are generally university or government institutions,
for example, NLM (U.S. National Library of Medicine)
has the network address 130.14.
Class C addresses represent small networks with fewer than 254 hosts,
for example: small companies, local libraries, etc.
Class D addresses are used for multicasting.
There are two columns of output, one labelled "Connections" and the other,
"Hosts".
The former refers to the number of individual requests logged by the
HTTP server,
while the latter refers to the number of unique hosts
(an underestimate of the number of users)
connecting to this service.
This section shows the most frequently accessed documents on the
server. The column titled "Frequency" lists the number of times each
document was accessed in the time period summarized, and the column
titled "URL" lists the URL for the document.
Some documents do not have a URL listed, but are referred to by a
symbolic name, defined in the Webinfo configuration file. Several
documents can be mapped to one symbolic name.
Where a URL is listed, it is anchored to the document it references. Note
that there is no assurance that the URL will still be active, since this
is essentially historical information. Where a symbolic name is listed,
it is anchored to a page describing the mapping between symbolic names
and actual URLs.
This section shows the most active hosts
accessing this server. The
column titled "Frequency" lists the number of times each
host accessed
a document on this server during the time period summarized, and the
column titled "Host" contains the names of the hosts.
This page shows how Webinfo maps URLs to symbolic names. Any access to
a URL which matches the (perl-format) regular expression in the "From"
column will be recorded by Webinfo as an access to the corresponding
symbolic name in the
"To" column. Multiple URLs can be mapped to the same symbolic name.
The symbolic names are anchored to a document specified in the configuration
file. Thus, a symbolic name can refer to a group of documents, but can
contain a link to the most important one, such as the table of contents.
Although one URL cannot be mapped to multiple symbolic names, it is
possible to collect aggregate data from a number of them by using
consistent names and wildcard matching with the "
Restrict by URL"
option on the user-designed plot page.
A connection is an individual request to the HTTP server that is logged.
This can currently be a request of the type GET,
POST,
or HEAD.
Note that each inlined image within a document results in
an independent connection.
Thus, to retrieve a document with two inlined images,
there would be three connections
(one for the document and one each for the images)
if there was no client caching.
In some sense this is an underestimate since all
clients are able to cache images and documents.
So if the user reloads a cached document it will not be logged on the server.
In another sense, it is an overestimate of how useful the server is,
since directory and image accesses are given the same
weight as document access.
A useful first cut at measuring how many people are accessing your documents
is the number of individual hosts accessing your server.
While there is not generally a one-to-one correspondence between
hosts and users,
especially with proxies, firewalls and the like, it is a good start.
Webinfo keeps track of this host information by sorting and merging (gigantic)
lists of hostnames.