http://www.eit.com/software/getstats/getstats.html (World Wide Web Directory, 06/1995)
Getstats Documentation
getstats.c, version 1.2
What is getstats?
Getstats (formerly called getsites) is a versatile World-Wide Web server log analyzer. It takes the log file from your CERN, NCSA, Plexus, GN, MacHTTP, or UNIX Gopher server and spits back all sorts of statistics. It was originally written according to suggestions as given by Tony Sanders (sanders@bsdi.com).
Getstats is not in the public domain. However, you may distribute it or its supporting files and documents in any format freely was long as it is unchanged from the original distribution. Getstats may not be resold by itself or as part of a package without express permission of its author (me). Enjoy!
-- Kevin Hughes, kevinh@eit.com
- Major report types
- Usage
- Command-Line Options
- Internal Options
- The Getstats Form Interface
- Frequently Asked Questions
- Distribution and Compilation Notes
- Other Statistics Gathering Programs
Major Report Types
Currently there are twelve major types of reports this program can produce. You can use as many options as you like to create combinations of reports.
1. getstats -c (concise report)
HTTP Server General Statistics
Server: http://www.eit.com/ (NCSA)
Local date: Fri Feb 11 18:17:07 PM PST 1994
Covers: 02/09/94 to 02/11/94 (3 days).
All dates are in local time.
Requests last 7 days: 4495
New unique hosts last 7 days: 358
Total unique hosts: 358
Number of HTML requests: 1854
Number of script requests: 472
Number of non-HTML requests: 2169
Number of malformed requests (all dates): 5
Total number of all requests/errors: 4500
Average requests/hour: 90.2, requests/day: 2164.7
Running time: 11 seconds.
This basic set of statistics is always output when getstats runs. Using the -c option will only produce this statistics paragraph.
2. getstats -m (monthly report)
HTTP Server Monthly Statistics
Covers: 10/30/93 to 11/08/93 (9 days).
All dates are in local time.
Each mark (#) represents 1000 requests.
----------------------------------------------
Oct (10/30/93): 569 : #
Nov (11/04/93): 2 :
...
The -m option will produce a monthly report of server use. The dates in the report are the first day of reported activity for that month.
3. getstats -w (weekly report)
HTTP Server Weekly Statistics
Covers: 12/28/93 to 01/27/94 (32 days).
All dates are in local time.
Each mark (#) represents 500 requests.
----------------------------------------------
Week of 12/27/93: 1878 : ###
Week of 01/03/94: 5606 : ###########
Week of 01/10/94: 23287 : ##############################################
...
The -w option will produce a weekly report of server use. The dates in the report are always the Monday of that particular week.
4. getstats -ds (daily summary)
HTTP Server Daily Summary
Covers: 12/28/93 to 01/27/94 (32 days).
All dates are in local time.
Each mark (#) represents 1000 requests.
----------------------------------------------
Mon: 16018 : ################
Tue: 13219 : #############
Wed: 9904 : #########
...
The -ds option produces a daily summary, which shows the aggregate number of requests for a particular day of the week.
5. getstats -d (daily report)
HTTP Server Daily Statistics
Covers: 12/28/93 to 01/27/94 (32 days).
All dates are in local time.
Each mark (#) represents 100 requests.
----------------------------------------------
12/28/93 (Tue): 88 :
12/29/93 (Wed): 258 : ##
12/30/93 (Thu): 591 : #####
12/31/93 (Fri): 775 : #######
...
The -d option produces a daily report, which shows the number of requests per day and the date.
6. getstats -hs (hourly summary)
HTTP Server Hourly Summary
Covers: 12/28/93 to 01/27/94 (32 days).
All dates are in local time.
Each mark (#) represents 200 requests.
----------------------------------------------
midnite: 1266 : ######
1:00 am: 1206 : ######
2:00 am: 1238 : ######
...
The -hs option produces an hourly summary, which shows the aggregate number of requests for a particular hour.
7. getstats -h (hourly report)
HTTP Server Hourly Statistics
Covers: 12/28/93 to 01/27/94 (32 days).
All dates are in local time.
Each mark (#) represents 20 requests.
----------------------------------------------
12/28/93 (Tue)
3:00 pm: 39 : #
4:00 pm: 12 :
5:00 pm: 36 : #
...
The -h option produces an hourly report, which shows the number of requests per hour, the day of the week, and the total number of requests for each day.
8. getstats -f (full report)
HTTP Server Full Statistics
Sorted by number of requests.
Covers: 12/28/93 to 01/27/94 (32 days).
All dates are in local time.
# of Requests : Last Access (M/D/Y) : Hostname
----------------------------------------------
6994 : 01/26/94 : kmac
1751 : 01/26/94 : eitech
1096 : 01/27/94 : jhvh-1
...
The -f option tells getstats to create a full report sorted by host name (and IP address). Use the -fa option to make a full report sorted by the number of accesses, the -fd option to create a full report sorted by the last access date, or the -fb option to create a full report sorted by the number of bytes transferred.
9. getstats -r (request report)
HTTP Server Request Statistics
Sorted by number of requests, 1560 unique requests.
Covers: 12/28/93 to 01/27/94 (32 days).
All dates are in local time.
# of requests : Last Access (M/D/Y) : Request
----------------------------------------------
4260 : 01/27/94 : /eit.home.html
3330 : 01/27/94 : /graphics/stripe.bottom.gif
2831 : 01/27/94 : /graphics/ball.black.gif
...
The -r option tells getstats to create a report of requests sorted by the request name. Use the -ra option to sort by accesses, -rd to sort by the last access time, -rb to sort by the number of bytes transferred, and -rf to sort by individual file sizes.
10. getstats -dn (domain report)
HTTP Server Domain Statistics
1 level, sorted by domain name, 22 unique domains.
Covers: 02/09/94 to 02/10/94 (2 days).
All dates are in local time.
# reqs : # uniq : Last Access (M/D/Y) : Domain
----------------------------------------------
180 : 28 : 02/10/94 : (numerical domains)
27 : 1 : 02/10/94 : .at
28 : 3 : 02/10/94 : .au
22 : 2 : 02/10/94 : .ca
...
The -dn option generates a domain report, sorted by domain name. Use -da to sort by the number of requests, -dd to sort by last access date, -db to sort by the number of bytes transferred, or -du to sort by the number of unique domains. The unique domain number is the total number of unique sites under a domain. In the example above, for instance, a total of 3 unique sites came from the .au domain.
11. getstats -dt (directory tree report)
HTTP Server Tree Report
Covers: 12/28/93 to 01/07/94 (12 days).
All dates are in local time.
# of Requests : Last Access (M/D/Y) : Dir/File
----------------------------------------------
55 : 01/07/94 : /reports
51 : 01/07/94 : /ht93
562 : 01/07/94 : /demos
487 : 01/07/94 : /asiceda
...
The -dt option generates a directory tree report, which cannot be sorted. The number of requests and last request date for directories and files is displayed. The request count for directories is the amount of requests for that directory plus the sum of all requests for the files and subdirectories under it.
If you find this report is empty, try using the -dr option without specifying a directory. This will tell getstats to make a tree report without verifying that the files and directories reported in the log file actually exist.
12. getstats -e (file) (error report)
HTTP Server Error Report (All Dates)
----------------------------------------------
kmac [Thu Dec 30 23:20:21 1993] get / foo
kmac [Thu Dec 30 23:20:37 1993] get foo /
kmac [Thu Dec 30 23:20:55 1993] get http://www.eit.com/ foo
-e generates a report of all malformed (or ignored) requests for all dates in the order they were encountered in the log file. If a filename is given as the argument to the option, bad requests will be appended to an error file, where they can be analyzed later.
getstats -a (all reports)
The -a option will produce all of the above reports, with list reports sorted by the number of accesses, if possible. If you want a report sorted another way, however, specify the correct option after the -a flag.
example: getstats -a -fb
This will create all reports sorted by number of requests, with the exception of the full report, which is sorted by byte traffic, and the error report, which must be specified on the command line.
Usage
usage: getstats [-C,-N,-P,-G,-A,-O], -M, -c, -m, -w, -ds -d, -hs, -h,
-e ["file"], -a, -dt, [-f,-fa,-fd,-fb], [-r,-ra,-rd,-rb,-rf],
[-dn,-da,-dd,-db,-du], -dl #, -df "file",
-sa "string", -ss "string", -sr "string", -sp "string",
-sd "string", -sh "string", -sw "string",
-b, -i, -ip, -p, -ht, -t #, -dr ["dir"], -l "file"
options: No option gives the default report.
-C, -N, -P, -G, -A, -O
: use CERN, NCSA, Plexus, GN, MacHTTP,
or UNIX Gopher server log format
-M : use common logfile format
-c, -m, -w, -ds, -d, -hs, -h, -e, -a
: concise, monthly, weekly, daily summary, daily, hourly summary,
hourly, error, and all reports
-f, -fa, -fd, -fb : full report
: sorted by address, accesses, date, or bytes
-r, -ra, -rd, -rb, -rf : file request report
: sorted by request, accesses, date, bytes, or file size
-dn, -da, -dd, -db, -du : domain report
: sorted by domain, accesses, date, bytes, or unique domains
-dl : number of domain levels to report
-df : file to look up domain codes from
-dt : directory tree report
-sa, -ss, -sr, -sp : filter log by "string"
: only addresses, skip addresses, only reqs, skip reqs
-sd : report entries with date "m/d/y"
-sh : report entries with hour "h"
-sw : report entries with day "day"
-b : add byte traffic statistics to all reports
-i : take input from standard input
-ip : look up all IP addresses
-p : display progress meter
-ht : produce HTML output
-t : take top # lines of list reports
-dr : root Web/Gopher directory
-l : logfile to use
docs: http://www.eit.com/software/getstats/getstats.html
To see the usage, run getstats with a -z option.
Command-Line Options
-dr directory, -l file, -C, -N, -P, -G, -A, -O, -M (root directory, logfile and log type)
The -dr option tells getstats what your root Web or Gopher directory is. This information is needed in order to determine byte statistics.
example: getstats -dr "/usr/local/www"
The -l option specifies the log file to use. The -C, -N, -P, -G, -A, and -O options will tell getstats to expect the log file to be in either CERN, NCSA, Plexus, GN, MacHTTP, or UNIX Gopher format.
example: getstats -l my.ncsa.log -N
example: getstats -l my.plexus.log -P
The -M option tells getstats to expect the log to be in the "common" log file format, a standard that was agreed upon by the World-Wide Web community. If your log looks something like this:
www.eit.com - - [01/Jan/1994:10:30:00 +0000] "GET /test.html" 200 123
then your server is using the format. Include the option in the command line:
example: getstats -l cern.common.log -C -M
example: getstats -l ncsa.common.log -N -M
If you use NCSA's httpd 1.2 or later, or CERN's server version 1.16b or later, you are probably using the common log format.
-sa string, -ss string, -sr string, -sp string (address and request masks)
The -sa option will only report (IP or name) addresses matching the conditions in the mask string. The -ss option will skip addresses matching the string conditions. -sr will only report requests matching the string conditions, and -sp will skip report requests matching the string conditions.
For these four case-insensitive string masks, the following rules apply:
- You can use asterisks as wildcards in specifying the string, at each or both ends of the string.
- Masks without wildcards must match exactly.
- You can make lists of masks, separated by commas.
example 1: getstats -sa "*.com, *.edu" -ss "*.eit.com"
example 2: getstats -sr "*.html, *.gif" -sp "*secret*"
example 3: getstats -sa "*.*" -sp "/internal/demo.html"
example 4: getstats -ss "dopey, sneezy, grumpy"
- Sites from ".com" and ".edu" domains are reported, but anything coming from "eit.com" is skipped.
- HTML and GIF requests are reported, but any requests with the word "secret" in them are skipped.
- Addresses with a period in them are reported (this is useful for filtering local machines), and any requests exactly matching "/internal/demo.html" are skipped.
- Requests coming from "dopey", "sneezy", and "grumpy" are skipped.
-sd string, -sh string, -sw string (date, hour, and day masks)
The -sd option reports requests matching the date conditions in the string. In a similar way, the -sh option filters by hour, and the -sw option filters by the day of the week.
For these three case-insensitive string masks, the following rules apply:
- The date string must be in the format "m/d/y". Ranges in brackets, such as "[xx-xx]" can be applied, and asterisks as wildcards can be used only in place of an entire field.
- Other valid specifications for the date string: today, yesterday, thisweek, lastweek, thismonth, and lastmonth.
- The hour string must contain numbers from 0 to 23, with 0 being the hour from midnight to 1 am, and 23 being 11 pm. Ranges such as "xx-xx" can be specified.
- The day string must be a three-letter day of the week, such as "mon", "tue", and "wed". Ranges such as "xxx-xxx" can be specified.
- Other valid specifications for the day string: weekends and weekdays.
example 1: getstats -sd "*/[4-10]/93" -sh "6" -sw "mon"
example 2: getstats -sd "1/[5-30]/1993" -sh "5-17" -sw "wed-sun"
example 3: getstats -sd "[1-5]/*/[91-94]" -sh "-17" -sw "-thu"
example 4: getstats -sd "lastweek" -sh "15-" -sw "tue-"
- Only accesses that occurred from the 4th to the 10th on any month in 1993 on a Monday at 6 pm will be reported.
- Requests from January 5th to the 30th, 1993 are reported, as are requests from 5 am to 2 pm, on Wednesdays to Sundays.
- Requests from January to May 1991 to 1994 are reported, as are requests from midnight to 2 pm, on Mondays to Thursdays.
- Requests from last week are reported, as are requests from 3 pm to 11 pm, on Tuesdays to Sundays.
-i, -p, -ht (input and output)
The -i option will allow you to take input from standard input, so you can do things such as piping lines into getstats. This option is disabled for VMS platforms.
example: tail -100 my.log | getstats -i
-p displays a progress meter, so you can see where getstats is in its processing. The -ht option will report all reports in a single-page HTML format, with appropriate links to server support URLs and this documentation.
-dl number, -df file (domain report options)
The -dl option allows you to specify how many domain levels to report. For instance, with the number of levels set at 1, the domain report would look something like:
27 : 1 : 02/10/94 : .at
28 : 3 : 02/10/94 : .au
22 : 2 : 02/10/94 : .ca
18 : 2 : 02/10/94 : .ch
640 : 37 : 02/10/94 : .com
With the number of levels set at 2:
27 : 1 : 02/10/94 : .at
27 : 1 : 02/10/94 : .at.ac
28 : 3 : 02/10/94 : .au
28 : 3 : 02/10/94 : .au.edu
22 : 2 : 02/10/94 : .ca
18 : 1 : 02/10/94 : .ca.nrc
4 : 1 : 02/10/94 : .ca.uwo
The -df option allows you to specify a file with descriptions for domain codes, to make the domain report a bit easier to understand. A file with descriptions is available by FTP at ftp.eit.com
, in the /pub/web.software/getstats
directory. You can also get the list by selecting this.
With domain descriptions:
27 : 1 : 02/10/94 : Austria (.at)
28 : 3 : 02/10/94 : Australia (.au)
22 : 2 : 02/10/94 : Canada (.ca)
18 : 2 : 02/10/94 : Switzerland (.ch)
645 : 37 : 02/10/94 : US Commercial (.com)
13 : 2 : 02/10/94 : Germany (.de)
-b, -ip, -t number (byte reporting, IP lookup, and top lines)
The -b option will report byte traffic statistics in all reports, and an extra column in list reports will be added for byte traffic. In addition, the average number of bytes transferred per hour and day will be added to the statistics paragraph. Byte counts for each file are determined by getting the size of the requested file, the path of which is determined by the top web directory and the request. However, like many log analyzers, there are byte statistics that getstats can't report:
- Redirected URLs
- The sizes of files in personal HTML directories
- The sizes of scripts
- Old files that have been changed
- Requests that have been rejected (perhaps due to access control)
Counts for these requests, or counts for requests that can't be determined, are reported as zero.
The -ip option will make getstats attempt to look up host names from IP addresses. This feature slows processing time, but is useful in analyzing logs from servers that don't look up IP addresses, such as the CERN server.
The -t option allows you to specify how many top lines to report in full, request, and domain reports. Using this, you can easily generate "Top 10" lists and short summaries.
Internal Options
Every option available from the command line can be harcoded into getstats, so gigantic series of options don't have to be used all the time!
Getstats has a number of options that are not available from the command line which must be specified in the source code:
- The root directory of your web or gopher tree, the URL for your server, and aliases for empty and slash ("/") requests.
- The mark character for graphs, and the number of requests and bytes per mark for different reports. The default settings have been tuned for light server use, or about 200 to 1,000 requests per day.
- The character length to truncate graph reports and requests reports to.
- The option to display all dates in local or GMT time (the default is GMT time), and what time zone the logfile uses.
- Whether or not you want to show numerical domains in the domain report.
- Whether or not you want to report hours, minutes, and seconds of the last access time in non-graph reports.
- Whether or not you wish to show files and their related statistics as well as directories in directory tree report.
- Whether or not to check the actual files to see if they exist in the directory tree report.
- For HTML request reports, whether or not to display requests as selectable URLs.
- If you have a GN server, whether you want to report both Gopher and HTML requests or not.
- Whether or not you want to display an image in your HTML reports (like the one on this page).
- How dates are to be displayed - either as M/D/Y or D/M/Y.
The Getstats Form Interface
Thanks to Brian Behlendorf (brian@gw.wired.com), there's now a pretty nifty HTML form interface to getstats. With it, you can choose to have the results mailed back to you or displayed within your Web browser.
You can get the C program statform.c and the HTML form statform.html by FTP at ftp.eit.com
in the /pub/web.software/getstats
directory. Or download statform.c and statform.html directly.
To start using the form:
- Compile getstats with the "CGI" option (you'll need to edit the source code for this).
- Edit statform.c, specifying the default log file to analyze, the root directory of your Web or Gopher tree, and the path to your getstats program.
- Compile statform.c and put the executable where you store your other CGI programs.
- Edit statform.html and specify the URL that will execute the statform program.
- Make sure both statform and getstats are executable by your server.
- Load up statform.html and have fun!
Frequently Asked Questions
- Getstats crashes and burns when I run it.
- First, double-check that you've specified the correct format for your log file. If it's in the "common" format, add the -M option to the command line. If all else fails, and you think there may be a bad line in your log file, try to narrow down the section under which getstats crashes and email it to me so I can test it out.
- How can I use getstats on multiple log files?
- A good idea is to use the -i option on previously archived logs, for instance:
zcat log.*.Z | getstats -i
- Will getstats be rewritten in perl?
- It might be rewritten in perl when I learn perl! However, if someone would like to donate their time towards writing a perl version, please let me know so there's no duplication of effort.
- How do I run getstats as a CGI script?
- First, uncomment the "#define CGI" line in the source code and compile. To call getstats from a URL with options, use a question mark after the program name, plus characters for spaces, and "%22" for quotes in the command line, such as:
Normal example: getstats -a -fb
URL example: http://www.eit.com/cgi-bin/getstats?-a+-fb
Normal example: getstats -ss "*.com"
URL example: http://www.eit.com/cgi-bin/getstats?-ss+%22*.com%22
- Why is my directory tree report empty?
- If an actual root Web directory is specified, nothing will be reported if getstats can't find the files. Use the -dr option with no arguments to generate a report if no physical directory exists.
- Why are some files reported as directories in my tree report?
- Unless you tell getstats where the actual files exist, it can only make a "best guess" as to whether a file is a directory or not. Use the -dr option with your root Web directory to tell it where your files are. See the above documentation for more details.
- The reported errors from my "common" format log file look OK to me!
- The second to last field on each line indicates the status code the server returned. Codes not in the "2xx" range mean some sort of error occurred. You can get a list of these codes at http://info.cern.ch/hypertext/WWW/Protocols/HTTP/HTRESP.html.
- Can I run getstats with my Windows NT server?
- It appears that the EMWACS Windows NT server uses the CERN format. For their server, include the -C option in the command line.
Distribution and Compilation Notes
This program compiles fine under gcc and runs under SunOS 4.1.3 (Solaris 1.1) with no problems. Of course, you'll have to define the internal options before you compile it.
You can get the program by FTP'ing ftp.eit.com
and going to the /pub/web.software/getstats
directory. Some files you'll need for compiling under VMS, as well as the getstats icon (which you can use in your HTML reports) are there also.
Other environments that getstats has reportedly compiled and run under with no problems are Solaris 2.3, DEC Ultrix, DEC OSF/1 V1.3 in Alpha AXP, SGI with Irix 4.0, AIX, NeXTstep, HPUX, A/UX, and VMS (with or without UCX). If you have compiled successfully on another platform or experience problems, please let me know.
Some nice examples of what people have done with getstats (without or without statform) are below:
Wanted!
- Code to generate graphs (via gnuplot and/or pbm utilities) either internally or from completed reports.
- Suggestions for features that are most needed for administrators.
- Ways I can speed up getstats and/or make it more efficient.
- Comments, improvements, patches, and suggestions are greatly appreciated - you can send them all to kevinh@eit.com.
Other Statistics Gathering Programs
Of course, getstats doesn't do everything, and depending on your needs you may want to look into the following programs:
- WWWstat, a full-featured NCSA server log analyzer written in Perl.
- wusage, a C program that can generate graphs and nifty graph icons along with basic CERN and NCSA log statistics.
- WebReport, a Perl NCSA log analyzer that comes with NCSA's httpd.
- gnlog, a Perl GN log analyzing program.
- glog, a Gopher log analyzer that can sort statistics in many ways. This and other Gopher utility software is available at the
boombox.micro.umn.edu
Gopher.
Last update: 5/18/94