Site-Info

Site-Info: Manual


Contents:

1.   Introduction
1.1. The Structure of Site-Info
2.   Installation
2.1. Server Setup
3.   Usage
3.1. File Information
3.2. The Index Database
3.3. The Author Database
3.4. The Host Database
3.5. The External Index Database
3.6. The Forms-Mail Interface
3.7. Reports

1. Introduction

Site-Info can be used to recursively browse through the documents- directory tree of your WWW-Server. It will collect a set of information about each file which is stored in the index-database which is maintained only by the package itself. Information about authors and maintainers of documents that is found along the way is stored in an author-database, which can be edited by the site administrator. A summary of Index and author data which contains only a subset of the information stored in the index-database is stored in the external index-database. The contents of this database can be transmitted to other hosts using Site-Info, creating a common external index for those hosts. A host-database stores information about the hosts involved in the transmission of index data.

1.1. The structure of Site-Info

Site-Info consists of six perl-scripts which take care of general stuff (config, report) and service the different databases (index, author, external, host). There are also some html-pages in the distribution: an administration page which contains hyperlinks to the scripts and index pages and two pages for searching the index or external index. All other pages are generated by the individual scripts and either stored in the index directory or displayed directly.
Site-Info uses a simple Unix Database-manager library to store all of its information (e.g. dbm, ndbm, gdbm). To store several pieces of information about an author or a host, it joins these pieces together with asterisks (*) - if you use asterisks in the titles of your HTML-documents or in other information that is stored in the Site-Info databases, the package will probably get confused. Another thing to remember is that Site-Info will translate ampersands (&) entered in the "email" field of the author form into atsigns (@). I introduced this feature because NCSA-Mosaic has problems with this character on some X-terminals.

All the scripts (except config.pl) contain a set of functions which can be invoked by passing a command to them within the query-string of the calling url. Some of the scripts will take additional information out of the query string, others get it by interpreting fill-out-forms.
The general format for calling a site-info function thus looks like this:
http://server/site-info-root/script?function[?additional data]
You'd have to insert the correct values for your server, of course: server is to be replaced with the hostname of your www-server, site-info-root is the location of the site-info script directory on your server (probably /cgi-bin/site-info).
To update the index of files you would use this url:
http://www.yoyo.org/cgi-bin/site-info/index.pl?create
To get a report on the author "gonzo", you'd use:
http://www.yoyo.org/cgi-bin/site-info/report.pl?author?gonzo

Normally you don't have to call these functions by hand - everything you need is assembled on the administration and index pages, so using the package is as easy as selecting a hyperlink with your mouse.
If you'd like to experiment,though, you can find a list of all the functions supported by a module at the beginning of its sourcecode.
References to functions in the text below will only contain the function and (if necessary) additional parameters:
function[?data]
You'll have to substitute the leading part (server and site-info-root) yourself.


2. Installation

Site-Info comes with an installation script (written in perl) that takes care of putting the scripts and pages into the directories you specify. It also creates a common includefile which contains the configuration for all the scripts (location of server- and document-directories, etc).
To run the installation script you just type perl install.pl or you can alternatively edit the first line of install.pl to reflect the location of perl on your site and run it as an executable.
The script will ask you about the locations of several programs on your system. Default values are printed in [brackets] and can be accepted by typing <return> at the prompt.
Next, the script will prompt you for the location of the server-root directory and documents-root directory. The server-root is the directory where the http-server (and its cgi-bin-directory) reside. The documents-root is the directory which contains the html-pages of your server.
Next, the script will ask where to put the Site-Info scripts and the html-pages that come along with the package. You should place the scripts in a separate subdirectory of cgi-bin. You should also put the index scripts in a separate directory from the administration scripts (as suggested by the default values) to easily set up document protection.
After the configuration is complete, the install-script will print a summary of the information you entered and ask if everything is correct. You can interrupt installation at this point by entering anything different than the default (y). Otherwise the script will create the directories, create the config-file and copy the individual scripts and pages into their directories.
Afterwards the databases are created and installation is complete.

Problems: The installation script guesses the location of programs on your system by parsing the output of the "whereis"-command (the first executable found in its output is taken as default). If you experience strange behaviour you should find out which version of the program involved is the one you actually use yourself with the which <program> command.
Sun-Users should use the following instead of the defaults:

2.1. Server setup

Before you can run Site-Info you will have to modify your http-Server setup:

3. Usage

3.1. File Information

When updating the index database, Site-Info gathers information on each html-document it encounters within your document-directory tree. The set of info acquired in any case contains: Title, owner (=maintainer), size and date of last modification.
You can supply additional information to Site-Info by including special tags within your html-documents, which are invisible to html-browsers. The tags are implemented as html-comments and have a common format:
<!--tag: value>
The following tags are implemented: You don't have to use any of these tags; Site-Info will run well without them. It is just a means to provide additional information.

For the hierarchical index you can provide a "heading" for each directory of your server. To do this, just create a file named .info (with a leading dot) in each directory, containing one line of text describing the directory.
To exclude a complete directory from the index you can create an empty file named .noindex in that directory. To exclude a whole directory tree, create a file names .notree in the topmost directory of that tree.

3.2. The Index Database

The index database is created automatically after the installation of Site-Info. You can update it by selecting the appropriate hyperlink on the administration page of Site-Info. The index database includes a keyword file which contains only the information necessary to "grep" through it. Whenever you update the index database, a set of html-pages is generated containing the sorted Index (hierarchically, alphabetically, by date) along with an updated list of authors. The local index information is also updated into the external index database.
It is also possible to create partial indices for subdirectories on your server. You can do this by firing up the html page "index_partial.html" which resides in your index directory. To do it by hand, you have to call index.pl?palpha?dir where dir is the topdirectory of the tree of want to index. You can exchange alpha with hiera or date for hierarchical or chronological indices (instead of alphabetical).

During installation, a directory is created which contains the local and external index files. Three index files (alphabetical, hierarchical and chronological) reside in this directory which are automatically updated everytime you update the index database (this is done using the <!--index>-tag described above). That way you have a complete index of your server right from the start. Of course you can edit the index files - only the index-lines within them will be replaced during the next update. You can also create new index pages with full or partial indices using the <!--index>-tag.

3.3. The Author Database

The author database is updated automatically whenever the index database is updated. You can edit it by selecting the appropriate hyperlink in the administration page. The information stored for each author comprises: username, fullname, email homepage, permissions and password. The permissions include the right to edit the author and host databases and to exchange index information with different hosts. The password is only required if any of the permissions are set, which should not be the case with most authors in the database.
You should restrict permission to edit the databases to the administration of your site, if possible only to yourself. The permissions to exchange index data with different hosts will be discussed below.
To change the author database you have to identify yourself with your Site-Info username and password. The package will check if your username has been granted permission to edit the database.
When editing an entry you can leave the password fields empty which will leave the password for that author unchanged. As an administrator you can change an author's password without having to enter his old password. An author can change his own password by using the author_passwd.html page - he will have to identify himself with his username and old password, though.

3.4. The Host Database

The host database contains information about all hosts that have transmitted their index data to your server (either directly or indirectly through other hosts). Additionally you can add hosts that you want to send your index data to. To do this or to edit an entry in the host database, select the appropriate links in the administration page (you have to identify yourself with your username and password before you are allowed to change the database).
The host database contains the following information about each host: description, url, location of the Site-Info scripts, location of the index homepage and permission to send to that host.

3.5. The External Index Database

The external index database contains a subset of information about files on your server as well files from different servers. The local information is updated everytime you update your local index database. Whenever the external index database is modified, an updated keyword file and several html-pages containing the sorted indexes are generated. The external index contains another database which stores the directory structure of all hosts that take part in the external index, along with the directory descriptions (see above).

Transmitting data to a host:
To transmit data to another host you have to enter it into the host database, along with the correct locations (Site-Info scripts and index homepage), and give permission to send to it. You have to give yourself permission to send index data in the authorbase. The receiving host has to create an entry for you in the author database which allows you to receive index data on that host.
Select the appropriate link from the administration page and identify yourself with your local Site-Info username and password. Choose the host you want to send to from the selection box (if it doesn't appear, you forgot to set the send permission in the host database) and press the submit-button. A script will prepare the data for sending and will present it as a set of four textareas (using the forms-interface). It should be pretty obvious that you should not edit the textareas. To actually send the data, identify yourself with your Site-Info username and password from the host you want to send to and press the submit button. If the action was successful, you will receive a screen indicating the entries that were added to the remote hosts database.

Receiving data from another host:
Receiving is analogous to sending. The sending host has to have your server in its host database, along with the send permission set. You have to provide an entry in the author database that is valid for receiving data on your host and sending data on the other host. You can either send the data yourself from the other host (if you are registered on that host), or let someone from the other host send the data to you (as described above).

3.6. The Forms-Mail Interface

You can use the forms-mail interface within your html-documents by including a direct call to it e.g. as a comment-button. The script author_form.pl in your Site-Info script directory takes its input as a query-string out of the calling URL to generate a form. The user then types his name, email and comment into the form and submits it, thus calling the author_mail.pl script which mails the comments to the appropriate author. You can only send mail to authors which are registered in your author database this way.

3.7. Reports

Site-Info keeps track of its use in a separate logfile. You can turn off the logging of events by commenting out the appropriate line in config.pl. The log is a plaintext file that can either be viewed directly from unix or with a HTML3-compliant browser (e.g. Arena, Mosaic 2.5), using the link on the administration page.
Site-Info can compile a list of documents which have expired according to the expiry-tags included in them. To see this list, just select the according link on the administration page.


This manual is still under construction. If you'd like to comment on it please use
this form or send an email to ganslan@uni-muenster.de.


Site-Info V1.1a, updated Sun, 27.11.94, 18:28 MET.