SiteStats: Documentation & Support

DOCUMENTATION & DEFINITIONS

SiteStats^TM runs every day and calculates the reports displayed here. In order to fully understand the reports it is important to understand the logic behind the programming and the data it reads. First lets discuss the logs that are read by the program. A web log is a line by line log of every single "hit" to your website. Each line in a log contains the following data for every request:

Date and Time of the request
Remote IP Address making the request (visitors ip address)
Authenticated User (if they had to enter a username and password then this is the username)
Request Method
URL Path Requested (the web page requested in the form of a file path on the server)
Query String (if any)
Request Protocol (HTTP Version)
Status returned to the visitor
Bytes (the amount of data sent back to the visitor)
Referring Web Page (if any)
User Agent (the visitors browser and operating system with versions)

Each and every file requested from the server is an individual "hit". A webpage request is an individual "hit". However webpages that contain images may constitute multiple "hits". For example, if you have a web page with 5 pictures on it, then a request to a server to view that page generates 6 hits (5 for the pictures, and one for the web page or html file itself). When a visitor looks at a webpage it is called an "Access" or "Page View" regardless of the number of "hits" it generated. This is the reason that we are mainly concerned with "Accesses" or "Page Views" and not hits.

SiteStats^TM breaks the Accesses (Page Views) into two distinct groups in order to better reflect the true traffic to the site:

World Accesses: These are all the visitors (by ip address) around the world that are not your own (or Desyne's).
World Accesses are the only ones graphed with the color bars throughout the reports.
Local Accesses: This group consists of your own ip addresses and those of Desyne staff.
They are calculated once and excluded from all other reports.

There are three other variables included in every report.

Unique Hosts: These are the ip addresses (or hostname when available) of the visitors making the requests.
Errors: These are requests that were not served back successfully. The file requested did not exist (404 error). The username or password needed was incorrect (403 error), etc. Click here for a complete list or error codes.
Data Transfer: This is simply the total amount of bytes sent back to visitors for all requests from the site.

So lets look at the actual programming logic behind the calculations and reports. First the web log is opened and each line is read and calculated (oldest to the most current). As each line is read in to the program...

HITS are added up. Each line is a hit. All images, files, errors, everything regardless of the status returned.

BYTES are added up. All bytes sent back to the visitors including all images, files, errors, everything regardless of the status.

After the two calculations above each request (line) is examined and if the request is NOT for a valid html page or directory (i.e. the file requested ends with an ".ico", ".css", ".js", ".gif", ".jpg", ".png", ".bmp", ".jpeg", or ".tif") then the request is ignored and the line skipped. This gives us a better representation of true traffic to the site by pages viewed and not just superfluous data.

So as the lines are read if we got this far we now look at the status code. If the request was NOT successful (status code >= 400) it increments the ERRORS count and is then excluded from all further consideration (skipped to the next line).

Since now we have successful, real accesses or (page views) we now divide them into the two groups. If the request comes from a local ip address it increments the LOCAL ACCESSES count and is then excluded from all further consideration.

Whas is left is WORLD ACCESSES for successful web page requests. This is the only data used to calculate all the other data. As mentioned before, this gives a much more accurate representation of real traffic, requesting real pages from the site.