There's a lot of confusion about web server statistics and exactly what information they log. This is an attempt to explain the basics.
Introduction | Logs | What does it mean?
When someone using web browser requests a page from a web site, the web server typically records information about the request. The resulting web server logs (which are accessible to the person running the site) are the basis for most web site statistics. Note: the following section (about the structure of logs) is somewhat technical. You might want to skip to "What does it mean?". |
Some interesting statistics |
Here is a sample line from the logs for my book reviews. (It's from an Apache server, but others should be similar.)
192.168.1.1 - - [24/Oct/2000:18:11:12 -0400] "GET /h/The_Great_Human_Diasporas.html HTTP/1.0" 200 5326 "http://www.google.com/search?q=cavalli-sforza&btnG=Google+Search" "Mozilla/4.5 [en] (WinNT; U)"
192.168.1.1
[24/Oct/2000:18:11:12 -0400]
"GET /h/The_Great_Human_Diasporas.html HTTP/1.0"
200 5326
"http://www.google.com/search?q=cavalli-sforza&btnG=Google+Search"
"Mozilla/4.5 [en] (WinNT; U)"
There are many analysis packages that will produce statistics from a web server log. Each line in the log file is a hit or request. Because every image (and stylesheet) is fetched separately, "hits" is totally useless as a marketing/impact measure (it is however useful as a technical indicator of how much stress the web server might be under). A request for an html document is a page access. Counting these gives a rough approximation to the number of times pages on the site have been viewed (page views), with some provisos. A site with frames will produce several (three or more) "page" accesses for each actual page view, since the frameset and each frame are separate requests. A proxy server may fetch the page once and then serve it to multiple clients (creating undercounting). And search engines and other automated spiders will often fetch every page on a site - without any of them being viewed by actual people. This can drastically inflate the (effective) page access counts, especially for low-traffic sites with a large number of pages. The number of unique hosts accessing a site is the number of unique network addresses making requests. This provides a rough approximation to the number of people viewing the site. Again, proxy servers cause undercounting, while someone with a dynamic address connecting at different times will be counted multiple times. Some analysis software analyses the intervals between series of requests to estimate the number of visits. I don't know much about this, but except for analysis over really long periods, I suspect it won't vary that much from the unique hosts figure. Figures produced using the same analysis package, on the one site, can be used to track changes over time. Trying to use absolute numbers, however, or comparing statistics from different sites is another matter - it's largely "smoke, mirrors, and spiders" and should be treated accordingly. |
Seasonal variation
Changes in the number of page accesses from month to month may not indicate anything unusual. Traffic on all the sites I run drops off drastically mid-year, in northern hemisphere holidays, and has done so consistently over the last five years. |