WOAnalysis Tool v1.0.1
======================

GOAL
====
The tool generates various statistics about web applications, like:
- average request time per specific page (component) name,
- user tracking, i.e. common user paths through the application,
- info on error messages that occured and how many users managed to overcome them and proceed to the next page.

ARCHICTETURE
============
The tool is split into several modules:
* Miner - the tool that brings all the tools together. Run Miner with no parameters to get information on the usage.
* conf/ - configuration files. The files are written in Ruby (a common practice). They contain info on:
    - what reports to generate and how to configure them,
		- application parameters,
		- output parameters (formats, destinations, etc.).	
	Example config files are attached.
* Recording
	- LogManager - parses the apache log file and creates data repository in the CacheMachine.
	- CacheMachine - maintains the graph of log entries (HTTP requests) as a group of double linked lists. The requests are linked by all possible criteria. For example requests from the same session are linked together, requests from the same IP, requests to the same application, etc. This way one can easily access each request and traverse the data in any order.
* StatModules
	They represent statistical modules. Each statistical module aggregates the data in CacheMachine for its purpose.
	Each statistical module is capable of generating several reports. 
	For example PageStatModule is capable of generating reports specific to pages like:
	- average request time per page,
	- total requests to each page,
	whereas SessionTrackStatModule is capable of generating session-related reports like:
	- individual user tracking (page-by-page),
	- conversion rates,
	- validation fail/success ratio.
	StatModules are hierarchical. SessionTrackStatModule and PageStatModule inherit from StatModule.
* ReportTypes
  They represent respective report types generated by StatModules.
	Class hierarchy reflects statistical module hierarchy, i.e.
	- Report
	  \__ PageReport
		     \__ PageVisitsReport
				 \__ AvgReqTimeReport
			   \__ ... add custom reports here
	  \__ SessionTrackReport
		     \__ ConversionReport
				 \__ IndividualTracksReport
				 \__ ... add custom reports here
	You can look into example conf/ifirma2006full.rb configuration file to see how this structure is reflected in the config files. Note that inner hash entries are actually Class objects (e.g. PageVisitsReport).
 * ReportGenerator
 	 It is responsible for generating specific report output files, like HTML or CSV. The reports provide only raw data (2D arrays) and report generator interprets them.

HINTS
=====
* Before processing a logfile I recommend that you remove all the irrelevant requests (i.e. requests for images, css files and alike). It will speed up the processing. 
* The WOLogging tool was tested with logfiles up to 850,000 lines (wc -l). As this tool stores all the requests and their bindings in RAM, it takes a lot of space (possible tracing and optimizations in the future). 
* For faster processing it is recommended to split the logfiles into smaller chunks (optimum size seems to be around 100 - 150K lines (split -l100000 file.log).

	 
CONVENTIONS
===========

Current version of scripts scripts supports the following log format :
  LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D \"%{x-webobjects-customenv}e\"" combined

x-webobjects-customenv format is :
  <AppName>;<SessionId>;ComponentName>;[<arg>=<val>;]*

Example:
  Store;YVJNJDA;org.pm.store.ReceiptPage;customer=foo@bar.com;qty=1;payment=cc;

Notes:
  - There is an older version 0.8.1 available.


In case of questions / problems, please contact:
  jacek@power.com.pl 
  wojtek@power.com.pl




