The RapidMiner Web Mining Extension provides the Read Server Log operator to read web server log files. Unfortunately, it is not straightforward to bring the operator to life based solely on the help text. Moreover, the operator does not work in the current version (5.1.4) of the extension.
At first, you need some data files to play with the operator. Download the following ZIP file available from the KDnuggets website: http://www.kdnuggets.com/web_mining_course/kdlog.zip. The archive contains an anonymized Apache HTTP Server log file (
da-11-16.ipntld.log). More information about the data can be found here. Place the log file under a separate directory and provide its path in the log dir parameter.
Here comes the tricky part. You must provide a configuration file for the logfile’s format in the config file parameter. The operator uses the polliwog Java library to process web server log files. Download the file
polliwog-bin-stable-0.7.tar.gz from the projects website. The file
apache-combined-log-entry-format.xml under the
polliwog-0.7/data/ directory describes the Apache Combined Log Format in which our log file is stored. Place this file in your file system and provide its path in the config file parameter.
Now you can run the process that results in the following error:
Yes, it is a bug that I have already reported to the developers. The error is because of some missing classes. Fortunately, the problem can be fixed quite easily. Copy
jdom-1.0.jar from polliwog’s
3rd-party-jars/ directory to the
lib/ directory of your RapidMiner installation. (That may require administrator privileges.) The operator will work fine after restarting RapidMiner. (The author has tested it on Linux.)
In order to fix the problem the content of these two JAR files must be added to the Web Mining Extension by the developers.