Analyzing Web Server Log Files with RapidMiner (Part 1): Quick Start Guide and Bugfix

The RapidMiner Web Mining Extension provides the Read Server Log operator to read web server log files. Unfortunately, it is not straightforward to bring the operator to life based solely on the help text. Moreover, the operator does not work in the current version (5.1.4) of the extension.

Parameters of the Read Server Log operator

At first, you need some data files to play with the operator. Download the following ZIP file available from the KDnuggets website: The archive contains an anonymized Apache HTTP Server log file (da-11-16.ipntld.log). More information about the data can be found here. Place the log file under a separate directory and provide its path in the log dir parameter.

Here comes the tricky part. You must provide a configuration file for the logfile’s format in the config file parameter. The operator uses the polliwog Java library to process web server log files. Download the file polliwog-bin-stable-0.7.tar.gz from the projects website. The file apache-combined-log-entry-format.xml under the polliwog-0.7/data/ directory describes the Apache Combined Log Format in which our log file is stored. Place this file in your file system and provide its path in the config file parameter.

Now you can run the process that results in the following error:

Yes, it is a bug that I have already reported to the developers. The error is because of some missing classes. Fortunately, the problem can be fixed quite easily. Copy gentlyWEB.jar and jdom-1.0.jar from polliwog’s 3rd-party-jars/ directory to the lib/ directory of your RapidMiner installation. (That may require administrator privileges.) The operator will work fine after restarting RapidMiner. (The author has tested it on Linux.)

In order to fix the problem the content of these two JAR files must be added to the Web Mining Extension by the developers.

Tagged ,

7 thoughts on “Analyzing Web Server Log Files with RapidMiner (Part 1): Quick Start Guide and Bugfix

  1. F says:

    First! 😀

    BTW congrat for the start of your blog!

  2. ravi,india says:

    I have the same problem after copy 2 file and store it to the lib/ directory of my RapidMiner installation in WINDOWS 7. Kindly help me how to solve this problem.


    Tamil Nadu

  3. jeszy75 says:

    The following trick solves the problem on Windows. Edit the RapidMinerGUI.bat file under the scripts\ directory as follows: change

    -jar “%RAPIDMINER_HOME%\lib\launcher.jar”



    in lines 169 and 174. If the -jar option is present in the command line the classpath from the manifest of launcher.jar is used which does not contain the two extra JAR files. Run the RapidMinerGUI.bat script to start RapidMiner.

  4. Jan says:

    Thank you. It helps me.

  5. Firstly, thanks you a lot!

    Do you know if RapidMiner can read iis logs? I can’t find any documentation on this…

    Thank you!!

    • jeszy75 says:

      I think so:) Try to use the file w3c-extended-log-entry-format.xml from the polliwog distribution instead of the file apache-combined-log-entry-format.xml.

  6. Hello again,

    I can’t thank you enough!

    I had to change the config file in order to have the right order of fields and in order to remove some fields and add some new ones but for the most important ones (uri, methods..) all was there!!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: