Monthly Archives: February 2012

Auto-generation of UML Class Diagrams in Maven Projects

The yWorks UML Doclet is a handy Javadoc extension that automatically creates good-looking UML diagrams from Java classes and embeds them into the generated API documentation.

Although the tool is not free sofware, it’s Community Edition is available for free of charge under the following conditions:

The Community Edition of the Software is licensed to you free of charge. It comes without support and warranties of any kind. The Community Edition of the Software inserts a web link into your output files that points to the yWorks website. You may not change that link or prevent either display of the link or the intended use as means of navigation to get to the yWorks website in any way.

These terms are quite reasonable and you don’t have to pay a high price for such a great tool.

The functionality offered by the product is also available in Apache Maven projects via the Maven Javadoc Plugin. The following is a minimal POM that demonstrates how to use the doclet in Maven projects:

<project xmlns="" xmlns:xsi="" xsi:schemaLocation="">

Can’t wait to try it? Download and unpack the archive of the Community Edition, then run Maven with mvn site -DyDoc.path=PATH where PATH is the path of the directory that contains your yDoc installation. Alternatively, the path can also be given in the properties element of the POM:


The screenshot below shows the appearance of the enhanced API documentation.

API documentation with auto-generated UML class diagram

Thanks to László Aszalós for recommending this excellent tool.

Tagged ,

Analyzing Web Server Log Files with RapidMiner (Part 3): from Sessions to Transactions

The Transform Log to Session operator of the Web Mining Extension transforms an example set returned by the Read Server Log operator to a set of transactions suitable for performing association analysis.

The Transform Log to Session operator in a process

The mandatory parameters of the operator are named session attribute and resource attribute. The former determines the attribute used for identifying sessions while the latter determines the attribute used for identifying resources.

Parameters of the Transform Log to Session operator

The result of the operator is an example set in which each session is represented by a single example. The examples have many integer valued attributes each of which corresponds to a resource. The value of such an attribute represents the number of times the resource has been requested during the session.

Result of the Transform Log to Session operator

Note that performing association analysis may require further processing. For example, integer attributes must be transformed to binomial ones in order that the FP-Growth operator can be applied.

Tagged ,

Analyzing Web Server Log Files with RapidMiner (Part 2): Sessions

The Read Server Log operator returns an example set. Each entry of the example set corresponds to an entry in the log file. The operator automatically associates a session attribute value with each of the examples. Many examples may share the same session attribute value. But hey, what exactly is a session?

Example set returned by the Read Server Log operator

The operator has an important parameter named session timeout whose value must be a non-negative integer. The value of this parameter is measured in milliseconds. A session is a series of log file entries that correspond to HTTP requests initiated by the same user agent from the same host, so that the time difference between the first and the last log file entry in the session is always less or equal to session timeout.

When the first hit by a specific user agent from a specific host is found in the log file, a new session attribute value is assigned to the corresponding example. All subsequent hits by the same user agent from the same site within session timeout are associated with the same session attribute value. Once session timeout is reached, a new session attribute value is generated for the same user agent from the same site.

The operator automatically converts date and time values to integers. The time attribute of the resulting example set stores the number of minutes since January 1, 1970, 00:00:00 GMT. (See the source code of the class for implementation details.) Note that some information is lost during date and time conversion, since seconds in time values are discarded.

Tagged ,

Analyzing Web Server Log Files with RapidMiner (Part 1): Quick Start Guide and Bugfix

The RapidMiner Web Mining Extension provides the Read Server Log operator to read web server log files. Unfortunately, it is not straightforward to bring the operator to life based solely on the help text. Moreover, the operator does not work in the current version (5.1.4) of the extension.

Parameters of the Read Server Log operator

At first, you need some data files to play with the operator. Download the following ZIP file available from the KDnuggets website: The archive contains an anonymized Apache HTTP Server log file (da-11-16.ipntld.log). More information about the data can be found here. Place the log file under a separate directory and provide its path in the log dir parameter.

Here comes the tricky part. You must provide a configuration file for the logfile’s format in the config file parameter. The operator uses the polliwog Java library to process web server log files. Download the file polliwog-bin-stable-0.7.tar.gz from the projects website. The file apache-combined-log-entry-format.xml under the polliwog-0.7/data/ directory describes the Apache Combined Log Format in which our log file is stored. Place this file in your file system and provide its path in the config file parameter.

Now you can run the process that results in the following error:

Yes, it is a bug that I have already reported to the developers. The error is because of some missing classes. Fortunately, the problem can be fixed quite easily. Copy gentlyWEB.jar and jdom-1.0.jar from polliwog’s 3rd-party-jars/ directory to the lib/ directory of your RapidMiner installation. (That may require administrator privileges.) The operator will work fine after restarting RapidMiner. (The author has tested it on Linux.)

In order to fix the problem the content of these two JAR files must be added to the Web Mining Extension by the developers.

Tagged ,

We are Starting Soon

I have just created this blog, my first post is due tomorrow morning (CET).