The Read Server Log operator returns an example set. Each entry of the example set corresponds to an entry in the log file. The operator automatically associates a session attribute value with each of the examples. Many examples may share the same session attribute value. But hey, what exactly is a session?
The operator has an important parameter named session timeout whose value must be a non-negative integer. The value of this parameter is measured in milliseconds. A session is a series of log file entries that correspond to HTTP requests initiated by the same user agent from the same host, so that the time difference between the first and the last log file entry in the session is always less or equal to session timeout.
When the first hit by a specific user agent from a specific host is found in the log file, a new session attribute value is assigned to the corresponding example. All subsequent hits by the same user agent from the same site within session timeout are associated with the same session attribute value. Once session timeout is reached, a new session attribute value is generated for the same user agent from the same site.
The operator automatically converts date and time values to integers. The time attribute of the resulting example set stores the number of minutes since January 1, 1970, 00:00:00 GMT. (See the source code of the
com.rapidminer.operator.io.loganalysis.LogFileSourceOperator class for implementation details.) Note that some information is lost during date and time conversion, since seconds in time values are discarded.