The Enrich Data by Webservice operator of the RapidMiner Web Mining Extension allows you to interact with web services in your RapidMiner process.
A web service can be invoked for each example of an example set. (Note that this may be time-consuming.) All strings of the form
<%attribute%> in a request will be automatically replaced with the corresponding attribute value of the current example. The operator provides several different methods to parse the response, including the use of regular expressions and XPath location paths. Parsing the result you can add new attributes to your example set.
For demonstration purposes we will use the Google Geocoding API. This web service also offers reverse geocoding functionality, i.e. provides a human-readable address for a geographical location. To see how it works, click on the following link: http://maps.googleapis.com/maps/api/geocode/xml?latlng=47.555214,21.621423&sensor=false. Notice that latitude and longitude values are passed to the service in the
latlng query string parameter.
We will use this data file for our experiment. The file contains earthquake data that originates from the Earthquake Search service provided by the United States Geological Survey (USGS). Consider the following RapidMiner process that is available from here:
First, the data file is read by the Read CSV operator. Then the Sort and Filter Example Range operators are used to filter the 50 highest magnitude earthquakes. Finally, the Enrich Data by Webservice operator invokes the web service to retrieve country names for the geographical locations of these 50 earthquakes. (Only a small subset of the entire data is used to prevent excessive network traffic.)
The parameters of the Enrich Data by Webservice operator should be set as follows (see the figure below):
- Set the value of the query type parameter to
- Set the value of the attribute type parameter to
- Uncheck the checkbox of the assume html parameter
- Set the value of the request method parameter to
- Set the value of the url parameter to
Finally, click on the Edit List button next to the xpath queries parameter that will bring up an Edit Parameter List window. Enter the string
Country into the attribute name field and the string
//result[type = 'country']/formatted_address/text() into the query expression field.
That’s all! Unfortunately, running the process results in the following error:
Well, this is a bug that I have already reported to the developers. (See the bug report here.) The following trick solves the problem: set the request method parameter of the Enrich Data by Webservice operator to
POST, enter some arbitrary text into the parameter service method, then set the request method parameter to
The figure below shows the enhanced example set that contains country names provided by the web service (see the Country attribute).