Processing live data feeds with RapidMiner

The Open File operator has been introduced in the 5.2 version of RapidMiner. It returns a file object for reading content either from a local file, from an URL or from a repository blob entry. Many data import operators including Read CSV, Read Excel and Read XML has been extended to accept a file object as input. With this new feature, now you can process live data feeds directly in RapidMiner.

Many data import operators provide a wizard to guide users through the process of parameter setting. Unfortunately, wizards can not use file objects, they always present a file chooser dialog on start. When dealing with data from the web, you can make use of the wizards according to the following scenario: download the data file and pass your local copy to the wizard. After successful import you can even delete the local file. Data import operators ignore their file name parameter when they receive a file object as input.

In the following a simple use case is presented for demonstration purposes.

The United States Geological Survey’s (USGS) Earthquake Hazards Program provides real-time earthquake data. Real-time feeds are available here. Data is updated periodically and is available for download in multiple formats. For example, click here to get data in CSV format about all M2.5+ earthquakes of the past 30 days (the feed is updated every fifteen minutes).

Let’s see how to read this feed in a RapidMiner process. First, download the feed to your computer. The local copy is required only to set the parameters of the Read CSV operator by using the Import Configuration Wizard. For this purpose you can use a smaller data file, for example this one.

Import the local copy of the feed using the wizard. Select the following data types for the attributes:

  • Src (source network): polynomial
  • EqId: polynomial
  • Version: integer
  • Datetime: date_time
  • Lat: real
  • Lon: real
  • Magintude: real
  • NST (number of reporting stations): integer
  • Region: text

Important: the value of the date format parameter must be set to E, MMM d, yyyy H:mm:ss z to ensure correct handling of the Datetime attribute. For details about date and time pattern strings consult the API documentation of the SimpleDateFormat class (see section titled Date and Time Patterns). It is also important to set the value of the locale parameter to one of the English locales.

Once the local file is imported successfully, drag the Open file operator into the process and connect its output port the input port of the Read CSV operator. Set the parameters of the Open file operator according to the following: set the value of the resource type parameter to URL, and provide the URL of the feed with the parameter url.

A RapidMiner process that uses the Open file operator to read a data feed from the web

Now you can delete the local data file, the operator will read the feed from the URL when the process is run.

You can download the complete RapidMiner process here.

Tagged ,

6 thoughts on “Processing live data feeds with RapidMiner

  1. peter says:

    I’m a bit confused by your real time data example. When the site updates, does rapid miner automatically receive the new data? So, if you were monitoring a news feed via RSS, can you make rapid miner run a set of algorithms once it gets a new set of data?

  2. jeszy75 says:

    Note that the Read CSV operator is executed only once in the above example. If you run the process multiple times then the Read CSV operator will always obtain the most current data. Of course you can put the operator in a loop to reload data periodically with a reasonable time delay. (There is an operator named Delay.)

  3. Paul says:

    Hey, u’ve got interesting articles there !

    Just one question about RapidMiner : When you query a database in a process, is it possible to put parameters that user will enter at the moment of calling the service ?

    So far i just see the “preparedstatement” option that just allow me to put ‘?’ anywhere in the query and than define the string to replace them separately…

    Sorry to post here but Rapidminer forum subscription takes a while ( they did not confirm my account after 2 days ) and Google doesn’t answer nothing about that !



    • jeszy75 says:

      You can use macros in the process to mimic “preparedstatement”-like behaviour. (See the “Set Macro” and “Set Macros” operators.) The value of a macro named “X” can be referenced as %{X}. Such references can appear in operator parameter values and will be substituted properly. Unfortunately, the value of a macro must be set before the process is run. It is not possible to provide macro values interactively during runtime. Hope this helps.

      • Paul says:

        Thanks so much for you reply and sorry again to post here !

        actually no it doesn’t help for my purposes. This macro operator just allow to reference a variable, like a C macro more or less ( according to the documentation ).

        I’m so surprised that SQL queries to extract information from database has to be fixed through the process !!

        The only solution i see so far is to add a standalone process in the server that change the query directly inside the .XML process, according to some parameters given by the end-user…

        If i’m wrong in something correct me 😉

      • Paul says:

        Well i managed to put parameters in my query coming from the url. The end-user enter something like ‘ RA/process/UpdateMining?fixmacro=3& ‘ and the process UpdateMining receive in the background will fix a macro at value = 3 … For the moment I think i can do good things with this !

        I googled again with “macros”, and also looked inside the forum and there’s not so much people talking about doing such data retrieving from the end user…

        Thanks again for your help !


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: