Thursday, February 14, 2008 8:43 AM
JeffB
Stream processing in Data Mining
Typically in data mining you are looking for patterns in data that has already been collected and saved. In essence, you are looking back in time to discern patterns to either explain what happened or to make future predictions based on that information.
But in some applications one needs to do continuous online processing of data. This is prevalent in fraud and intrusion detection systems. In these scenarios, you are often processing and acquiring data in real-time and making decisions based on this new data and past information.
To assist in building systems which can support this capability there has been active research into stream processing systems or data managers. In these systems one can run a query that runs continuously as opposed to a moment at time. This allows one to build some sort of monitoring system on top of the stream processor.
This seems like a farily new area and I am not sure what the state of the art is, as we typically don't see these capabilities integrated into commercial DBMS systems.
I have found the following academic projects that deal with Stream Processing.
Stanford Stream Data Manager Stream MillMAIDSIn addition, there appear to be some commercial products that are similar to the above. If anyone has used any of these systems or built applications on this type of technology, I would be interested to hear about it.
Update
I have found some additional stream management systems so I wanted to update my list here.
CougarAuroraBorealis
HancockNiagra
OpenCQ
Tapestry
Telegraph
Tribeca
Gigascope