rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
Journal of Open Source Software, 4(35), 1336, (2019)
Detecting collusive outliers using One-Class SVM, Isolation Forest and RRCF (left to right). One-class SVM and Isolation Forest fail to detect the group of collusive outliers in the center.
Publication info
Recommended citation:
- Bartos, M., Mullapudi, A., & Troutman, S. (2019). rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams. Journal of Open Source Software, 4(35), 1336. doi:10.21105/joss.01336
Available at:
- http://joss.theoj.org/papers/f8c83c0b01a984d0dbf934939b53c96d
Abstract
In this paper, we present the first open-source implementation of the robust random cut forest (RRCF) algorithm—an unsupervised ensemble method for anomaly detection on streaming data (Guha, Mishra, Roy, & Schrijvers, 2016). RRCF offers a number of features that many competing anomaly detection algorithms lack. Specifically, RRCF:
- Is designed to handle large volumes of streaming data.
- Is well-suited to data of high dimension.
- Reduces the influence of irrelevant dimensions in the input data.
- Gracefully handles duplicates and near-duplicates that could otherwise mask the presence of outliers.
- Features an anomaly-scoring metric with a clear underlying statistical meaning.
The RRCF algorithm is currently used for anomaly detection in the Amazon Kinesis real-time analytics engine. The goal of our repository is to provide an open-source implementation of the RRCF algorithm and its core data structures for the purposes of facilitating experimentation and enabling future extensions.