rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

Journal of Open Source Software, 4(35), 1336, (2019)

Detecting collusive outliers using One-Class SVM, Isolation Forest and RRCF (left to right). One-class SVM and Isolation Forest fail to detect the group of collusive outliers in the center.

Publication info

Recommended citation:

Bartos, M., Mullapudi, A., & Troutman, S. (2019). rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams. Journal of Open Source Software, 4(35), 1336. doi:10.21105/joss.01336

Available at:

http://joss.theoj.org/papers/f8c83c0b01a984d0dbf934939b53c96d

Abstract

In this paper, we present the first open-source implementation of the robust random cut forest (RRCF) algorithm—an unsupervised ensemble method for anomaly detection on streaming data (Guha, Mishra, Roy, & Schrijvers, 2016). RRCF offers a number of features that many competing anomaly detection algorithms lack. Specifically, RRCF:

  • Is designed to handle large volumes of streaming data.
  • Is well-suited to data of high dimension.
  • Reduces the influence of irrelevant dimensions in the input data.
  • Gracefully handles duplicates and near-duplicates that could otherwise mask the presence of outliers.
  • Features an anomaly-scoring metric with a clear underlying statistical meaning.

The RRCF algorithm is currently used for anomaly detection in the Amazon Kinesis real-time analytics engine. The goal of our repository is to provide an open-source implementation of the RRCF algorithm and its core data structures for the purposes of facilitating experimentation and enabling future extensions.