Building Ensembles of Adaptive Nested Dichotomies with Random-Pair Selection 2016

Tim Leathart, Bernhard Pfahringer and Eibe Frank

ECMLPKDD 2016, Joint European Conference on Machine Learning and Knowledge Discovery in Databases

A system of nested dichotomies is a method of decomposing a multi-class problem into a collection of binary problems. Such a system recursively splits the set of classes into two subsets, and trains a binary classifier to distinguish between each subset. Even though ensembles of nested dichotomies with random structure have been shown to perform well in practice, using a more sophisticated class subset selection method can be used to improve classification accuracy. We investigate an approach to this problem called random-pair selection, and evaluate its effectiveness compared to other published methods of subset selection. We show that our method outperforms other methods in many cases, and is at least on par in almost all other cases.

pdf bibtex

Accelerating Regular Expressions on Many-Core Systems 2014

Tim Leathart

The University of Waikato, Honours Thesis

Regular expression matching is an essential component of critical computer systems. As the amount of data online is constantly growing, and the number of applications for data analysis using regular expressions increases, more powerful and efficient regular expression matching systems are required. This study explores the design space of regular expression matching using two different many-core systems: the Intel Xeon Phi Coprocessor and the Azure System-on-Chip by Wave Semiconductor. A many-core, software based implementation of Thompson’s NFA algorithm and B-FSM has been developed for the Xeon Phi, with generally poor throughput compared to using a standard Xeon CPU. However, a hardwarebased implementation of B-FSM has also been developed and evaluated for the Azure System-on-Chip, with an estimated throughput per device 11.7x greater than current state-of-the-art ASIC-based implementations. The matching systems have been evaluated using rules from the Snort rule set, a popular network intrusion detection suite.

pdf bibtex