Publications

Probability Calibration Trees 2017

Tim Leathart, Eibe Frank, Bernhard Pfahringer and Geoffrey Holmes

Conference
ACML 2017, Asian Conference on Machine Learning

Abstract
Obtaining accurate and well calibrated probability estimates from classifiers is useful in many applications, for example, when minimising the expected cost of classifications. Existing methods of calibrating probability estimates are applied globally, ignoring the potential for improvements by applying a more fine-grained model. We propose probability calibration trees, a modification of logistic model trees that identifies regions of the input space in which different probability calibration models are learned to improve performance. We compare probability calibration trees to two widely used calibration methods -- isotonic regression and Platt scaling -- and show that our method results in lower root mean squared error on average than both methods, for estimates produced by a variety of base learners.

pdf bibtex

Building Ensembles of Adaptive Nested Dichotomies with Random-Pair Selection 2016

Tim Leathart, Bernhard Pfahringer and Eibe Frank

Conference
ECMLPKDD 2016, Joint European Conference on Machine Learning and Knowledge Discovery in Databases

Abstract
A system of nested dichotomies is a method of decomposing a multi-class problem into a collection of binary problems. Such a system recursively splits the set of classes into two subsets, and trains a binary classifier to distinguish between each subset. Even though ensembles of nested dichotomies with random structure have been shown to perform well in practice, using a more sophisticated class subset selection method can be used to improve classification accuracy. We investigate an approach to this problem called random-pair selection, and evaluate its effectiveness compared to other published methods of subset selection. We show that our method outperforms other methods in many cases, and is at least on par in almost all other cases.

pdf bibtex


Accelerating Regular Expressions on Many-Core Systems 2014

Tim Leathart

Published
The University of Waikato, Honours Thesis

Abstract
Regular expression matching is an essential component of critical computer systems. As the amount of data online is constantly growing, and the number of applications for data analysis using regular expressions increases, more powerful and efficient regular expression matching systems are required. This study explores the design space of regular expression matching using two different many-core systems: the Intel Xeon Phi Coprocessor and the Azure System-on-Chip by Wave Semiconductor. A many-core, software based implementation of Thompson’s NFA algorithm and B-FSM has been developed for the Xeon Phi, with generally poor throughput compared to using a standard Xeon CPU. However, a hardwarebased implementation of B-FSM has also been developed and evaluated for the Azure System-on-Chip, with an estimated throughput per device 11.7x greater than current state-of-the-art ASIC-based implementations. The matching systems have been evaluated using rules from the Snort rule set, a popular network intrusion detection suite.

pdf bibtex