ECMLPKDD 2016, Joint European Conference on Machine Learning and Knowledge Discovery in Databases
A system of nested dichotomies is a method of decomposing a multi-class problem into a collection of binary problems. Such a system recursively splits the set of classes into two subsets, and trains a binary classifier to distinguish between each subset. Even though ensembles of nested dichotomies with random structure have been shown to perform well in practice, using a more sophisticated class subset selection method can be used to improve classification accuracy. We investigate an approach to this problem called random-pair selection, and evaluate its effectiveness compared to other published methods of subset selection. We show that our method outperforms other methods in many cases, and is at least on par in almost all other cases.
The University of Waikato, Honours Thesis
Regular expression matching is an essential component of critical computer systems. As the amount of data online is constantly growing, and the number of applications for data analysis using regular expressions increases, more powerful and efficient regular expression matching systems are required. This study explores the design space of regular expression matching using two different many-core systems: the Intel Xeon Phi Coprocessor and the Azure System-on-Chip by Wave Semiconductor. A many-core, software based implementation of Thompson’s NFA algorithm and B-FSM has been developed for the Xeon Phi, with generally poor throughput compared to using a standard Xeon CPU. However, a hardwarebased implementation of B-FSM has also been developed and evaluated for the Azure System-on-Chip, with an estimated throughput per device 11.7x greater than current state-of-the-art ASIC-based implementations. The matching systems have been evaluated using rules from the Snort rule set, a popular network intrusion detection suite.