Differentiable Convolutional Neural Network Architectures for Time Series Classification
Bachelor Thesis at Hasso Plattner Institute
In the last few years, deep learning revolutionized many areas of machine learning. One very important area is time series classification. Time series are produced everywhere in real world scenarios, such as industrial or medical sensor data. The success in deep learning is largely based on training through backpropagation. Nevertheless, backpropagation can only be applied to weights training, since architecture construction is not differentiable. As a result, state of the art architectures are commonly handcrafted. This thesis addresses the problem of automatically designing architectures for time series classification in an efficient manner. Existing solutions for constructing architectures algorithmically, such as evolutionary or reinforcement learning methods, are much more computationally expensive. We address the problem by introducing a regularization technique for convolutional neural networks (CNNs) that enables joint training of network weights and architecture through backpropagation. Skip connections and a special two-phased training are introduced to enable a stable optimization. We evaluate the approach on the UCR archive, yielding competitive results compared to state of the art in time series classification, and outperforming on datasets where handcrafted architectures do not match the complexity of the dataset. [PDF]
Building an end-to-end speech recognizer with tensorflow
DNN speech recognition
For decades speech recognition was based on Hidden Markov Models, handcrafted features and very detailed linguistic domain knowledge. Most current speech recognizers are based on Deep Neural Network architectures, like Deep Speech 2 - Amodei et al. 2015 and Wav2Letter - Collobert et al. 2016. Though, as of now, most state of the art speech recognizer implementations are proprietary. Within a course at the University of Potsdam, we aim to change that.
SpeechT is my attempt on a tensorflow based implementation of an end-to-end speech recognizer. It is based on Wav2Letter - Collobert et al. 2016 and achieves a Letter Error Rate of 8% and Word Error Rate of 20% on the LibriSpeech test corpus. During decoding it uses my customized version of tensorflow to incorporate a language model KenLM for improved text predictions. [On github]
Transfer Learning for Speech Recognition on a Budget
Published Workshop Paper at ACL 2017 (Kunze and Kirsch et al. 2017)
End-to-end training of Automated Speech Recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network's weights were sufficient for good performance, especially for inner layers. [On ArXiv] [On Github]
Framework for Exploring and Understanding Multivariate Correlations
Published Demo Track Paper at ECML PKDD 2017 (Kirsch et al. 2017)
Feature selection is an essential step to identify relevant and non-redundant features for target class prediction. In this context, the number of feature combinations grows exponentially with the dimension of the feature space. This hinders the user's understanding of the feature-target relevance and feature-feature redundancy. We propose an interactive Framework for Exploring and Understanding Multivariate Correlations (FEXUM), that embeds these correlations using a force-directed graph. In contrast to existing work, our framework allows the user to explore the correlated feature space and guides in understanding multivariate correlations through interactive visualizations. [ECML PKDD Paper] [On Github]
All features in the dataset are drawn using a fore directed graph (right) with the target feature to predict highlighted in green. The closer a feature is to the target, the greater is its relevance for predicting the target feature. Though, the closer two features are to each other, the bigger is also their redundancy towards each other. Classifiers perform best if many highly relevant features are provided while having low inter-feature redundancy. Our force directed graph drawing of feature correlations results in a soft clustering of features, where the user can then pick relevant features that have a high distance to each other, thereby minimizing redundancy. The user may want to make sure that the algorithmically found correlations are in fact relevant correlations, therefore they can be analyzed in detail (left).
Before moving into the field of AI, I was a freelance software engineer and participated in Human Computer Interaction research. My previous projects