This project was first floated as a Kaggle competition, with the dataset made available by Bosch.
In this work, we pose the task of fault detection as a binary classification problem. The features include numerical, categorical, and timestamp features, and hence warranty a combination of several techniques for efficiently solving the problem.
First, a biased sampling method is used to reduce the effect of skewed data distribution. Thereafter, the categorical features are represented as 3 numerical features using sparse online classification algorithms: stochastic truncated gradient (STG), forward-backward splitting (FOBOS), and enhanced regularized dual averaging (ERDA). Once features are obtained, we try several classification methods like SVM and feed-forward networks to perform the fault detection. Finally, the overall objective is optimized using a Bayesian optimization technique.