Machine-learning

An Introduction to Speech Recognition using WFSTs

Until now, all of my blog posts have been about deep learning methods or their application to NLP. Since the last couple of weeks, however, I have started learning about Automatic Speech Recognition (ASR)1. Therefore, I will also include speech-related articles in this publication now. The ASR logic is very simple (it’s just Bayes rule, like most other things in machine learning). Essentially, given a speech waveform, the objective is to transcribe it, i.

Sparsity in Online Learning with Lasso Regularization

Sparse vectors have become popular recently for 2 reasons: Sparse matrices require much less storage since they can be stored using various space-saving methods. Sparse vectors are much more interpretable than dense vectors. For instance, the non-zero non-negative components of a sparse word vector may be taken to denote the weights for certain features. In contrast, there is no interpretation for a value like $-0.1347$. Sparsity is often induced through the use of L1 (or Lasso) regularization.

A Short Note on Stochastic Gradient Descent Algorithms

I just finished reading Sebastian Ruder’s amazing article providing an overview of the most popular algorithms used for optimizing gradient descent. Here I’ll make very short notes on them primarily for purposes of recall. Momentum The update vector consists of another term which has the previous update vector (weighted by $\gamma$). This helps it to move faster downhill — like a ball. $$ vt = \gamma v{t-1} + \eta \nabla_{\theta}J(\theta) $$

Machine-learning

An Introduction to Speech Recognition using WFSTs

Sparsity in Online Learning with Lasso Regularization

A Short Note on Stochastic Gradient Descent Algorithms

Uncertain Fuzzy Self-organization based Clustering: Interval Type-2 Approach to Adaptive Resonance Theory

Monitoring production line performance to reduce failures

Fuzzy adaptive resonance theory (ART) clustering