Nonlinear Dimensionality Reduction

In statistical learning, many problems require initial preprocessing of multi-dimensional data, and often reduce dimensionality of the data, in a way, to compress features without loosing information about relevant data properties. Common linear dimensionality reduction methods, such as PCA or MDS, in many cases cannot properly reduce data dimensionality especially when data located around nonlinear manifold embedding in high-dimensional space.

ml

There are many nonlinear dimensionality reduction (NLDR) methods for construction of low-dimensional manifold embeddings. A Julia language package ManifoldLearning.jl provides implementation of most common algorithms.

Read More

MLE Optimization for Mixture Models

Introduction

Mixture models are used for many purposes in data science, e.g. to represent feature distributions or spatial relations. Given a fixed data sample, one can fit a mixture model to it using one of a variety of methods. A very common mixture structure is based on Gaussian distributions, the Gaussian Mixture Model (GMM). The expectation-minimization (EM) algorithm allows to find GMM parameters by maximizing model’s likelihood. This approach usually requires closed-form description of parameters’ estimates and have slow convergence to optimal solution. These limitation can be overcome by using (quasi-)Newton optimization. In conjunction with automatic differentiation (AD), finding optimal model parameters become trivial.

GMM

For modeling a collection of time series, another model based on a mixture of ARMA processes, Mixture of ARMA Models, can be used. Using similar MLE optimization and AD approach, we will show how to cluster time series.

Read More

MLE Optimization for Regression Models

Introduction

The goal of regression is to predict the value of one or more continuous target variables $t$ given the value of a $D$-dimensional vector $x$ of input variables. The polynomial is a specific example of a broad class of functions called linear regression models. The simplest form of linear regression models are also linear functions of the input variables. However, much more useful class of functions can be constructed by taking linear combinations of a fix set of nonlinear functions of the input variables, known as basis functions [1].

Regression

Regression models can be used for time series modeling. Typically, a regression model provides a projection from the baseline status to some relevant demographic variables. Curve-type time series data are quite common examples of these kinds of variables. Typical time series model is the ARMA model. It’s a combination of two types of time series data processes, namely, autoregressive and moving average processes.

Read More

Stochastic Gradient Descent in Data Science

Introduction

Stochastic gradient descent (SGD) is a popular stochastic optimization algorithm in the field of machine learning, especially for optimizing deep neural networks. In its core, this iterative algorithm combines two optimization techniques: a stochastic approximation with gradient descent.

sgd

SGD is common for optimizing various range of models. We are interested in application of this optimization techniques to standard data science tasks as linear regression and clustering. In addition, we’ll use differentiable programming techniques for simplifying and making versatile our SGD implementation.

Read More