Isotonic Regression in Rust

Isotonic (monotonic) regression is the technique of fitting a piecewise line to a sequence of observations such that the fitted line is monotonically non-decreasing.

Let $(x_1,y_1),\ldots ,(x_n,y_n)$ be a given set of observations, where the $y_i \in \mathbb{R}$.

The isotonic regression problem a weighted least-squares fit ${\hat y_i} \approx y_{i}$ for all $i$, subject to the constraint that ${\hat y}_i \leq {\hat y}_j$ whenever $x_i \leq x_j$.

$$ \min \sum_{i=1}^{n} w_i ({\hat y_i}-y_i)^2 \text{ subject to } {\hat y_i} \leq {\hat y_j} {\text{ for all }} (i,j)\in E$$

where $ E={(i,j):x_i \leq x_j} $ specifies the partial ordering of the observed inputs $x_i$.

Regression in Rust

Isotonic regression is useful for enforcing a monotonic fit to the data, e.g. if your data has a monotonic trend but output from your model is not monotonic. In that situation you can use isotonic regression as a smoother for the data or a post-processing step to force your model prediction to be monotonic. This property can be used for various ordination techniques, e.g. non-metric multidimensional scaling (MDS).

The solution of the above isotonic regression problem can be obtained by the pool-adjacent-violators (PAV) algorithm [1]. This algorithm requires linear time and memory to run. Sometime ago, I implemented PAV for solving isotonic regression in Julia which allowed to provide a non-metric extension to metric MDS implementation in MultivariateStats.jl package.

As I had started to play around with Rust, I decided to implement some of machine learning (ML) algorithms for a Rust ML library linfa. This library already has implementations of basic linear regression methods which I’m going to use for comparison in this post.

Preable

As I use Jupyter notebookes for most of my posts, I installed Rust kernel provided by Evcxr project. I use my fork of linfa library where I also added a well known cars dataset for testing 1D regression algorithms.

I begin by adding necessary dependencies in my notebook. I cloned my linfa repository, branch cars-iso, to define local Rust dependencies for this notebook.

:dep ndarray = {version ="^0.15.4"}
:dep linfa = { path="./linfa", features = ["openblas-system", "serde"] }
:dep linfa-datasets = { path="./linfa/datasets",  features = ["diabetes", "cars"] }
:dep linfa-linear  = { path="./linfa/algorithms/linfa-linear" }

I link necessary crates and import required structures and traits.

extern crate linfa;
extern crate linfa_linear;
extern crate linfa_datasets;
extern crate ndarray;

use ndarray::{array, Array1, ArrayBase, Dim, OwnedRepr, Ix1, Ix2, Axis, ArrayView};
use linfa::prelude::SingleTargetRegression;
use linfa::traits::{Fit, Predict};
use linfa::dataset::{AsSingleTargets, DatasetBase, DatasetView};
use linfa_linear::{LinearRegression, IsotonicRegression};

Regression

Now, I load the cars dataset from the linfa_datasets crate.

let dataset = linfa_datasets::cars();

Performing regression on the dataset is a straight forward operation. First, fit the particular regression model, and then use this model for prediction on the same data. Following code shows how to create OLS and isotonic regression model and make predictions.

let lreg = LinearRegression::default().fit(&dataset)?;
let lreg_pred = lreg.predict(&dataset);

let iso = IsotonicRegression::default().fit(&dataset)?;
let iso_pred = iso.predict(&dataset);

Visualization

For visuallization, I use plotly crate for visualazing the regression results. It has a good support of ndarrays and Jupyter environment.

:dep plotly = {version = ">= 0.7.0", features=["plotly_ndarray"]}
:dep itertools-num = {version = "^0.1.3" }

extern crate plotly;
extern crate itertools_num;

use plotly::common::{Line, LineShape, Marker, Mode, Title};
use plotly::{Plot, Scatter};
use plotly::ndarray::ArrayTraces;
use plotly::layout::{Layout, Legend};
use itertools_num::linspace;

In addition, I generate a collection of predictions separate from the original data.

let (xx, yy) = {
    let n = 150;
    let xx = linspace::<f64>(0., 30., n).collect::<Vec<_>>();
    let xv = ArrayView::from_shape((n,1), &xx).unwrap();
    let ds = DatasetBase::new(xv, dataset.targets().to_owned());
    let yy = iso.predict(&ds);
    (xx, yy)
};

Now, put original data and all regression results together on one plot.

{
let xs = dataset.records().index_axis(Axis(1), 0).to_owned();
let ys = dataset.targets().to_owned();

let trace1 = Scatter::from_array(xs.to_owned(), ys)
                      .mode(Mode::Markers).name("Data");
let trace2 = Scatter::from_array(xs.to_owned(), lreg_pred.to_owned())
                      .mode(Mode::Lines).name("Linear");
let trace3 = Scatter::from_array(xs.to_owned(), iso_pred.to_owned())
                      .mode(Mode::LinesMarkers).name("Isotonic");
let trace4 = Scatter::from_array(Array1::from(xx), yy)
                      .mode(Mode::Lines).name("Isotonic (Pred)");

let layout = Layout::new()
    .height(600)
    .x_axis(plotly::layout::Axis::new().title(Title::new("Speed (mph)")))
    .y_axis(plotly::layout::Axis::new().title(Title::new("Stopping distance (ft)")));

let mut plot = Plot::new();
plot.add_trace(trace1);
plot.add_trace(trace2);
plot.add_trace(trace3);
plot.add_trace(trace4);
plot.set_layout(layout);
plot.notebook_display();
}

References

[1] Best, M.J., Chakravarti, N. Active set algorithms for isotonic regression; A unifying framework. Mathematical Programming 47, 425–439 (1990).