A Mathematical Translation of LOESS (Local Polynomial Regression)

3 minute read


When conducting exploratory data analysis, scatter plots are commonly used tools to visualise a relationship between two variables.

mtcars %>% 
  ggplot(aes(mpg, wt)) +
  geom_point() +
  labs(title = "A scatterplot")

A common geom_ in ggplot2 is geom_smooth which “Aids the eye in seeing patterns in the presence of overplotting”.

mtcars %>% 
  ggplot(aes(mpg, wt)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  labs(title = "A scatterplot")

## `geom_smooth()` using method = 'loess'

We can see this is a great smoother to cut through the noise and help interpret the data. For less than 1000 observation this is done using a method called loess.

LOESS also has applications in time series decomposition (trying to understand the trend in noisy time series data).

Given how common this is, it would pay to understand how it works…

Mathematical Translation

LOESS (LOcal Polynomial RegrESSion, or LOWESS “Locally Weighted Scatter plot Smoothing”) is a commonly used technique for smoothing a bivariate set of points.

LOESS draws a smooth line through your points.

Given bivariate observations (xi, yi), i = 1, 2, .., m the basic model that can be fitted may be written as yi = g(xi) + ϵi, i = 1, …, n where ϵi = NID(0, σ2) and g(x) is a local polynomial of degree λ ≥ 0, which may be written as g(x) = β0(x) + β1(x)x + … + βλ(x)xλ.

This line is created from a statistical model that can be a simple average of the points, a straight line or a squiggly line.

The parameters β0(x), β1(x), …, βλ(x) are estimated by weighted least squares for each value of x. The weight function weights the data, (xi, yi), so that data values near to x have greater weight than those farther away from x. We can use the tricube weight function:

T(z) = (1 − z3)3z ≤ 1 or 0 z > 1

As we go along the x-axis to draw our squiggle, at any given point, the model tries to draw a small part of that line. The model places stronger emphasis on points nearby that x-value by applying a special weight formula.

with wi(x) = T(Δi(x)/Δ(x, α)) to define the local neighborhood weights for the data at the point x. Here Δi(x) = |x − xi| and Δ(x, α) controls the amount of smoothing (larger values of Δ(x, α) result in more smoothing). As Δ(x, α) → ∞, wi(x) → 1 , for each i = 1, 2, …, n, and the local linear model reduces to the standard parametric polynomial regression.

For 0 < α ≤ 1, Δ(x, α) is the distance to the qth nearest neighbor where q = [αn] ([.] is the integer part function). Hence, Δ(x, α) = Δ(q)(x), where Δ(q)(x) denotes the qth largest value of Δi(x), i = 1, …, n, . For , α > 1, Δ(x, α) = αΔ(n)(x) . It follows that as α → ∞ , the local linear model reduces to a parametric polynomial regression of degree λ .

To define what is meant by ‘nearby’, there is a special window created around any given x-value. If we make this window really small, only the points close to our current position on the x-axis influence where the smoothed line gets drawn, so the line will hug closely to the actual data and produce a noisy, squiggly line. If we relax this and make the window large, more points will fight for influence and as a result we will get a much smoother line.

In action

When we adjust α the smoothing parameter, this controls the percentage of data included in the window around any point on the x-axis when we fit the model.

spans <- c(0.2, 0.3, 0.4, 0.6, 0.8, 1)
p <- list()

for(i in seq_along(spans)){
  loess_fit <- loess(mtcars$wt ~ mtcars$mpg, span = spans[i] , degree = 2)
  out <- data.frame(wt = mtcars$wt, 
                    mpg = mtcars$mpg, 
                    fitted = loess_fit$fitted)
 p[[i]] <- ggplot(out, aes(mpg, wt)) +
    geom_point() +
    geom_line(data = out, aes(mpg, fitted), col = "blue") +
   labs(title = paste("Smoothing Factor:", spans[i]))

gridExtra::grid.arrange(p[[1]], p[[2]], p[[3]], p[[4]], p[[5]], p[[6]], nrow = 2)