---
title: "Introduction to MazamaRollUtils"
author: "Jonathan Callahan, Mazama Science"
date: "September 22, 2021"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to MazamaRollUtils}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE,
  fig.width = 7,
  fig.height = 7
)
```

## Background

Analysis of time series data often involves applying "rolling" functions to calculate,
_e.g._ a "moving average". These functions are straightforward to write in any language
and it makes sense to have C++ versions of common rolling functions available
to R as they dramatically speed up calculations. Several packages exist that 
provide some version of this functionality:

* [zoo](https://cran.r-project.org/package=zoo) -- 
core R package with a specific data model
* [seismicRoll](https://cran.r-project.org/package=seismicRoll) -- rolling functions focused on seismology
* [RcppRoll](https://cran.r-project.org/package=RcppRoll) --
rolling functions for basic statistics

Our goal in creating a new package of C++ rolling functions is to build up a
suite of functions useful in environmental time series analysis. We want these
functions to be available in a neutral environment with no underlying data model. 
The functions are as straightforward to use as is reasonably possible with a 
target audience of data analysts at any level of R expertise.

## Installation

Install from CRAN with:

```
install.packages('MazamaRollUtils')
```

Install the latest version from GitHub with:

```
devtools::install_github("MazamaScience/MazamaRollUtils")
```

## Features

### Predictable Names

Many of the rolling functions in **MazamaRollUtils** have the names of familiar 
**R** functions with `roll_` prepended. These functions calculate rolling versions of
the expected statistic:

* `roll_max()`
* `roll_mean()`
* `roll_median()`
* `roll_min()`
* `roll_prod()`
* `roll_sd()`
* `roll_sum()`
* `roll_var()`

Additional rolling functions with no equivalent in base R include:

* `roll_MAD()` -- Median Absolute Deviation
* `roll_hampel()` -- Hampel filter

Other functions wrap the rolling functions to provide enhanced 
functionality. These are not required to return vectors of the same length as
the input data.

* `findOutliers()` -- returns indices of outlier values identified by `roll_hampel()`.

### Common Arguments

All of the `roll_~()` functions accept the same arguments where appropriate:

* `x` -- Numeric vector input.
* `width` -- Integer width of the rolling window.
* `by` -- Integer shift to use when sliding the window to the next location
align Character position of the return value within the window. One of: `"left" | "center" | "right"`.
* `na.rm` -- Logical specifying whether \code{NA} values should be removed 
before the calculations within each window.

The `roll_mean()` function also accepts:

* `weights` -- Numeric vector of size `width` specifying each window index weight. 
If `NULL`, unit weights are used.

### Predictable Return Length

The output of each `roll_~()` function is guaranteed to have the same length as
the input vector, with varying stretches of `NA` at one or both ends 
depending on arguments `width`, `align` and `na.rm`. This makes it easy to 
align the return values with the input data.

## Examples

The example dataset included in the package contains a tiny amount of data but
suffices to demonstrate usage of package functions.

### Basic Rolling Means

```{r example_1}
library(MazamaRollUtils)

# Extract vectors from our example dataset
t <- example_pm25$datetime
x <- example_pm25$pm25

# Plot with 3- and 24-hr rolling means
layout(matrix(seq(2)))
plot(t, x, pch = 16, cex = 0.5)
lines(t, roll_mean(x, width = 3), col = 'red')
title("3-hour Rolling Mean")
plot(t, x, pch = 16, cex = 0.5)
lines(t, roll_mean(x, width = 24), col = 'red')
title("24-hour Rolling Mean")
layout(1)
```

### Using 'width', 'align', 'by' and 'na.rm'

The next example uses all of the standard arguments to quickly calculate a daily
maximum value and spread it out across all indices.

```{r example_2}
library(MazamaRollUtils)

# Extract vectors from our example dataset
t <- example_pm25$datetime
x <- example_pm25$pm25

# Calculate the left-aligned 24-hr max every hour, ignoring NA values
max_24hr <- roll_max(x, width = 24, align = "left", by = 1, na.rm = TRUE)

# Calculate the left-aligned daily max once every 24 hours, ignoring NA values
max_daily_day <- roll_max(x, width = 24, align = "left", by = 24, na.rm = TRUE)

# Spread the max_daily_day value out to every hour with a right-aligned look "back"
max_daily_hour <- roll_max(max_daily_day, width = 24, align = "right", by = 1, na.rm = TRUE)

# Plot with 3- and 24-hr rolling means
layout(matrix(seq(3)))
plot(t, max_24hr, col = 'red')
points(t, x, pch = 16, cex = 0.5)
title("Rolling 24-hr Max")
plot(t, max_daily_day, col = 'red')
points(t, x, pch = 16, cex = 0.5)
title("Daily 24-hr Max")
plot(t, max_daily_hour, col = 'red')
points(t, x, pch = 16, cex = 0.5)
title("Hourly Daily Max")
layout(1)
```

### Using roll_mean() with 'weights'

The `roll_mean()` function accepts a `weights` argument that can be used to
create a _weighted moving average_. The next example demonstrates creation of
an exponential weighting function to be applied to our data.

```{r example_3}
library(MazamaRollUtils)

# Extract vectors from our example dataset
t <- example_pm25$datetime
x <- example_pm25$pm25

# Create weights for a 9-element exponentially weighted window
#   See:  https://en.wikipedia.org/wiki/Moving_average
N <- 9
alpha <- 2/(N + 1)
w <- (1-alpha)^(0:(N-1))
weights <- rev(w)          # right aligned window

EMA <- roll_mean(x, width = N, align = "right", weights = weights)

# Plot Exponential Moving Average (EMA)
plot(t, x, pch = 16, cex = 0.5)
lines(t, EMA, col = 'red')
title("9-Element Exponential Moving Average")
```