# Trend and Breakpoint analyses in MHW metrics

#### Francois Thoral

#### 2024-08-30

Source:`vignettes/MHW_metric_trends.Rmd`

`MHW_metric_trends.Rmd`

## Overview

One may see in other vignettes how to download and prepare OISST data and how to detect MHWs in gridded data. In this vignette we will use data subsetted around the New Zealand (NZ) Exclusive Economic Zone (EEZ) for our example to calculate trends and breakpoints for bioregions (from Spalding et al. (2007)) and seasons as shown in Thoral et al. (2022).

## MEOW

The Marine Ecosystems of the World (MEOW) shapefile may be downloaded here. Extract the contents of the download to a convenient location before use with the following code chunks. For more information on the MEOW please see Spalding et al. (2007).

```
# Load and process the MEOW shapefile
# NB: Change your file path to where the files were unzipped
MEOW_NZ <- st_read('~/Desktop/meow_ecos.shp') %>%
dplyr:::filter(PROVINCE %in% c('Northern New Zealand', 'Southern New Zealand',
'Subantarctic New Zealand')) %>%
st_make_valid() %>%
st_transform(crs = 4326) %>%
st_shift_longitude() %>%
mutate(ECOREGION = factor(ECOREGION, levels = c("Kermadec Island", "Three Kings-North Cape",
"Northeastern New Zealand", "Central New Zealand",
"Chatham Island", "South New Zealand",
"Bounty and Antipodes Islands",
"Snares Island","Auckland Island","Campbell Island")))
# Find max lon/lat ranges
lon_range <- range(sf::st_coordinates(MEOW_NZ$geometry)[,1])
lat_range <- range(sf::st_coordinates(MEOW_NZ$geometry)[,2])
# Expand a bit to make sure all necessary pixels are downloaded
lon_range <- c(lon_range[1]-0.25, lon_range[2]+0.25)
lat_range <- c(lat_range[1]-0.25, lat_range[2]+0.25)
```

Once we have our MEOW data for NZ ready it’s time to find out which
pixels specifically we will want to analyse. The process of assigning
bioregions to pixels is simplified by using the `sf`

(Pebesma (2018)) package.

```
# Create OISST grid
lon_lat_OISST <- base::expand.grid(seq(0.125, 359.875, by = 0.25),
seq(-89.875, 89.875, by = 0.25)) %>%
dplyr::rename(lon = Var1, lat = Var2) %>%
dplyr::arrange(lon, lat) %>%
st_as_sf(coords = c('lon', 'lat'), crs = 4326)
# Join OISST grid to MEOW spatial and filter out pixels not within NZ EEZ
lon_lat_NZ <- st_join(lon_lat_OISST, MEOW_NZ) %>%
dplyr::select(geometry, ECOREGION, PROVINCE) %>%
dplyr::filter(!is.na(ECOREGION))
# Convert the coordinates back to a tibble for further use
# NB: Should be 6,398 pixels
lon_lat_NZ_df <- lon_lat_NZ %>%
mutate(lon = sf::st_coordinates(.)[,1],
lat = sf::st_coordinates(.)[,2]) %>%
mutate(lon = as.numeric(lon), lat = as.numeric(lat)) %>%
data.frame() %>%
dplyr::select(-geometry)
```

## Download data and calculate climatologies

We begin by downloading daily OISST v2.1 data for all pixels around
New Zealand (lat 25-55°S, lon 160-190°E) and use the functions
`heatwaveR::ts2clim()`

and
`heatwaveR::detect_event()`

to get the climatology from the
time series. Note that this full process will take roughly one hour on a
desktop computer given the number of pixels and days. Even though we
know the exact coordinates of the pixels that we want, it is still
faster to use the `rerddap::griddap()`

function to extract a
bounding box around NZ rather than to use the functions in the
** raster** package to extract the EEZ
polygons.

```
# Get SST for around NZ + Islands and for all years
OISST_sub_dl <- function(time_df){
OISST_dat <- griddap(x = "ncdcOisst21Agg",
url = "https://coastwatch.pfeg.noaa.gov/erddap/",
time = c(time_df$start, time_df$end),
zlev = c(0, 0),
latitude = lat_range,
longitude = lon_range,
fields = c("sst"))$data %>%
mutate(time = as.Date(stringr::str_remove(time, "T00:00:00Z"))) %>%
dplyr::rename(t = time, temp = sst) %>%
dplyr::select(lon, lat, t, temp) %>%
na.omit()
}
# The span of years to download
dl_years <- data.frame(date_index = 1:5,
start = as.Date(c("1982-01-01", "1990-01-01",
"1998-01-01", "2006-01-01", "2014-01-01")),
end = as.Date(c("1989-12-31", "1997-12-31",
"2005-12-31", "2013-12-31", "2021-12-31")))
# Download the data
# NB: Takes ~21 minutes on a desktop computer
# NB: Contains 175,208,352 rows, 4 columns, and uses ~15 GB of RAM
OISST_data <- dl_years %>%
group_by(date_index) %>%
group_modify(~OISST_sub_dl(.x)) %>%
ungroup() %>%
dplyr::select(lon, lat, t, temp)
# Filter out only the NZ pixels
# NB: Contains 87,195,152 rows
OISST_data_NZ <- left_join(lon_lat_NZ_df, OISST_data, by = c("lon", "lat")) %>% drop_na()
# Get the climatologies only
clim_only <- function(df){
# First calculate the climatologies
clim <- ts2clm(data = df, climatologyPeriod = c("1982-01-01", "2011-12-31"))
# Then the events
event <- detect_event(data = clim)
# Return only the climatology metric dataframe of results
return(event$climatology)
}
# Extract the climatology values
# NB: Takes ~21 minutes on a desktop computer
OISST_clim <- OISST_data_NZ %>%
group_by(ECOREGION, PROVINCE, lon, lat) %>%
group_modify(~clim_only(.x)) %>%
drop_na()
```

## Summarise MHW metrics by year, season, and bioregion

```
# Get summary
MHW_summary <- OISST_clim %>%
dplyr::select(-c(doy, thresh, threshCriterion, durationCriterion, event)) %>%
mutate(season_num = month(as.Date(floor_date(t, unit = "season"))),
year = year(t)) %>%
mutate(season = recode_factor(season_num, `12` = "Summer", `3` = "Autumn",
`6` = "Winter", `9` = "Spring")) %>%
mutate(intensity = temp-seas) %>%
group_by(ECOREGION, PROVINCE, lon, lat, year, season) %>%
summarise(nevents = length(unique(event_no)),
nMHWdays = length(t),
int_cumulative = sum(intensity),
int_mean = mean(intensity),
int_max = max(intensity), .groups = "drop") %>%
pivot_longer(-c(ECOREGION, PROVINCE, lon, lat, year, season),
names_to = 'metrics', values_to = 'values')
# Save for future use
# NB: 10.8 MB
saveRDS(MHW_summary,"~/Desktop/MHW_summary_NZ_1982_2021_OISST.Rds")
```

## Trends in MHW metrics at NZ scale + Bioregions

```
# Load data from previous code chunk
MHW_summary_NZ <- readRDS("~/Desktop/MHW_summary_NZ_1982_2021_OISST.Rds")
# Get count of unique pixels
MHW_summary_NZ_npix <- MHW_summary_NZ %>%
dplyr::select(lon, lat) %>%
dplyr::distinct()
# Get annual summaries
MHW_annual_summary_NZ <- MHW_summary_NZ %>%
pivot_wider(names_from = metrics, values_from = values) %>%
group_by(year) %>%
summarize(Number_MHW_days = sum(nMHWdays)/nrow(MHW_summary_NZ_npix),
Nevents = sum(nevents)/nrow(MHW_summary_NZ_npix),
Mean_Intensity = mean(int_mean),
Maximum_Intensity = mean(int_max),
Cumulative_Intensity = sum(int_cumulative)/nrow(MHW_summary_NZ_npix), .groups = "drop") %>%
complete(year = 1982:2021) %>%
pivot_longer(-year, names_to = 'Metrics', values_to = 'values') %>%
mutate(Metrics = factor(Metrics, levels = c("Number_MHW_days", "Nevents", "Mean_Intensity",
"Maximum_Intensity", "Cumulative_Intensity")))
# Prepare prettier labels
metrics_labs <- c(`Number_MHW_days` = "Number of MHW days",
`Nevents` = "Number of Events",
`Mean_Intensity` = "Mean Intensity (°C)",
`Maximum_Intensity` = "Maximum Intensity (°C)",
`Cumulative_Intensity` = "Cumulative Intensity (°C Days)")
# Create figure
p <- ggplot(MHW_annual_summary_NZ, aes(x = year, y = values, col = Metrics)) +
geom_line(size = 0.5) +
geom_smooth(se = T) +
scale_colour_viridis(begin = 0, end = 0.75, option = "viridis", discrete = T) +
facet_wrap(Metrics ~ ., scales = 'free', labeller = as_labeller(metrics_labs), ncol = 2) +
labs(x = "Year", Y = NULL) +
theme_bw() +
theme(legend.position = "none")
p
# Coerce to interactive plotly format
# plotly::ggplotly(p)
```

It seems like some of the metrics are going up, and that there is a
potential acceleration after the years 2010. We can then calculate non
parametric linear trends (Sen’s slope and Mann-Kendall test) and
breakpoints (Pettitt test) using the `trend`

(Pohlert (2020)) package as well as the more
usual and parametric `lm`

function.

```
# Get annual trends
MHW_annual_trends_NZ <- MHW_annual_summary_NZ %>%
group_by(Metrics) %>%
nest() %>%
mutate(ts_out = purrr::map(data, ~ts(.x$values, start = 1982, end = 2021, frequency = 1))) %>%
mutate(sens = purrr::map(ts_out, ~sens.slope(.x, conf.level = 0.95)),
pettitt = purrr::map(ts_out, ~pettitt.test(.x)),
lm = purrr::map(data, ~lm(values ~ year, .x))) %>%
mutate(Sens_Slope = as.numeric(unlist(sens)[1]),
P_Value = as.numeric(unlist(sens)[3]),
Change_Point_Year = time(ts_out[[1]])[as.numeric(unlist(pettitt)[3])],
Change_Point_pvalue = as.numeric(unlist(pettitt)[4]),
lm_slope = unlist(lm)$coefficients.year) %>%
# Add step of cutting time series in 2 using Change_Point_Year
mutate(pre_ts = purrr::map(ts_out, ~window(.x, start = 1982, end = Change_Point_Year)),
post_ts = purrr::map(ts_out, ~window(.x, start = Change_Point_Year, end = 2021))) %>%
# Add step of calculating sen's slope and p-value to pre and post change point year
mutate(sens_pre = purrr::map(pre_ts, ~sens.slope(.x, conf.level = 0.95)),
Sens_Slope_pre = as.numeric(unlist(sens_pre)[1]), P_Value_pre = as.numeric(unlist(sens_pre)[3]),
sens_post = purrr::map(post_ts, ~sens.slope(.x, conf.level = 0.95)),
Sens_Slope_post = as.numeric(unlist(sens_post)[1]),
P_Value_post = as.numeric(unlist(sens_post)[3])) %>%
dplyr::select(Metrics, Sens_Slope, P_Value, Change_Point_Year, Change_Point_pvalue, lm_slope,
Sens_Slope_pre, P_Value_pre, Sens_Slope_post, P_Value_post)
# Create table
kable(MHW_annual_trends_NZ, caption = "Table 1 - Sens's Slope and p-value.") %>%
kable_classic()
```

In fact, we see significant change-point years for the number of MHW days (2012), number of events (1997) and cumulative intensity (2012). The slopes pre and post change-point are documenting the recent change in the metrics which complement the trend on the full time series. What are the reason behind the change-points?

One can be more interested in finding patterns in MHW trends between ecoregions.

```
# Count pixels per realm
MHW_summary_NZ_npix_realms <- MHW_summary_NZ %>%
group_by(lon, lat, ECOREGION) %>%
tally() %>%
group_by(ECOREGION) %>%
tally() %>%
rename(npix = n)
MHW_summary_NZ_npix_realms
# Summarise by realm
MHW_summary_NZ_realms <- left_join(MHW_summary_NZ, MHW_summary_NZ_npix_realms, by = 'ECOREGION') %>%
pivot_wider(names_from = metrics,values_from = values) %>%
group_by(year, ECOREGION, season, npix) %>%
summarize(Number_MHW_days = sum(nMHWdays),
Nevents = sum(nevents),
Mean_Intensity = mean(int_mean),
Maximum_Intensity = mean(int_max),
Cumulative_Intensity = sum(int_cumulative), .groups = "drop") %>%
mutate(Number_MHW_days = Number_MHW_days/npix,
Nevents = Nevents/npix,
Cumulative_Intensity = Cumulative_Intensity/npix) %>%
group_by(ECOREGION, season) %>%
complete(year = 1982:2021) %>%
dplyr::select(-npix) %>%
pivot_longer(-c(year, ECOREGION, season), names_to = 'Metrics', values_to = 'values') %>%
mutate(Metrics = factor(Metrics, levels = c("Number_MHW_days", "Nevents", "Mean_Intensity",
"Maximum_Intensity", "Cumulative_Intensity"))) %>%
replace(is.na(.), 0)
# Plot the results
p <- MHW_summary_NZ_realms %>%
dplyr::filter(Metrics == 'Number_MHW_days') %>%
ggplot(aes(x = year, y = values, colour = season)) +
geom_line(size = 0.5) +
geom_smooth(se = T) +
labs(x = "Year", y = "Number of MHW days") +
scale_colour_viridis(begin = 0, end = 0.75, option = "inferno", discrete = T) +
guides(colour = guide_legend(override.aes = list(shape = 15, size = 2))) +
facet_wrap(ECOREGION~., ncol = 3) +
theme_bw() +
theme(legend.position = c(0.8,0.05),
legend.title = element_blank())
p
# Coerce to plotly format
# plotly::ggplotly(p)
```

We could also show similar graphs for other metrics, like number of events, min, max or cumulative intensity.

```
# Get trends
MHW_trends_NZ_realms <- MHW_summary_NZ_realms %>%
group_by(Metrics, ECOREGION, season) %>%
nest() %>%
mutate(ts_out = purrr::map(data, ~ts(.x$values, start = 1982, end = 2021, frequency = 1))) %>%
mutate(sens = purrr::map(ts_out, ~sens.slope(.x, conf.level = 0.95)),
pettitt = purrr::map(ts_out, ~pettitt.test(.x)),
lm = purrr::map(data, ~lm(values ~ year,.x))) %>%
mutate(Sens_Slope = as.numeric(unlist(sens)[1]),
P_Value = as.numeric(unlist(sens)[3]),
Change_Point_Year = time(ts_out[[1]])[as.numeric(unlist(pettitt)[3])],
Change_Point_pvalue = as.numeric(unlist(pettitt)[4]),
lm_slope = unlist(lm)$coefficients.year) %>%
# Add step of cutting time series in 2 using Change_Point_Year
mutate(pre_ts = purrr::map(ts_out, ~window(.x, start = 1982, end = Change_Point_Year)),
post_ts = purrr::map(ts_out, ~window(.x, start = Change_Point_Year, end = 2021))) %>%
# Add step of calculating sen's slope and p-value to pre and post change point year
mutate(sens_pre = purrr::map(pre_ts, ~sens.slope(.x, conf.level = 0.95)),
Sens_Slope_pre = as.numeric(unlist(sens_pre)[1]),
P_Value_pre = as.numeric(unlist(sens_pre)[3]),
sens_post = purrr::map(post_ts, ~sens.slope(.x, conf.level = 0.95)),
Sens_Slope_post = as.numeric(unlist(sens_post)[1]),
P_Value_post = as.numeric(unlist(sens_post)[3])) %>%
ungroup() %>% # To remove Season column
dplyr::select(ECOREGION, Metrics, season, Sens_Slope, P_Value, Change_Point_Year, Change_Point_pvalue,
lm_slope, Sens_Slope_pre, P_Value_pre, Sens_Slope_post, P_Value_post) %>%
dplyr::filter(Metrics == 'Number_MHW_days')
# Create table
kable(MHW_trends_NZ_realms, caption = "Table 2 - Sens's Slope and p-value.") %>%
kable_classic()
```

According to the previous tables, the trend and break point analyses are useful to asses changes in MHW metrics at these scales. But MHWs are by definition discrete events in space and time. So some can argue that having a pixel and event-based approach is preferable. Can we still apply this methodology on pixel scale?

## Pixel-based case scenario

We use the time series of SST available within the
`heatwaveR`

package (sst_WA) and investigate the trend
analysis in 5 metrics by grouping metric values per year as we have done
above.

```
# Simple event detection
MHW_WA <- detect_event(ts2clm(sst_WA, climatologyPeriod = c("1982-01-01", "2011-12-31")))
# Get annual metrics
MHW_WA_metrics_year <- MHW_WA$climatology %>%
dplyr::filter(!is.na(event_no)) %>%
mutate(year = year(t),
intensity = temp-seas) %>%
group_by(event, year) %>%
summarise(nevents = length(unique(event_no)),
nMHWdays = length(t),
int_cumulative = sum(intensity),
int_mean = mean(intensity),
int_max = max(intensity), .groups = "drop") %>%
# Need to complete time series in case of years with no MHWs in order to get full ts in trend analysis
complete(year = 1982:2018) %>%
dplyr::select(-event) %>%
pivot_longer(-c(year), names_to = 'Metrics', values_to = 'values') %>%
replace(is.na(.), 0)
# Plot the data
p <- ggplot(MHW_WA_metrics_year, aes(x = year, y = values)) +
geom_point() +
geom_line() +
geom_smooth(se = T, method = 'lm') +
labs(x = "Year") +
facet_wrap(~Metrics, scales = 'free') +
theme_bw() +
theme(legend.position = "none")
p
# Coerce to plotly
# plotly::ggplotly(p)
```

The `lm`

regression seems to detect some upward trends in
the metrics. What does Mann-Kendall have to say about these?

```
MHW_WA_trends <- MHW_WA_metrics_year %>%
group_by(Metrics) %>%
nest() %>%
mutate(ts_out = purrr::map(data, ~ts(.x$values, start = 1982, end = 2018, frequency = 1))) %>%
mutate(sens = purrr::map(ts_out, ~sens.slope(.x, conf.level = 0.95)),
pettitt = purrr::map(ts_out, ~pettitt.test(.x)),
lm = purrr::map(data,~lm(values ~ year, .x))) %>%
mutate(Sens_Slope = as.numeric(unlist(sens)[1]),
P_Value = as.numeric(unlist(sens)[3]),
Change_Point_Year = time(ts_out[[1]])[as.numeric(unlist(pettitt)[3])],
Change_Point_pvalue = as.numeric(unlist(pettitt)[4]),
lm_slope = unlist(lm)$coefficients.year) %>%
# Add step of cutting time series in 2 using Change_Point_Year
mutate(pre_ts = purrr::map(ts_out,~window(.x, start = 1982, end = Change_Point_Year)),
post_ts = purrr::map(ts_out,~window(.x, start = Change_Point_Year, end = 2018))) %>%
# Add step of calculating sen's slope and p-value to pre and post change point year
mutate(sens_pre = purrr::map(pre_ts, ~sens.slope(.x, conf.level = 0.95)),
Sens_Slope_pre = as.numeric(unlist(sens_pre)[1]),
P_Value_pre = as.numeric(unlist(sens_pre)[3]),
sens_post = purrr::map(post_ts, ~sens.slope(.x, conf.level = 0.95)),
Sens_Slope_post = as.numeric(unlist(sens_post)[1]),
P_Value_post = as.numeric(unlist(sens_post)[3])) %>%
dplyr::select(Metrics, Sens_Slope, P_Value, Change_Point_Year, Change_Point_pvalue,
lm_slope, Sens_Slope_pre, P_Value_pre, Sens_Slope_post, P_Value_post)
# Create table
kable(MHW_WA_trends, caption = "Table 3 - Sens's Slope and p-value.") %>%
kable_classic()
```

For some reason, the Mann-Kendall and Sen Slope analyses don’t seem
to be useful here as they return suspiciously too low trends (vs
lm_slope, which is the traditional linear regression using
`lm`

) and quite high p-values

Summarizing MHW metrics by year in a given pixel will likely result in some years with no MHWs, hence bringing some 0 values into the time series. I am not too sure how sensitive the MK and Pettitt (breakpoint) analyses are to 0 values, something to look into in the future.

## References

*The R Journal*10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.

*Trend: Non-Parametric Trend Tests and Change-Point Detection*. https://CRAN.R-project.org/package=trend.

*BioScience*57 (7): 573–83.

*Scientific Reports*12 (1): 1–13.