---
title: "Get started with funbiogeo"
output: rmarkdown::html_vignette
bibliography: refs.bib
vignette: >
  %\VignetteIndexEntry{Get started with funbiogeo}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: inline
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse   = TRUE,
  comment    = "#>",
  fig.width  = 7,
  fig.height = 7,
  out.width  = "100%",
  dpi        = 100
)
```

The aim of the `funbiogeo`package is to help users streamline the workflows in **fun**ctional **biogeo**graphy [@Violle_emergence_2014].
It helps filter sites, species, and traits based on their trait coverages.
It also provide default diagnostic plots and standard tables summarizing input
data. This vignette aims to be an introduction to the most commonly used
functions.

The vignette is a worked through real world example of a functional biogeography workflow using the internal dataset on European mammals provided in `funbiogeo`.

```{r setup}
library("funbiogeo")
library("ggplot2")
```

## Provided data

We are interested in mapping the functional traits of mammals of Europe.
We used range maps data
[from the IUCN](https://www.iucnredlist.org/resources/spatial-data-download)
and trait data from the
[PanTHERIA database](https://esapubs.org/archive/ecol/E090/184/)
[@Jones_PanTHERIA_2009].
The initial extraction scripts are available
on [GitHub](https://github.com/FRBCesab/eumammals).

The functions in `funbiogeo` mostly leverage three different objects:

- the **species x traits** `data.frame`, which describes the traits (in columns) of the species (in rows) (`species_traits` dataset in `funbiogeo`);
- the **site x species** `data.frame`, which describes the presence/absence, the abundance, or the cover of species (in columns) across sites (in rows) (`site_species` dataset in `funbiogeo`);
- the **site x locations** object, which describes the physical location of sites through an `sf` object (`site_locations` dataset in `funbiogeo`).

You can load the example data using the `data(..., package = "funbiogeo")` call:

```{r load-datasets}
data("site_species", package = "funbiogeo")
data("site_locations", package = "funbiogeo")
data("species_traits", package = "funbiogeo")
```

In the following sections we'll describe in detail these three provided datasets in the package.


### Species x Traits

This object contains traits values for multiple traits (as columns) for studied
species (as rows). It should be a `data.frame`. The first column should contain
species names and the other columns contain trait values.

Note that we'll be talking about **species** throughout this vignette
and in the arguments of `funbiogeo`, but the package doesn't make any assumption
on the biological level. It can be individuals, populations, strains, species,
genera, families, etc. The important fact is that you should have trait data
for the level at which you want to work.

Let's examine the `species_traits` data included in the package:

```{r preview-species-traits, echo = FALSE}
knitr::kable(
  head(species_traits, 4), 
  caption   = "First lines of species x traits dataset", 
  align     = c("r", "r", "r", "r", "r", "r")
)
```

The first column **`"species"`** contains species names, while the next 6
columns contain different traits for all species. Note that the species names
are anonymized because the IUCN data
[cannot be redistributed](https://www.iucnredlist.org/terms/terms-of-use#4.%20No%20Reposting%20or%20Redistribution).

Let's look at a summary of the trait dataset:

```{r summary-trait-dataset}
summary(species_traits)
```

From there we can see that there are 149 included species and some missing trait
values. For example, we don't have body mass information for 26 species.

Note that to use your own species by traits `data.frame`, it should follow
similar structure with the first column being named **`"species"`** and the
other ones containing traits.


### Site x Species

This object contains species occurrences/abundance/coverage at sites  
of the study area. It is a `data.frame`. The first column, **`"site"`**,
contains site names while the other columns contains the abundance of each
species across sites.

Note that here we are talking about sites in an abstract way.
These can be plots, assemblages, of whatever collections of species you're
interested in.

The package `funbiogeo` comes with the example dataset on European mammals
`site_species`.
Let's look at it:

```{r preview-sites-species, echo = FALSE}
knitr::kable(
  head(site_species[ , 1:4], 10), 
  caption   = "Format of site x species dataset", 
  align     = c("c", "c", "c", "c")
)
```

The example dataset contains the occurrence of the 149 mammal species
across 1,505 sites (grid cells of 0.5° x 0.5° resolution).

Note that to use your own site by species `data.frame`, it should follow similar
structure with the first column being named **`"sites"`** and the other ones
containing presence information of species across sites.


### Site x Locations

This object contains the geographical location of the sites.
It should be an `sf` object
from the [`sf` package](https://cran.r-project.org/package=sf).
These are spatial R objects that describe geographical locations. The sites can
have arbitrary shapes: points, regular polygons, irregular polygons, or even
line transects! To make sure that your data is well plotted you should specify
the Coordinate Reference System (CRS) of this object.

The package `funbiogeo` comes with the example dataset `site_locations` defining
the location of the 1,505 sites (grid cells of 0.5° x 0.5° resolution) as
polygons. It contains the names of the site in its first column **`"site"`**:

```{r preview-sites-locs}
site_locations
```

Note that to use your own site locations object, it should follow similar
structure, being an `sf` object with the first column being named **`"sites"`**.


## Visualizing the data (diagnostic plots)

`funbiogeo` provides many functions to display the data to help the user select
specific traits, species, and/or sites. We are going to detail some of them
in this section (see the full list in the
[diagnostic plots vignette](diagnostic-plots.Rmd).
We call them *diagnostic plots* because they help us to have an overview of our dataset **prior** to the analyses.

### Trait completeness per species

A first way to visualize our `data.frame` is to look at the proportion of species
with non-missing traits using the `fb_plot_number_species_by_trait()` function.
It takes the species by trait `data.frame` as input.

```{r fig-plot-sp-by-trait}
fb_plot_number_species_by_trait(species_traits)
```

This plot shows us the number of species (along the x-axis) in function of 
the trait name (along the y-axis). The number of concerned species is shown at
the bottom of the plot while the corresponding proportion of species
(compared to all the species included in the trait dataset) is indicated as
a secondary x-axis at the top. The proportion of species concerned is shown at
the right of each point. For example, in our example dataset, 82.6% species have
a non-NA adult body mass.

The function also include a way to provide a target proportion of species as
the second argument. It will display the proportion as a dark red dashed line.

For example, if we want to visualize which traits cover more than 75%
of the species:

```{r fig-plot-sp-by-trait-prop}
fb_plot_number_species_by_trait(species_traits, threshold_species_proportion = 0.75)
```

The top number shows the corresponding number of species.


### Number of Traits per Species

Another way to filter the data would be to select certain species that
have at least a certain number of traits. This can be visualized using the
`fb_plot_number_traits_by_species()` function. Similarly to the above-mentioned
function, it takes the species x traits `data.frame` as the first argument:

```{r fig-plot-trait-by-sp}
fb_plot_number_traits_by_species(species_traits)
```

The plot shows the number (bottom x-axis) and the proportion (top x-axis) of
species covered by a specific number of traits (0 to 6 in our example).

  
## Filtering the data

Now that we displayed the diagnostic plots, we can decide thresholds and filter
our data for our following analyses.

### Filter trait by species coverage

We want to select the traits that are available for at least 75% of the species. 
To do so we can use the `fb_filter_traits_by_species_coverage()` function.
The function takes the species by traits `data.frame` and outputs the same dataset but with the traits filtered (so with less columns). The second argument
`threshold_species_proportion` is the threshold proportion of species covered:

```{r filter-traits-by-species}
# Initial dimension of the input data
dim(species_traits)

# Filter traits 
red_sp_traits <- fb_filter_traits_by_species_coverage(
  species_traits, threshold_species_proportion =  0.75
)

dim(red_sp_traits)

# The reduced data set now has fewer trait columns
head(red_sp_traits)
```

The function outputs a filtered species-traits dataset retaining only traits
covering at least 75% of the species. In the end this keep two traits: body mass
and litter size.


### Filter species by trait coverage

Similarly you could filter species by their trait coverage. For example we would
like to make sure that the species we filtered previously so that they show
at least one of the two traits selected above and thus exclude species for which
neither of the traits are available.
We can use the function `fb_filter_species_by_trait_coverage()` with
the species x traits `data.frame` as the first argument and the second argument
the proportion of traits covered by species.

```{r filter-species-by-traits}
# Filter species with at least 50% of included (two traits)
# at least one trait
red_sp_traits_2 <- fb_filter_species_by_trait_coverage(
  red_sp_traits, threshold_traits_proportion = 0.5
)

head(red_sp_traits_2)

dim(red_sp_traits_2)
```

We thus have selected 2 traits that cover at least 75% of the initial species
list and 127 species which have known values for these two traits.


### Filter sites by trait coverage

Now that we have filtered our traits and species of interest we need to filter
the sites, that contain enough species for which the traits are available.
Similarly to above the function is
`fb_filter_sites_by_trait_coverage()` it takes as two first arguments
the site x species `data.frame` and the species x traits `data.frame`. The third
argument is `threshold_traits_proportion` that indicates the percent coverage
of traits to filter each site. Note that this coverage is weighted by
the occurrence, abundance, or cover depending on the content
of the site x species `data.frame`.

Let's say here we're interested in sites for which our species with available
traits represent at least 90% of the species present:

```{r filter-trait-coverage}
# Initial site x species data
dim(site_species)

# Filter sites with at least 90% species covered
filt_sites <- fb_filter_sites_by_trait_coverage(
  site_species, red_sp_traits_2, threshold_traits_proportion = 0.9
)

# Filtered sites
dim(filt_sites)

filt_sites[1:4, 1:4]
```

The output of the function is a site x species `data.frame` with selected sites
and species. Now we selected 1,268 sites out of 1,505, for our 2 traits and 127 species.


## Computing Functional Diversity metrics

The `funbiogeo` functions helped us filter our data appropriately with enough
available trait information for species and sites.

We can use the filtered datasets to proceed with our analyses using other
readily available tools for functional diversity indices.
This where you should use your preferred packages to compute functional
diversity indices like
[`fundiversity`](https://cran.r-project.org/package=fundiversity),
[`betapart`](https://cran.r-project.org/package=betapart), or
[`hypervolume`](https://cran.r-project.org/package=hypervolume).

For the sake of the example we included a function in `funbiogeo`
to compute Community-Weighted Mean [CWM, @Garnier_Plant_2004] named `fb_cwm()`.
The CWM is the abundance-weighted average trait per site. We'll be using it in
the following section we'll then show another example computing functional
diversity indices using the `fundiversity` package.


### Community-Weighted Mean (CWM)

We're interested to look at the spatial distribution of the average body mass
and litter size of European mammals. To do so, we can compute the
community-weighted mean of both traits. We'll use the `fb_cwm()` function to do
so, it takes the site x species `data.frame` and species x traits `data.frame`
as arguments.

```{r compute-cwm}
# Note that we're reusing our filtered data to compute CWM
cwm <- fb_cwm(filt_sites, red_sp_traits_2)

head(cwm)
```

It outputs a `data.frame` with 3 columns: the first one, `site`, shows the site
name as provided in the input site x species `data.frame`, `trait`
which indicates the trait name on which the CWM is computed,
and `cwm` which shows the value of the CWM.


### Compute functional diversity indices

We can also integrate our filtered datasets in other functional diversity computation pipeline. We'll show an example by computing functional richness
with `fundiversity`.

```{r computing-fric, eval = require("fundiversity")}
## To install 'fundiversity' uncomment the following line
# install.packages("fundiversity")

# Functional richness in 'fundiversity' requires all the traits to be known
# so we need to filter the traits
filt_traits <- subset(
  red_sp_traits, !is.na(adult_body_mass) & !is.na(litter_size)
)

# We need to transform species and site names as row names for species-traits
# and site-species data.frames, as required by 'fundiversity'
rownames(filt_traits) <- filt_traits$species
filt_traits <- filt_traits[, -1]

rownames(filt_sites) <- filt_sites$site
filt_sites <- filt_sites[, -1]

# Scale traits
filt_traits <- scale(filt_traits)

# Compute Functional Richness
fric <- fundiversity::fd_fric(filt_traits, filt_sites)

head(fric)
```

We now have a table with Functional Richness computed for all of our sites.


## Putting variables on the map

### Map of environmental raster

If we want to display the environment associated with our sites of interest, we
can leverage environmental raster layers, like the mean annual temperature.
Fortunately, we have access to an example raster of mean annual temperature
through `funbiogeo`. The package provides a helper function display a raster
layer easily (without any assumption about the projection) named
`fb_map_raster()`:

```{r map-tavg, dev = 'png'}
# Read raster
tavg <- system.file(
  "extdata", "annual_mean_temp.tif", package = "funbiogeo"
)
tavg <- terra::rast(tavg)

# Map raster
fb_map_raster(tavg) + 
  scale_fill_distiller("Temperature", palette = "Spectral") +
  theme(legend.position = "bottom") + 
  ggtitle("Mean annual temperature in Europe")
```

We can also combine that with the annual precipitation information available
as an example

```{r composition-map, dev = 'png'}
library("patchwork")

# Read raster ------------------------------------------------------------------
tavg <- system.file("extdata", "annual_mean_temp.tif", package = "funbiogeo")
tavg <- terra::rast(tavg)

prec <- system.file("extdata", "annual_tot_prec.tif", package = "funbiogeo")
prec <- terra::rast(prec)

# Individual Maps --------------------------------------------------------------
map_temperature <- fb_map_raster(tavg, legend.position = "none") + 
  scale_fill_distiller("Temperature", palette = "Spectral")

map_precipitation <- fb_map_raster(prec) + 
  scale_fill_distiller("Precipitation", direction = 1)

# Plot composition -------------------------------------------------------------

(map_temperature / map_precipitation) + 
  plot_annotation(title = "Europe", 
                  theme = theme(plot.title = element_text(face = "bold"))) & 
  theme_classic() &
  theme(text = element_text(family = "mono"))
```

This function allows the visualization of a raster in a simple fashion, but it
doesn't tell us anything about the environmental variable at the sites.
In the next section we will follow the example of mapping an environmental
variable.

### Map of average environmental variable in site

To get the average environmental variable in the site we can use the
`fb_get_environment()` function, it takes as arguments the site-locations object
and an environmental raster from the `terra` package.
By default, it takes the average of the raster values per site.

```{r get-environment}
site_mat <- fb_get_environment(site_locations, tavg)

head(site_mat)
```

The variable names in columns are based on the names of the provided raster.
To put these values on the map we can use the `fb_map_site_data()` function,
which allows mapping arbitrary site-level variables. It takes three needed
arguments: the first of which, `site_locations`, which is the `sf` object
describing sites' geographic locations; the second argument is `site_data`,
which is a `data.frame` giving additional data indexed by site; and
`selected_col`, which is a character giving the name of the column to plot from
the `site_data` argument.

We can thus plot the mean annual temperature of sites through the following
commands:

```{r map-avg-environment}
fb_map_site_data(site_locations, site_mat, "annual_mean_temp") +
  labs(title = "Mean Annual Temperature per Site")
```

### Map of functional diversity indices

We can leverage the same functions to map our functional diversity indices.
For example, with the body mass CWM:

```{r map-cwm-body-mass}
body_mass_cwm <- subset(cwm, trait == "adult_body_mass")

fb_map_site_data(site_locations, body_mass_cwm, "cwm") +
  scale_fill_viridis_c(trans = "log10") +
  labs(title = "Mammals Body Mass CWM")
```

We can do similarly for functional richness:

```{r map-fric, eval = require("fundiversity")}
fb_map_site_data(site_locations, fric, "FRic") +
  scale_fill_viridis_c(option = "magma") +
  labs(title = "Mammals Functional Richness")
```


## Conclusion

This concludes our tutorial to introduce `funbiogeo`. The packages contains many
more features, especially diagnostic plots, which are explained in detail in
[a dedicated vignette](diagnostic-plots.Rmd).
There is also a specific vignette about [transforming raw data from long to wide format](long-format.Rmd).
Finally, if you're interested in learning about up-scaling your sites, you can
refer to [the specific vignette](upscaling.Rmd)


## References