--- title: "Tutorial for bioregion" author: "Maxime Lenormand, Boris Leroy and Pierre Denelle" date: "`r Sys.Date()`" output: html_vignette: number_sections: false html_document: toc: true toc_float: collapsed: false smooth_scroll: false toc_depth: 2 bibliography: '`r system.file("REFERENCES.bib", package="bioregion")`' csl: journal-of-biogeography.csl vignette: > %\VignetteIndexEntry{Tutorial for bioregion} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.width = 8, fig.height = 8) # Packages -------------------------------------------------------------------- suppressPackageStartupMessages({ suppressWarnings({ library(bioregion) }) }) options(tinytex.verbose = TRUE) ``` ## 0. Brief introduction This tutorial aims at describing the different features of the R package `bioregion`. The main purpose of the `bioregion`'s package is to propose a transparent methodological framework to compare bioregionalisation methods. Below is the typical flow chart of bioregions' identification based on a site-species bipartite network or co-occurrence matrix with `bioregion` (Figure 1). This workflow can be divided into four main steps: 1. Preprocess the data (matrix or network formats) 2. Compute similarity/dissimilarity metrics between sites based on species composition 3. Run the different algorithms to identify different set of bioregions 4. Evaluate and visualize the results

Figure 1: Workflow of the bioregion's package.

## 1. Install binary files Some functions or at least part of them (listed below) require binary files to run. * [netclu_infomap](https://bioRgeo.github.io/bioregion/reference/netclu_infomap.html) * [netclu_louvain](https://bioRgeo.github.io/bioregion/reference/netclu_louvain.html) (Cpp version) * [netclu_oslom](https://bioRgeo.github.io/bioregion/reference/netclu_oslom.html) Please check this [tutorial page](https://bioRgeo.github.io/bioregion/articles/a1_install_binary_files.html) to get instructions regarding the installation of the binary files. ## 2. Matrix or Network formats The `bioregion`'s package takes as input site-species information stored in a bipartite network or a co-occurrence matrix. Relying on the function [mat_to_net](https://bioRgeo.github.io/bioregion/reference/mat_to_net.html) and [net_to_mat](https://bioRgeo.github.io/bioregion/reference/net_to_mat.html) , it handles both the matrix and network formats throughout the workflow. Please have a look at this [tutorial page](https://bioRgeo.github.io/bioregion/articles/a2_matrix_and_network_formats.html) to better understand how these two functions work. ## 3. Pairwise similarity/dissimilarity metrics The functions [similarity](https://bioRgeo.github.io/bioregion/reference/similarity.html) and [dissimilarity](https://bioRgeo.github.io/bioregion/reference/dissimilarity.html) compute respectively pairwise similarity and dissimilarity metrics based on a (site-species) co-occurrence matrix. The resulting `data.frame` is stored in a `bioregion.pairwise.metric` object containing all requested metrics between each pair of sites. The functions [dissimilarity_to_similarity](https://bioRgeo.github.io/bioregion/reference/dissimilarity_to_similarity.html) and [similarity_to_dissimilarity](https://bioRgeo.github.io/bioregion/reference/dissimilarity_to_similarity.html) can be used to transform a similarity object into a dissimilarity object and vice versa. Please have a look at this [tutorial page](https://bioRgeo.github.io/bioregion/articles/a3_pairwise_metrics.html) to better understand how these functions work. ## 4. Clustering algorithms The `bioregion` R package gathers several methods allowing to group sites and species into similar entities called bioregions. All these methods can lead to several partitions of sites and species, i.e. to different bioregionalisations.
Bioregionalisation methods can be based on hierarchical clustering algorithms, non-hierarchical clustering algorithms or network algorithms.
The functions in the package are related to each of these three families and produce output that have a specific class, namely the `bioregion.clusters` class.
### 4.1 Hierarchical clustering The functions relying on hierarchical clustering start with the prefix `hclu_`. With these algorithms, the bioregions are placed into a dendrogram that ranges from two extremes: all sites belong to the same bioregion (top of the tree) or all sites belong to a different bioregion (bottom of the tree). See the following [tutorial page](https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html) for more details. ### 4.2 Non-hierarchical clustering The functions relying on hierarchical clustering start with the prefix `nhclu_`. For most of these algorithms, the user needs to predefine the number of clusters, although this number can be determined by estimating the optimal partition. See this [tutorial page](https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html) for more details. ### 4.3 Network clustering The functions relying on network clustering start with the prefix `netclu_`. Site-species matrices can be seen as (bipartite) networks where the nodes are either the sites or the species and the links between them are the occurrences of species within sites. With networks, modularity algorithms can be applied, leading to bioregionalisation. The following [tutorial page](https://bioRgeo.github.io/bioregion/articles/a4_3_network_clustering.html) details more each clustering functions relying on a network algorithm. ### 4.4 Microbenchmark The different bioregionalisation methods listed in the package rely on more or less computationally intensive algorithms. The following [page](https://biorgeo.github.io/bioregion/articles/a4_4_microbenchmark.html) estimates the time required to run each method on data sets of different sizes. ## 5. Visualization and evaluation of the results ## 5.1 Visualization If sites have geographic coordinates, then each bioregionalisation can be visualized with the function `map_clusters()`. This [tutorial page](https://biorgeo.github.io/bioregion/articles/a5_1_visualization.html) details different ways to plot your bioregionalisation. ## 5.2 Compare partitions In this section, we look at how sites are assigned to bioregions within a single bioregionalization and also compare this assignment across different bioregionalizations. The following [page](https://biorgeo.github.io/bioregion/articles/a5_2_compare_partitions.html) illustrates this.