5.3 Contribution

In this vignette, we aim at evaluating the contribution of individual species to each bioregion, using the function contribution().

Data

We use the vegetation dataset that comes with bioregion.

data("vegedf")
data("vegemat")

# Calculation of (dis)similarity matrices
vegedissim <- dissimilarity(vegemat, metric = c("Simpson"))
vegesim <- dissimilarity_to_similarity(vegedissim)

Bioregionalization

We use the same three bioregionalization algorithms as in the visualization vignette, i.e. a non-hierarchical, hierarchical and network bioregionalizations.
We chose 3 bioregions for the non-hierarchical and hierarchical bioregionalizations.

# Non hierarchical bioregionalization
vege_nhclu_kmeans <- nhclu_kmeans(vegedissim, n_clust = 3, index = "Simpson")
vege_nhclu_kmeans$cluster_info # 3
##     partition_name n_clust
## K_3            K_3       3
# Hierarchical bioregionalization
set.seed(1)
vege_hclu_hierarclust <- hclu_hierarclust(dissimilarity = vegedissim,
                                          index = names(vegedissim)[3],
                                          method = "average", n_clust = 3)
vege_hclu_hierarclust$cluster_info # 3
##   partition_name n_clust requested_n_clust output_cut_height
## 1            K_3       3                 3            0.5625
# Network bioregionalization
set.seed(1)
vege_netclu_walktrap <- netclu_walktrap(vegesim,
                                        index = names(vegesim)[3])
vege_netclu_walktrap$cluster_info # 3
##     partition_name n_clust
## K_3            K_3       3

Indices

Contribution


The contribution index ρ is calculated for each species x bioregion combination, following (Lenormand et al., 2019).
Its formula is the following:

$$\rho_{ij} = \frac{n_{ij} - \frac{n_i n_j}{n}}{\sqrt{\frac{n - n_j}{n-1} (1-\frac{n_j}{n}) \frac{n_i n_j}{n}}}$$ with n the number of sites, ni the number of sites in which species i is present, nj the number of sites belonging to the bioregion j, nij the number of occurrences of species i in sites belonging to the bioregion j.

Cz statistics

Cz metrics are derived from . Their respective formula are: $$C_i = 1 - \sum_{s=1}^{N_M}{{(\frac{k_is}{k_i}})^2}$$

where kis is the number of links of node (species or site) i to nodes in bioregion s, and ki is the total degree of node i. The participation coefficient of a node is therefore close to 1 if its links are uniformly distributed among all the bioregions and 0 if all its links are within its own bioregion.

And: $$z_i = \frac{k_i - \overline{k_{si}}}{\sigma_{k_{si}}}$$

where ki is the number of links of node (species or site) i to other nodes in its bioregion si, $\overline{k_{si}}$ is the average of k over all the nodes in si, and σksi is the standard deviation of k in si. The within-bioregion degree z-score measures how well-connected node i is to other nodes in the bioregion.

Contribution

We can now run the function contribution().

contrib_kmeans <- contribution(vege_nhclu_kmeans, vegemat,
                               indices = "contribution")
contrib_hclu <- contribution(vege_hclu_hierarclust, vegemat,
                             indices = "contribution")
contrib_netclu <- contribution(vege_netclu_walktrap, vegemat,
                               indices = "contribution")

# Cz indices
clust_bip <- netclu_greedy(vegedf, bipartite = TRUE)
cz_netclu <- contribution(cluster_object = clust_bip, comat = vegemat, 
                          bipartite_link = vegedf, indices = "Cz")

contribution() outputs data.frame with the contribution metrics available at the species level.

References

Lenormand, M., Papuga, G., Argagnon, O., Soubeyrand, M., Alleaume, S., & Luque, S. (2019). Biogeographical network analysis of plant species distribution in the mediterranean region. Ecology and Evolution, 9, 237–250.