This code performs data quality checks on MERMAID projects with fish belt data, including observer comparisons, outlier detection, family diversity analysis, and biomass composition.
Getting example fishbelt data from MERMAID
Loads the necessary R packages and gets an example MERMAID project with fish belt data. Since the code relies on all three scales of MERMAID data (observations, sample units, and sample events), it is necessary to either select a MERMAID project whose permissions are set to public for fish belt or to use a project for which you are a member. Here I am exporting a public project so anyone can run the code - in this case a project tagged with “WCS”. However, I also provide some example code (hashed out) that can be used to get a project for which you are a member.
Note: This step requires authentication even if you are accessing a public project. This means you need to create a free MERMAID account if you don’t already have one.
Show the code
rm(list =ls()) #remove past stored objectsoptions(scipen =999) #turn off scientific notation#### Load packages ####library(mermaidr)library(tidyverse)library(plotly)library(DT)library(ggplot2)library(ggpubr)library(knitr)#### Get data from an example project with public fishbelt data ####allPublicFisbeltProjects <-mermaid_get_summary_sampleevents() %>%filter(data_policy_beltfish =="public"&!is.na(beltfish_biomass_kgha_avg) &!country %in%c("Indonesia", "Philippines") &grepl(pattern ="Blue Ventures", x = tags)) %>%group_by(project_id, project, tags, project_notes) %>% dplyr::summarise(NumSites =length(site),TotalSampleUnits =sum(beltfish_sample_unit_count)) %>%arrange(desc(TotalSampleUnits))targetProject <- allPublicFisbeltProjects[1,] #choose first project as exampletargetFishAllData <-mermaid_get_project_data(project = allPublicFisbeltProjects$project_id[1],method ="fishbelt",data ="all")# #### Get data from a project for which you are a member ##### my_projects <- mermaid_get_my_projects()# # #The next line gets the first project - change the number to get a different one# targetProject <- my_projects[1,] # # # Get all data levels# targetFishAllData <- targetProject %>% # mermaid_get_project_data(method = "fishbelt", data = "all")# Extract the three levelsobservations_data <- targetFishAllData$observationssampleunits_data <- targetFishAllData$sampleunitssampleevents_data <- targetFishAllData$sampleevents
Comparing fish biomass and abundance patterns among observers to identify potential observer effects.
Note: Many transects have multiple observers listed. The analyses below show patterns for each observer across all transects they participated in, but systematic differences could reflect either individual observer effects or the effects of observer pairings.
ggplot(families_by_site, aes(x = n_families)) +geom_histogram(binwidth =2, fill ="#4575b4", color ="black", alpha =0.7) +labs(title ="Distribution of Fish Family Richness Across Sites",x ="Number of Fish Families",y ="Number of Sites" ) +theme_classic() +theme(axis.text =element_text(size =11, colour ="black"),axis.title =element_text(size =12, colour ="black"),plot.title =element_text(size =14, face ="bold", hjust =0.5) )
Complete Family List
Show the code
family_list <- observations_data %>%group_by(fish_family) %>%summarise(n_species =n_distinct(fish_taxon),total_count =sum(count, na.rm =TRUE) ) %>%arrange(fish_family)datatable(family_list,colnames =c("Family", "N Species", "N Observations", "Total Count"),caption ="Complete List of Fish Families",options =list(pageLength =15, autoWidth =TRUE))
Biomass Composition by Family
Top Families by Biomass
Identifying which fish families contribute most to total biomass.
Show the code
# Extract all family-specific biomass columns from sample eventsfamily_cols <-grep("^biomass_kgha_fish_family_avg_", names(sampleevents_data), value =TRUE)# Calculate true mean biomass for each family (replacing NAs with 0)family_biomass <- sampleevents_data %>%select(all_of(family_cols)) %>%pivot_longer(cols =everything(),names_to ="fish_family",values_to ="biomass",names_prefix ="biomass_kgha_fish_family_avg_") %>%# Replace NAs with 0 to get true mean (including sites where family is absent)mutate(fish_family =str_to_title(fish_family),biomass =replace_na(biomass, 0)) %>%group_by(fish_family) %>%summarise(mean_biomass =mean(biomass, na.rm =TRUE),sd_biomass =sd(biomass, na.rm =TRUE),n_sites_present =sum(biomass >0),n_sites_total =n() ) %>%mutate(percent_sites =100* n_sites_present / n_sites_total,# Calculate contribution to overall mean biomass across all familiespercent_total =100* mean_biomass /sum(mean_biomass) ) %>%arrange(desc(mean_biomass))# Top 15 familiestop_families <-head(family_biomass, 15)kable(top_families,digits =2,col.names =c("Family", "Mean Biomass (kg/ha)", "SD", "Sites Present", "Total Sites", "% Sites Present", "% of Total Biomass"),caption ="Top 15 Fish Families by Mean Biomass (true mean including zeros)")
Top 15 Fish Families by Mean Biomass (true mean including zeros)
Family
Mean Biomass (kg/ha)
SD
Sites Present
Total Sites
% Sites Present
% of Total Biomass
Scaridae
46.75
82.86
45
47
95.74
27.82
Haemulidae
41.91
67.25
40
47
85.11
24.94
Acanthuridae
31.27
155.34
39
47
82.98
18.61
Pomacentridae
14.81
14.99
47
47
100.00
8.81
Labridae
9.92
10.05
47
47
100.00
5.90
Lutjanidae
8.92
15.60
23
47
48.94
5.31
Pomacanthidae
5.66
19.74
15
47
31.91
3.37
Chaetodontidae
1.93
4.62
21
47
44.68
1.15
Carangidae
1.90
7.49
7
47
14.89
1.13
Holocentridae
1.43
3.40
12
47
25.53
0.85
Diodontidae
1.20
8.23
1
47
2.13
0.71
Epinephelidae
0.84
3.80
3
47
6.38
0.50
Serranidae
0.53
1.22
13
47
27.66
0.31
Aulostomidae
0.37
2.56
1
47
2.13
0.22
Balistidae
0.29
1.39
2
47
4.26
0.17
Biomass Proportion - Top Families
Show the code
# Prepare data for pie chart - top 10 plus "Other"top10_families <-head(family_biomass, 10)other_biomass <-sum(family_biomass$mean_biomass[11:nrow(family_biomass)])pie_data <- top10_families %>%select(fish_family, mean_biomass) %>%bind_rows(tibble(fish_family ="Other families", mean_biomass = other_biomass))# Create pie chartplot_ly(pie_data, labels =~fish_family, values =~mean_biomass,type ='pie',textposition ='inside',textinfo ='label+percent',hoverinfo ='text',text =~paste(fish_family, '<br>',round(mean_biomass, 1), 'kg/ha'),marker =list(line =list(color ='#FFFFFF', width =2))) %>%layout(title ="Fish Biomass Composition by Family (Top 10 - Mean Biomass)",showlegend =TRUE,legend =list(orientation ='v', x =1.1, y =0.5))
Biomass Composition by Trophic Group
Show the code
# Get trophic group biomass from sample eventstrophic_cols <-grep("biomass_kgha_trophic_group_avg_", names(sampleevents_data), value =TRUE)trophic_biomass <- sampleevents_data %>%select(site, all_of(trophic_cols)) %>%pivot_longer(cols =all_of(trophic_cols),names_to ="trophic_group",values_to ="biomass",names_prefix ="biomass_kgha_trophic_group_avg_") %>%# Replace NAs with 0 to get true mean (including sites where group is absent)mutate(biomass =replace_na(biomass, 0)) %>%group_by(trophic_group) %>%summarise(mean_biomass =mean(biomass, na.rm =TRUE),q025 =quantile(biomass, 0.025, na.rm =TRUE),q975 =quantile(biomass, 0.975, na.rm =TRUE),n_sites_present =sum(biomass >0),n_sites_total =n() ) %>%mutate(trophic_group =case_when( trophic_group =="planktivore"~"Planktivore", trophic_group =="herbivore_macroalgae"~"Herbivore (macroalgae)", trophic_group =="herbivore_detritivore"~"Herbivore (detritivore)", trophic_group =="invertivore_sessile"~"Invertivore (sessile)", trophic_group =="invertivore_mobile"~"Invertivore (mobile)", trophic_group =="omnivore"~"Omnivore", trophic_group =="piscivore"~"Piscivore",TRUE~ trophic_group ),percent_sites =100* n_sites_present / n_sites_total,percent_total =100* mean_biomass /sum(mean_biomass) ) %>%arrange(desc(mean_biomass))kable(trophic_biomass,digits =2,col.names =c("Trophic Group", "Mean Biomass (kg/ha)", "2.5% Quantile", "97.5% Quantile","Sites Present", "Total Sites", "% Sites Present","% of Total Biomass"),caption ="Fish Biomass by Trophic Group (true mean including zeros)")
Fish Biomass by Trophic Group (true mean including zeros)
Trophic Group
Mean Biomass (kg/ha)
2.5% Quantile
97.5% Quantile
Sites Present
Total Sites
% Sites Present
% of Total Biomass
Herbivore (detritivore)
58.78
3.24
264.55
47
47
100.00
34.98
Invertivore (mobile)
45.54
0.58
168.86
47
47
100.00
27.10
Herbivore (macroalgae)
29.00
0.00
40.78
33
47
70.21
17.26
Piscivore
16.88
0.00
61.84
43
47
91.49
10.04
Invertivore (sessile)
7.30
0.00
46.51
27
47
57.45
4.34
Omnivore
5.35
0.00
14.27
45
47
95.74
3.19
Planktivore
5.19
0.03
18.71
45
47
95.74
3.09
Trophic Group Composition Bar Plot
Show the code
ggplot(trophic_biomass, aes(x =reorder(trophic_group, mean_biomass), y = mean_biomass, fill = trophic_group)) +geom_col(alpha =0.8, color ="black", linewidth =0.3) +geom_errorbar(aes(ymin = q025, ymax = q975),width =0.3, linewidth =0.5) +coord_flip() +labs(title ="Mean Fish Biomass by Trophic Group",subtitle ="Error bars show 95% quantiles (2.5% - 97.5%)",x ="Trophic Group",y ="Mean Biomass (kg/ha)" ) +theme_classic() +theme(legend.position ="none",axis.text =element_text(size =11, colour ="black"),axis.title =element_text(size =12, colour ="black"),plot.title =element_text(size =14, face ="bold", hjust =0.5),plot.subtitle =element_text(size =11, hjust =0.5, color ="gray30") ) +scale_fill_brewer(palette ="Set3")