Fish biomass is estimated in MERMAID for each individual fish observed using the following length-weight relationship:
\[W = a \times L^b\]
where \(W\) is weight (grams), \(L\) is total length (centimetres), and \(a\) and \(b\) are species-specific coefficients sourced from FishBase (see documentation here).
This document shows how to retrieve and compare MERMAID’s current biomass conversion coefficients with a previous version, explores the consequences of any differences across fish size ranges, and demonstrates how you can apply your own custom coefficients to recalculate biomass estimates at the transect and sample event level.
Getting fishbelt data and reference information from MERMAID
This loads the necessary R packages and downloads fish belt data from an example public MERMAID project, together with the current MERMAID fish species reference data and a previous version of that reference.
Since the code uses observation-level data, it is necessary to select a MERMAID project whose fish belt data sharing policy is set to public - or one of your own projects.
Note: This step requires authentication even if you are accessing public projects. This means you need to create a free MERMAID account if you don’t already have one.
Show the code
rm(list =ls()) # remove past stored objectsoptions(scipen =999) # turn off scientific notation#### Load packages ####library(mermaidr)library(tidyverse)library(plotly)library(DT)library(readxl)library(knitr)library(scales)#### Option 1: Get data from a public project ##### Find a public project with fish belt observation data.# The code below searches for projects with a public fish belt policy and# selects the one with the most transects so there is a good spread of species.# it also excludes some countries from the search (Indonesia, Philippines, India)public_projects <-mermaid_get_summary_sampleevents() %>%filter(data_policy_beltfish =="public"& beltfish_sample_unit_count >0&!country %in%c("Indonesia", "Philippines", "India")) %>%group_by(project_id, project, tags, project_notes, country) %>% dplyr::summarise(NumSites =length(site_id),TotalSampleUnits =sum(beltfish_sample_unit_count),.groups ="drop" ) %>%arrange(desc(TotalSampleUnits))# Select the project with the most sample unitsexample_project_id <- public_projects$project_id[1]example_project_name <- public_projects$project[1]# Download all three data levels for that projectexample_data <-mermaid_get_project_data(project = example_project_id,method ="fishbelt",data ="all")# #### Option 2: Get data from a project for which you are a member ##### my_projects <- mermaid_get_my_projects()## # Choose your project by name or index# example_project_id <- my_projects$project_id[my_projects$project == "Your Project Name"]# example_project_name <- "Your Project Name"## example_data <- mermaid_get_project_data(# project = example_project_id,# method = "fishbelt",# data = "all"# )# Extract the three data levelsobservations_data <- example_data$observationssampleunits_data <- example_data$sampleunitssampleevents_data <- example_data$sampleevents
This section loads the current MERMAID reference data, with the \(a\) and \(b\) values that are currently being used to calculate fish biomass from each observation, and a previous version of the data (from Feb. 25).
Note: There is also a third coefficient (\(c\)) that is used to convert to total length (TL) from other length types, but in MERMAID these should all be 1 (i.e. no conversion) since the \(a\) and \(b\) values extracted from Fishbase assume lengths are measured as TL.
Show the code
#### Get current MERMAID fish species reference ####mermaid_ref_raw <-mermaid_get_reference("fishspecies")mermaid_ref <- mermaid_ref_raw %>%select( name, genus,current_a = biomass_constant_a,current_b = biomass_constant_b,current_c = biomass_constant_c, max_length ) %>%mutate(name =paste(genus, name, sep =" "))#### Get previous MERMAID fish species reference ####prev_ref_url <-"https://public.datamermaid.org/mermaid_attributes_26-02-25.xlsx"prev_ref_local <-tempfile(fileext =".xlsx")download.file(prev_ref_url, destfile = prev_ref_local, mode ="wb", quiet =TRUE)# Inspect available sheets (update the sheet name below if it differs)sheet_names <- readxl::excel_sheets(prev_ref_local)prev_ref_raw <- readxl::read_excel(prev_ref_local, sheet ="Fish Species")prev_ref <- prev_ref_raw %>%select(name = Name,family = Family,prev_a =`Biomass Constant A`,prev_b =`Biomass Constant B`,prev_c =`Biomass Constant C`,prev_max_length =`Max Length (cm)` )
Comparing biomass conversion coefficients between versions
Joins the current and previous reference sets on species ID and identifies species where the \(a\) or \(b\) coefficients differ.
Show the code
# Join on species name (most reliable key between versions)coef_comparison <- mermaid_ref %>%inner_join(prev_ref, by ="name") %>%mutate(diff_a = current_a - prev_a,diff_b = current_b - prev_b,pct_diff_a = (diff_a / prev_a) *100,pct_diff_b = (diff_b / prev_b) *100,changed =abs(pct_diff_a)>1|abs(pct_diff_b)>1 )changed_species <- coef_comparison %>%filter(changed) %>%arrange(desc(abs(pct_diff_a) +abs(pct_diff_b)))unchanged_count <-sum(!coef_comparison$changed)changed_count <-sum(coef_comparison$changed)total_matched <-nrow(coef_comparison)
✓ Comparison complete Species matched between versions: 3524 Species with identical coefficients: 3518 Species with changed coefficients: 6
Species with changed coefficients
The table below shows all species where the \(a\) or \(b\) values differ between the current MERMAID reference and the previous version. Percentage differences are shown to indicate the relative magnitude of each change.
Show the code
changed_species %>%select(Species = name,Family = family,`Current a`= current_a,`Previous a`= prev_a,`Δa`= diff_a,`Δa (%)`= pct_diff_a,`Current b`= current_b,`Previous b`= prev_b,`Δb`= diff_b,`Δb (%)`= pct_diff_b ) %>%mutate(across(where(is.numeric), ~round(., 5))) %>% DT::datatable(rownames =FALSE,filter ="top",options =list(pageLength =15, scrollX =TRUE),caption ="Species with differences in biomass conversion coefficients between the current MERMAID reference and the previous version." )
Visualizing coefficient differences
The plot below shows how the \(a\) and \(b\) coefficients have shifted for species where they changed, with each line connecting the previous value (blue) to the current value (red). Species are ordered by the total magnitude of change across both coefficients.
Implications for biomass estimates across the size range
For each species where the coefficients changed, we calculate what the estimated individual weight would be across the full size range of the species — from 1 cm up to its published maximum length — under both the current and previous reference. Subtracting the previous estimate from the current one at each size shows where and how much the two versions diverge.
Biomass is calculated as \(W = a \times L^b\) (grams, for a single fish of length \(L\) cm).
The plot below shows the difference in estimated individual fish weight (grams) between the current and previous reference coefficients at each size. Positive values indicate that the current reference predicts a higher weight; negative values indicate the opposite.
Interpretation: Differences in \(b\) compound exponentially with fish size, so even small changes to the exponent can lead to large absolute differences in biomass estimates for large individuals. Positive differences indicate the current reference will produce higher biomass estimates than the previous version for those species and sizes and vice versa.
Biomass curves by species
The plot below shows both the current and previous biomass curves for each species, to make it easier to see at which sizes the two versions diverge. For clarity, only species with the largest absolute differences at maximum size are shown (up to 12).
Show the code
# Select species with the largest differences at maximum sizetop_species <- biomass_curves %>%group_by(name) %>% dplyr::summarise(max_abs_diff =max(abs(biomass_diff)), .groups ="drop") %>%slice_max(max_abs_diff, n =min(12, nrow(.))) %>%pull(name)curves_long <- biomass_curves %>%filter(name %in% top_species) %>%select(name, size_cm, biomass_current, biomass_prev) %>%pivot_longer(cols =c(biomass_current, biomass_prev),names_to ="version",values_to ="biomass_g") %>%mutate(version =recode(version,"biomass_current"="Current","biomass_prev"="Previous"))p_facet <-ggplot(curves_long,aes(x = size_cm, y = biomass_g/1000, colour = version)) +geom_line(linewidth =0.8) +facet_wrap(~ name, scales ="free", ncol =3) +scale_colour_manual(values =c("Current"="#E15759", "Previous"="#4E79A7")) +scale_y_continuous(labels =label_comma()) +labs(title ="Biomass Curves by Species: Current vs Previous Coefficients",subtitle ="Showing up to 12 species with largest differences at maximum size",x ="Total length (cm)",y ="Estimated weight (kg)",colour ="Reference version" ) +theme_classic() +theme(legend.position ="top",panel.spacing.y =unit(2, "lines"), # add vertical space between facet rowsstrip.text =element_text(size =7, face ="italic"),strip.background =element_rect(fill ="#f0f3f5"),axis.text =element_text(size =7, colour ="black"),axis.title =element_text(size =11, colour ="black"),plot.title =element_text(size =13, face ="bold", hjust =0.5),plot.subtitle =element_text(size =10, hjust =0.5, colour ="gray30") )p_plotly <-ggplotly(p_facet, height =700) %>%config(displayModeBar =TRUE,displaylogo =FALSE,modeBarButtonsToRemove =c("zoom", "pan", "select", "zoomIn", "zoomOut","autoScale", "resetScale", "lasso2d","hoverClosestCartesian","hoverCompareCartesian")) %>%layout(margin =list(t =80, b =130, l =120),legend =list(orientation ="h",x =0.5,y =-0.15,xanchor ="center",yanchor ="top" ) )# Remove duplicate legend entries from facetingseen <-c()for (i inseq_along(p_plotly$x$data)) { trace_name <- p_plotly$x$data[[i]]$nameif (!is.null(trace_name)) {if (trace_name %in% seen) { p_plotly$x$data[[i]]$showlegend <-FALSE } else { seen <-c(seen, trace_name) } }}p_plotly
Using custom coefficients
Researchers may have access to coefficients that are more appropriate for their study region — for example, from a regional calibration study, an updated FishBase record, or a publication specific to their survey area. This section demonstrates how to apply custom \(a\) and \(b\) values to recalculate biomass from raw observations and shows how those changes propagate to the transect and sample event level.
Creating a custom coefficient table
To illustrate the workflow, below I create a small custom coefficient set is by taking five of the most frequently observed species in the project and applying small random adjustments (up to ±10%) to the current MERMAID coefficients. In practice, replace the custom_coefs table with your own data (Note: your own custom coefficients table should have columns for “name”, “custom_a”, and “custom_b”).
Show the code
set.seed(42) # for reproducibility# Identify the most frequently observed species in the project dataspecies_counts <- observations_data %>%count(fish_taxon, sort =TRUE)# Get current MERMAID reference coefficients for species present in the projectproject_ref <- mermaid_ref %>%filter(name %in% species_counts$fish_taxon)# Select the 5 most observed species that have reference coefficientstop5_species <- species_counts %>%filter(fish_taxon %in% project_ref$name) %>%slice_head(n =5) %>%pull(fish_taxon)# Create custom coefficients with random shifts within ±10%custom_coefs <- project_ref %>%filter(name %in% top5_species) %>%select(name, current_a, current_b) %>%mutate(adj_a =runif(n(), -0.09, 0.09),adj_b =runif(n(), -0.09, 0.09),custom_a = current_a * (1+ adj_a),custom_b = current_b * (1+ adj_b) )# Display the custom coefficient tablecustom_coefs %>%select(Species = name,`MERMAID a`= current_a,`Custom a`= custom_a,`Change in a (%)`= adj_a,`MERMAID b`= current_b,`Custom b`= custom_b,`Change in b (%)`= adj_b ) %>%mutate(`Change in a (%)`=round(`Change in a (%)`*100, 2),`Change in b (%)`=round(`Change in b (%)`*100, 2),across(c(`MERMAID a`, `Custom a`, `MERMAID b`, `Custom b`), ~round(., 6)) ) %>%kable(caption ="Custom coefficient set for demonstration. Replace with your own values in practice." )
Custom coefficient set for demonstration. Replace with your own values in practice.
Species
MERMAID a
Custom a
Change in a (%)
MERMAID b
Custom b
Change in b (%)
Halichoeres bivittatus
0.010516
0.011301
7.47
3.092435
3.103065
0.34
Halichoeres garnoti
0.005190
0.005598
7.87
2.540000
2.648168
4.26
Scarus iseri
0.015800
0.015192
-3.85
3.051500
2.850833
-6.58
Stegastes partitus
0.012300
0.013032
5.95
3.050000
3.136189
2.83
Thalassoma bifasciatum
0.010700
0.010973
2.55
2.916000
3.023634
3.69
Recalculating biomass from observations
MERMAID calculates fish biomass at the observation level as:
where the two conversion factors simplify to ×10. The code below recalculates this for each observation using the custom coefficients where available, and falls back to the MERMAID reference for all other species. These observation-level estimates are then summed to the transect level and averaged to the sample event level.
Note: Column names may vary slightly between MERMAID exports (mermaidr vs. xlsx downloads through the website). If the code below produces errors, check names(observations_data) and update the column references accordingly. The key columns needed are fish_taxon, size, count, transect_len_surveyed, and belt_width.
Show the code
# Build coefficient lookup tablescustom_lookup <- custom_coefs %>%select(fish_taxon = name, custom_a, custom_b)ref_lookup <- mermaid_ref %>%select(fish_taxon = name, ref_a = current_a, ref_b = current_b)# Join coefficients to observation data and recalculate biomassobs_recalc <- observations_data %>%left_join(custom_lookup, by ="fish_taxon") %>%left_join(ref_lookup, by ="fish_taxon") %>%mutate(# Use custom coefficients where available, otherwise MERMAID referenceuse_a =if_else(!is.na(custom_a), custom_a, ref_a),use_b =if_else(!is.na(custom_b), custom_b, ref_b),transect_area_m2 = transect_length * assigned_transect_width_m,# Recalculate biomass density (kg/ha)biomass_g_custom = use_a * size ^ use_b * count,biomass_kgha_custom = (biomass_g_custom / transect_area_m2) *10 )
Transect-level summaries
Biomass density estimates from individual observations are summed to the transect (sample unit) level.
The grouped bar chart below compares mean site-level biomass between the MERMAID reference and custom coefficients, making it easy to identify which sites are most affected by the change. To prevent crowding in the plot, only the top 15 sample events by absolute biomass difference are shown (unless there are fewer than 15 sample events in the project).
The diverging bar chart below shows the percentage difference in mean site biomass between the two approaches. Positive values (red) indicate sites where the custom coefficients give higher estimates; negative values (blue) indicate the reverse.
Coefficient comparison: 6 of 3524 matched species had at least one coefficient change between the current and previous MERMAID reference.
Biomass implications: Differences in coefficients compound with fish size, so their practical effect on biomass estimates is most pronounced for large individuals. The species showing the largest divergence at maximum size was Carcharhinus melanopterus, with a maximum difference of 38.3 kg per individual (at its maximum length).
Custom coefficient impact: Applying the example custom coefficients (±10% adjustments to five species) resulted in mean site-level biomass differences of up to 23.7% (19.6 kg/ha). The site most affected was CCMR-F-GUZ-08.
To use your own custom coefficients, replace the custom_coefs table above with a data frame containing columns name (matching the fish_taxon column in your observations), custom_a, and custom_b. The recalculation pipeline will automatically apply your values where available and fall back to the MERMAID reference for all other species.
Data and methods
Fish belt data: accessed via the mermaidr R package using mermaid_get_project_data()
Current MERMAID fish species reference: mermaid_get_reference("fishspecies")