This code performs data quality checks on MERMAID projects with fish belt data, including observer comparisons, outlier detection, and shark observation analysis.
Getting example fishbelt data from MERMAID
Loads the necessary R packages and gets example MERMAID projects with fish belt data. This analysis can be used for data checks on any project. However, in this example I use two separate projects in order to show trends in observers and taxa:
Observer comparison project: A project with multiple observers to show observer effect analysis
Shark data project: A project with shark observations for the shark analysis section
Since the code relies on all three scales of MERMAID data (observations, sample units, and sample events), it is necessary to either select MERMAID projects whose permissions are set to public for fish belt or to use projects for which you are a member.
Note: This step requires authentication even if you are accessing public projects. This means you need to create a free MERMAID account if you don’t already have one.
Show the code
rm(list =ls()) #remove past stored objectsoptions(scipen =999) #turn off scientific notation#### Load packages ####library(mermaidr)library(tidyverse)library(plotly)library(DT)library(ggplot2)library(ggpubr)library(knitr)library(rfishbase)#### Option 1: Get data from public projects ##### Find a project with many observers for observer comparisonsobserver_projects <-mermaid_get_summary_sampleevents() %>%filter(data_policy_beltfish =="public"&!is.na(beltfish_biomass_kgha_avg) &!country %in%c("Indonesia", "Philippines")) %>%group_by(project_id, project, tags, project_notes, country) %>% dplyr::summarise(NumSites =length(site),TotalSampleUnits =sum(beltfish_sample_unit_count),NumObservers =n_distinct(observers) ) %>%arrange(desc(NumObservers), desc(TotalSampleUnits))# Select project with most observersobserver_project_id <- observer_projects$project_id[1]observer_project_name <- observer_projects$project[1]# Get the observer comparison dataobserver_data <-mermaid_get_project_data(project = observer_project_id,method ="fishbelt",data ="all")# Find a project with shark observations# Get all public projects and check for shark familiesshark_families <-c("Carcharhinidae","Sphyrnidae","Alopiidae","Lamnidae","Rhincodontidae","Galeocerdonidae","Ginglymostomatidae","Hemiscylliidae","Heterodontidae","Hexanchidae","Odontaspididae","Orectolobidae","Parascylliidae","Scyliorhinidae","Squalidae","Stegostomatidae","Triakidae")# This is a simplified approach - you may need to manually specify a project# that you know has shark datashark_project_id <- observer_project_id # Start with same projectshark_project_name <- observer_project_name# Try to find a project with sharks (this may take time, so commented out)# You can manually specify a project ID here if you know one with shark datashark_project_id <-"55ac964c-0228-42da-8061-5983339ecb9f"shark_project_name <- observer_projects$project[observer_projects$project_id == shark_project_id]shark_data <-mermaid_get_project_data(project = shark_project_id,method ="fishbelt",data ="all")# Check if shark data actually has sharks, if not use observer datahas_sharks <-any(shark_data$observations$fish_family %in% shark_families)if(!has_sharks) {cat("Note: Selected shark project does not contain shark observations.\n")cat("Using observer project data for all analyses.\n\n") shark_data <- observer_data shark_project_id <- observer_project_id shark_project_name <- observer_project_name}# #### Option 2: Get data from projects for which you are a member ##### my_projects <- mermaid_get_my_projects()# # # Specify which project to use for observer comparisons# observer_project_id <- my_projects$project_id[1] # Change index as needed# observer_data <- mermaid_get_project_data(# project = observer_project_id,# method = "fishbelt",# data = "all"# )# # # Specify which project to use for shark data (can be same or different)# shark_project_id <- my_projects$project_id[2] # Change index as needed# shark_data <- mermaid_get_project_data(# project = shark_project_id,# method = "fishbelt",# data = "all"# )# Extract the three levels for observer comparisonsobservations_data <- observer_data$observationssampleunits_data <- observer_data$sampleunitssampleevents_data <- observer_data$sampleevents# Extract data for shark analysis (will be used later)shark_observations_data <- shark_data$observationsshark_sampleunits_data <- shark_data$sampleunitsshark_sampleevents_data <- shark_data$sampleevents
Anonymize Observers
Show the code
# Create anonymization mappingcreate_observer_mapping <-function(observer_string) {# Split all observer strings and get unique observers all_observers <- observer_string %>%str_split(", ") %>%unlist() %>%unique() %>%sort()# Create mapping with Observer A, B, C, etc. mapping <-tibble(original = all_observers,anonymous =paste0("Observer ", LETTERS[1:length(all_observers)]) )return(mapping)}# Get all unique observers from sample eventsobserver_mapping <-create_observer_mapping(sampleevents_data$observers)# Function to anonymize observer stringsanonymize_observers <-function(observer_string, mapping) {# Split the string observers <-str_split(observer_string, ", ")[[1]]# Map each observer anonymous_observers <-sapply(observers, function(obs) { mapping$anonymous[mapping$original == obs] })# Recombinepaste(anonymous_observers, collapse =", ")}# Apply anonymization to all data framesobservations_data <- observations_data %>%mutate(observers =sapply(observers, anonymize_observers, mapping = observer_mapping))sampleunits_data <- sampleunits_data %>%mutate(observers =sapply(observers, anonymize_observers, mapping = observer_mapping))sampleevents_data <- sampleevents_data %>%mutate(observers =sapply(observers, anonymize_observers, mapping = observer_mapping))# Display messagecat(paste0("<span style='color: green;'>✓ Observer names anonymized: ", nrow(observer_mapping), " unique observers mapped to letters A-", LETTERS[nrow(observer_mapping)], "</span>\n"))
# Assemble project information message for observer comparison projectproject_info <-paste0("### Observer Comparison Project\n\n","**Project:** ", observer_project_name, "\n\n","**Project ID:** ", observer_project_id, "\n\n")# Add data extracted for observer projectproject_info <-paste0(project_info,"**Data extracted:**\n\n", "- Observations: ", nrow(observations_data), " records\n","- Sample Units: ", nrow(sampleunits_data), " transects\n","- Sample Events: ", nrow(sampleevents_data), " sites\n","- Unique observers: ", length(unique(sampleevents_data$observers)), "\n","\n")# Add shark project info if differentif(shark_project_id != observer_project_id) { project_info <-paste0(project_info,"### Shark Data Project\n\n","**Project:** ", shark_project_name, "\n\n","**Project ID:** ", shark_project_id, "\n\n","**Data extracted:**\n\n", "- Observations: ", nrow(shark_observations_data), " records\n","- Sample Units: ", nrow(shark_sampleunits_data), " transects\n","- Sample Events: ", nrow(shark_sampleevents_data), " sites\n","\n")} else { project_info <-paste0(project_info,"*Note: Same project used for shark analysis*\n\n")}# Display the assembled messagecat(project_info)
Observer Comparison Project
Project: Northern Belize Coastal Complex
Project ID: d2225edc-0dbb-4c10-8cc7-e7d6dfaf149f
Data extracted:
Observations: 2187 records
Sample Units: 176 transects
Sample Events: 47 sites
Unique observers: 7
Shark Data Project
Project: SERF 2.0_Nick Graham_Chagos_Outer
Project ID: 55ac964c-0228-42da-8061-5983339ecb9f
Data extracted:
Observations: 8986 records
Sample Units: 149 transects
Sample Events: 40 sites
Observer Comparisons
Comparing fish biomass and abundance patterns among observers to identify potential observer effects.
Note: Many transects have multiple observers listed. The analyses below show patterns for each observer across all transects they participated in, but systematic differences could reflect either individual observer effects or the effects of observer pairings.
Fish Biomass Summary by Observer (transect-level totals)
Observer
N Transects
Mean Biomass (kg/ha)
Median Biomass
2.5% Quantile
97.5% Quantile
Observer C
31
286.46
107.42
3.95
1755.24
Observer B
47
144.63
65.78
2.84
562.14
Observer A
141
144.21
72.80
3.08
799.10
Statistical Test: Biomass Differences Among Observers
Show the code
# Prepare data for statistical testplot_data <- transect_biomass %>%separate_rows(observers, sep =", ")# Filter to observers with at least 5 transects for meaningful comparisonobservers_to_test <- plot_data %>%count(observers) %>%filter(n >=5) %>%pull(observers)test_data <- plot_data %>%filter(observers %in% observers_to_test)# Perform Kruskal-Wallis test (non-parametric ANOVA)if(length(observers_to_test) >=2) { kw_test <-kruskal.test(total_biomass ~ observers, data = test_data)cat("**Kruskal-Wallis Test for Biomass Differences Among Observers**\n\n")cat("Test statistic (chi-squared):", round(kw_test$statistic, 3), "\n\n")cat("Degrees of freedom:", kw_test$parameter, "\n\n")cat("P-value:", format.pval(kw_test$p.value, digits =3), "\n\n")if(kw_test$p.value <0.05) {cat("**Result:** Significant differences detected among observers (p < 0.05)\n\n") } else {cat("**Result:** No significant differences detected among observers (p ≥ 0.05)\n\n") }} else {cat("Insufficient observers (with ≥5 transects) for statistical testing.\n\n")}
Kruskal-Wallis Test for Biomass Differences Among Observers
Test statistic (chi-squared): 1.076
Degrees of freedom: 2
P-value: 0.584
Result: No significant differences detected among observers (p ≥ 0.05)
Interpretation note: If significant differences are found among observers, this warrants further investigation. However, these results should be interpreted with caution, as observers may survey in different locations and at different times. Therefore, observed differences could reflect spatial or temporal variation in fish communities rather than true observer effects.
Biomass Distribution by Observer
Show the code
# Create subtitle text based on statistical test resultsif(exists("kw_test") &&length(observers_to_test) >=2) {# Determine significanceif(kw_test$p.value <0.05) { sig_text <-"Significant differences detected among observers" } else { sig_text <-"No significant differences detected among observers" } subtitle_text <-paste0("Observers with ≥5 transects shown; total biomass per transect\n", sig_text, " (Kruskal-Wallis: χ² = ", round(kw_test$statistic, 2),", p = ", format.pval(kw_test$p.value, digits =3, eps =0.001), ")" )} else { subtitle_text <-"Observers with ≥5 transects shown; total biomass per transect"}ggplot(test_data, aes(x =reorder(observers, total_biomass, FUN = median), y = total_biomass)) +geom_boxplot(fill ="#69b3a2", alpha =0.7) +coord_flip() +labs(title ="Fish Biomass Distribution by Observer",subtitle = subtitle_text,x ="Observer",y ="Total Biomass (kg/ha)" ) +theme_classic() +theme(axis.text =element_text(size =10, colour ="black"),axis.title =element_text(size =12, colour ="black"),plot.title =element_text(size =14, face ="bold", hjust =0.5),plot.subtitle =element_text(size =10, hjust =0.5, color ="gray30") )
Sites Identified as Statistical Outliers (|z-score| > 2)
Site
Date
Biomass (kg/ha)
Z-Score
Observer
BCMR 33
2022-11-17
1217.07
5.27
Observer C, Observer B
BZCCCB02
2021-10-28
681.31
2.58
Observer A
Interpretation note: Sites identified as statistical outliers warrant further investigation. However, these may represent true ecological variation (e.g., protected areas, unique habitat features) rather than data errors. Differences could also reflect spatial or temporal variation in environmental conditions or fish community structure. Field notes and environmental context should be consulted before concluding these are erroneous data points.
Interactive Scatter Plot - Biomass vs Abundance
Show the code
# Calculate mean abundance per sample eventse_with_abundance <- sampleunits_data %>%group_by(sample_event_id) %>%summarise(mean_abundance =mean(total_abundance, na.rm =TRUE)) %>%right_join(sampleevents_data, by ="sample_event_id") %>%# Join with outlier detection results from z-score testleft_join( site_biomass_stats %>%select(site, sample_date, is_outlier, z_score),by =c("site", "sample_date") ) %>%mutate(# Only flag as outlier if z-score > 2 (high biomass only)is_high_outlier =!is.na(z_score) & z_score >2,outlier_status =ifelse(is_high_outlier, "High Outlier", "Normal") )# Calculate upper biomass threshold based on z-scoresmean_biomass <-mean(sampleevents_data$biomass_kgha_avg, na.rm =TRUE)sd_biomass <-sd(sampleevents_data$biomass_kgha_avg, na.rm =TRUE)upper_biomass_threshold <- mean_biomass +2* sd_biomass# Get x-axis range for the linex_range <-range(se_with_abundance$mean_abundance, na.rm =TRUE)# Create interactive scatter plotscatter_plot <-plot_ly(data = se_with_abundance,x =~mean_abundance,y =~biomass_kgha_avg,type ="scatter",mode ="markers",text =~paste("Site:", site,"<br>Biomass:", round(biomass_kgha_avg, 1), "kg/ha","<br>Abundance:", round(mean_abundance, 1),"<br>Observer:", observers,"<br>Status:", outlier_status),hoverinfo ="text",marker =list(size =10,color =~ifelse(is_high_outlier, "#d13823", "#277d1d"), # Red for high outliers, green for normalline =list(color ="white", width =1) ),name ="Sample Events",showlegend =TRUE) %>%# Add horizontal reference line for upper biomass threshold onlyadd_segments(x = x_range[1],xend = x_range[2],y = upper_biomass_threshold,yend = upper_biomass_threshold,inherit =FALSE, # don't inherit marker mapping from the main plotmode ="lines", # lines only (no markers)marker =list(size =0, opacity =0), # belt-and-suspenders: hide any marker glyphsline =list(color ="red", dash ="dash", width =2),name ="Upper threshold (z = 2)",showlegend =TRUE,hoverinfo ="text",text =paste("Upper threshold:", round(upper_biomass_threshold, 1), "kg/ha") ) %>%layout(title ="Fish Biomass vs Abundance (High Outliers Highlighted)",xaxis =list(title ="Mean Fish Abundance (count per transect)"),yaxis =list(title ="Fish Biomass (kg/ha)"),hovermode ="closest" )scatter_plot
Note: Red points indicate sites with unusually high biomass (z-score > 2), falling above the dashed red threshold line.
Shark Observations
Shark Presence and Abundance
Show the code
# Identify shark families (Carcharhinidae, Sphyrnidae, etc.)shark_families <-c("Carcharhinidae","Sphyrnidae","Alopiidae","Lamnidae","Rhincodontidae","Galeocerdonidae","Ginglymostomatidae","Hemiscylliidae","Heterodontidae","Hexanchidae","Odontaspididae","Orectolobidae","Parascylliidae","Scyliorhinidae","Squalidae","Stegostomatidae","Triakidae")# Use shark-specific data for this analysisshark_obs <- shark_observations_data %>%filter(fish_family %in% shark_families)if(nrow(shark_obs) >0) {# Summarize shark observations by site shark_summary <- shark_obs %>%group_by(site, fish_family, fish_taxon) %>%summarise(total_count =sum(count, na.rm =TRUE),total_biomass =sum(biomass_kgha, na.rm =TRUE),n_observations =n(),.groups ="drop" ) %>%arrange(desc(total_count))kable(shark_summary,digits =2,col.names =c("Site", "Family", "Species", "Total Count", "Total Biomass (kg/ha)", "N Obs"),caption ="Shark Observations Summary")# Sites with sharkscat("\n\nTotal sites surveyed:", length(unique(shark_observations_data$site)), "\n\n")cat("Sites with shark observations:", length(unique(shark_obs$site)), "\n\n")cat("Percentage of sites with sharks:", round(100*length(unique(shark_obs$site)) /length(unique(shark_observations_data$site)), 1), "%\n\n")} else {cat("No shark observations in the dataset.")}
Total sites surveyed: 13
Sites with shark observations: 10
Percentage of sites with sharks: 76.9 %
Individual Shark Weight Validation
Note - If observations are flagged here it warrants further investigation but does not necessarily mean there is an error. Maximum weight data is pulled from Fishbase here to compare with the observations but there is less maximum weight data available than maximum length (the latter of which is what is used in the MERMAID Collect app to test for larger than expected observations).
Show the code
if(nrow(shark_obs) >0) {# Get individual shark observations with calculated weights individual_sharks <- shark_obs %>%mutate(# Remove "m" from transect_width and convert to numerictransect_width_m =as.numeric(str_replace(transect_width, "m", "")),# Calculate transect area in hectares# transect_length and transect_width_m are in meterstransect_area_m2 = transect_length * transect_width_m,transect_area_ha = transect_area_m2 /10000,# biomass_kgha is the total biomass for this observation per hectare# To get total biomass for the actual transect area:total_biomass_kg = biomass_kgha * transect_area_ha,# Divide by count to get individual weightindividual_weight_kg = total_biomass_kg / count ) %>%select(site, sample_date, fish_taxon, fish_family, size, count, biomass_kgha, transect_length, transect_width_m, transect_area_ha, individual_weight_kg)# Get unique species names for FishBase lookup shark_species <-unique(shark_obs$fish_taxon)# Attempt to get max weights from FishBasecat("Retrieving maximum weights from FishBase...\n\n")# Initialize results list max_weights_list <-list()for(species in shark_species) {# Try to get weight data from FishBasetryCatch({# Get species data species_data <-species(species, fields =c("Species", "Weight"))if(!is.null(species_data) &&nrow(species_data) >0) {if(!is.na(species_data$Weight) && species_data$Weight >0) { max_weights_list[[species]] <-tibble(fish_taxon = species,# FishBase Weight is in grams, convert to kgmax_weight_kg = species_data$Weight /1000 ) } } }, error =function(e) {# Silently continue if species not foundNULL }) }# Combine resultsif(length(max_weights_list) >0) { max_weights <-bind_rows(max_weights_list)# Join with observations shark_weight_check <- individual_sharks %>%left_join(max_weights, by ="fish_taxon") %>%mutate(exceeds_max =!is.na(max_weight_kg) & individual_weight_kg > max_weight_kg,weight_ratio = individual_weight_kg / max_weight_kg )# Check for any that exceed published max exceeding_sharks <- shark_weight_check %>%filter(exceeds_max) %>%arrange(desc(weight_ratio))if(nrow(exceeding_sharks) >0) {cat("**Warning:** Some individual shark observations exceed published maximum weights\n\n")kable(exceeding_sharks %>%select(site, sample_date, fish_taxon, size, count, individual_weight_kg, max_weight_kg, weight_ratio),digits =2,col.names =c("Site", "Date", "Species", "Size (cm)", "Count","Observed Weight (kg)", "Max Published Weight (kg)", "Ratio (Obs/Max)"),caption ="Shark Observations Exceeding Published Maximum Weights") } else {cat("<span style='color: green;'>✓ All individual shark weights are within published maximum ranges</span>\n\n") }# Summary table of all sharks with available max weights shark_comparison <- shark_weight_check %>%filter(!is.na(max_weight_kg)) %>%group_by(fish_taxon) %>%summarise(n_obs =n(),mean_observed_kg =mean(individual_weight_kg, na.rm =TRUE),max_observed_kg =max(individual_weight_kg, na.rm =TRUE),published_max_kg =first(max_weight_kg),n_exceeding =sum(exceeds_max, na.rm =TRUE),.groups ="drop" ) %>%arrange(desc(n_obs))kable(shark_comparison,digits =2,col.names =c("Species", "N Observations", "Mean Weight (kg)", "Max Observed (kg)", "Published Max (kg)", "N Exceeding"),caption ="Comparison of Observed vs Published Maximum Shark Weights") } else {cat("No maximum weight data available from FishBase for the observed shark species.\n")cat("This could be due to taxonomic naming differences or missing FishBase data.\n\n")# Still show the observationskable(individual_sharks,digits =2,col.names =c("Site", "Date", "Species", "Family", "Size (cm)", "Count", "Biomass (kg/ha)", "Transect Length (m)", "Transect Width (m)", "Transect Area (ha)", "Individual Weight (kg)"),caption ="Individual Shark Observations (FishBase validation unavailable)") }} else {cat("No shark observations in the dataset.")}
Retrieving maximum weights from FishBase…
Warning: Some individual shark observations exceed published maximum weights
Comparison of Observed vs Published Maximum Shark Weights
Species
N Observations
Mean Weight (kg)
Max Observed (kg)
Published Max (kg)
N Exceeding
Carcharhinus amblyrhynchos
17
70.45
194.09
33.70
16
Triaenodon obesus
10
18.45
30.05
18.25
5
Carcharhinus albimarginatus
1
19.66
19.66
162.20
0
Visualization of Observed & Published Maximum Shark Weights
Show the code
if(exists("shark_weight_check") &&nrow(shark_weight_check) >0) {# Get the species ordering based on max observed weight species_order <- shark_weight_check %>%group_by(fish_taxon) %>%summarise(max_obs =max(individual_weight_kg, na.rm =TRUE)) %>%arrange(max_obs) %>%pull(fish_taxon)# Apply this ordering to the data shark_weight_check <- shark_weight_check %>%mutate(fish_taxon =factor(fish_taxon, levels = species_order))# Create plot using geom_errorbar instead of geom_linerange p <-ggplot(shark_weight_check, aes(x = individual_weight_kg, y = fish_taxon)) +# Add points for observed weights firstgeom_point(aes(color = exceeds_max), size =3, alpha =0.7) +# Add vertical lines for published max weights using geom_vline with faceting trickgeom_point(data = shark_weight_check %>%filter(!is.na(max_weight_kg)) %>%distinct(fish_taxon, max_weight_kg),aes(x = max_weight_kg, y = fish_taxon),shape ="|", size =10, color ="red", stroke =2) +scale_color_manual(values =c("TRUE"="#d13823", "FALSE"="#277d1d"),labels =c("TRUE"="Exceeds max", "FALSE"="Within range"),na.value ="gray50",name ="Status") +labs(title ="Individual Shark Weights vs Published Maximum Weights",subtitle =paste0("Red vertical marks show published maximum weights from FishBase\n","Published weights available for ", length(unique(shark_weight_check$fish_taxon[!is.na(shark_weight_check$max_weight_kg)]))," of ", length(unique(shark_weight_check$fish_taxon)), " species" ),x ="Individual Weight (kg)",y ="Species" ) +theme_classic() +theme(axis.text.y =element_text(size =9, colour ="black", face ="italic"),axis.text.x =element_text(size =10, colour ="black"),axis.title =element_text(size =12, colour ="black"),plot.title =element_text(size =14, face ="bold", hjust =0.5),plot.subtitle =element_text(size =10, hjust =0.5, color ="gray30"),legend.position ="bottom" )print(p)} else {cat("No shark weight comparison data available to plot.")}
Shark Biomass by Site
Show the code
if(nrow(shark_obs) >0) {# Get all shark family columns from sample events data shark_family_cols <-grep("^biomass_kgha_fish_family_avg_", names(shark_sampleevents_data), value =TRUE)# Filter to only shark families (convert to lowercase for matching) shark_families_lower <-tolower(shark_families) shark_cols <- shark_family_cols[sapply(shark_family_cols, function(col) { family_name <-sub("biomass_kgha_fish_family_avg_", "", col)any(grepl(family_name, shark_families_lower, ignore.case =TRUE)) })]# Calculate shark and total biomass per sample event shark_biomass_summary <- shark_sampleevents_data %>%mutate(# Sum all shark family columnsshark_biomass =rowSums(select(., all_of(shark_cols)), na.rm =TRUE),# Use total biomasstotal_biomass = biomass_kgha_avg,# Calculate other fish biomassother_biomass = total_biomass - shark_biomass,# Calculate percentageshark_percent =100* shark_biomass / total_biomass,# Create label combining site and datesite_date =paste0(site, " (", format(as.Date(sample_date), "%Y-%m-%d"), ")") ) %>%# Filter to only sample events with sharksfilter(shark_biomass >0) %>%select(site_date, shark_biomass, other_biomass, total_biomass, shark_percent)# Prepare data for stacked bars shark_biomass_stacked <- shark_biomass_summary %>%pivot_longer(cols =c(shark_biomass, other_biomass),names_to ="biomass_type",values_to ="biomass") %>%mutate(biomass_type =factor(biomass_type,levels =c("other_biomass", "shark_biomass"),labels =c("Other fish", "Sharks")))# Plotggplot(shark_biomass_stacked, aes(x =reorder(site_date, total_biomass), y = biomass, fill = biomass_type)) +geom_col(position ="stack", alpha =0.8) +# Add percentage labels at the end of barsgeom_text(data = shark_biomass_summary,aes(x = site_date, y = total_biomass, label =paste0(round(shark_percent, 1), "%"),fill =NULL),hjust =-0.1, size =3, color ="black") +scale_fill_manual(values =c("Other fish"="#7fbc41", "Sharks"="#2c7bb6")) +coord_flip() +labs(title ="Shark and Total Fish Biomass by Sample Event (Events with Sharks)",subtitle ="Percentage shows shark contribution to total fish biomass",x ="Site (Date)",y ="Biomass (kg/ha)",fill ="Category" ) +theme_classic() +theme(axis.text =element_text(size =9, colour ="black"),axis.title =element_text(size =12, colour ="black"),plot.title =element_text(size =14, face ="bold", hjust =0.5),plot.subtitle =element_text(size =11, hjust =0.5, color ="gray30"),legend.position ="bottom" ) +# Expand x-axis limits to make room for percentage labelsscale_y_continuous(expand =expansion(mult =c(0, 0.15)))} else {cat("No shark observations in the dataset.")}
Observer Anonymization: Observer names have been anonymized to letters for sharing this document publicly.
Observer Consistency: Statistical tests have been performed to assess systematic differences between observers. Any detected differences warrant further investigation, but should be interpreted cautiously as they may reflect spatial or temporal variation rather than true observer effects.
Outliers: Sites identified as statistical outliers should be investigated to determine if they represent true ecological variation or potential data issues. Consider environmental context and field notes.
Shark Observations: Individual shark weights have been validated against published maximum weights from FishBase where available. Any observations exceeding published maxima should be carefully reviewed for potential data entry errors in size or abundance.
Data Integrity: For any flagged issues, consult field datasheets and consider re-validation of measurements before making corrections to the database.