MERMAID Image Classification Open Data Tutorial

R version

Author

Iain R. Caldwell

Published

September 9, 2025

The following is a short tutorial showing how MERMAID images and their associated annotations can be accessed from the S3 bucket using R code, with a final step of visualizing them together.


Setting up the environment

rm(list = ls()) #remove past stored objects
options(scipen = 999) #turn off scientific notation

#install.packages(c("arrow", "dplyr", "ggplot2", "magick", "paws", "tidyr"))

library(arrow)
library(dplyr)
library(tidyr)
library(ggplot2)
library(magick)
#library(paws) #Only needed if using S3 rather than https to retrieve image

Accessing MERMAID annotations

To access and work with MERMAID open data (including images and annotations) you will need to open the mermaid_confirmed_annotations.parquet file with a library such as arrow. The following creates an R dataframe from the parquet file.

#annotations_path_s3 = "s3://coral-reef-training/mermaid/mermaid_confirmed_annotations.parquet" # Location of the annotations file (S3)

annotations_path_https = "https://coral-reef-training.s3.us-east-1.amazonaws.com/mermaid/mermaid_confirmed_annotations.parquet" # Location of the annotations file (https)

# Read the full annotations table (as S3 or https)
# Each row corresponds to one annotated point for an image (25 per image)
#df_annotations_s3 <- arrow::read_parquet(annotations_path_s3)
df_annotations_https <- arrow::read_parquet(annotations_path_https)

# A per-image table (drop duplicate image rows)
df_images <- df_annotations_https %>%
  select(image_id, region_id, region_name) %>% 
  distinct()

glue::glue("Loaded {nrow(df_annotations_https)} annotations across {nrow(df_images)} images from {length(unique(df_images$region_id))} unique geographic realms.")
Loaded 50000 annotations across 2000 images from 2 unique geographic realms.

Fetching an image (function)

The following function will load an image from S3 (equivalent of get_image_s3 in Python) as a magick image object, using the paws.storage function.

get_image <- function(image_id,
                      bucket = "coral-reef-training",
                      region = "us-east-1",
                      thumbnail = FALSE,
                      use_s3_fallback = TRUE) {
  # 1) Try public HTTPS (works if the object is public)
  key <- if (thumbnail) sprintf("mermaid/%s_thumbnail.png", image_id)
         else           sprintf("mermaid/%s.png", image_id)
  https_url <- sprintf("https://%s.s3.%s.amazonaws.com/%s", bucket, region, key)

  img <- tryCatch(
    magick::image_read(https_url),
    error = function(e) NULL
  )
  if (!is.null(img)) return(img)

  # 2) Optional fallback to paws.storage (needs AWS creds)
  if (use_s3_fallback) {
    if (!requireNamespace("paws.storage", quietly = TRUE)) {
      stop("Public HTTPS failed and {paws.storage} is not installed for S3 fallback.")
    }
    s3 <- paws.storage::s3()
    obj <- tryCatch(s3$get_object(Bucket = bucket, Key = key), error = function(e) NULL)
    if (is.null(obj)) {
      stop("Could not fetch image via HTTPS or S3. If the object isn’t public, configure AWS credentials.")
    }
    return(magick::image_read(obj$Body))
  } else {
    stop("Could not fetch image via HTTPS. If the object isn’t public, enable S3 fallback and configure AWS credentials.")
  }
}

Get an image and its annotations

Once you have the annotations parquet file, you can extract an image using the above function and the associated annotations for that image from the dataframe as follows:

# Choose an index (integer) between 1 and nrow(df_images).
idx <- 1
stopifnot(idx >= 1, idx <= nrow(df_images))

#If you already have an image id you can apply it instead of the following line:
image_id_iter <- df_images$image_id[idx]
img <- get_image(image_id_iter, thumbnail = FALSE)

annotations <- df_annotations_https %>%
  filter(image_id == image_id_iter)

Plot the image with annotations

After getting an example image with its associated annotations you can visualize those as follows:

# --- tidy annotation fields & palette ---
annotations <- annotations %>%
  mutate(growth_form = ifelse(test = is.na(growth_form_name),
                              yes = "None",
                              no = growth_form_name))

# --- PLOT ---
p <- magick::image_ggplot(img) +
  geom_point(data = annotations,
             aes(x = col, y = row,
                 color = benthic_attribute_name,
                 shape = growth_form),
             size = 4) +
  scale_color_discrete(name = "Benthic attribute") +
  scale_shape_discrete(name = "Growth form") +
  theme_void()

p

 

Powered by

Logo