More ocean data has been collected in the last two years than in all previous years combined, and we are on a path to continue to break that record. More than ever we need to establish a solid foundation for processing this ceaseless stream of data, especially for visual data where ocean-going platforms are beginning to integrate multi-camera feeds for observation and navigation. Techniques to efficiently process and utilize visual datasets with machine learning exist and continue to be transformatively effective, but have had limited success in the oceanographic world due to (1) lack of dataset standardization, (2) sparse annotation tools for the wider oceanographic community, and (3) insufficient formatting of existing, expertly curated imagery for use by data scientists.
Our efforts will establish a new baseline dataset, optimized to directly accelerate development of modern, intelligent, automated analysis of underwater visual data.This will enable scientists, explorers, policymakers, storytellers, and the public to learn, understand, and care more about our oceans than ever before.
Building on successes of the machine learning community, we propose to build a public platform (TBDNet) that makes use of existing (and future), expertly curated data to know what’s in the ocean and where it is for effective and responsible marine stewardship (Figure 1). This platform will be modeled after Stanford’s ImageNet that, along with other datasets, enabled rapid advances in automated visual analysis. Unlike ImageNet, which was created in a field that lacked curated data, TBDNet seeks to organize and index a wealth of existing data. We hope to address items (1) and (2) above, and begin addressing (3) within the duration of this project by utilizing MBARI’s Video Annotation and Reference System (VARS) and MBARI’s annotated deep sea video database that will serve as the primary image set for TBDNet. As the project progresses, we plan to incorporate other existing datasets from WHOI, URI, and elsewhere using the standards and workflow we develop.
MBARI uses high-resolution video equipment to record hundreds of remotely and autonomously operated vehicle dives each year. This video library contains detailed footage of the biological, chemical, geological, and physical aspects of each deployment. Since 1988, more than 23,000 hours of videotape have been archived, annotated, and maintained as a centralized MBARI resource. This resource is enabled by the Video Annotation and Reference System (VARS), which is a software interface and database system that provides tools for describing, cataloguing, retrieving, and viewing the visual, descriptive, and quantitative data associated with MBARI’s deep-sea video archives. All of MBARI’s video resources are expertly annotated by members of the Video Lab (VL), and there are currently more than 6 million annotations and 4000 terms in the VARS knowledgebase, with over 2000 of those terms belonging to either genera or species.
The proposed workflow, which will be completed by November, is described in Figure 2. Using the VARS search, images corresponding to a keyword query (e.g., genus or species) with a single annotation will be selected for automated bounding box identification using an existing computer vision algorithm, and verified by an MBARI VL Technician or crowdsourcing.
In parallel, a state-of-the-art deep learning algorithm will be developed on the expertly verified, labeled, and localized images. This algorithm will be used to augment future data labeling and verification tasks, continuing to improve as more data is added to the system. Once all single-annotation images are verified, multiple-annotation images will be iteratively used until all annotated images have been labeled and verified. As the workflow is finalized, we will pursue incorporating NGS Pristine Seas annotated image data to the training set, and demonstrate our efforts on unannotated video from the NOAA Office of Ocean Exploration and/or the Ocean Exploration Trust.
Quantity of usable data. The quality of any modern image processing algorithm is directly proportional to the quantity of accurate, labeled images. It is unknown how large, and in what state the MBARI dataset is.
Effectiveness of annotation algorithms. We have identified existing, open-source algorithms to augment the laborious image labeling and localization task, but the efficacy of these algorithms on MBARI’s existing dataset is unknown.
Validation. The need for accurate, crowdsourced validation of image annotations is crucial to the success of any high-quality, large-scale dataset. While there is prior work demonstrating the power of commercial data labeling services (e.g, Mechanical Turk or CrowdFlower), the effectiveness of these same services for a relatively niche dataset like MBARI’s is unknown.
Adoption. A large-scale database such as TBDNet is useless if it is not being used to further scientific understanding and ocean stewardship. Adequately addressing the previous unknowns in a public, well-defined manner is crucial to fostering trust between our efforts and the research community.
MBARI annotated video dataset
Project Leader salary support
CVision High Performance Compute cluster
OpenROV GPU machine
CVision in kind salary support for B. Woodward
OpenROV in kind salary support for G. Montague
Travel for on-site (MBARI) meeting
Refreshments and meals for on-site meeting
Research Assistant (6 months, term position at MBARI)
Product manager and CV/ML assist; 2 months @ 100% time and 4 months at 40% time.
2x CV/ML Experts @ 25% time for 6 months
Grace Young, Oxford University
Gilbert Montague, OpenROV
Ben Woodward, CVision AI
Genevieve Flaspohler, MIT
Joshua Gyllinsky, University of Rhode Island
Adam Soule, Woods Hole Oceanographic Institution
Katy Croff Bell, MIT Media Lab
Kakani Katija, Monterey Bay Aquarium Research Institute