Poster Session 1
Time: 06/Jun/2018: 5:30pm-7:00pm · Location: Roosevelt

CRESCYNT Data Science for Coral Reefs: Data Rescue as Motivation for Data Management Training

Ruth D Gates1, Ouida W Meier1, Megan Donahue1, Judy Lemus1, Gwen Jacobs1, Erik Franklin1, Ilya Zaslavsky2, CRESCYNT Data Science for Coral Reefs Workshop_Participants1

1Hawaii Inst of Marine Biology / Univ of Hawaii Manoa, United States of America; 2San Diego Supercomputer Center, Univ of California San Diego, United States of America

The CRESCYNT Coral Reef Science and Cyberinfrastructure Network held a workshop on Data Science for Coral Reefs: Data Rescue workshop to broadly teach some essential data management skills, consistently indicated by coral reef researchers is a persistent need, and simultaneously offer an opportunity for data rescue of older coral reef data sets in danger of being lost to science. High quality observations of reefs taken decades ago become more valuable with time, as those particular intersections of space, time, organism, community, and physical environment will never be repeated again. A need to develop a set of data rescue workflows that could be widely shared coincided with a need frequently expressed by coral reef researchers for improved data management skills. This workshop was designed as two days of training followed by two days of workathon, and was held at NCEAS in March 2018. Participants undertook basic data management training and metadata creation, and then made progress in salvaging, archiving, and linking collections of related data sets. The opportunity allowed us to develop specific recommendations for suitable repositories, metadata structures consistent with the needs of coral reef researchers, workflows for capturing, preserving, and maximizing future accessibility of valuable reef records, and revelations about how to keep data fresh and therefore actively curated. A data discovery exercise produced researcher assessment of several repositories and a metadata aggregator (CINERGI), and made researchers more aware of how to write thoughtful metadata for future discovery by others. Participants included a fortuitous combination of senior scientists, post docs, graduate students, and skilled technical specialists, who all contributed to what we expect will be a long tail of positive workshop outcomes as participants committed to sharing their new skills with others. Materials developed for the workshop have been designed to be shared with the coral reef community and other researchers.

Breakout Session 6: Science Engagement and Outreach - Panel
Time: 07/Jun/2018: 1:45pm-3:45pm · Location: Kennedy

CRESCYNT Data Science for Coral Reefs: Data Integration and Team Science

Ruth D Gates1, Ouida W Meier1, Megan Donahue1, Erik C Franklin1, Judy Lemus1, Gwen Jacobs1, Ilya Zaslavsky2, CRESCYNT Data Science for Coral Reefs Workshop_Participants1

1Hawaii Inst of Marine Biology, Univ of Hawaii Manoa, United States of America; 2San Diego Supercomputer Center, Univ of California San Diego, United States of America

Coral reef scientists have reported consistently that data integration is a serious and limiting challenge in addressing questions at multiple scales and with collaboration from multiple disciplines associated with coral reef work. At a basic level, data integration needs may be a simple requirement to connect disparate data sets, such as reef observations with environmental context. A culminating workshop for the CRESCYNT, the Coral Reef Science and Cyberinfrastructure Network, addressed this issue with a workshop in March 2018 that combined intensive training and workathon. The first two days focused on training and advanced R techniques, including collaborative effort in data cleaning and analysis. The training and applied work relied on linking all R Studio coding work directly to GitHub, thereby affording version control, collaboration, and incorporating code, documentation, and visualization of results into a very readable document. The second two days focused on broad collaboration on an example use case, drawing on a series of vertically scaled data sets from different disciplines contributed by researchers working in Kaneohe Bay, Hawaii. The confluence of collaborative open science tools, intensive and competent training, researchers from a range of disciplines, generous sharing of fortuitous data sets, time spent face to face in discussion and novel work, and a mixture of applying skills and thinking together about new science questions that could be asked with these collaborative open science tools, resulted in a significant leap forward in skills and practice for participants, a joint scientific endeavor to break new ground, and a commitment to share results and outcomes with the rest of the community. Participants included senior scientists, staff scientists, technical specialists, early career faculty, postdocs, and graduate students.

Poster Session 2
Time: 07/Jun/2018: 4:30pm-5:30pm · Location: Roosevelt

From Data Discovery to Research Workflows in EarthCube: Linking Catalog and Workbench Tools

Ilya Zaslavsky1, Stephen Richard2, David Valentine1, Thomas Whitenack1, Gary Hudman7, Karen Stocks4, Jeffrey Grethe3, Amarnath Gupta3, Ouida Meier5, Bernhard Peucker-Ehrenbrink6, Burak Ozyurt3

1San Diego Supercomputer Center; 2Lamont Doherty Earth Observatory; 3University of California, San Diego; 4Scripps Institution of Oceanography; 5University of Hawaii; 6Woods Hole Oceanographic Institution; 7Arizona Geological Survey

Most discovery tools to date have focused on searching resource metadata and related indexed content to locate data files or services, or software tools, of interest for addressing geoscience research problems. A typical scenario involves downloading data files and then preparing them for use in a locally-installed or a server-based software application. Accessing, exploring and configuring the data for use in research applications often takes significant time due to inconsistent or incomplete metadata, poorly described data semantics, and lack of mechanisms for bringing data directly into a workbench environment.

The EarthCube Data Discovery Hub is developing approaches and infrastructure to reduce time to science by linking search results in the catalog directly to software tools and environments. Implementing such linkages can be done in several ways: 1) Generate machine-actionable links that will open web-accessible applications and load the data resource and include them in metadata; 2) Augment metadata descriptions to make the data easier to interpret and incorporate in a research workflow; 3) Provide standardized, structured descriptions of data distribution options ('affordances') that client applications could use to match with applications dynamically, and 4) Develop bi-directional interfaces for invoking online applications and workbenches directly from the data discovery environment, and subsequently updating the catalog with information about data usage on the workbench.

Inclusion of the additional information systematically in metadata is performed by CINERGI metadata augmentation pipeline, with the metadata subsequently indexed and published in the DDH catalog – which now provides options for 'Workbench' linkage from search results. Adoptions of conventions for such additional metadata, and interfacing search results with one or several research workbenches, can enable EarthCube to streamline the workflow from data discovery to data utilization.

