Skip to the content

Data Management Plan

Introduction

The R. J. Cook Agronomy Farm (CAF) is a collaborative research effort with the goal to improve dryland cropping systems in the Inland Pacific Northwest.  CAF is primarily funded by the USDA through the Long-Term Agroecological Research (LTAR) initiative but is also supported by Washington State University, University of Idaho, Oregon State University, and various organizations.  As a member of the USDA LTAR network, data policies are largely pursuant to USDA guidelines.

The CAF Data Policies and Guidelines document provides more technical details and implementations.

The CAF Data Inventory provides details on data types, formats, and sources.

Expected Data Types

The data managed by CAF are of various types.  Non-digital data include manually recorded meteorological data on standard paper forms, intermediary reports from field and lab work on physical paper notebooks or forms, and logistics data from farming operations on standard and non-standard paper records.  Digital data include field collection, lab analysis, and real-time environmental data (meteorological, hydrological, pedological). Many of the data are tabular, spatial, or time-series in nature, but some digital photographic data are collected, binary data are generated from some computer models, and many storage and intermediate data are relational or object-oriented in nature.

All public datasets have a corresponding metadata files in ISO 19115 format

Data Formats and Standards

Much of the digital data are in formats that are human-readable and/or machine-readable using open-standard formats when possible.  Some legacy data are in inaccessible formats, such as PDF, but are being converted to open standards or more accessible formats when possible.  Some modeling data are in binary format but reduced to csv, xls, or netCDF. Written documents are in formats including Microsoft Word (doc, docx), text (.md, .txt., etc), OneNote or Evernote.  Outreach documents are in Powerpoint and HTML. Image formats are in JPEG, GIF, or TIFF. Most computer code are available on Github (C#, R, Python, Shell, Bash, Powershell) but legacy code is not (mostly VBA).  GIS data are in either ESRI shapefiles, geoJSON, or geoTIFF. Databases include Microsoft Azure Cosmos DB, Microsoft Azure Blob Storage, and Microsoft Azure SQL Server.

Public and mature data have data schemas, including no-SQL data stored in JSON format (Cosmos DB).  Data dictionaries and taxonomies are being pursued with plans to be tightly integrated.

Data Storage and Preservation of Access

All digital data, including working documents, are planned for long-term preservation.  At the start of a project, a cloud backup system is put in place – primarily Google Drive but Dropbox, Microsoft Drive, and others are supported.  During the lifespan of the project, the data are continuously backed up to the cloud. Upon project completion, or major milestones, the data are moved to the Washington State University long-term cloud storage (geo-redundant backups) and backed up on a local network attached storage within the unit.  Metadata and project notes are included in archived working documents. Computer code is saved on Github (github.com/caf-ltar).

Published data are either pushed to the LTAR Data Repository or hosted on the long-term data repository managed by CAF.  Some data are published to USDA ARS's ArcGIS Online account, Ag Data Commons, AmeriFlux, Phenocam Network, or other trusted scientific repositories.  Regardless of the hosting service of the data, the metadata are registered and maintained on the National Agricultural Library (NAL) GeoData database which is harvested by NAL's Ag Data Commons and integrated into data.gov.

Data Sharing and Public Access

Refer to the official LTAR document on data sharing and public access (in draft).

Roles and Responsibilities

The Data Manager at CAF is responsible for ensuring the implementation of the data management plan and providing minimal oversight of collaborators to ensure CAF data policies are being followed.  In the absence of the data manager, the Director is responsible.