Data and Statistics Support Services

Research Data Management

NIH DMSP

Looking for information on the NIH Data Management and Sharing Plan 2023 requirements?
Consult our Guide with UVA-focused guidance on preparing your plan.

Electronic Lab Notebooks

NEW!

As of June 2024, UVA has a university-wide license to LabArchives. LabArchives is a web-based Electronic Lab Notebook (ELN) that allows you to record data in any file format, collaborate within your lab, and retain a complete history of the data.

LabArchives is sponsored by UVA's Office of the Vice President for Research. To get started, visit the UVA LabArchives guide.

Introduction

Research Data Management

Conducting research involves working with data and involves processes from start to finish, including naming files, preparing and cleaning your data, performing analyses, documenting your work, and more. Below are selected resources help improve your workflows through better data management practices.

File Naming and Organization

File Naming

File names should be:
- Human readable - you should understand the content of the file from its name alone
- Machine readable - don't use spaces (_ or - are preferable), special characters or accents
- Use default ordering - start with a leading zero (01, 02, etc) for numbered files, use the YYYYMMDD date format
Have a plan for version control, either by including version numbers in the file names or using an automated system l

File Organization

Always save raw data in a separate folder
Make sure folder names are descriptive and consistent
You can create a folder structure based on data type, processing stage, or any other system that makes sense to you and your collaborators

File Formats

Be sure to save data in an open format for sharing and long-term preservation
Check the Library of Congress Recommended Formats list to see what is recommended for your file type

Resources:

File Naming and Organization Worksheet

Spreadsheets and Tidy Data

Following a few basic recommendations when working with research data in spreadsheets can save you time when it comes to analyzing your data. Consider these practices adapted from Data Organization in Spreadsheets, Karl W. Broman & Kara H. Woo.

Basic Spreadsheet Practices

Getting Started

Make backups
No calculations in the raw data files
Make it a rectangle and fill in each cell

Inputting Data

Be consistent
What to do with empty cells? Consider options as described here: DataONE - Data Entry and Manipulation
Use data validation to avoid errors. For Excel, see: Excel Data Validation Guide | ExcelJet

Tidy Data

Data Science for the Biomedical Sciences
- Every column is a variable
- Every row is an observation
- Every cell is a single value
Do not use font color or highlighting as data - consider an additional column with a "flag" value, as described above in Exercise 3

Sharing

Save the data in plain text files

Additional Resources:

Data Science for the Biomedical Sciences - Spreadsheets
Data Carpentry Spreadsheet Lesson
DataONE Data Entry and Manipulation (creating files, missing values, data validation)
Data Organization in Spreadsheets, Karl W. Broman & Kara H. Woo

Data Documentation and Description

Documentation:

Describing your Project

CESSDA has a useful guide for creating project-level documentation:

For what purpose was the data created
What does the dataset contain?
How was the data collected?
Who collected the data and when?
How was the data processed?
What possible manipulations were done to the data?
What were the quality assurance procedures?
How can the data be accessed?

Describing your Dataset(s)

Nice overview on Readme, Data Dictionaries, Codebooks with examples (Iowa)

Readme File

A readme is typically a plain-text file that provides information about a datafile to help facilitate use and re-use of the data. Typical elements to a readme include the following (adapted from Guide to Writing "readme" Style Metadata). Using one of the templates below can help ensure you create a useful readme file.

Readme Content: General Information

Dataset title
Creator name and contact information
Date(s) of data collection
Location of data collection
Keywords to describe data topic

Readme Content: Data and Files

Descriptive file names for each file, and for each, a description of what data is contained
Date the file was created
List of variables for each dataset, including full names and descriptions of each
Definitions of any codes or symbols, including those for missing data

Readme Content: Methods

Methods for data collection or generation
Methods used for data processing

Additional Readme Resources

Sample Readme file from deposited dataset "Data and R code to support: Estimating densities of zebra mussels (Dreissena polymorpha) in early invasions using distance sampling"
LibraData (UVA's Research Data Repository) Readme template
Guide to Writing "readme" Style Metadata (including a template) (Cornell)

Data Dictionary

How to Make A Data Dictionary (OSF)
Describing Your Data: Data Dictionaries (Smithsonian)

Codebook

Nice, simple codebook example (Kent State) (part of a good SPSS tutorial on Creating a Codebook)

Metadata and Standards

Disciplinary Metadata (Digital Curation Centre) - links to information about metadata standards by discipline/field

List of Metadata Standards
List of Metadata Use Cases to look over for examples - e.g. ICPSR - Inter-university Consortium for Political and Social Research - a huge repository of datasets that have been professionally curated (described) with all metadata conforming to the DDI standard.

Choosing a Repository

Selecting a Data Repository

Considerations

An effective way to make your data accessible is to store it in a repository. In this case, a data repository refers to a storage service that offers a mechanism for managing and storing digital content, where users can upload final datasets to make them accessible and discoverable.

Benefits of digital repositories include:

Raise the impact of your research by allowing you to make data accessible to other researchers and scholars
Keep your data safe and readable in the long-term
Meet funder or publisher requirements

NIH Data Management and Sharing Requirements

Get assistance with writing your plan for the new NIH Data Management and Sharing Policy from our Guide.

Journal Sharing Requirements

Science journals http://www.sciencemag.org/authors/science-journals-editorial-policies
SpringerNature https://www.springernature.com/gp/authors/research-data-policy
Nature journals https://www.nature.com/authors/policies/availability.html
Wiley https://authorservices.wiley.com/author-resources/Journal-Authors/open-access/data-sharing-citation/index.html
Sage https://us.sagepub.com/en-us/nam/journal/big-data-society#submission-guidelines

To make your data/supplements available, first make sure that they are appropriate for sharing (e.g. de-identified if needed), and properly organized and labeled. Typically uploading datasets or supplements are straightforward.

More general considerations when deciding where to deposit your data

Is the repository recommended by the publisher or funder? If you are submitting your supplemental data in a journal article, you should check for the journal's data policy and data repositories specified therein
Is the repository recognized within the research field and/or a discipline-specific repository? Or, if none is available in your field, do you need a generalist repository?
Does the repository provide a Digital Object Identifier (DOI) or other means for your data to be cited?
Will the data be easy to find by other researches? Does the it metadata or other methods to describe your data?

Discipline-Specific Repositories

First, check your funder or journal requirements for recommended or preferred repositor/ies

Repository Directories and Lists

NIH-supported Scientific Data Repositories provided by NIH
re3data is a searchable global registry of research data repositories
OpenDOAR global directory of Open Access repositories and their policies
Nature's Scientific Data and PLOS journals list discipline-specific repositories and cross-disciplinary repositories

Sample Discipline-Specific Repositories

Protein Data Bank - Worldwide repository of 3D structures of proteins, nucleic acids, and complex assemblies
ArrayExpress - One of the major international repositories for high-throughput functional genomics data from both microarray and high throughput sequencing studies.
Cardiovascular Research Grid - sharing cardiovascular data
Mouse Genome Database
Zebra Fish Model Organism

General and Cross-Disciplinary Repositories

UVA Data Repository

LibraData (UVA’s DataVerse data repository)

NIH-affiliated Repositories

In general, NIH does not endorse any particular repository. Overall, NIH encourages researchers to select the repository that is most appropriate for their data type and discipline. This list of NIH-supported repositories provides examples of suitable repositories.

General (Multidisciplinary) Data Repositories

The Generalist Repository Ecosystem Initiative (GREI) includes seven established generalist repositories that will work together to establish consistent metadata, develop use cases for data sharing, train and educate researchers on FAIR data and the importance of data sharing, and more:

Dataverse - note that UVA's respository, LibraData, is a Dataverse
Dryad - well-established repository led by a nonprofit organization that (note there are Data Publishing Charges)
Figshare - accepts scholarly output including figures, datasets, media, papers, posters, presentations and filesets
M endeley Data - a free cloud-based service run by Elsevier
Open Science Framework - open to many types of output
Vivli - a clinical research data sharing platform
Zenodo - a free cloud-based service based on the European Organization for Nuclear Research (CERN's) data repository platform

In addition to the above, NIH notes Synapse as an appropriate generalist repository:

Synapse - create a project and share your data to the public when ready

UVA Research Data Resources

University Policies:

UVA-Contracted Cloud Storage:

Additional UVA Storage Resources:

Backing Up Your Data:

Consider the rule of three:
- Here (lab computer, personal computer)
- Near (portable hard drive, flash drive)
- Far (cloud storage, remote backup)
CESSDA Guide to Backing Up Data

Rigor and Reproducibility

See our guide on how to incorporate rigor and reproducibility practices into your biomedical research.

Other Research Data Platforms at UVA

In addition to UVA's license to LabArchives, other systems for managing or storing your research data include:

iTHRIV Research Data Commons - health-related research projects including PHI can consider this system which allows research teams from different departments and schools to manage tiered access as controlled by project owners and data administrators, ensuring that data can only be accessed by the appropriate users.
UVA's Research Computing's High Performance Computing environments or Cloud solutions

Questions?

Need more information on managing your research data? We are here to help:

Health Sciences Library Research & Data Services - contact us at hsl-rdas@virginia.edu

Research and Data Services