LibGuides UFV: Research Data Management: Organize

Organization & Documentation

Proper organization will help both you and others understand the data’s relevance, whether it’s one week, one month, one year or 10 years down the road. Organization and documentation also provide tools for evaluating the data you’ve collected, help your teams keep track of the data as the research progresses, and make the entire process more accurate and efficient. Proper organization also includes sound storage and backup plans in case of natural disasters or machine errors that may destroy or jeopardize your data.

Have you ever had problems opening up files and getting error messages regarding versions? Where possible, choose sustainable open file formats, so that your data can be accessed and used in the future.

Open file formats: Files that are open to the public without requiring paid patented proprietary software to read and use. Descriptions of the file format and software are also available to the public. For example, a CSV file is an open file format for spreadsheet applications.

Proprietary file formats: Files that are owned and controlled by companies where the description of the files and software are patented and not openly available to the public (e.g. Microsoft Excel’s XLSX format). You may need to purchase or renew a user license to continue working with the data; also, if you have a legacy copy of a program like Adobe Photoshop, you may need to upgrade to the latest version in order to collaborate with others on a file.

Below are a few examples of open sustainable file formats for different categories of data:

Text
XML (.xml)
Open Document Format (.odt)
Plain text (.txt)

Tabular
Character-delimited files such as Comma Separated Value (.csv) or Tab Delimited (.tab)
XML
Plain text (.txt)
JSON

Media
Uncompressed TIFF (.tif)
JPEG 2000 (.mj2)
MPEG-4 (.mp4)
Free Lossless Audio Codec (.flac)
Geospatial
ESRI Shapefiles and supporting files (.shp, .shx, .dbf, .prj, .sbx, .sbn)
KML (.kml)
GML (.gml)
GeoTIFF (.tif, .tfw)

At the beginning of your research project, set up a file-naming convention and make sure everyone on the team adheres to the same rules. Aim for short and descriptive filenames and simple file folder hierarchies. This will help you search and locate your data with less effort.

Here are a few tips and recommendations to consider:

Use international date format

Example:
YYYYMMDD (20220506)

This allows computer programs to sort dates chronologically and avoid confusion between month and day.

Shorten your project name to minimize side scrolling

Avoid: Fraser Valley Pest And Pollinator Management
Try: FVPPM

Describe details about the file content with relevant chronological and or version information

Avoid: FVPPM_20200303
Try: FVPPM_Thrips_Potato20200303
OR: FVPPMThripsPotato202000303

Use underscore (_) as delimiter.

Avoid using spaces or special characters (e.g. !@#$%^&*(){}[]<>+) in filenames, as different computer programs handle these differently.

Avoid: FVPPM Thrips Potato 20200303
Try: FVPPM_Thrips_Potato_20200303

Use date and time, or sequential numbers, to keep track of versions

Avoid: FVPPM_Thrips_Potato_Initial_Investigation
Try: FVPPM_Thrips_Potato_20200303

Keep file hierarchies simple so you can find documents easily

Avoid: G://UFV/Agriculture/BritishColumbia/ChilliwackCampus/FVPPM_Thrips_Potato_20200303

Try: G://Agri/FVPPMF_Thrips_Potato_20200303

This page is adapted from UBC data management planning documentation:

UBC data management planning documentation

Version Control

Version control refers to the management of different versions of your information across different formats, including text documents, online web content, and computer software. This process allows you to track the progression of your study over time. UK Data Service has a brief but comprehensive guide of versioning. Their link is here:

UK Data Service

Here are some items to consider regarding version control, as outlined in the page linked above:

Decide how many versions of a file to keep, which versions to keep, for how long and how to organise versions.
Identify milestone versions to keep, e.g. Major versions rather than minor versions (keep version 02-00 but not 02-01).
Uniquely identify different versions of files using a systematic naming convention, such as using version numbers or dates.
Record changes made to a file when a new version is created.

Grey outlines of three document icons with Version 1, 1.1, 1.2 above, back background.

Here are various ways to implement version control, again taken directly from the above link:

The date recorded in the file name or within the file, for example, HealthTest-2008-04-06.
Version numbering in the file name, for example, HealthTest-00-02 or HealthTest_v2.
A file history, version control table or notes included within a file, where versions, dates, authors and details of changes to the file are recorded.
Version control facilities within the software used.
Using versioning software, e.g. Subversion.
Using file-sharing services, such as Dropbox or Google Docs.
Controlling rights to file-editing.
Manual merging of entries or edits by multiple users.

For visual examples of version control, please visit the UK Data Service’s website on versioning:

UK Data Service

Metadata describes and documents your research data and allows your work to be discovered, accessed and reused by the scholarly community. Proper documentation allows your research to be found through data repositories, library catalogues, archives and scholarly citations. It also allows finders to locate and understand the content of your research.

Below are some of the metadata elements that are included in research documentation:

Creator(s) or Principal investigator(s)
Description (a short abstract)
Funding agencies
Date/versions
Subject
Format
Unique persistent identifier (e.g. doi)

Aside from citations and file-level metadata, various disciplines and data repositories have developed and may require specific metadata standards. For example, DDI (Data Documentation Initiative) describes data in the field of humanities and social sciences:

Data Documentation Initiative (DDI)

If you are looking for a schema specific to your discipline, please consult UK Digital Curation Centre’s comprehensive list of disciplinary specific metadata standards:

Digital Curation Centre - Search by Discipline

A README file is a Plain Text (.txt) file that is used to provide information about the project. This can include locations of different datasets, software used to analyze data, research team members and their roles in the project, and definitions of technical terms. A README file also allows the contextualization of information and decisions long after a research project has ended.

In general, your README files should be written in a way that is readily understood by peers in your discipline. It is also preferable if your README file is machine readable. Below are couple of links from Cornell University and the University of British Columbia that are on creating README files.

Cornell University - Guide to writing "readme" style metadata
UBC - How do I create a README
Here is UBC's guide on writing a README file.

README Template from UBC
This is a README Template created by the University of British Columbia Library.

Storage and security are the foundations of good research data management; your research efforts will be wasted otherwise. Data stored on desktops, laptops and/or personal devices should be transferred to secure storage that is regularly backed up.

Follow the 3-2-1 Rule:

Save three copies of your data, on two different types of storage media, and keep one copy off-site.

Three document icons, storage media types (External Hard Drive, Cloud Storage, U F V server), U F V school outline with doc.