Proper organization will help both you and others understand the data’s relevance, whether it’s one week, one month, one year or 10 years down the road. Organization and documentation also provide tools for evaluating the data you’ve collected, help your teams keep track of the data as the research progresses, and make the entire process more accurate and efficient. Proper organization also includes sound storage and backup plans in case of natural disasters or machine errors that may destroy or jeopardize your data.
Have you ever had problems opening up files and getting error messages regarding versions? Where possible, choose sustainable open file formats, so that your data can be accessed and used in the future.
Open file formats: Files that are open to the public without requiring paid patented proprietary software to read and use. Descriptions of the file format and software are also available to the public. For example, a CSV file is an open file format for spreadsheet applications.
Proprietary file formats: Files that are owned and controlled by companies where the description of the files and software are patented and not openly available to the public (e.g. Microsoft Excel’s XLSX format). You may need to purchase or renew a user license to continue working with the data; also, if you have a legacy copy of a program like Adobe Photoshop, you may need to upgrade to the latest version in order to collaborate with others on a file.
Below are a few examples of open sustainable file formats for different categories of data:
Text
XML (.xml)
Open Document Format (.odt)
Plain text (.txt)
Tabular
Character-delimited files such as Comma Separated Value (.csv) or Tab Delimited (.tab)
XML
Plain text (.txt)
JSON
Media
Uncompressed TIFF (.tif)
JPEG 2000 (.mj2)
MPEG-4 (.mp4)
Free Lossless Audio Codec (.flac)
Geospatial
ESRI Shapefiles and supporting files (.shp, .shx, .dbf, .prj, .sbx, .sbn)
KML (.kml)
GML (.gml)
GeoTIFF (.tif, .tfw)
At the beginning of your research project, set up a file-naming convention and make sure everyone on the team adheres to the same rules. Aim for short and descriptive filenames and simple file folder hierarchies. This will help you search and locate your data with less effort.
Here are a few tips and recommendations to consider:
Use international date format
Example:
YYYYMMDD (20220506)
This allows computer programs to sort dates chronologically and avoid confusion between month and day.
Shorten your project name to minimize side scrolling
Avoid: Fraser Valley Pest And Pollinator Management
Try: FVPPM
Describe details about the file content with relevant chronological and or version information
Avoid: FVPPM_20200303
Try: FVPPM_Thrips_Potato20200303
OR: FVPPMThripsPotato202000303
Use underscore (_) as delimiter.
Avoid using spaces or special characters (e.g. !@#$%^&*(){}[]<>+) in filenames, as different computer programs handle these differently.
Avoid: FVPPM Thrips Potato 20200303
Try: FVPPM_Thrips_Potato_20200303
Use date and time, or sequential numbers, to keep track of versions
Avoid: FVPPM_Thrips_Potato_Initial_Investigation
Try: FVPPM_Thrips_Potato_20200303
Keep file hierarchies simple so you can find documents easily
Avoid: G://UFV/Agriculture/BritishColumbia/ChilliwackCampus/FVPPM_Thrips_Potato_20200303
Try: G://Agri/FVPPMF_Thrips_Potato_20200303
This page is adapted from UBC data management planning documentation:
Version Control
Version control refers to the management of different versions of your information across different formats, including text documents, online web content, and computer software. This process allows you to track the progression of your study over time. UK Data Service has a brief but comprehensive guide of versioning. Their link is here:
Here are some items to consider regarding version control, as outlined in the page linked above:
Here are various ways to implement version control, again taken directly from the above link:
For visual examples of version control, please visit the UK Data Service’s website on versioning:
Metadata describes and documents your research data and allows your work to be discovered, accessed and reused by the scholarly community. Proper documentation allows your research to be found through data repositories, library catalogues, archives and scholarly citations. It also allows finders to locate and understand the content of your research.
Below are some of the metadata elements that are included in research documentation:
Aside from citations and file-level metadata, various disciplines and data repositories have developed and may require specific metadata standards. For example, DDI (Data Documentation Initiative) describes data in the field of humanities and social sciences:
If you are looking for a schema specific to your discipline, please consult UK Digital Curation Centre’s comprehensive list of disciplinary specific metadata standards:
A README file is a Plain Text (.txt) file that is used to provide information about the project. This can include locations of different datasets, software used to analyze data, research team members and their roles in the project, and definitions of technical terms. A README file also allows the contextualization of information and decisions long after a research project has ended.
In general, your README files should be written in a way that is readily understood by peers in your discipline. It is also preferable if your README file is machine readable. Below are couple of links from Cornell University and the University of British Columbia that are on creating README files.
Storage and security are the foundations of good research data management; your research efforts will be wasted otherwise. Data stored on desktops, laptops and/or personal devices should be transferred to secure storage that is regularly backed up.
Follow the 3-2-1 Rule:
Save three copies of your data, on two different types of storage media, and keep one copy off-site.
© , University of the Fraser Valley, 33844 King Road, Abbotsford, B.C., Canada V2S 7M8