Skip to Main Content

Research Data Management

This guide provides information about Research Data Management, data management plans, and training resources.

Context

Researchers and their teams need to be aware of the policies and processes to which their research data must comply. In instances where sensitive data cannot be made public for various ethical, policy or legal reasons, research teams should consider whether de-identifying data, i.e. removing direct identifiers, is possible and would allow for safe sharing. 

Direct identifiers are those which place study participants at immediate risk of being re-identified. The following list is based on various sources, including guidance from major international funding agencies, the US Health Insurance Portability and Accountability Act (HIPAA) and the British Medical Journal

Direct identifiers include:

  • Names or initials, as well as names of relatives or household members
  • Addresses, and small area geographic identifiers such as postal codes / zip codes
  • Telephone numbers
  • Electronic identifiers such as web addresses, email addresses, social media handles, or IP addresses of individual computers
  • Unique identifying numbers such as hospital IDs, Social Insurance Numbers, clinical trial record numbers, account numbers, certificate or license numbers
  • Exact dates relating to individually-linked events such as birth or marriage, date of hospital admission or discharge, or date of a medical procedure
  • Multimedia data: unaltered photographs, audio, or videos of individuals
  • Biometric identifiers including finger or voice prints, and iris or retinal images
  • Human genomic data, unless risk was explained and consent to share data or consent for secondary use of data was received from study participants
  • Age information for individuals over 89 years old

For research projects involving human participants and human biological materials, these decisions must align with UFV's Human Research Ethics requirements.

Data Protection Terms

Method Description
Anonymization

Direct and indirect identifiers have been removed or manipulated together with mathematical and technical guarantees to prevent re-identification.

Example: Meaningless data is calibrated to a dataset to hide whether an individual is present or not.
 

De-identification
Direct and known indirect identifiers have been removed or manipulated to break the linkage to real world identities.
Example: Data are suppressed, generalized, perturbed, or swapped; e.g., GPA: 3.2 = 3.0-3.5, gender: female = gender: male.
 
Pseudonymization Information from which direct identifiers have been eliminated or transformed, but indirect identifiers remain intact.
Example: Unique, artificial pseudonyms replace direct identifiers; e.g., John Doe = 5L7T LX619Z (unique sequence not used anywhere else).

Open Source Tools

  ARX  Amnesia Anonimatron
Website https://arx.deidentifier.org/ https://amnesia.openaire.eu/  https://realrolfje.github.io/anonimatron/
Purpose
  • Anonymization
  • De-identification
  • Anonymization
  • Anonymization
System Requirement
  • Windows, macOS or Linux
  • Windows and Linux
  • Online version 
  • Windows, macOS or Linux
Notable Features
  • Supports popular models for protecting data including k-anonymity, and variants ℓ-diversity, t-closeness, β-Likeness
  • Allows end-users to categorize, top and bottom code, generalize, and transform data in more complex ways
  • Extract data from CSV, Excel, and DBMS
  • General Data Protection Regulation (GDPR) compliant
  • Supports k-anonymity and km-anonymity 
  • The anonymized data can be saved locally or directly to Zenodo
  • This software may work best for clinical data, or data which are not survey data
  • General Data Protection Regulation (GDPR) compliant
  • Supports popular models for protecting data such k-anonymity
  • Can generate fake emails, names, or ID's

Recommended Readings

Acknowledgement

Adapted from: University of Victoria. (2022). Sensitive Data. https://libguides.uvic.ca/researchdata/planning/sensitivedata

Thank you Monique Grenier for allowing us to use content from the Tools for Sensitive Data Page. 

The University of the Fraser Valley is situated on the traditional territory of the Stó:lō peoples. The Stó:lō have an intrinsic relationship with what they refer to as S’olh Temexw (Our Sacred Land), therefore we express our gratitude and respect for the honour of living and working in this territory.

© , University of the Fraser Valley, 33844 King Road, Abbotsford, B.C., Canada V2S 7M8