Resumen
Data interoperability is crucial for effectively combining data for scientific inquiry. To facilitate interoperability, data standards such as a common definition of variables are often developed. The Open Data Commons for Spinal Cord Injury (odc-sci.org) has established an initial set of community-based data elements (CoDEs)—a minimal set of variables for sharing—to promote data interoperability in SCI research, aligning with FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. We sought to understand the use of CoDEs by the SCI community to inform current standards adherence and future standards development. We systematically analyzed 39 public datasets in relation to 17 required CoDEs and found variations between reported data and the structure specified by the CoDEs. Overall, we found that the enforcement of data standards improved reporting rates of CoDEs variables. Notably, different variables were found to require different levels of curation to ensure semantic equivalence among datasets. We also uncovered specific reporting habits of researchers such as formatting and naming patterns. A need for different data standards based on the nature of the study (e.g., human study, derivative study) was realized alongside a detailed list of issues that should be addressed when implementing such standards. Among the various approaches to developing data standards, ODC-SCI adopted a semi-formal approach by creating standards that are easy to adopt by the user. Our data-driven evaluation of actual reporting behavior shows that this flexibility can lead to subsequent problems in harmonization. This study serves as a baseline analysis of reporting behaviors for shaping and facilitating data standards.
| Idioma original | English |
|---|---|
| Número de artículo | 115100 |
| Publicación | Experimental Neurology |
| Volumen | 385 |
| DOI | |
| Estado | Published - mar 2025 |
Nota bibliográfica
Publisher Copyright:© 2024 The Authors
Financiación
The realization of the full value of biomedical data—which is dependent on individual researchers actively sharing their data—holds major implications for biomedical science including increased transparency and credibility, meta-analysis based on individual subject data, informed experimental design (e.g., sample size estimation from shared data), and the integration of big data assets for knowledge discovery and hypothesis generation (Bandrowski and Martone, 2016; Begley and Ioannidis, 2015; Chan et al., 2014; Collins and Tabak, 2014; Ferguson et al., 2014; Levesque, 2017; Nielson et al., 2015). This belief that data sharing will benefit scientific research has gained prevalence in recent years, resulting in many regulatory organizations requiring data management best practices and mandating the sharing of research data. In the US, the NIH (National Institutes of Health) recently issued the Data Management and Sharing (DMS) policy (effective January 25, 2023) in which researchers submitting for funding must develop a DMS plan, including data sharing strategies (National Institutes of Health, 2023). As efforts of data sharing increase, new challenges have presented themselves, such as the need for establishing data standards across members of a research community to facilitate the sharing and integration of data.
| Financiadores |
|---|
| National Institutes of Health (NIH) |
| Division of Mathematical Sciences |
ASJC Scopus subject areas
- Neurology
- Developmental Neuroscience