Social science data repositories in data deluge A case study of ICPSR's workflow and practices

Wei Jeng, Daqing He, Yu Chi

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Purpose - Owing to the recent surge of interest in the age of the data deluge, the importance of researching data infrastructures is increasing. The open archival information system (OAIS) model has been widely adopted as a framework for creating and maintaining digital repositories. Considering that OAIS is a reference model that requires customization for actual practice, this paper aims to examine how the current practices in a data repository map to the OAIS environment and functional components. Design/methodology/approach - The authors conducted two focus-group sessions and one individual interview with eight employees at the world's largest social science data repository, the Interuniversity Consortium for Political and Social Research (ICPSR). By examining their current actions (activities regarding their work responsibilities) and IT practices, they studied the barriers and challenges of archiving and curating qualitative data at ICPSR. Findings - The authors observed that the OAIS model is robust and reliable in actual service processes for data curation and data archives. In addition, a data repository's workflow resembles digital archives or even digital libraries. On the other hand, they find that the cost of preventing disclosure risk and a lack of agreement on the standards of text data files are the most apparent obstacles for data curation professionals to handle qualitative data; the maturation of data metrics seems to be a promising solution to several challenges in social science data sharing. Originality/value - The authors evaluated the gap between a research data repository's current practices and the adoption of the OAIS model. They also identified answers to questions such as how current technological infrastructure in a leading data repository such as ICPSR supports their daily operations, what the ideal technologies in those data repositories would be and the associated challenges that accompany these ideal technologies. Most importantly, they helped to prioritize challenges and barriers from the data curator's perspective and to contribute implications of data sharing and reuse in social sciences.

Original languageEnglish
Pages (from-to)626-649
Number of pages24
JournalElectronic Library
Volume35
Issue number4
DOIs
StatePublished - 2017

Bibliographical note

Funding Information:
Daqing He is an Associate Professor at the School of Computing and Information and the Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, USA. He earned his PhD in Artificial Intelligence from the University of Edinburgh, Scotland. Prior to joining the University of Pittsburgh in 2004, he served on the research faculties of the Robert Gordon University, Scotland, and the University of Maryland at College Park. His main research interests cover information retrieval (monolingual and multilingual), information access on the social Web, adaptive Web systems and user modelling, interactive retrieval interface design, Web log mining and analysis and research data management. Dr He has been the principal investigator (PI) and co-PI for more than ten research projects, funded by the National Science Foundation (NSF), United States Defense Advanced Research Projects Agency (DARPA), ALISE/OCLC, University of Pittsburgh and other agencies. He has published more than 150 articles in internationally recognized journals and conferences in these areas, which include Journal of Association for Information Science and Technology, Information Processing and Management, ACM Transactions on Information Systems, Journal of Information Science, Association for Computing Machinery’s Special Interest Group on Information Retrieval (ACM SIGIR), Conference on Information and Knowledge Management (CIKM), World Wide Web Conference (WWW), ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) and so on. Dr He has served as a member on the programme committees for more than 40 major international conferences in the area of information retrieval and Web technologies, and has been called upon to be a reviewer for many top-ranked international journals in the same areas. He serves on the editorial board of the SCI/SSCI indexed journals Information Processing and Management, Internet Research, and Aslib Journal of Information Management. Daqing He is the corresponding author and can be contacted at: dah44@pitt.edu Yu Chi is a PhD student at the School of Computing and Information at the University of Pittsburgh, Pittsburgh, Pennsylvania, USA. Her research relates to information behaviour and seeking. Her current research project investigates how laypeople obtain information and improve knowledge through online health-related information seeking.

Funding Information:
Purpose – Owing to the recent surge of interest in the age of the data deluge, the importance of researching data infrastructures is increasing. The open archival information system (OAIS) model has been widely adopted as a framework for creating and maintaining digital repositories. Considering that OAIS is a reference model that requires customization for actual practice, this paper aims to examine how the current practices in a data repository map to the OAIS environment and functional components. Design/methodology/approach – The authors conducted two focus-group sessions and one individual interview with eight employees at the world’s largest social science data repository, the Interuniversity Consortium for Political and Social Research (ICPSR). By examining their current actions (activities regarding their work responsibilities) and IT practices, they studied the barriers and challenges of archiving and curating qualitative data at ICPSR. Findings – The authors observed that the OAIS model is robust and reliable in actual service processes for data curation and data archives. In addition, a data repository’s workflow resembles digital archives or even digital libraries. On the other hand, they find that the cost of preventing disclosure risk and a lack of agreement on the standards of text data files are the most apparent obstacles for data curation professionals to handle qualitative data; the maturation of data metrics seems to be a promising solution to several challenges in social science data sharing. Originality/value – The authors evaluated the gap between a research data repository’s current practices and the adoption of the OAIS model. They also identified answers to questions such as how current technological infrastructure in a leading data repository such as ICPSR supports their daily operations, what the ideal technologies in those data repositories would be and the associated challenges that accompany these ideal technologies. Most importantly, they helped to prioritize challenges and barriers from the data curator’s perspective and to contribute implications of data sharing and reuse in social sciences. Keywords Data sharing, Digital repositories, Open archival information system (OAIS), Research data curation Paper type Research paper The authors thank the iFellowship, guided by the Committee on Coherence at Scale (CoC) for Higher Education, sponsored by the Council on Library and Information Resources (CLIR) and Andrew W. Mellon Foundations, as well as Beta-Phi-Mu Honor Society, which provided research funding for this project. This study is also partially supported by the project titled Research on Knowledge Organization and Service Innovation in the Big Data Environments funded by the National Natural Science Foundation of China (No. 71420107026). The authors also thank Drs Nora Mattern, Liz Lyon, Sheila Corrall, Jian Qin, Jung Sun Oh and Stephen Griffin for their invaluable comments and suggestions on this research project. Last but not least, the authors thank all participants and people who helped facilitate the field study at ICPSR for their valuable input and assistance.

Publisher Copyright:
© Emerald Publishing Limited.

Keywords

  • Data sharing
  • Digital repositories
  • Open archival information system (OAIS)
  • Research data curation

ASJC Scopus subject areas

  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Social science data repositories in data deluge A case study of ICPSR's workflow and practices'. Together they form a unique fingerprint.

Cite this