Exploring learning approaches for ancient Greek character recognition with citizen science data

Matthew I. Swindall, Gregory Croisdale, Chase C. Hunter, Ben Keener, Alex C. Williams, James H. Brusuelas, Nita Krevans, Melissa Sellew, Lucy Fortson, John F. Wallin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

The central dogma of handwritten character recognition remains inextricably linked to optical character recognition methods for print media. Alongside their reliance on proprietary data and lack of open-access software, the applicability of these optical character recognition methods to handwritten characters from low-quality documents (e.g., that are damaged) remains unknown. In this paper, we compare and contrast the performance of state-of-the-art optical character recognition tools for print and learning models engineered with state-of-the-art machine learning toolkits trained on handwritten inputs. Using Tesseract OCR as a baseline, we build, optimize, and evaluate three types of convolutional neural networks that are trained on the AL-ALLand AL-PUBdatasets, a collection of images of handwritten ancient Greek characters that were labeled by volunteers through the Ancient Lives online citizen science project. We find our best-performing machine learning model to be 92.57% accurate compared to Tesseract OCR's 11.15%. Following our analysis, we present a brief examination of our models' shortcomings, introduce the publicly-available AL-PUBdataset, and, describe Theia, a web-based tool that democratizes our machine learning models for public use. We conclude by discussing the promise of our findings for advancing research at the intersection of machine learning, manuscript transcription, and the digital humanities.

Original languageEnglish
Title of host publicationProceedings - IEEE 17th International Conference on eScience, eScience 2021
Pages128-137
Number of pages10
ISBN (Electronic)9781665403610
DOIs
StatePublished - Sep 2021
Event17th IEEE International Conference on eScience, eScience 2021 - Virtual, Online, Austria
Duration: Sep 20 2021Sep 23 2021

Publication series

NameProceedings - IEEE 17th International Conference on eScience, eScience 2021

Conference

Conference17th IEEE International Conference on eScience, eScience 2021
Country/TerritoryAustria
CityVirtual, Online
Period9/20/219/23/21

Bibliographical note

Publisher Copyright:
© 2021 IEEE.

Keywords

  • Ancient Greek
  • Character transcription
  • Citizen science
  • Crowdsourcing
  • Dataset
  • Machine learning
  • Papyrology

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Exploring learning approaches for ancient Greek character recognition with citizen science data'. Together they form a unique fingerprint.

Cite this