Supplement to "KRATYLOS: Software for enriching endangered language-annotated databases with crowd-sourced linguistic and cultural input"

  • Finkel, Raphael (PI)

Grants and Contracts Details


Ms. Cruz is a postdoctoral researcher at the University of Kentucky in the Linguistics Program. She has been working with a team of researchers at Indiana University on a corpus of Eastern Chatino of San Juan Quiahije (CTP), a Zapotecan language spoken by some 3,000 speakers in the municipality of San Juan Quiahije, Oaxaca, Mexico. CTP is a highly endangered and under-resourced language. She is finalizing the work on the corpus, which represents the only available multi-tier and time-aligned corpus of Chatino. The corpus consists of multi-tier transcriptions, part-of-speech tagged, and translated spoken language recordings annotated in ELAN and converted to the Praat-compatible TextGrid, as well as a TIMIT-compatible speech corpus format. The recordings are of extremely good quality (96 kHz, 24 bit, using professional microphones and recorders). The annotations represent multi-pass checked and corrected data. The speech corpus is valuable not only for documentation and linguistic research, but also for the development of speech technologies to facilitate transcription and annotation of all available archival Chatino recordings, such as the Archive for Indigenous Languages in Latin America (AILLA) at the University of Texas at Austin or the Archive for Traditional Music (ATM) at Indiana University. Ms. Cruz has been accepted together with her project partners to present the Chatino corpus at the LREC 2016 conference. This unique opportunity will allow her to present the results of her work in endangered language documentation using speech and language technologies. The paper has been accepted as a plenary oral presentation. so it will reach a large community. The resources and this publicity will help our project goals in many ways.
Effective start/end date4/4/167/31/19


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.