Living Virtually: Creating and Intervacing Digital Surrogates of Textual Data Embedded (Hidden) in Cultural Heritage Artifacts

Grants and Contracts Details


Title: Living Virtually (LIV): Research Question Damaged, cultural heritage objects that are located in museums and libraries and which contain inaccessible written text present a unique dilemma. On the one hand, there is our desire to handle, study, read, and thus understand and engage with human thought and knowledge. On the other, there is our need to preserve, protect, and therefore not dismantle these records of our past in order to access that hidden textual data. Although new technologies, in the form of non-invasive digital imaging, can resolve this problem by creating facsimiles that we can use to tease out information that cannot be extracted in any other way, significant challenges remain. Imaging methods are not always sensitive to every kind of material or to the wide variety of inks that appear. Moreover, even when a text is extracted from the digital surrogate, like the recent virtual unrolling of the Ein Gedi scroll, it is something that will only exist virtually, i.e. physical verification can never be performed on the object to interrogate published results; it is truly a born-digital text. Can a workable set of tools be created to address the variables in terms of material and ink in order to establish a definitive method for reading text embedded in cultural heritage objects? Can these tools also establish a standard for digitally archiving, curating, and editing these virtual texts in a way that both allows for further research and incorporates reproducibility, the ability to verify research results using the data? Focusing on the carbonised scrolls from Herculaneum (carbon ink written on papyrus), which have proven to be incredibly difficult to read using non-invasive imaging methods, this project proposes to develop: a) new imaging and computational methods for definitively extracting embedded textual data; b) a digital environment for accessing, editing, and annotating that data, and which does not require professional coding skills; c) as cultural heritage artefacts, a simplified method for transforming this data into formats for digital exhibition for the purposes of tourism and educational outreach. More importantly, we propose an open architecture system, which is thus updatable and adaptable to any kind of material object and writing, whether the characters are written in ink or incised. Research Context The remains of Herculaneum and Pompeii, destroyed during the eruption of mount Vesuvius in 79 CE, include a unique collection of papyrus scrolls that preserve new texts, texts represent a vast and untapped body of information about the ancient world. The papyri form Herculaneum document a critical period of cultural shift: the absorption and transmission of Greek philosophical and literary culture to Republican Rome. So far, for example, we have found works by the philosopher and poet Philodemus, notes from lectures given by Zeno of Sidon, and the Latin poetry of Lucretius. Moreover, the papyri constitute an actual ancient library and its collection of books seems to have a dominant theme: Epicurean philosophy. The Herculaneum papyri pose a distinct challenge. Due to the volcanic eruption these ancient papyrus scrolls were carbonised. Early efforts to unroll them by hand or using other devices resulted in three general categories of condition: 1) fragments from early attempts to unroll scrolls by hand, which consist of broken chunks with multiple layers stuck together; 2) fragmentary papyrus sheets from scrolls successfully unrolled in the 18th century, but which still retain fragmentary layers from other sheets in the roll stuck to its surface, both above and below; 3) scrolls still intact with contents that are completely unknown. In every case the papyrus surface is as black as burnt newspaper Until now we have only been able to decipher the contents of these unrolled sheets based on what the human eye can see and, later, from what we could read using multispectral imaging (MSI). Nevertheless, the problem of layers with hidden writing persists, notably in the case of intact scrolls, but also in both the fragmentary chunks and unrolled sheets. How then do we read what we cannot see? And since these fragments and scrolls are visibly deteriorating, how do we handle what we should not touch? Our assembled team has been working with the Herculaneum papyri and engaged in solving this problem for a long time. In July 2017 we concluded a pilot project at the University of Oxford, working with P.Herc. 118 in the Bodleian Library that brings years of work to a critical point. The summation is as follows: 1) Fragments. We have applied 3D shape modelling and hyperspectral imaging (HSI) to multi-layer fragments for the purpose of supporting the measurement, analysis, and non-invasive imaging of hidden layers. This approach allows for a framework in which to explore imaging methods that can reveal hidden layers with reference to the dominant top-layer shape (and any visible text captured there). This model can form the basis of a representation for extracting all text below the surface of a multi-layer fragment, and then coordinating multiple fragments into a federated whole (i.e. virtual re-rolling of an unrolled scroll). 2) Fully Unrolled Material. Our work has shown that spectral imaging of fully opened material, registered together with 3D metric shape models, serves as a tremendously valuable building block for a digital edition. The aligned spectral cube allows us to apply advanced enhancement algorithms for detecting and highlighting all manner of features on demand. And the combined representation supports digital restoration, plausible visualizations under posited conditions, and meticulous analysis that is impossible to achieve even with reference and access to the real object. 3) Intact Scrolls. We have defined a complete software pipeline based on volumetric imaging methods (such as micro-computed tomography, phase contrast tomography, and 3D x-ray fluorescence) that allows for a non-invasive total "virtual unwrapping" of a rolled up scroll. The sections of this pipeline are designed to produce a final master image that captures a view of what is located inside the rolled scroll as a flattened image. 4) True Born-Digital Text: In 2016 project team members successfully applied the virtual unrolling technique to a carbonised Hebrew scroll excavated at the site of Ein Gedi in 1970, the site of an ancient Jewish community (8th century BCE to around 600 CE). Fortunately the ink was visible and the text was quickly identified as Leviticus: Put simply, we have a method to create a 3D model for fragments and unrolled sheets that effectively repurposes all existing archive image data (both natural light, HSI, MSI, etc.) in the registration process (combining), which produces a digital surrogate that can be used for research and editing. We also have a virtual method for unwrapping layers to create another digital surrogate for hidden texts. The first instance of true born-digital text from this pipeline process has occurred. The success of the overall method has now been proven. We now propose to engage in interdisciplinary research, covering the fields of physics, computer science, classics, papyrology, and digital humanities to solve definitively a long standing problem in the application of non-invasive imaging to Herculaneum material. First, ink is not guaranteed to appear in a volumetric scan and imaging is not sensitive to every kind of substrate material. Thus, while we can read text in Ein Gedi scans, we currently cannot see legible text in Herculaneum scrolls. To remedy this we will offer a reliable method for visualising ink in non-invasive imaging when it is either invisible or only faintly visible by introducing a new imaging method and a neural network (machine/deep learning) to draw out letter shapes. At the same time, the imaging methods required for studying these cultural heritage objects produces massive amounts of data and heavy image formats. To access and work with the data in a simplified way our project will create a digital environment dedicated to data management, curation, and digital editing and annotation, as well as create the required data model for producing smart editions of true born-digital text. The digital editions created from the master images and 3D models will integrate all data/metadata for the purposes of transparency and reproducibility of findings. Next, to ensure that our ground-breaking research is not confined to the Herculaneum material alone, the neural network will be designed to work with a reference library that can be continually updated with other languages. Moreover, our dedicated digital environment will also be designed for simplified upgrading and the addition of new object models and languages. Lastly, for museums and libraries, which receive mass numbers of tourists each year, our digital environment will also offer a workflow for transforming the large image data from the 3D models and volumetric scans into required formats for digital exhibitions and applications via mobile technology for the purposes of education and public engagement. Research Methods The full process we propose consists of two integrated phases: 1) targeting and extraction of characters/letters via DFXI and RACT; 2) data management, editing, and access via ALICE Phase 1: DFXI and RACT Although some inks are readily visible in volumetric imaging (typically those containing heavy elements like metals), many objects contain inks that are not visible in this process. Our preliminary results show that there are two complementary approaches that can work to isolate, extract, and expose inks of any kind written on a substrate of any kind. These two approaches are new methods built to succeed when inks are difficult to detect in tomography. The first method we propose is "Dark Field X-ray Imaging" (DFXI), which is a volumetric x-ray fluorescence approach. The object is stimulated with x-rays and made to fluoresce (emit light) through material characteristic x-rays. This allows measurement of trace elements in the ink that are not present in the substrate. Opposed to bright field imaging (looking through the object towards the incident x-ray source), dark field x-ray imaging is more sensitive to trace elements. In the case of Herculaneum fragments, research has shown that Herculaneum ink is weakly contaminated with very small trace amounts of lead, probably from the process by which the ink was made (lead lined containers, water from lead pipes, etc.). This very weak contamination is too slight to be imaged directly, but with DFXI we have demonstrated the ability to image it non-invasively, which opens a new avenue for imaging multi-layer or even intact material based on its trace elements. The method depends on a very bright and controllable x-ray beam to create fluorescence. Typically this kind of beam is available only at a synchrotron facility. The second new method we propose, which we call "Reference-Amplified Computed Tomography" (RACT), is based on the construction of a convolutional neural network (CNN) from a large-scale reference library that is designed to amplify specific hard- to-identify signals in tomography. This generalized framework has the ability to pinpoint and enhance the visibility of very hard-to-see phenomena in tomographic images. We have applied a preliminary version of RACT in experiments with inks on papyrus and have shown that its ability to enhance the contrast between the ink and the substrate is far beyond what is visible with the human eye. This machine learning method depends on high-resolution computed tomography together with the construction of a labelled reference library, which identifies many variations of the appearance of the phenomenon being detected. In our experiments, we built a reference library of labelled regions where there was no ink and also where ink was present. In this way the library, converted into a CNN, was able to enhance regions in tomographic data where ink was present, making it visible. While these new proof-of-concept imaging methods show promise in extending the reference library, including the ability to reveal Herculaneum ink, the question of sensitivity is important. Normally a very bright beam is required in order to generate the best fluorescence, which means that an expensive synchrotron facility must be used. Moreover, as important as the beam may be, many collections would prefer their objects to be handled and imaged in situ, without leaving the safety of the archive or library. Finally, the time and cost of high-resolution imaging can result in substantial overhead, especially for large numbers of objects. One of the aims of this project is to investigate how we can achieve the desired imaging results using less expensive and even portable equipment. There are three categories of instruments representing increasing sensitivity: desktop sources (which are portable); laboratory settings (which are more sensitive but not portable); and the synchrotron environment (national physics facility). We will use each category of instrumentation to do reference tests in order to refine both DFXI and RACT and to assess sensitivity trade-offs. In particular, we will systematically analyse DFXI and RACT to discover how much spatial resolution is required to generate a desired outcome in terms of readability and visibility. In the case of DFXI, it may be possible under certain conditions to generate good results with a desktop x-ray source as opposed to a synchrotron. In the case of RACT, it may be possible to achieve a sensitive reference library and level of enhancement with far less spatial and spectral resolution than is available from the most sensitive instruments. Research for Phase 1 will essentially constitute: 1) DFXI and RACT. For DFXI, establish definitive method of ink extraction by targeting trace elements (factors known either through method of ink creation or direct analysis from exposed ink on object). For RACT, introduce third level of training focused on a script/alphabet library for character recognition, i.e. make RACT aware of regions of no ink, ink, and character shapes. For incised characters, train RACT to recognise changes in the substrate surface data that correlate to already learned character shapes. Integrate method/s into existing pipeline. Imaging research will be conducted at Diamond Light Facility in Oxford. 2) Sensitivity analysis. Investigate to extent to which DFXI and RACT work with desktop and laboratory settings. (This will include a prototype design of a portable and standalone dark field tomography machine). The goal is to provide the cost/benefit trade-offs so that conservators can assess the likelihood of success on particular collections and the costs required to get desired results; knowing when less expensive methods can and should be used is critical for institutional budgets. Phase 2: ALICE After an object undergoes 3D modelling and/or the virtual unrolling pipeline, it must be accessible to a variety of users who may have little to no professional coding experience. Raw CT scans, for example, are typically around 10-15GB each. Archiving raw data is simple. Working with it is a greater challenge. The Augmented Language Interface for Cultural Engagement (ALICE) is thus a digital environment for managing and curating the data and creating 3rd generation digital editions of true born-digital text. ALICE has three areas of operation, and the projects P.Herc 118 and Ein Gedi data will be used as models for creating the ALICE environment: 1. Administration/Curatorial: The virtual unwrapping process, with DFXI and RACT, produces a sequence of images and metadata that essentially tells the story of how the text is virtually born, i.e. how and why we see what we see. We call this the provenance chain, and it is a critical record that must be archived according to a standard. Data resulting from the creation of all 3D models must be archived as well. ALICE will establish the best practice for data curation and storage formats. At the same time, machine readable formats (JSON, etc.) for data and the creation of stable and citable identifiers for all image data, both pipeline and 3D models, must also be implemented. Finally, an application Programming Interface (API) for accessing machine readable formats will be implemented along with search/query functionality of archive. 2. Digital Editing: a digital text editor built upon pre-existing software and models that will advance the current state of digital editing language texts, in this specific case ancient Greek and Latin. Our goal is to further introduce automation into the semantic markup process required by the Text Encoding Initiative (TEI) for effective digital publication; XML and HTML5 will be automatically rendered in live time as digital editions are created. Moreover, we now propose to bring the actual cultural heritage object into the edition. The following 3rd generation functionality will be introduced: a) Visualisation of both 2D and 3D models while editing. b) Full integration of the provenance chain into the edition model, (e.g. the ability to know where text is actually located in the physical object when reading the text) c) Using the 3D models of fragments and unrolled sheets, ability to construct or reconstruct a hypothetical 3D model of an intact scroll (based on editor's judgement), i.e. virtually re-rolling of fragments. 3. Access and Research: ALICE's digital editions are ultimately designed to be accessible via current web browsers, and thus collections, museums, and libraries can publish them on their existing digital infrastructures. For mobile applications or augmented reality applications, ALICE will offer a simplified means to convert image and text data into appropriate formats for app development; this will be done via click events rather than running scripts in a terminal shell in order to make the platform easily accessible to a variety of users. Technical Summary The proposed work involves two phases. Table 1 below indicates the key types of data formats and the main code-base utilized (for full details see Tech Plan). Stage Project Activity Technology Data Types (format) Code-base, operating software, hardware Phase 1: Full pipeline and tool kit Imaging CT, 3D .tif, .png. .obj Virtual Unrolling Phase 2 ALICE Curation, Editing, Access Dedicated Work Environment .txt, .xml .arml, .tif, .jpg, .obj, .png .xml, .csv, .json Django, Html5, python, CSS3 Project Management The risk in perfecting our overall system - from a new precision imaging method to a digital environment that makes the data immediately useful to multiple parties - is high. However, our project team, their previous work, and project organisation have mitigated that risk. The project is divided into two teams: Oxford: Research group led by Principal Investigator Professor Obbink, Co-I Dr Brusuelas, and Co-I Dr Dopke (STFC RAL). PI Obbink will provide general leadership and expertise in both Herculaneum papyri and advanced imaging techniques. Co-I and physicist Dopke will work with a full time postdoctoral researcher to be hired; their work will pertain to refining the DFXI method and report on its performance with desktop, laboratory, and synchrotron environments. Co-I Brusuelas will act as dedicated Project Manager and work with a full time research developer to be hired to build the ALICE platform. Kentucky: Research group led by International Co-I Professor Seales. Co-I Seales will offer leadership in imaging and computer science as applied to extracting data embedded in cultural heritage objects. He will work closely with Co-I Dopke and Oxford physics postdoc in DFXI research. The team consists of Christy Chapman (job title/role) and Seth Parker (named/unnamed). Parker will be responsible for working on the neural networks. Chapman will work with the Oxford research team on grant reporting, scheduling/arranging media events for the project in the USA and UK, as well as work with Co-I Brusuelas in monitoring project activity. Co-I Brusuelas will monitor the full cycle of development and ensure milestones are met and outputs are delivered on time. Brusuelas will also regularly meet with both research teams and organise periodic full project team meetings. Communication will be maintained via project management applications such as SLACK and video chat software. Essential Timeline, Milestones, and Deliverables (Output): Dissemination LIV's dissemination activities are designed to address the widest audience possible, as discussed more broadly in our Pathways to Impact statement: e.g. classics, papyrology, digital humanities, digital cultural heritage, physics, computer science, and museum and cultural heritage studies. Activities are grouped into two categories: publications and conferences. Publications will focus on a combination of scientific and humanities articles authored by project staff. Topics include: " P.Herc 118: 3D modeling and the first edition of Bodleian MS. Gr. Class. b. 1 (P)/1-12' " ALICE: The Augmented Language Interface for Cultural Engagement " RACT: Amplifying Computed Tomography " Towards a Standalone Dark Field Tomography Machine " Sensitivity Report: Cost/benefit analysis for imaging with portable, lab, and synchrotron environment In order to promote LIV broadly, research staff will present project findings at a variety of conferences, specifically aiming at " Digital Heritage International Conference " IEEE International Conferences " Electronic Visualisation and the Arts conferences " The 29th International Congress of Papyrology 2019
Effective start/end date5/1/191/31/23


  • University of Oxford: $393,095.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.