Hierarchical Abstract Process Modeling for Analyst Workflow

Grants and Contracts Details


In 2020, we developed a training game and data collection tool for the Human-Machine Collabo- ration team. This game provides analysts with tools to investigate a collection of documents and answer questions about which employee of a company committed a crime. It is similar to the annual VAST Challenge, but more IC-specic and with more detailed instrumentation. The game collects detailed data on how analysts work, including what searches they perform, why they perform them, and how they use the results to make future searches and form conclusions. We use the Scheherazade system [2] to infer from this data an abstract, bottom-up, graph-based model of analyst work ow. It represents how earlier actions lead to later actions in an analyst's process. This year's project has revealed two primary ways Scheherazade should be improved to better serve the IC and translate the insights gained from this training game into an application deployable on the high side: To ensure our system can be rapidly applied in dierent problem domains, we must automat- ically identify the features likely to indicate an analyst's state in their current work ow. This entails looking at the rich logs currently generated by our system to identify features that are most important for constructing work ow graphs while minimizing analyst logging eort. The automatically generated work ow graph needs to be organized hierarchically based on how high-level analysis questions translate into lower-level tasks. The system must be able to distinguish and separate multiple parallel tasks being performed at once. Task 1: Automatically Identifying Relevant Work ow Features The logs from our data collection tool provide specic examples of each user performing the task. To represent a general process, we must abstract these actions (e.g. one user's specic query \Find emails sent by Michelle at 4:38 PM on May 5th" may generalize to \Find emails sent by Michelle" performed by many users). Abstraction needs to preserve the features that best model analyst work ow. This year, we will create these abstractions manually based on knowledge of the task, but to apply to other domains and on the high side, this must be done automatically. Using the dataset we collected this year as a testbed, we will use statistical analyses like principal components analysis [3] and machine-learning-based feature selection techniques [1] to identify what elements of actions best predict the original action while still being general enough to occur in the logs of many users. We will evaluate the quality of these downselected features by examining how well they recreate the analyst work ow actions from the original dataset. This process will automate a previously manual task and will also provide insight about how to combining logs from many tools. Our game provides all tools inside the game, but in reality analysts use many separate tools. To model their process in the real world, our process can determine which events they need to log when they work and at what level of detail.
Effective start/end date1/1/2112/31/21


  • North Carolina State University: $182,453.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.