Grants and Contracts Details
Description
In 2020, we developed a training game and data collection tool for the Human-Machine Collabo-
ration team. This game provides analysts with tools to investigate a collection of documents and
answer questions about which employee of a company committed a crime. It is similar to the annual
VAST Challenge, but more IC-specic and with more detailed instrumentation. The game collects
detailed data on how analysts work, including what searches they perform, why they perform them,
and how they use the results to make future searches and form conclusions.
We use the Scheherazade system [2] to infer from this data an abstract, bottom-up, graph-based
model of analyst work
ow. It represents how earlier actions lead to later actions in an analyst's
process. This year's project has revealed two primary ways Scheherazade should be improved to
better serve the IC and translate the insights gained from this training game into an application
deployable on the high side:
To ensure our system can be rapidly applied in dierent problem domains, we must automat-
ically identify the features likely to indicate an analyst's state in their current work
ow. This
entails looking at the rich logs currently generated by our system to identify features that are
most important for constructing work
ow graphs while minimizing analyst logging eort.
The automatically generated work
ow graph needs to be organized hierarchically based on
how high-level analysis questions translate into lower-level tasks. The system must be able
to distinguish and separate multiple parallel tasks being performed at once.
Task 1: Automatically Identifying Relevant Work
ow Features
The logs from our data collection tool provide specic examples of each user performing the task.
To represent a general process, we must abstract these actions (e.g. one user's specic query \Find
emails sent by Michelle at 4:38 PM on May 5th" may generalize to \Find emails sent by Michelle"
performed by many users). Abstraction needs to preserve the features that best model analyst
work
ow. This year, we will create these abstractions manually based on knowledge of the task,
but to apply to other domains and on the high side, this must be done automatically.
Using the dataset we collected this year as a testbed, we will use statistical analyses like principal
components analysis [3] and machine-learning-based feature selection techniques [1] to identify what
elements of actions best predict the original action while still being general enough to occur in the
logs of many users. We will evaluate the quality of these downselected features by examining how
well they recreate the analyst work
ow actions from the original dataset.
This process will automate a previously manual task and will also provide insight about how to
combining logs from many tools. Our game provides all tools inside the game, but in reality analysts
use many separate tools. To model their process in the real world, our process can determine which
events they need to log when they work and at what level of detail.
Status | Finished |
---|---|
Effective start/end date | 1/1/21 → 12/31/21 |
Funding
- North Carolina State University: $182,453.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.