Detalles del proyecto
Description
Competition Sensitive Information
Abstract
Methodological Advancements for Generalizable Insights into Complex
Systems (MAGICS)
ARC Solicitation Number DARPA-EA-25-02-05
Abstract Title Large Science Models: Foundation Models
Proposer Organization for Generalizable Insights Into Complex
Systems with Psycho-social Application
Technical Point of Contact (POC) University of Kentucky Research Foundation
Name: Ishanu Chattopadhyay
Mailing Address: 760 Press Avenue
Room 367
Lexington, KY 40508
Telephone: 8144411296
Email: ishanu [email protected]
Administrative POC Name: Kim C. Carter
Mailing Address: 500 S Limestone
109 Kinkead Hall
Lexington, KY 40526-0001
Telephone: 859-257-9420
Email: [email protected]
Is the proposed effort scoped for 1 FTE / Yes
12 months? (Yes/No) $300,000.00
Estimated Total Cost Yes
Is the model OT agreement acceptable
without changes (Yes/No) No
Identify any other solicitation(s) to which
this concept has been proposed No
Does the proposer work include Human
Subjects Research? (Yes/No)
DARPA-EA-25-02-05 Abstract
Competition Sensitive Information
Contents
1 Proposed Idea Addressing the ARC Opportunity 1
1
2 Scienti?c Analysis of the Proposed Idea and Its Fundamental Limits 5
5
3 Analysis of the current state of the art 5
6
4 Proposer Quali?cations 8
9
5 Estimated Cost & FTE Calculation
6 Bibliography
Quadchart Slide
Technical Papers
DARPA-EA-25-02-05 Abstract
Competition Sensitive Information
1. Proposed Idea Addressing the ARC Opportunity
We propose a general-purpose modeling framework for obtaining foundation models of complex
systems (Large Science Models (LSMs)) with potentially hundred to thousands of observables
that evolve, react, and drift over time; especially when no governing equations are known.
Existing approaches fail in ultra-high-dimensional settings, break down under missing or cor-
rupted data, and cannot model feedback where systems change in response to being observed.
Our framework addresses all three, with the following key innovations: 1 We infer governing
rules directly from observational data, constructing models of complex systems with a data-
informed intrinsic distance metric (the LSM-metric) connecting emergent system geometry to
local drivers of dynamical change. 2 Using this learned internal geometry for non-ergodic,
reactive contexts, we derive dynamic equations of change via rigorous variational principles, 3
enabling both local perturbation modeling and global system evolution. 4 Re?exivity is modeled
explicitly, allowing us to capture feedback-driven dynamics central to human systems. 5 A formal
validation protocol detects new emergent macroscopic structures in time or subgroups by testing
deviation from model-predicted expected distribution of sample perturbations. 6 We also aim to
estimate data suf?ciency for stable inference using a novel conservation of complexity principle.
The result is a generalizable theory of digital twins that uni?es structure inference, dynamics,
and model adaptation, leveraging AI-driven discovery of a complex system foundation model.
Our framework applies broadly across complex domains, directly addressing core MAGICS
challenges in inference boundaries, alignment, adaptation, and re?exivity.
We aim to demonstrate our approach concretely on DoD-relevant questions of information
percolation in social systems. Using data from large scale social surveys (General Social Survey
(GSS), Eurobarometer, and NLS), we plan to address the following question(s): understand
how belief systems form, evolve, and stabilize within populations under incomplete, noisy, and
reactive conditions. Speci?cally, we aim to: 1 predict individual and group-level worldviews from
partial observations, validated by the ability to generatively impute missing survey responses;
2 explain/quantify/predict polarization as the emergence of stable clusters in belief space;
and 3 explain/quantify/predict how targeted information (e.g., propaganda) propagates through
and restructures collective opinion. Ultimately, we seek to deliver a new class of foundation
models for societal belief systems, enabling predictive, interpretable, and adaptable simulations
of collective cognition, which can inform new enabling technologies for the DoD in quantitatively
predicting/in?uencing social dynamics in con?ict theaters to understanding resilience to shocks.
Tools that we aim to deliver will provide new insights into social theory, computationally yield
actionable interventions in future, and offer a new approach to test and validate social theory.
2. Scienti?c Analysis of the Proposed Idea and Its Fundamental Limits
Large Science Models (LSM): We outline our framework for representing a complex system
a f gcharacterized by a large set of interdependent observable variables X xI; : : : ; xN , where N
IH IHmay be very large ( Q T). The system is “complex” in the sense that the cross-dependencies
amongst variables are a priori unknown, and not derivable from ?rst principles, i:e:, the dis-
tribution of a variable xi is assumed to be an unknown function of the remaining variables
a X a ¦x i f T gxj j i . We assume that each observable xi takes values from a ?nite alphabet i.
f gA system state is then a feasible set of variable-value pairs: xI; : : : ; xn . A state may be partially
observable, i:e:, contain missing values for some of the variables. To formally express this, we
de?ne an observed state as a collection of probability distributions over the respective alphabets
¦ a @¦ A @¦ APi:NNiaI
¦over?nite
the i; where i D i . Here, D i denotes the simplex of probability distributions
i. A degenerate distribution in this context corresponds to a variable observed
set
a a af T gto have a speci?c, known value. As a shorthand, we use: x i
xj; i j ; and i N j.
jTai
DARPA-EA-25-02-05 Abstract 1
Competition Sensitive Information
RACDIF4 Worse black wages are bible grass Use of marijuana should be made legal A LSM is a generative
miracles due to lack of will Govt. spending on social security
wrkwayup natsoc Govt. spending in military
teensex Belief in religious Govt. spending in healthcare
godmeans miracles inspired word a. GSS variable prayer natarms Sex before marriage model for this system com-
other natheal Life does not serve any purpose
sat?n Blacks overcome book of fables Bible prayer in public schools
prejudice without word of god approve disapprove premarsx Religion treats men and women equally
favors nihilism Better for man to work, woman tend prising marginal predic-
godmeans home
Sex before marriage: prayer Husb shld work wife shld look after aPNiNaDI@¦i,iA
teens agree home
Mother working doesn’t hurt children
Life meaningful fefam relgeneq Preschool kids su?er if mother works toir@s: iA where
because god exists Religious extremists can hold public esti-
fefam meetings
Satisfaction with How fundamentalist are you
?nancial situation agree disagree hubby-
disagree strongly agree neither agree nor disagree wrk
mates the distribution for
strongly disagree strongly agree fechld
strongly disagree fepresch
approve wrkwayup RELEXT1 RELEXT1 i, given the remaining
Prob: 0.711 god
Frac: 0.086 i. A LSM is “generative”
fund
agree somewhat disagree strongly definitely probably believe but doubts believe sometimes
agree strongly neither agree nor disagree dont believe
definitely not probably not not in the sense of gen-
disagree somewhat know god exists
no way to find out
some higher power
approve approve approve disapprove natsoc teensex erating conversation like
Prob: 0.736 Prob: 0.964 Prob: 0.538 Prob: 0.766
Frac: 0.064 Frac: 0.05 Frac: 0.058 Frac: 0.114
hubbywrk about right too little almst always wrg always wrong chatbots, but in its ability
too much not wrong at all to generate valid samples
agree sometimes wrong
disagree
neither agree nor disagree strongly diPsaraogpbrpe:re0o.v7e54 disapprove approve miracles not seen before.
strongly agree Frac: 0.051 Prob: 0.543 Prob: 0.512
Frac: 0.093 Frac: 0.075
hubbywrk fechld no, definitely not Equilibrium Eigenstates:
yes, definitely no, probably not Equilibria are solutions to:
yes, probably
disagree agree agree strongly agree
neither agree nor disagree strongly agree disagree strongly disagree
agree strongly disagree satfin fund @ A a (1)
Prob: 0.513 Prob: 0.619
fepresch Frac: 0.158 relgeneq Frac: 0.206
agree i don''t belong to or follow any religion more or less not at all sat liberal fundamentalist wi:eh:e, re@@A AD @ ANNiaI i i ,
disagree treats men and women equally satisfied moderate deviation
strongly agree strongly disagree treats men better than women to
age
treats women better than men
b
hubbywrk disagree disagree visnhist disapprove disapprove approve
Prob: 0.448 Prob: 0.573 Prob: 0.588 Prob: 0.843 Prob: 0.632
Frac: 0.057 Frac: 0.069 Frac: 0.102 Frac: 0.108 Frac: 0.061
disagree neither agree nor disagree c d drives system evolution.
e
cd
disagree fepresch strongly disagree RACDIF4 e Inference Using Condi-
Prob: 0.708 Prob: 0.591 approve
Frac: 0.206 Frac: 0.056 disapprove
disagree agree no yes Prob: 0.747 Prob: 0.544 tional Inference Trees:
strongly agree Frac: 0.088 Frac: 0.051
We propose to infer i
disagree agree b. GSS variable fefam disagree disagree
Prob: 0.59 Prob: 0.436 Prob: 0.658 Prob: 0.533 from suf?ciently large num-
Frac: 0.089 Frac: 0.047 stargornegely agree disagree dstisraognrgelye Frac: 0.062 Frac: 0.051 ber of samples comprising
variable-value pairs (with
Fig. 1: LSM recursive structure (AI-discovered): non-leaf nodes expand to their own
trees, e:g: fefam in the predictor for variable prayer is expanded. This emergent macro
structure captures complex dependencies
possibly missing entries), as a conditional inference tree (CIT)1,2. CIT inference only uses data-
splits that pass speci?c signi?cance tests (unlike CART3 trees, and are robust against over?t).
?@ j AA distinct CIT i is trained for each variable xi (using potentially all i as features) to estimate
P i i . One such implementation has been demonstrated in modeling microbiome4.
Importantly, i using potentially all other variables as putative “features”, reveals recursive
dependencies; each non-leaf node is “hyperlinked” to its own tree (See Fig 1); this is the
recursive forest of the LSM, which reveals the emergent macro-structure of the system.
The Intrinsic Distance Metric : The
LSM-distance quanti?es similarity be-
tween states as the average Jensen-
Shannon divergence (DJS):
I XN
P@ ; HA D iaI @ Aijj@ AH
N
DJS i:
This is a true distance metric5, and in-
duces the (Riemannian) metrictensor 6,7:
(2)
gij
@ A a IP @ AFig. 2: Re?exivity: Until asked, opinions exist as distribution over @P j P H
i@
possible states (superposition), which collapse to particular opinions @ ; :
a
0
Fwhen queried (state collapse), which changes future responses by ( ). Large Deviation Bound8: Sanov’s the-
?r@ 3 A @ 3 Aorem9,10 bounds log of likelihood ratio of transition x y to persistence P r y y between
@ Astates x; y via scaled x; y 4. Thus LSM-distance is the rate function8,11 of perturbation:
DARPA-EA-25-02-05 Abstract 2
Competition Sensitive Information
x y
y
?r@ A3
ln @ A H y
?r@ A3
x; y ; where, > is a constant proportional to NP (3)
The LSM-metic is “special” because of this large deviation bound, connecting “closeness” of
samples to the odds of perturbing from one to the other, bridging geometry to dynamics.
I. In?uenza C II. GSS 2000 Re?exivity and State Collapse on Observation:
LSMs encapsulate re?exivity. If inferred cross-
(a) Emergent clusters 600 600 dependencies apply to individuals, then for a ques-
With LSM-Distance
400 400 tion with multiple response options (e:g:, “strongly
200 200 agree” to “strongly disagree”), an individual who
have never been asked to respond to this question,
0 0 200 400 600 0 0 200 400 600 has their state i initially as a probability distribu-
0 0:2 0:4 0:6 0:8 1 tion over possibilities. Once the question is posed
0 0:2 0:4 0:6 0:8 1 and a response is given, the state collapses to a
single outcome (Fig. 2), updating i and altering
C/Japan/OU-31/2014 the overall state , akin to quantum measurement-
(stable) induced state collapse12. This local update in?u-
ences predicted response to an unasked question
51
@ A¡j because the inputs to j are now different:
(b) Local Potential Fields 0 2 02 0 0 5 0 5 q u er y3i e A i Ta ei A j@ jA Ta j@ e jA (4)
for two distinct samples 0
246 0:1 0:5 1 Emergent Equations of Motion & Change: In
0
C/Miyagi/6/93 0 a Physics, the difference between kinetic (T) and po-
(unstable)
tential energy (U), or the Lagrangian13,14 L T U
2 2 0 2 1 0:1 5 0 5
0:5 is often used to encode system dynamics. The
1 1 1:5 0:5 1
0 Euler-Lagrange equations15, applied to L, deter-
0 0
mine the system’s evolution via the principle of
2 0 2 0 stationary action. For LSMs, where states are
probability distributions , we can use the same
0 0:5 principle to obtain governing digital twin dynamics:
(c) Conservation ASD screening? Our “kinetic” term captures velocity over the prob-
of Complexity Eurobarometer surv.y
agapkebnilictjye¦Iksbji,ematwpnledeexonuutrsh“ienpgoctuetrhnreetinaptlr”sottjeaertcmetioanraisnoedpseitrfsraoptomrreddPiipvcketorr-
LHM compressed size [KB] 107 Gut Microbiomex @ A . With Einstein summation (over repeated in-
dices), the LSM Lagrangian is given by:
In?uenza (humans)z (5)
In?uenza (animals)?? a I ? ? @ AL X gklPpk ipPnl in ;
Pi
106 US social surv.xx Then the standard Euler-Lagrange condition:
@L !
data
de?cient @ ?im
105
data
saturation
104
103 104 105 106 107 d @L a H (6)
dt @ im
Fig. 3: Data compressed size [KB] yields overdamped gradient-?ow eqn. of motion:
(column I) a PI @I Aim
LSM modeling of viral evolution
and opinion survey (column II). Panel a illustrates sam- ¡gkmPmk X
ple clusters induced by intrinsic LSM-distance (Eq. (2)),
N q jmkjm
which leads us to estimate local potential ?elds around j (7)
DJS
individual samples (red dots in panel b, two samples ln P C P@ IC A ¢ (where
" e jm ! #
shown for each application). We show one stable strain
(top of a ?tness peak), and one not. Panel c illustrates jm mj jm mj
the conservation fo complexity principle, which shows that gkm is the inverse metric tensor)
ideally the compressed size of LSM models scale with the
compressed size of the data (across diverse applications).
DARPA-EA-25-02-05 Abstract 3
Competition Sensitive Information
This equation of motion governs local evolution of our ultra-high dimensional domain-agnostic
complex systems, without assuming ?rst-principle laws, capturing dynamics through emergent
divergence-based local potential ?elds (See Fig. 3), e:g:, local potential around viral strains
provides visualization of the ?tness landscape16 that drives change.
We point out which parts of the equation are AI-inferred: the metric tensor g is based on the
inferred predictors i, and once these are inferred from data (e:g: as the LSM recursive forest),
the rest of the analysis follows from standard approaches in dynamical systems theory. Some
key analysis capabilities, enabled by the LSM framework, are discussed next.
¡-sampling and Valid Perturbations of Observed Samples. The local potential ?eld
computed by Eq. (7) enables use to sample the neighborhood or any observed point in the
ultra-high dimensional state space; and characterize the set of valid perturbations. We call this
¡ ¡the LSM-sampling scheme the -sampling around a point. The structure of the -sampling
neighborhood reveals local stability and potential “direction” of change. For example, in Fig. 3b,
we have an example of a viral strain that is locally “stable”, and one with tendency to mutate.
Validation Principle and Model Drift. The large deviation bound in Eq (3) may be used to
¡evaluate model ?t. Eq (3) speci?es the expected distribution of distances of perturbed samples in
the -sampling neighborhood (from the observed sample), and deviations from the exponential
decay stipulated by the bound may be quanti?ed as a degradation of model ?t. This is a general
principle that can track model ?t as samples and system behavior evolves and deviates over
time, and necessitates model re-calibration/retraining.
Estimating Data Suf?ciency via Conservation of Complexity. Do we know if we have
enough data to for reliable LSM inference? To address this, we introduce a principle we call con-
servation of complexity (CoC). The core idea: if additional data no longer reveals new structural
dependencies, then the model has saturated its descriptive capacity (Fig. 3c, demonstrates this
across many domains), otherwise more data will re?ne the model. This principle (no cheating on
complexity) applies to any generative model as a general yardstick of data suf?ciency.
Operationally, each sample (response vector) is encoded as a ?nite string, and the total dataset
as a concatenation of such strings. We then use data compression as a proxy for Kolmogorov
complexity to estimate whether the size of the inferred LSM scales proportionally with the com-
pressed size of the dataset. If this scaling saturates (i:e: the model no longer grows meaningfully
with added data) then the digital twin has reached representational suf?ciency. A second test
evaluates the model’s generative quality. If the LSM can produce synthetic belief states that are
statistically indistinguishable (in compressed form) from true observed states, then we consider
the model faithful. This test is executed by generating perturbed copies of observed individuals
and measuring how well they compress jointly with the originals. Together, these two tests: one
checking if the model complexity tracks with data, and the other if generated samples match the
real data structure, offer an executable path to assess when our LSM is valid and complete.
Research Plan and Quantitative Validation: Data Sources. We will use General Social
Surveys (GSS)17, Eurobarometer18, and National Longitudinal Surveys (NLS)19, comprising
de-identi?ed individual-level responses to opinion, belief, and demographic questions with
longitudinal depth. Together these data sources represent over 100K individuals over half a
century on socio-political opinions that have de?ned modern Western society. No PII/CAI will be
generated/used. Tasks (1-8) will be complemented by ?nal report and open-source software.
Task 1 Data Harmonization (0.5–1M). Standardize variable formats, and align response types
across surveys if necessary, and procure licenses/regulatory authorizations.
12M
Task 2 : LSM Inference (2-3M). Learn LSMs using conditional inference trees (CITs), for GSS
DARPA-EA-25-02-05 Abstract 4
Competition Sensitive Information
years and other data sets, and also for relevant subgroups. 12M
Task 3 : Lagrangian Validation (2-3M). Compute local potential ?elds and verify that forward
dynamics from the Lagrangian reproduce observed macro-level belief drift.
12M
Task 4 : Masked Imputation (1-2M). Mask a subset (progressively 10%, 20%, 50% & 80%)
VH7of responses and test reconstruction ability (expected recovery success > )); evaluate
prediction accuracy and belief-space consistency.
12M
¡Task 5 : Emergent Structure (1M). Analyze potential landscapes for attractors, polarization &
drift. Use -sampling to probe rigidity vs. ?uidity in belief evolution. 12M
Task 6 : Model Drift Detection (1M). Detect and quantify deviations from expected model
behavior to guide retraining.
12M
Task 7 : Test data suf?ciency (1M). Apply CoC using restricted samples to show how model
quality stabilizes when our CoC conditions are satis?ed.
12M
Task 8 : LSM Analysis for Social Theory Questions (3M). Demonstrate local potential-based
mechanisms by which perturbations (e.g., propaganda) amplify polarization, explaining why
?lter-bubble interventions can back?re as observed in Bail et al.’s social experiments20. Validate
simulation testbed’s ability to adjudicate competing social theories.
12M
Quantitative benchmarks: Accuracy of masked prediction, ?delity of forward simulation, struc-
ture emergence, and data suf?ciency curves (Fig. 3c). We will benchmark our predictions
and masking reconstruction performance against standard ML that do not infer the underlying
foundational structure. No human subject research is planned. We will explore methods for
responsible/ethical future model deployment, including potential misuse in in?uence operations.
3. Analysis of the current state of the art
Models of belief systems have traditionally focused on latent variable estimation (e.g., Item re-
sponse theory21, structural eqn. models or symbolic regression22,23), effective in low-dimensional,
structured settings but limited under sparsity, noise, or reactive dynamics. Predictive models,
including deep nets and LLMs, offer statistical power but lack mechanism, fail under drift, and
cannot validate internal constructs. Theories of belief formation (e.g., Festinger’s cognitive disso-
nance24) and theory-of-mind25,26 motivate modeling belief as dynamic and re?exive, yet existing
tools remain static, and largely not validated quantitatively. In broader complex system modeling,
current approaches (e.g., agent-based models, network simulations) impose hand-crafted rules
or coarse dynamics and fail to scale to ultra-high-dimensional systems with emergent feedback
and regime shifts, and where the “equations of motion” are a priori unknown.
4. Proposer Quali?cations
The project will be led by Dr. Ishanu Chattopadhyay, Assistant Professor of Biomedical In-
formatics and Computer Science at the University of Kentucky. Dr. Chattopadhyay is uniquely
positioned to execute this project, having pioneered several high impact inference algorithms,and
having served as PI on four previous DARPA grants, with extensive experience leading interdis-
ciplinary efforts at the interface of AI, mathematical control theory, and cognitive science.
5. Estimated Cost & FTE Calculation
The total estimated cost of the proposed effort is $300,000 (12M period), based on academic
budgeting practices at the University of Kentucky, with costs represented as fully burdened
salaries, inclusive of fringe and overhead. The effort is scoped for 1.0 FTE commitment, primarily
allocated to the PI and a postdoctoral researcher, focusing solely on fully burdened labor costs.
DARPA-EA-25-02-05 Abstract 5
Competition Sensitive Information
6. Bibliography
[1] Hothorn, T., Hornik, K. & Zeileis, A. ctree: Conditional inference trees.
The comprehensive R archive network 8, 1–34 (2015).
[2] Hothorn, T., Hornik, K. & Zeileis, A. Unbiased recursive partitioning: A conditional inference
framework. Journal of Computational and Graphical statistics 15, 651–674 (2006).
[3] Timofeev, R. Classi?cation and regression trees (cart) theory and applications.
Humboldt University, Berlin 54, 48 (2004).
[4] Sizemore, N. et al. A digital twin of the infant microbiome to predict neurodevelopmental
de?cits. Science Advances 10, eadj0400 (2024).
[5] Bryant, V. Metric Spaces: Iteration and Application (Cambridge University Press, 1985).
[6] Chavel, I. Riemannian Geometry: A Modern Introduction (Cambridge University Press,
Cambridge, UK, 2006).
[7] Wald, R. M. General Relativity (University of Chicago Press, Chicago, IL, 1984).
[8] Varadhan, S. S. Asymptotic probabilities and differential equations.
Communications on Pure and Applied Mathematics 19, 261–286 (1966).
[9] Sanov, I. N. On the probability of large deviations of random variables (United States Air
Force, Of?ce of Scienti?c Research, 1958).
[10] Cover, T. M. & Thomas, J. A. Elements of Information Theory (Wiley-Interscience, Hoboken,
NJ, 2006), 2nd edn.
[11] Dembo, A. & Zeitouni, O. Large deviations techniques and applications, vol. 38 (Springer
Science & Business Media, 2009).
[12] Leggett, A. J. The quantum measurement problem. science 307, 871–872 (2005).
[13] Goldstein, H., Poole, C. P. & Safko, J. L. Classical Mechanics (Addison Wesley, San
Francisco, CA, 2002), 3rd edn.
[14] Arnold, V. I. Mathematical Methods of Classical Mechanics (Springer, New York, NY, 1989).
[15] Troutman, J. L. The Euler-Lagrange Equations, 145–193 (Springer New York, New York,
NY, 1996). URL https://doi.org/10.1007/978-1-4612-0737-5 7.
[16] Stadler, P. F. Fitness landscapes. In Biological evolution and statistical physics, 183–204
(Springer, 2002).
[17] Davern, M., Bautista, R., Freese, J., Herd, P. & Morgan, S. L. General social survey 1972-
2024 (2024). URL https://gssdataexplorer.norc.org/gsscite.
[18] European Commission. Eurobarometer 94.3 (2021) (2023). URL https://search.gesis.org/
research data/ZA7780. ZA7780 Data ?le Version 2.0.0.
[19] U.S. Bureau of Labor Statistics. National longitudinal survey of youth 1979 (nlsy79) (2022).
URL https://www.nlsinfo.org/content/cohorts/nlsy79.
[20] Bail, C. A. et al. Exposure to opposing views on social media can increase political
polarization. Proceedings of the National Academy of Sciences 115, 9216–9221 (2018).
[21] Bock, R. D. & Gibbons, R. D. Item response theory (John Wiley & Sons, 2021).
[22] Ullman, J. B. & Bentler, P. M. Structural equation modeling.
Handbook of psychology, second edition 2 (2012).
[23] Schmidt, M. & Lipson, H. Symbolic regression of implicit equations. In
Genetic programming theory and practice VII, 73–85 (Springer, 2009).
[24] Festinger, L. Cognitive dissonance. New York (1959).
DARPA-EA-25-02-05 Abstract 6
Competition Sensitive Information
[25] Premack, D. & Woodruff, G. Does the chimpanzee have a theory of mind?
Behavioral and Brain Sciences 1, 515–526 (1978).
[26] Baker, C. L., Saxe, R. & Tenenbaum, J. B. Rational quantitative attribution of beliefs, desires
and percepts in human mentalizing. Nature Human Behaviour 1, 1–10 (2017).
DARPA-EA-25-02-05 Abstract 7
Large Science Models: Foundation Models for Generalizable Insights Into Advanced Researc Concepts (ARC)
Complex Systems with Psycho-social Application
ARPA-EA-25-02-05
University of Kentucky; Technical POC: Ishanu Chattopadhyay PhD Abstract Summary Slide
PROPOSED IDEA TECHNICAL APPROACH
We propose Large Science Models (LSMs) as a frame- ¤ System state is modeled as a tensor product of probability
work for foundation models of complex systems:
distrubtions ( )
with hundreds to thousands of evolving variables with
¤ Use conditional inference trees to learn predictors and
apriori unknown cross-talk
recursive cross-talk.
where no governing equations are know a priori and
systems adapt in response to observation. ¤ Emergent distance metric () bridges geometry to dynam-
£ We learn system geometry from data, then derive emer- ics via large deviation rate function
gent equations of motion using variational principles. ¤ A Lagrangian is de?ned, inducing eqns. of motion via
£ LSMs explicitly model re?exivity, enabling simulation of Euler-Lagrange equations.
These steps allow probing local potential ?elds, recover-
human systems such as opinion maturation dynamics. ing/tracking dynamical structure without prior assumptions.
£ LSMs support inference, forward simulation, and controlled
perturbation, enabling predictive, interpretable, and adaptive
digital twins of societal systems.
APPLICATION TECHNICAL ABILITY
£ Model belief/opinion dynamics using large-scale, multi- PI Ishanu Chattopadhyay is an Assistant Professor of
Biomedical Informatics and Computer Science at the Univer-
national surveys—GSS (US), Eurobarometer (Europe), and sity of Kentucky.
NLS (US), spanning 100K+ individuals over 50+ years
£ Extensive expertise and publications in stochastic pro-
£ Belief state reconstruction, simulation of information
cesses, large-scale data analysis, and probabilistic modeling.
shocks (e.g., propaganda), and detection of emergent po-
larization and resilience patterns £ PI has developed several high-impact inference algorithms
£ Data suf?ciency tests with complexity conservation (CoC) . for predictive analytics, including the data smashing algo-
rithm, and event-level prediction for urban crime.
Validation & Benchmarking
£ He has served as PI on three previous DARPA grants
Imputation: >80% recovery under 10–80% masking
Simulation: Reproduces observed belief drift (PMs: James Gimlett, Wade Shen) and was the recipient
Emergence: Detects stable polarization clusters of the Young Faculty Award 2020, advised by the current
Drift: Deviations from large deviation-predicted decay director of the DSO (Bart Russell)
CoC Test: Model saturation tracks data complexity
8 Benchmarking: Against SOTA ML
Competion Sensitive Information
| Estado | No iniciado |
|---|---|
| Fecha de inicio/Fecha fin | 1/3/26 → 1/2/27 |
Financiación
- Defense Advanced Research Projects Agency: 257.520,00 US$
Huella digital
Explore los temas de investigación que se abordan en este proyecto. Estas etiquetas se generan con base en las adjudicaciones/concesiones subyacentes. Juntos, forma una huella digital única.