Large Science Models: Foundation Models for Generalizable Insights Into Complex Systems with Psycho-social Application

Detalles del proyecto

Description

Competition Sensitive Information Abstract Methodological Advancements for Generalizable Insights into Complex Systems (MAGICS) ARC Solicitation Number DARPA-EA-25-02-05 Abstract Title Large Science Models: Foundation Models Proposer Organization for Generalizable Insights Into Complex Systems with Psycho-social Application Technical Point of Contact (POC) University of Kentucky Research Foundation Name: Ishanu Chattopadhyay Mailing Address: 760 Press Avenue Room 367 Lexington, KY 40508 Telephone: 8144411296 Email: ishanu [email protected] Administrative POC Name: Kim C. Carter Mailing Address: 500 S Limestone 109 Kinkead Hall Lexington, KY 40526-0001 Telephone: 859-257-9420 Email: [email protected] Is the proposed effort scoped for 1 FTE / Yes 12 months? (Yes/No) $300,000.00 Estimated Total Cost Yes Is the model OT agreement acceptable without changes (Yes/No) No Identify any other solicitation(s) to which this concept has been proposed No Does the proposer work include Human Subjects Research? (Yes/No) DARPA-EA-25-02-05 Abstract Competition Sensitive Information Contents 1 Proposed Idea Addressing the ARC Opportunity 1 1 2 Scienti?c Analysis of the Proposed Idea and Its Fundamental Limits 5 5 3 Analysis of the current state of the art 5 6 4 Proposer Quali?cations 8 9 5 Estimated Cost & FTE Calculation 6 Bibliography Quadchart Slide Technical Papers DARPA-EA-25-02-05 Abstract Competition Sensitive Information 1. Proposed Idea Addressing the ARC Opportunity We propose a general-purpose modeling framework for obtaining foundation models of complex systems (Large Science Models (LSMs)) with potentially hundred to thousands of observables that evolve, react, and drift over time; especially when no governing equations are known. Existing approaches fail in ultra-high-dimensional settings, break down under missing or cor- rupted data, and cannot model feedback where systems change in response to being observed. Our framework addresses all three, with the following key innovations: 1 We infer governing rules directly from observational data, constructing models of complex systems with a data- informed intrinsic distance metric (the LSM-metric) connecting emergent system geometry to local drivers of dynamical change. 2 Using this learned internal geometry for non-ergodic, reactive contexts, we derive dynamic equations of change via rigorous variational principles, 3 enabling both local perturbation modeling and global system evolution. 4 Re?exivity is modeled explicitly, allowing us to capture feedback-driven dynamics central to human systems. 5 A formal validation protocol detects new emergent macroscopic structures in time or subgroups by testing deviation from model-predicted expected distribution of sample perturbations. 6 We also aim to estimate data suf?ciency for stable inference using a novel conservation of complexity principle. The result is a generalizable theory of digital twins that uni?es structure inference, dynamics, and model adaptation, leveraging AI-driven discovery of a complex system foundation model. Our framework applies broadly across complex domains, directly addressing core MAGICS challenges in inference boundaries, alignment, adaptation, and re?exivity. We aim to demonstrate our approach concretely on DoD-relevant questions of information percolation in social systems. Using data from large scale social surveys (General Social Survey (GSS), Eurobarometer, and NLS), we plan to address the following question(s): understand how belief systems form, evolve, and stabilize within populations under incomplete, noisy, and reactive conditions. Speci?cally, we aim to: 1 predict individual and group-level worldviews from partial observations, validated by the ability to generatively impute missing survey responses; 2 explain/quantify/predict polarization as the emergence of stable clusters in belief space; and 3 explain/quantify/predict how targeted information (e.g., propaganda) propagates through and restructures collective opinion. Ultimately, we seek to deliver a new class of foundation models for societal belief systems, enabling predictive, interpretable, and adaptable simulations of collective cognition, which can inform new enabling technologies for the DoD in quantitatively predicting/in?uencing social dynamics in con?ict theaters to understanding resilience to shocks. Tools that we aim to deliver will provide new insights into social theory, computationally yield actionable interventions in future, and offer a new approach to test and validate social theory. 2. Scienti?c Analysis of the Proposed Idea and Its Fundamental Limits Large Science Models (LSM): We outline our framework for representing a complex system a f gcharacterized by a large set of interdependent observable variables X xI; : : : ; xN , where N IH IHmay be very large ( Q T). The system is “complex” in the sense that the cross-dependencies amongst variables are a priori unknown, and not derivable from ?rst principles, i:e:, the dis- tribution of a variable xi is assumed to be an unknown function of the remaining variables a X a ¦x i f T gxj j i . We assume that each observable xi takes values from a ?nite alphabet i. f gA system state is then a feasible set of variable-value pairs: xI; : : : ; xn . A state may be partially observable, i:e:, contain missing values for some of the variables. To formally express this, we de?ne an observed state as a collection of probability distributions over the respective alphabets ¦ a @¦ A @¦ APi:NNiaI ¦over?nite the i; where i D i . Here, D i denotes the simplex of probability distributions i. A degenerate distribution in this context corresponds to a variable observed set a a af T gto have a speci?c, known value. As a shorthand, we use: x i xj; i j ; and i N j. jTai DARPA-EA-25-02-05 Abstract 1 Competition Sensitive Information RACDIF4 Worse black wages are bible grass Use of marijuana should be made legal A LSM is a generative miracles due to lack of will Govt. spending on social security wrkwayup natsoc Govt. spending in military teensex Belief in religious Govt. spending in healthcare godmeans miracles inspired word a. GSS variable prayer natarms Sex before marriage model for this system com- other natheal Life does not serve any purpose sat?n Blacks overcome book of fables Bible prayer in public schools prejudice without word of god approve disapprove premarsx Religion treats men and women equally favors nihilism Better for man to work, woman tend prising marginal predic- godmeans home Sex before marriage: prayer Husb shld work wife shld look after aPNiNaDI@¦i,iA teens agree home Mother working doesn’t hurt children Life meaningful fefam relgeneq Preschool kids su?er if mother works toir@s: iA where because god exists Religious extremists can hold public esti- fefam meetings Satisfaction with How fundamentalist are you ?nancial situation agree disagree hubby- disagree strongly agree neither agree nor disagree wrk mates the distribution for strongly disagree strongly agree fechld strongly disagree fepresch approve wrkwayup RELEXT1 RELEXT1 i, given the remaining Prob: 0.711 god Frac: 0.086 i. A LSM is “generative” fund agree somewhat disagree strongly definitely probably believe but doubts believe sometimes agree strongly neither agree nor disagree dont believe definitely not probably not not in the sense of gen- disagree somewhat know god exists no way to find out some higher power approve approve approve disapprove natsoc teensex erating conversation like Prob: 0.736 Prob: 0.964 Prob: 0.538 Prob: 0.766 Frac: 0.064 Frac: 0.05 Frac: 0.058 Frac: 0.114 hubbywrk about right too little almst always wrg always wrong chatbots, but in its ability too much not wrong at all to generate valid samples agree sometimes wrong disagree neither agree nor disagree strongly diPsaraogpbrpe:re0o.v7e54 disapprove approve miracles not seen before. strongly agree Frac: 0.051 Prob: 0.543 Prob: 0.512 Frac: 0.093 Frac: 0.075 hubbywrk fechld no, definitely not Equilibrium Eigenstates: yes, definitely no, probably not Equilibria are solutions to: yes, probably disagree agree agree strongly agree neither agree nor disagree strongly agree disagree strongly disagree agree strongly disagree satfin fund @ A a (1) Prob: 0.513 Prob: 0.619 fepresch Frac: 0.158 relgeneq Frac: 0.206 agree i don''t belong to or follow any religion more or less not at all sat liberal fundamentalist wi:eh:e, re@@A AD @ ANNiaI i i , disagree treats men and women equally satisfied moderate deviation strongly agree strongly disagree treats men better than women to age treats women better than men b hubbywrk disagree disagree visnhist disapprove disapprove approve Prob: 0.448 Prob: 0.573 Prob: 0.588 Prob: 0.843 Prob: 0.632 Frac: 0.057 Frac: 0.069 Frac: 0.102 Frac: 0.108 Frac: 0.061 disagree neither agree nor disagree c d drives system evolution. e cd disagree fepresch strongly disagree RACDIF4 e Inference Using Condi- Prob: 0.708 Prob: 0.591 approve Frac: 0.206 Frac: 0.056 disapprove disagree agree no yes Prob: 0.747 Prob: 0.544 tional Inference Trees: strongly agree Frac: 0.088 Frac: 0.051 We propose to infer i disagree agree b. GSS variable fefam disagree disagree Prob: 0.59 Prob: 0.436 Prob: 0.658 Prob: 0.533 from suf?ciently large num- Frac: 0.089 Frac: 0.047 stargornegely agree disagree dstisraognrgelye Frac: 0.062 Frac: 0.051 ber of samples comprising variable-value pairs (with Fig. 1: LSM recursive structure (AI-discovered): non-leaf nodes expand to their own trees, e:g: fefam in the predictor for variable prayer is expanded. This emergent macro structure captures complex dependencies possibly missing entries), as a conditional inference tree (CIT)1,2. CIT inference only uses data- splits that pass speci?c signi?cance tests (unlike CART3 trees, and are robust against over?t). ?@ j AA distinct CIT i is trained for each variable xi (using potentially all i as features) to estimate P i i . One such implementation has been demonstrated in modeling microbiome4. Importantly, i using potentially all other variables as putative “features”, reveals recursive dependencies; each non-leaf node is “hyperlinked” to its own tree (See Fig 1); this is the recursive forest of the LSM, which reveals the emergent macro-structure of the system. The Intrinsic Distance Metric : The LSM-distance quanti?es similarity be- tween states as the average Jensen- Shannon divergence (DJS): I XN P@ ; HA D iaI @ Aijj@ AH N DJS i: This is a true distance metric5, and in- duces the (Riemannian) metrictensor 6,7: (2) gij @ A a IP @ AFig. 2: Re?exivity: Until asked, opinions exist as distribution over @P j P H i@ possible states (superposition), which collapse to particular opinions @ ; : a 0 Fwhen queried (state collapse), which changes future responses by ( ). Large Deviation Bound8: Sanov’s the- ?r@ 3 A @ 3 Aorem9,10 bounds log of likelihood ratio of transition x y to persistence P r y y between @ Astates x; y via scaled x; y 4. Thus LSM-distance is the rate function8,11 of perturbation: DARPA-EA-25-02-05 Abstract 2 Competition Sensitive Information x y y ?r@ A3 ln @ A H y ?r@ A3 x; y ; where, > is a constant proportional to NP (3) The LSM-metic is “special” because of this large deviation bound, connecting “closeness” of samples to the odds of perturbing from one to the other, bridging geometry to dynamics. I. In?uenza C II. GSS 2000 Re?exivity and State Collapse on Observation: LSMs encapsulate re?exivity. If inferred cross- (a) Emergent clusters 600 600 dependencies apply to individuals, then for a ques- With LSM-Distance 400 400 tion with multiple response options (e:g:, “strongly 200 200 agree” to “strongly disagree”), an individual who have never been asked to respond to this question, 0 0 200 400 600 0 0 200 400 600 has their state i initially as a probability distribu- 0 0:2 0:4 0:6 0:8 1 tion over possibilities. Once the question is posed 0 0:2 0:4 0:6 0:8 1 and a response is given, the state collapses to a single outcome (Fig. 2), updating i and altering C/Japan/OU-31/2014 the overall state , akin to quantum measurement- (stable) induced state collapse12. This local update in?u- ences predicted response to an unasked question 51 @ A¡j because the inputs to j are now different: (b) Local Potential Fields 0 2 02 0 0 5 0 5 q u er y3i e A i Ta ei A j@ jA Ta j@ e jA (4) for two distinct samples 0 246 0:1 0:5 1 Emergent Equations of Motion & Change: In 0 C/Miyagi/6/93 0 a Physics, the difference between kinetic (T) and po- (unstable) tential energy (U), or the Lagrangian13,14 L T U 2 2 0 2 1 0:1 5 0 5 0:5 is often used to encode system dynamics. The 1 1 1:5 0:5 1 0 Euler-Lagrange equations15, applied to L, deter- 0 0 mine the system’s evolution via the principle of 2 0 2 0 stationary action. For LSMs, where states are probability distributions , we can use the same 0 0:5 principle to obtain governing digital twin dynamics: (c) Conservation ASD screening? Our “kinetic” term captures velocity over the prob- of Complexity Eurobarometer surv.y agapkebnilictjye¦Iksbji,ematwpnledeexonuutrsh“ienpgoctuetrhnreetinaptlr”sottjeaertcmetioanraisnoedpseitrfsraoptomrreddPiipvcketorr- LHM compressed size [KB] 107 Gut Microbiomex @ A . With Einstein summation (over repeated in- dices), the LSM Lagrangian is given by: In?uenza (humans)z (5) In?uenza (animals)?? a I ? ? @ AL X gklPpk ipPnl in ; Pi 106 US social surv.xx Then the standard Euler-Lagrange condition: @L ! data de?cient @ ?im 105 data saturation 104 103 104 105 106 107 d @L a H (6) dt @ im Fig. 3: Data compressed size [KB] yields overdamped gradient-?ow eqn. of motion: (column I)  a PI @I Aim LSM modeling of viral evolution and opinion survey (column II). Panel a illustrates sam- ¡gkmPmk X ple clusters induced by intrinsic LSM-distance (Eq. (2)), N q jmkjm which leads us to estimate local potential ?elds around j (7) DJS individual samples (red dots in panel b, two samples ln P C P@ IC A ¢ (where " e jm ! # shown for each application). We show one stable strain (top of a ?tness peak), and one not. Panel c illustrates jm mj jm mj the conservation fo complexity principle, which shows that gkm is the inverse metric tensor) ideally the compressed size of LSM models scale with the compressed size of the data (across diverse applications). DARPA-EA-25-02-05 Abstract 3 Competition Sensitive Information This equation of motion governs local evolution of our ultra-high dimensional domain-agnostic complex systems, without assuming ?rst-principle laws, capturing dynamics through emergent divergence-based local potential ?elds (See Fig. 3), e:g:, local potential around viral strains provides visualization of the ?tness landscape16 that drives change. We point out which parts of the equation are AI-inferred: the metric tensor g is based on the inferred predictors i, and once these are inferred from data (e:g: as the LSM recursive forest), the rest of the analysis follows from standard approaches in dynamical systems theory. Some key analysis capabilities, enabled by the LSM framework, are discussed next. ¡-sampling and Valid Perturbations of Observed Samples. The local potential ?eld computed by Eq. (7) enables use to sample the neighborhood or any observed point in the ultra-high dimensional state space; and characterize the set of valid perturbations. We call this ¡ ¡the LSM-sampling scheme the -sampling around a point. The structure of the -sampling neighborhood reveals local stability and potential “direction” of change. For example, in Fig. 3b, we have an example of a viral strain that is locally “stable”, and one with tendency to mutate. Validation Principle and Model Drift. The large deviation bound in Eq (3) may be used to ¡evaluate model ?t. Eq (3) speci?es the expected distribution of distances of perturbed samples in the -sampling neighborhood (from the observed sample), and deviations from the exponential decay stipulated by the bound may be quanti?ed as a degradation of model ?t. This is a general principle that can track model ?t as samples and system behavior evolves and deviates over time, and necessitates model re-calibration/retraining. Estimating Data Suf?ciency via Conservation of Complexity. Do we know if we have enough data to for reliable LSM inference? To address this, we introduce a principle we call con- servation of complexity (CoC). The core idea: if additional data no longer reveals new structural dependencies, then the model has saturated its descriptive capacity (Fig. 3c, demonstrates this across many domains), otherwise more data will re?ne the model. This principle (no cheating on complexity) applies to any generative model as a general yardstick of data suf?ciency. Operationally, each sample (response vector) is encoded as a ?nite string, and the total dataset as a concatenation of such strings. We then use data compression as a proxy for Kolmogorov complexity to estimate whether the size of the inferred LSM scales proportionally with the com- pressed size of the dataset. If this scaling saturates (i:e: the model no longer grows meaningfully with added data) then the digital twin has reached representational suf?ciency. A second test evaluates the model’s generative quality. If the LSM can produce synthetic belief states that are statistically indistinguishable (in compressed form) from true observed states, then we consider the model faithful. This test is executed by generating perturbed copies of observed individuals and measuring how well they compress jointly with the originals. Together, these two tests: one checking if the model complexity tracks with data, and the other if generated samples match the real data structure, offer an executable path to assess when our LSM is valid and complete. Research Plan and Quantitative Validation: Data Sources. We will use General Social Surveys (GSS)17, Eurobarometer18, and National Longitudinal Surveys (NLS)19, comprising de-identi?ed individual-level responses to opinion, belief, and demographic questions with longitudinal depth. Together these data sources represent over 100K individuals over half a century on socio-political opinions that have de?ned modern Western society. No PII/CAI will be generated/used. Tasks (1-8) will be complemented by ?nal report and open-source software. Task 1 Data Harmonization (0.5–1M). Standardize variable formats, and align response types across surveys if necessary, and procure licenses/regulatory authorizations. 12M Task 2 : LSM Inference (2-3M). Learn LSMs using conditional inference trees (CITs), for GSS DARPA-EA-25-02-05 Abstract 4 Competition Sensitive Information years and other data sets, and also for relevant subgroups. 12M Task 3 : Lagrangian Validation (2-3M). Compute local potential ?elds and verify that forward dynamics from the Lagrangian reproduce observed macro-level belief drift. 12M Task 4 : Masked Imputation (1-2M). Mask a subset (progressively 10%, 20%, 50% & 80%) VH7of responses and test reconstruction ability (expected recovery success > )); evaluate prediction accuracy and belief-space consistency. 12M ¡Task 5 : Emergent Structure (1M). Analyze potential landscapes for attractors, polarization & drift. Use -sampling to probe rigidity vs. ?uidity in belief evolution. 12M Task 6 : Model Drift Detection (1M). Detect and quantify deviations from expected model behavior to guide retraining. 12M Task 7 : Test data suf?ciency (1M). Apply CoC using restricted samples to show how model quality stabilizes when our CoC conditions are satis?ed. 12M Task 8 : LSM Analysis for Social Theory Questions (3M). Demonstrate local potential-based mechanisms by which perturbations (e.g., propaganda) amplify polarization, explaining why ?lter-bubble interventions can back?re as observed in Bail et al.’s social experiments20. Validate simulation testbed’s ability to adjudicate competing social theories. 12M Quantitative benchmarks: Accuracy of masked prediction, ?delity of forward simulation, struc- ture emergence, and data suf?ciency curves (Fig. 3c). We will benchmark our predictions and masking reconstruction performance against standard ML that do not infer the underlying foundational structure. No human subject research is planned. We will explore methods for responsible/ethical future model deployment, including potential misuse in in?uence operations. 3. Analysis of the current state of the art Models of belief systems have traditionally focused on latent variable estimation (e.g., Item re- sponse theory21, structural eqn. models or symbolic regression22,23), effective in low-dimensional, structured settings but limited under sparsity, noise, or reactive dynamics. Predictive models, including deep nets and LLMs, offer statistical power but lack mechanism, fail under drift, and cannot validate internal constructs. Theories of belief formation (e.g., Festinger’s cognitive disso- nance24) and theory-of-mind25,26 motivate modeling belief as dynamic and re?exive, yet existing tools remain static, and largely not validated quantitatively. In broader complex system modeling, current approaches (e.g., agent-based models, network simulations) impose hand-crafted rules or coarse dynamics and fail to scale to ultra-high-dimensional systems with emergent feedback and regime shifts, and where the “equations of motion” are a priori unknown. 4. Proposer Quali?cations The project will be led by Dr. Ishanu Chattopadhyay, Assistant Professor of Biomedical In- formatics and Computer Science at the University of Kentucky. Dr. Chattopadhyay is uniquely positioned to execute this project, having pioneered several high impact inference algorithms,and having served as PI on four previous DARPA grants, with extensive experience leading interdis- ciplinary efforts at the interface of AI, mathematical control theory, and cognitive science. 5. Estimated Cost & FTE Calculation The total estimated cost of the proposed effort is $300,000 (12M period), based on academic budgeting practices at the University of Kentucky, with costs represented as fully burdened salaries, inclusive of fringe and overhead. The effort is scoped for 1.0 FTE commitment, primarily allocated to the PI and a postdoctoral researcher, focusing solely on fully burdened labor costs. DARPA-EA-25-02-05 Abstract 5 Competition Sensitive Information 6. Bibliography [1] Hothorn, T., Hornik, K. & Zeileis, A. ctree: Conditional inference trees. The comprehensive R archive network 8, 1–34 (2015). [2] Hothorn, T., Hornik, K. & Zeileis, A. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical statistics 15, 651–674 (2006). [3] Timofeev, R. Classi?cation and regression trees (cart) theory and applications. Humboldt University, Berlin 54, 48 (2004). [4] Sizemore, N. et al. A digital twin of the infant microbiome to predict neurodevelopmental de?cits. Science Advances 10, eadj0400 (2024). [5] Bryant, V. Metric Spaces: Iteration and Application (Cambridge University Press, 1985). [6] Chavel, I. Riemannian Geometry: A Modern Introduction (Cambridge University Press, Cambridge, UK, 2006). [7] Wald, R. M. General Relativity (University of Chicago Press, Chicago, IL, 1984). [8] Varadhan, S. S. Asymptotic probabilities and differential equations. Communications on Pure and Applied Mathematics 19, 261–286 (1966). [9] Sanov, I. N. On the probability of large deviations of random variables (United States Air Force, Of?ce of Scienti?c Research, 1958). [10] Cover, T. M. & Thomas, J. A. Elements of Information Theory (Wiley-Interscience, Hoboken, NJ, 2006), 2nd edn. [11] Dembo, A. & Zeitouni, O. Large deviations techniques and applications, vol. 38 (Springer Science & Business Media, 2009). [12] Leggett, A. J. The quantum measurement problem. science 307, 871–872 (2005). [13] Goldstein, H., Poole, C. P. & Safko, J. L. Classical Mechanics (Addison Wesley, San Francisco, CA, 2002), 3rd edn. [14] Arnold, V. I. Mathematical Methods of Classical Mechanics (Springer, New York, NY, 1989). [15] Troutman, J. L. The Euler-Lagrange Equations, 145–193 (Springer New York, New York, NY, 1996). URL https://doi.org/10.1007/978-1-4612-0737-5 7. [16] Stadler, P. F. Fitness landscapes. In Biological evolution and statistical physics, 183–204 (Springer, 2002). [17] Davern, M., Bautista, R., Freese, J., Herd, P. & Morgan, S. L. General social survey 1972- 2024 (2024). URL https://gssdataexplorer.norc.org/gsscite. [18] European Commission. Eurobarometer 94.3 (2021) (2023). URL https://search.gesis.org/ research data/ZA7780. ZA7780 Data ?le Version 2.0.0. [19] U.S. Bureau of Labor Statistics. National longitudinal survey of youth 1979 (nlsy79) (2022). URL https://www.nlsinfo.org/content/cohorts/nlsy79. [20] Bail, C. A. et al. Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences 115, 9216–9221 (2018). [21] Bock, R. D. & Gibbons, R. D. Item response theory (John Wiley & Sons, 2021). [22] Ullman, J. B. & Bentler, P. M. Structural equation modeling. Handbook of psychology, second edition 2 (2012). [23] Schmidt, M. & Lipson, H. Symbolic regression of implicit equations. In Genetic programming theory and practice VII, 73–85 (Springer, 2009). [24] Festinger, L. Cognitive dissonance. New York (1959). DARPA-EA-25-02-05 Abstract 6 Competition Sensitive Information [25] Premack, D. & Woodruff, G. Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences 1, 515–526 (1978). [26] Baker, C. L., Saxe, R. & Tenenbaum, J. B. Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nature Human Behaviour 1, 1–10 (2017). DARPA-EA-25-02-05 Abstract 7 Large Science Models: Foundation Models for Generalizable Insights Into Advanced Researc Concepts (ARC) Complex Systems with Psycho-social Application ARPA-EA-25-02-05 University of Kentucky; Technical POC: Ishanu Chattopadhyay PhD Abstract Summary Slide PROPOSED IDEA TECHNICAL APPROACH We propose Large Science Models (LSMs) as a frame- ¤ System state is modeled as a tensor product of probability work for foundation models of complex systems: distrubtions ( ) with hundreds to thousands of evolving variables with ¤ Use conditional inference trees to learn predictors and apriori unknown cross-talk recursive cross-talk. where no governing equations are know a priori and systems adapt in response to observation. ¤ Emergent distance metric () bridges geometry to dynam- £ We learn system geometry from data, then derive emer- ics via large deviation rate function gent equations of motion using variational principles. ¤ A Lagrangian is de?ned, inducing eqns. of motion via £ LSMs explicitly model re?exivity, enabling simulation of Euler-Lagrange equations. These steps allow probing local potential ?elds, recover- human systems such as opinion maturation dynamics. ing/tracking dynamical structure without prior assumptions. £ LSMs support inference, forward simulation, and controlled perturbation, enabling predictive, interpretable, and adaptive digital twins of societal systems. APPLICATION TECHNICAL ABILITY £ Model belief/opinion dynamics using large-scale, multi- PI Ishanu Chattopadhyay is an Assistant Professor of Biomedical Informatics and Computer Science at the Univer- national surveys—GSS (US), Eurobarometer (Europe), and sity of Kentucky. NLS (US), spanning 100K+ individuals over 50+ years £ Extensive expertise and publications in stochastic pro- £ Belief state reconstruction, simulation of information cesses, large-scale data analysis, and probabilistic modeling. shocks (e.g., propaganda), and detection of emergent po- larization and resilience patterns £ PI has developed several high-impact inference algorithms £ Data suf?ciency tests with complexity conservation (CoC) . for predictive analytics, including the data smashing algo- rithm, and event-level prediction for urban crime. Validation & Benchmarking £ He has served as PI on three previous DARPA grants Imputation: >80% recovery under 10–80% masking Simulation: Reproduces observed belief drift (PMs: James Gimlett, Wade Shen) and was the recipient Emergence: Detects stable polarization clusters of the Young Faculty Award 2020, advised by the current Drift: Deviations from large deviation-predicted decay director of the DSO (Bart Russell) CoC Test: Model saturation tracks data complexity 8 Benchmarking: Against SOTA ML Competion Sensitive Information
EstadoNo iniciado
Fecha de inicio/Fecha fin1/3/261/2/27

Financiación

  • Defense Advanced Research Projects Agency: 257.520,00 US$

Huella digital

Explore los temas de investigación que se abordan en este proyecto. Estas etiquetas se generan con base en las adjudicaciones/concesiones subyacentes. Juntos, forma una huella digital única.