From Noisy Data to Better Models: The Case for Weak-Form Scientific Machine Learning

Scientific machine learning is a rapidly growing field that blends machine learning with scientific knowledge, often incorporating governing equations, to solve complex problems in science and engineering. Yet real-world data rarely behaves perfectly. Experimental noise, sparse measurements, incomplete observations, and irregular dynamics can make it difficult not only to discover governing equations, but also to determine what kinds of models the available data can realistically support.

That challenge sits at the center of the upcoming SIAM course, From Data to Equations: Weak Form Methods for Discovering Models from Noisy Data, led by instructors David Bortz, University of Colorado Boulder, and Daniel Messenger, Los Alamos National Laboratory. The course explores how weak-form approaches can help researchers build models that remain interpretable, computationally practical, and robust to imperfect real-world data. Learn more about the course and register here.

At its core, this course explores a question that is becoming increasingly important across scientific machine learning: “for the data you actually have, at the scale and resolution it actually has, what model class is appropriate to be fitting in the first place?”

“Classical modeling from first principles works when first principles reach the scale you can observe,” instructors Dr. Bortz and Dr. Messenger shared. “In a lot of modern application areas, they don’t or they don’t exist yet.

Part of what sets a weak-form scientific machine learning (WSciML) method apart comes from the underlying mathematical approach itself. “Weak-form methods evaluate the differential equation against compactly supported test functions, which moves derivatives off the data,” the instructors explained. “That’s the headline property.”

Just as important, however, is the practical workflow that emerges from that formulation. According to the instructors, “weak formulation reduces equation discovery to a (sparse) linear regression for a wide class of problems.”

Real scientific data rarely resembles the clean, idealized structures that show up in theory. It’s closer to a crowded junk drawer than a carefully raked Zen Garden — full of irregularities, missing pieces, and structure that only becomes visible at the right level of resolution. Classical ordinary and partial differential equation frameworks typically assume “the data is a smooth trajectory of a smooth vector field,” but in practice, that assumption often breaks down at the scales where measurements are taken.

“For a lot of real measurements — biological assays, sparse satellite data, plasma diagnostics — that assumption isn’t accurate at the scale you can observe, even if something like it is true at a different scale you can’t.”

Instead of asking “what equation does this data satisfy?”, the instructors suggest asking, “what’s the smallest model class consistent with this data at this resolution?”

On the practical side, participants in the From Data to Equations course will leave the course able to take a noisy dataset, build a candidate function library, run methods such as Weak-form Sparse Identification of Nonlinear Dynamics (WSINDy) or Weak-form Estimation of Non-linear Dynamics (WENDy), and interpret results with a more critical eye.

By the end of this course, students will be asking of their own data going forward:

• What model complexity can this dataset support?
• Am I seeing real dynamics or artifacts of my preprocessing?
• If I dropped half of my samples, would my conclusion change — and what does that tell me about identifiability?
• Is the residual telling me my model is wrong, or is my noise model wrong?

As both instructors noted, these are not questions unique to WSciML methods, but the weak-form workflow provides a particularly direct way to investigate them in practice.

In their view, “What’s the equation?” is better positioned as a hierarchy of questions rather than just one. In real systems, researchers often need multiple levels of description at once. There is typically a fine-scale model — often expensive to simulate and only partially known — a coarser model fitted to observations, and sometimes several intermediate representations in between, each useful for different decisions and predictions. Rather than searching for a single “true” equation, the goal becomes building a structured set of models appropriate to different scales of use.

The applications currently keeping the instructors busiest now reflect this, particularly control and forecasting problems where running a full-scale simulation inside a decision loop is not feasible. In that setting, “scale-aware model discovery — picking the right model class for the right decision — feels like a piece of computational science that should grow over the next decade.”

The course From Data to Equations: Weak Form Methods for Discovering Models from Noisy Data is being held as a one-day, in-person workshop on July 5 from 8:30 a.m. – 4:30 p.m., the day before the 2026 SIAM Annual Meeting and the SIAM Conference on the Life Sciences, at the Huntington Convention Center of Cleveland, Ohio.

Register now to save your spot!

Dr. David Bortz and Dr. Daniel Messenger contributed to the SIAM News article, The Weak Form is Stronger Than You Think.

Related Reading

From Data to Equations: Weak Form Methods for Discovering Models from Noisy Data

2026 SIAM Annual Meeting (AN26)

The Weak Form is Stronger Than You Think

Stay Up-to-Date with Email Alerts