Source - Lind et al Baseline Methods (2023)

“Baseline methods in the context of modern distributed flexibility: an evaluation considering multi-DER types, markets, and product characteristics” — Lind et al. (2023), Journal of Utilities Policy. A systematic evaluation of nine baseline methodologies for DER flexibility market participation, with a proposed decision framework for method selection.

Document metadata

Field	Value
Authors	Leandro Lind, José P. Chaves, Orlando Valarezo, Anibal Sanjab, Luis Olmos
Institutions	IIT-ICAI School of Engineering, Universidad Pontificia Comillas (Madrid); VITO/EnergyVille (Belgium)
Published	Journal of Utilities Policy, 2023
DOI	10.1016/j.jup.2023.101688
Funding	CoordiNet project (Horizon 2020, No. 824414); BeFlex project (No. 101075438)
Type	Peer-reviewed academic article (preprint available)

Summary

The problem addressed: when a DER is activated for explicit flexibility, how do you determine how much flexibility was actually delivered? For large scheduled generators, the answer is easy — compare metered output to the committed schedule. For DERs, no individual schedule exists, so a counterfactual “what would this resource have done without activation?” must be estimated. This counterfactual is the baseline.

The paper evaluates nine established methods against three criteria (accuracy, simplicity, integrity) across four DER types (Load-DR, Controllable DG, Non-controllable DG, Energy Storage Systems), multi-DER aggregation, and two product dimensions (direction: up/down; timing: real-time to weeks ahead). The central finding is that no one-size-fits-all baseline method exists.

Nine baseline methods

Method	How it works	Best for	Key weakness
XofY	Average of X highest/mid/lowest days from the last Y eligible days	Load-DR (upward)	Upward bias (HighXofY); fails for weather-dependent DG/ESS
Rolling average	Average of last X same-type days (weekday/weekend), recency-weighted	Load-DR	Same as XofY; doesn’t capture DG/ESS variability
Comparable day	FSP selects an ex-post non-activation reference day	Non-controllable DG	Low integrity; FSP chooses own baseline
Regression	Statistical model (consumption = f(weather, season, past data))	Load-DR, PV/wind with weather data	Complex; high simplicity cost
Machine learning	Neural network / ML techniques	Non-controllable DG, Load-DR	Very low simplicity; black-box risk
MBMA	Meter reading immediately before activation = baseline	Balancing services (short-duration)	Integrity risk for ESS (see below); inaccurate for long activations
Zero baseline	Baseline = 0; all production during activation = flexibility delivered	Backup generators, batteries providing upward production flexibility	Fails for consumption-side DR
Control group	Average of similar non-activating customers during activation	Multi-DER aggregation	Requires a valid comparison group; low integrity
Capacity limitation	Product defined as a power cap; no energy-delta baseline needed	DSO congestion management	Requires different clearing algorithms; primarily upward only
Self-reported	FSP reports its own baseline	Large industrial FSPs	Low integrity without verification

Key analytical findings

MBMA integrity risk for batteries

MBMA (Meter-Before-Meter-After) reads the meter immediately before activation and uses that reading as the baseline. For batteries that provide upward flexibility via increased production (injection), this creates a manipulation opportunity: the operator could switch from charging to discharging mode just before the pre-activation reading, causing the baseline to be negative/zero and the subsequent production to appear larger than it actually is. This “gaming” inflates the measured flexibility delivered.

The recommended method for batteries providing production-side flexibility is zero baseline — any injection during activation counts as delivered flexibility, with no pre-activation baseline manipulation possible. This is indeed the approach used in SWITCH for batteries providing increased production. (Source - SWITCH User Documentation (2026))

Capacity limitation products and the baseline question

Capacity limitation products (where the DSO sets a power cap and the FSP must stay below it) appear to eliminate the baseline problem: the product is defined by the cap, not by an energy delta. However, the authors note that even capacity-cleared products often still require energy delivery validation post-activation — whether the FSP delivered the agreed energy volume. The Swedish context confirms this: SWITCH’s TO (Tillgänglighetsordrar) and DO (Direktordrar) products use capacity logic for market clearing (MW-based bids), but FSPs are still expected to deliver the awarded energy volume, requiring a baseline for validation. Capacity clearing and energy settlement are separable steps.

Harmonisation across sequential markets

When a DSO LFM and a TSO balancing market operate sequentially (both drawing from the same portfolio), different baseline methods create distortions. An FSP managing upward activations in both markets may face conflicting incentives depending on whether the TSO uses MBMA and the DSO uses XofY. The paper recommends baseline harmonisation across interacting markets — relevant to future TSO-DSO coordination as NC DR matures. (Network Code on Demand Response)

Market timing

Real-time / balancing services: MBMA is the international standard (used in FCR, mFRR); no time for ex-ante calculation.
Day-ahead cleared products: XofY or rolling average; but gate-closure timing must exclude the hours between GCT and activation to prevent gaming.
Long-term contracted products (ST/LongFlex): ex-ante calculation methods; regression or ML feasible.

DER-type matrix (summary)

Load-DR: historical methods (XofY, rolling average) are adequate — medium accuracy, high simplicity.
Non-controllable DG (wind, solar): regression or ML needed for accuracy (weather-driven output); XofY only works with same-day adjustment.
Controllable DG (backup generators): zero baseline is the most accurate (baseline IS zero when idle).
ESS: MBMA for consumption-side; zero baseline for production-side flexibility.
Multi-DER aggregation: no single method covers mixed portfolios well; submetering per technology type is the most accurate but costly; comparable day or control group are pragmatic alternatives.

Connections to Swedish context

SWITCH MBMA — the default automatic baseline in SWITCH for consumption-side resources. The paper confirms this is the correct approach for short-duration balancing-type products, though accuracy degrades for activations longer than 1–2 hours.
SWITCH zero baseline (noll-referens) — used for battery resources providing production-side upward flexibility. The paper endorses this approach on both accuracy and integrity grounds.
NODES rolling average — sthlmflex used a 5-day rolling average as the standard NODES baseline (Source - sthlmflex säsong 3 (2022-2023)). The paper categorises this as a rolling average variant with medium accuracy and medium integrity for Load-DR.
NC DR Art. 43–44 settlement — future flexibility markets under NC DR will require standardised baseline methods. This paper provides the academic grounding for what ACER/Ei should require.
BeFlexible project — this paper was produced under the same BeFlexible project that funds SWITCH market demonstrations. The academic framework and the operational SWITCH design are therefore closely related.

Relevance to wiki topics

Topic	Relevance
Baseline Methods	Primary source for the concept page
SWITCH	MBMA and zero-baseline methods explained; capacity/energy distinction clarified
NODES	NODES uses rolling average in sthlmflex; no MBMA
Aggregation	Multi-DER baseline challenge is the core aggregation settlement problem
Energy Storage	Battery-specific baseline recommendations; MBMA integrity risk
Flexibility Market	Baseline design is central to market settlement and FSP participation
Network Code on Demand Response	Future regulation will standardise baseline methods; NC DR Art. 43–44
CoordiNet	Paper funded by CoordiNet; SWITCH’s MBMA baseline traces to CoordiNet design