Baseline Methods

A baseline (referensplan) is the estimated level of electricity consumption or production a flexibility service provider (FSP) would have had at a given time, absent any flexibility activation. It is the counterfactual against which delivered flexibility is measured. The baseline problem is central to Flexibility Market settlement: without a credible baseline, it is impossible to verify how much flexibility was actually delivered, or to pay FSPs correctly.

The baseline is, in effect, a forecast with an adversary — which is why “good” forecasting for flex markets is defined by manipulation-resistance and recalculability, not just accuracy. See STLF for Flexibility Markets — What Counts as Good and How to Achieve It.

Why baselines are hard for DERs

In traditional balancing markets, large generators commit to a schedule in the day-ahead market; the schedule serves as the baseline, and flexibility delivery is measured as deviation from it. For distributed energy resources (DERs), no such individual schedule exists. A heat pump, battery, or EV charger has no pre-committed output plan visible to the market. The flexibility buyer (DSO or TSO) must therefore estimate what the resource would have done — a fundamentally uncertain exercise. (Source - Lind et al Baseline Methods (2023))

Baseline design has direct consequences for:

FSP revenue: a systematically high baseline inflates apparent flexibility delivery and over-rewards the FSP
Market integrity: FSPs may be able to game certain methods by changing behaviour before the baseline measurement window (e.g. charging a battery just before an MBMA reading)
Market participation: overly complex baseline methods are a barrier to entry for smaller FSPs

Methods

Nine methods are established in the literature and practice:

Method	Mechanism	Best for	Key weakness
XofY	Average of X highest/mid/lowest days from last Y eligible days	Consumption-side load-DR, upward activation	Upward bias (HighXofY); poor for weather-driven DG/storage
Rolling average	Average of last X same-type days, recency-weighted	Load-DR, stable consumption patterns	Same failure modes as XofY for variable DG/ESS
Comparable day	FSP chooses an ex-post reference day from history	Non-controllable DG (wind/solar)	Low integrity — FSP selects own reference
Regression	Statistical model of baseline as function of weather, time, past consumption	Load-DR, PV/wind with weather covariates	Complex; low simplicity
Machine learning	ML/neural network predicts baseline	Non-controllable DG, complex portfolios	Very low simplicity; interpretability issues
MBMA	Meter reading immediately before activation = baseline	Balancing services, short-duration activation	Integrity risk for batteries (see below); degrades over long activations
Zero baseline	Baseline = 0; any output during activation = delivered flexibility	Backup generators; batteries providing production-side upward flexibility	Not applicable to consumption-side DR
Control group	Average profile of similar non-activating customers	Multi-DER aggregation, behavioural DR programs	Requires valid comparison group; low integrity
Capacity limitation	Product defined as a power cap; no energy-delta baseline needed for clearing	DSO congestion management	Different clearing algorithms required; energy delivery validation still needed post-activation
Self-reported	FSP declares its own baseline	Large industrial FSPs	Low integrity without independent verification

MBMA and the battery integrity problem

MBMA (Meter-Before-Meter-After) reads the meter immediately before activation. If that reading is used directly as the baseline, a battery operator providing upward flexibility via increased injection can game the method: by switching from charging to discharging just before the pre-activation reading, the apparent baseline decreases (or goes negative), making the subsequent injection appear larger than it really is — inflating measured delivery and payment.

The correct method for batteries providing production-side upward flexibility is zero baseline: any injection during activation counts as delivered, with no prior-period manipulation possible.

This is the approach used in SWITCH: batteries providing increased production use zero baseline (noll-referens), while MBMA is reserved for consumption-side resources. (Source - SWITCH User Documentation (2026))

Capacity clearing vs. energy settlement

A common misconception: that capacity-based products (where clearing is in MW, not MWh) eliminate the need for a baseline. This is true for the clearing step — the DSO procures a capacity commitment and pays an availability fee in SEK/MW. But energy delivery validation still requires a baseline. When the DSO activates, the FSP must deliver the agreed energy volume; comparing metered output to the baseline determines whether the 75% delivery threshold was met and what the activation payment should be.

In SWITCH‘s TO (Tillgänglighetsordrar) and DO (Direktordrar) products: the market clearing is MW-based (capacity bids), but the FSP is still expected to deliver the energy amount awarded. A baseline is therefore required for post-activation settlement, even though clearing used capacity logic.

Method selection by DER type

DER type	Recommended method	Rationale
Consumption load-DR (upward, stable loads)	XofY or rolling average + same-day adjustment	Consumption relatively predictable from history
Consumption load-DR (balancing, short-duration)	MBMA	No time for ex-ante calculation; MBMA is accurate for short windows
Non-controllable DG (wind, solar)	Regression or ML (with weather data)	Output driven by weather, not history; averaging methods fail
Controllable DG (backup generators, CHP)	Zero baseline	Backup generator has zero output when idle — zero IS the baseline
Battery — consumption side (charging)	MBMA or rolling average	Charging pattern may be predictable
Battery — production side (discharging/injection)	Zero baseline	MBMA integrity risk; zero eliminates gaming incentive
Multi-DER aggregation (same type)	Per-type method applied at portfolio level	Sum of individual baselines works for MBMA and zero
Multi-DER aggregation (mixed types)	Comparable day, control group, or submetering per technology	No single method covers mixed portfolios accurately

Swedish market practice

SWITCH offers three methods (Source - SWITCH User Documentation (2026)):

Egen referensplan — FSP uploads own baseline (UI or API); deadline D-1 09:30 for DA, H-4 for ID
Noll-referens (zero baseline) — used for production resources, especially batteries providing upward flexibility via injection
MBMA — automatic calculation from resource metering; default for consumption-side resources

The SWITCH documentation locks down baseline integrity: baseline values cannot be uploaded or changed after a flexibility trade has occurred on that market. Resource control was moved to the DSO after a 2022/23 risk assessment identified that FSPs could manipulate resource registration data; the control change was a preemptive measure, not a response to confirmed abuse. (Source - BeFlexible D5.1 Demo Planning and Deployment (2024))

NODES (via sthlmflex) used a 5-day rolling average as the standard method: same hour across the 5 preceding working days. FSPs could alternatively upload their own baseline or agree an alternative with the buying DSO. (Source - sthlmflex säsong 3 (2022-2023))

European LFM baseline practices

Sassone et al. (2025) document the baseline methodologies deployed across 7 operational European LFMs — the most comprehensive comparative dataset available. (Source - Local Flexibility Markets in Europe Critical Review (2025))

Method	Computation	Used by
Historical: Average	Average power flow, same time window, past X days grouped by day type	Enedis, Swedish DSOs (sthlmflex), Areti, Unareti
Historical: Average + correction	As above, adjusted by actual flow H hours pre-activation	E-distribuzione (uses 2-hour window)
Historical: Median	Median (not average) over X similar days	Enedis
Historical: Mean X-in-Y	Average over X best days of last Y, excluding extremes	All British DSOs
Historical: Mean X-in-Y + correction	As above, with H-hour pre-activation correction	British DSOs, E-REDES (Portugal)
Historical: K-Nearest Neighbors	ML-selected closest days among last Y days	Enedis (consumption units only)
Recent data: Average	Average power flow, last H hours	Enedis, E-distribuzione, Elektro Ljubljana
Recent data: Trapezoidal	Linear interpolation between H hours before/after activation	Enedis (consumption only)
Benchmark	Weighted average of similar units not providing flexibility	Enedis (consumption, wind, solar)
Zero	No power exchange assumed; any deviation = delivered flexibility	All British DSOs
User-nominated	BSP’s own forecasting model, subject to DSO approval	British DSOs, Enedis, Swedish DSOs (sthlmflex)
None	N/A — bid is a schedule modification	Netherlands GOPACS

The key limitation of historical baselines: weather-driven demand shifts between the observation period and the activation day can cause large errors. Correction factors (actual flow H hours before activation) partially mitigate this, but no single method suits all DER types. Swedish DSOs (sthlmflex) offered BSPs the user-nominated approach — own forecasting model, DSO-approved — placing the accuracy burden on the FSP while giving them maximum flexibility. The Netherlands GOPACS model avoids baselines entirely by defining each bid as a modification to a commercial schedule; this is conceptually similar to capacity-limitation products (see below).

Capacity-limit products as a baseline alternative

The European Commission’s 2025 LFM study (VITO) formally endorses capacity-limiting products (operating envelopes) as a structurally different approach that avoids baseline calculation entirely. Instead of measuring a flexibility volume (MWh change vs. counterfactual), the product defines a power cap at the connection point: the FSP commits to staying below (or above) a specified MW threshold. Delivery is verified against the cap, not against a counterfactual baseline.

This approach is particularly suited for:

LV grid congestion where individual sub-metering is impractical
Cases where baseline manipulation risk is high (batteries, weather-driven DG)
Early-stage markets where settlement complexity is a participation barrier

The trade-off: capacity-limit products require different clearing algorithms and cannot be easily stacked with energy-based products in the same market session. They represent a different product architecture, not just a different measurement method. (Source - EC LFM Specification and Design Criteria (VITO, 2025))

MaxUsage™ — the Swedish worked example, and a baseline caveat. NODES MaxUsage™ at Effekthandel Väst is the clearest operational Swedish capacity-limit product: the FSP and DSO jointly set a consumption ceiling for fixed peak hours and the FSP is paid for staying under it — verified by direct metering, with no per-event baseline (Renova capped at 75 kW for 07:00–10:00; GoCo halved workplace EV-charging power 08:00–12:00). But it only appears baseline-free: the value is benchmarked against a historical-consumption reference (Renova’s ~300 kW historical draw in those hours), which is a counterfactual that drifts as efficiency and behaviour change. That drift is precisely why Kinnekulle Energi discontinued the product (GKT’s 450 kW reference shifted between seasons, confounding validation), even as Effekthandel Väst scaled it. The lesson for baseline design: a capacity-limit product removes the baseline from settlement but not from valuation unless the cap is anchored to a firm, non-drifting reference (e.g., contracted capacity) rather than historical consumption. (Source - Effekthandel Väst Produkter och MaxUsage (NODES, 2024))

FSP experience — baseline as a participation barrier

Palm et al. (2023) collected qualitative evidence from CoordiNet Uppland and Skåne FSPs and PFSPs showing that baseline calculation is a genuine day-to-day barrier, not just a theoretical design concern. (Source - Palm et al LFM Drivers and Barriers (2023))

Conceptual difficulty: Several FSPs found the concept of “what would we have consumed” genuinely confusing — “You could say that you had intended to consume something, but it may be untrue what you come up with. You might claim that you reduced the consumption a lot, but it might have happened anyway.” This uncertainty was particularly acute for consumption-side flexibility, where the counterfactual is less obvious than for a generator.

Baseline manipulation — the sports arena case: A story circulated among CoordiNet participants of a sports arena that turned on all its lighting just before the baseline measurement window to artificially inflate its reference load — then received payment for the resulting apparent (but not real) reduction in consumption. This incident damaged trust in the settlement mechanism among other participants and reinforced the information barrier: FSPs were not only confused by how baselines worked, but actively worried about the integrity of a system that could be gamed.

This corroborates the SWITCH response (Source - BeFlexible D5.1 Demo Planning and Deployment (2024)): after a 2022/23 risk assessment, E.ON moved resource control to the DSO (not FSP) as a preemptive anti-manipulation measure — transferring responsibility for the measurement reference from the resource owner to the market operator.

D-1 timing and real-time information loss: FSPs also noted that committing to a baseline a full day ahead means losing all real-time information that becomes available between bid submission and activation. One FSP: “In CoordiNet, you lose all information that is added between the day before and the control occasion … you can have a very good forecast, but it is a dynamic system.” This is less a baseline method problem than a market timing problem, but it compounds baseline uncertainty for FSPs — their baseline was set without access to information that only became available on the activation day.

Submetering as enabling infrastructure for baseline accuracy

A structural solution to the multi-DER baseline problem is dedicated measuring devices (DMDs) — circuit-level meters that measure a specific DER (EV charger, heat pump, battery) independently of total building consumption. See Submetering for the full concept.

DMDs solve the attribution problem for mixed-DER portfolios:

Without a DMD, the whole-building meter contains the activation signal plus all unrelated household load variations — making the baseline inherently noisy
With a DMD, only the specific DER’s pre/post consumption is captured; the baseline is clean and device-specific
Per-device attribution enables applying different methods to different DER types in the same portfolio: zero baseline for the battery’s injection output, MBMA for the heat pump’s consumption reduction

Art. 7b of Regulation 2024/1747 (EMD Reform) creates a customer right to install DMDs; implementation rules at member-state level are pending. The Network Code on Demand Response T&C development will be where Sweden defines how DMD data must be used in baseline calculations and whether settlement-grade or indicative-grade measurement is required for different products. (Source - Submetering for Flexibility Services Comillas (2024))

An EU-level review of IA–DER commercial practices confirms that baseline accuracy directly determines FSP revenue fairness: a systematically high baseline over-rewards the FSP; a low baseline under-rewards real flexibility delivery. Aggregators managing third-party DERs have a fiduciary interest in accurate baselines that cannot be provided by whole-building meters alone. (Source - Aggregators DR Relationships Comillas (2025))

Regulatory horizon

The Network Code on Demand Response will require standardised settlement processes for flexibility markets across the EU. ACER’s NC DR text (Art. 14) calls for baseline methods that must be “recalculable, transparent, precise, accurate, and unbiased” and requires a register of all MS-approved baseline methods. NC DR Art. 35 explicitly allows alternatives that do not require baselines (such as capacity-limit products), providing flexibility to Member States.

The European Commission study (2025) sets a medium-term target of baseline methods coordinated at MS level and informed by an EU-level baseline library containing all approved national approaches — a structured convergence mechanism that avoids forced harmonisation while enabling knowledge sharing.

NC DR Art. 43–44 links settlement requirements to DNDP flexibility procurement, which will bring SWITCH’s and NODES’s current informal methods into scope of regulatory review. Sweden’s ~6-year operational experience with MBMA, noll-referens, and rolling-average methods will be directly relevant input to the national T&C baseline definition process. (Source - EC LFM Specification and Design Criteria (VITO, 2025))