Baseline Methods
A baseline (referensplan) is the estimated level of electricity consumption or production a flexibility service provider (FSP) would have had at a given time, absent any flexibility activation. It is the counterfactual against which delivered flexibility is measured. The baseline problem is central to Flexibility Market settlement: without a credible baseline, it is impossible to verify how much flexibility was actually delivered, or to pay FSPs correctly.
The baseline is, in effect, a forecast with an adversary — which is why “good” forecasting for flex markets is defined by manipulation-resistance and recalculability, not just accuracy. See STLF for Flexibility Markets — What Counts as Good and How to Achieve It.
Why baselines are hard for DERs
In traditional balancing markets, large generators commit to a schedule in the day-ahead market; the schedule serves as the baseline, and flexibility delivery is measured as deviation from it. For distributed energy resources (DERs), no such individual schedule exists. A heat pump, battery, or EV charger has no pre-committed output plan visible to the market. The flexibility buyer (DSO or TSO) must therefore estimate what the resource would have done — a fundamentally uncertain exercise. (Source - Lind et al Baseline Methods (2023))
Baseline design has direct consequences for:
- FSP revenue: a systematically high baseline inflates apparent flexibility delivery and over-rewards the FSP
- Market integrity: FSPs may be able to game certain methods by changing behaviour before the baseline measurement window (e.g. charging a battery just before an MBMA reading)
- Market participation: overly complex baseline methods are a barrier to entry for smaller FSPs
Methods
Nine methods are established in the literature and practice:
| Method | Mechanism | Best for | Key weakness |
|---|---|---|---|
| XofY | Average of X highest/mid/lowest days from last Y eligible days | Consumption-side load-DR, upward activation | Upward bias (HighXofY); poor for weather-driven DG/storage |
| Rolling average | Average of last X same-type days, recency-weighted | Load-DR, stable consumption patterns | Same failure modes as XofY for variable DG/ESS |
| Comparable day | FSP chooses an ex-post reference day from history | Non-controllable DG (wind/solar) | Low integrity — FSP selects own reference |
| Regression | Statistical model of baseline as function of weather, time, past consumption | Load-DR, PV/wind with weather covariates | Complex; low simplicity |
| Machine learning | ML/neural network predicts baseline | Non-controllable DG, complex portfolios | Very low simplicity; interpretability issues |
| MBMA | Meter reading immediately before activation = baseline | Balancing services, short-duration activation | Integrity risk for batteries (see below); degrades over long activations |
| Zero baseline | Baseline = 0; any output during activation = delivered flexibility | Backup generators; batteries providing production-side upward flexibility | Not applicable to consumption-side DR |
| Control group | Average profile of similar non-activating customers | Multi-DER aggregation, behavioural DR programs | Requires valid comparison group; low integrity |
| Capacity limitation | Product defined as a power cap; no energy-delta baseline needed for clearing | DSO congestion management | Different clearing algorithms required; energy delivery validation still needed post-activation |
| Self-reported | FSP declares its own baseline | Large industrial FSPs | Low integrity without independent verification |
MBMA and the battery integrity problem
MBMA (Meter-Before-Meter-After) reads the meter immediately before activation. If that reading is used directly as the baseline, a battery operator providing upward flexibility via increased injection can game the method: by switching from charging to discharging just before the pre-activation reading, the apparent baseline decreases (or goes negative), making the subsequent injection appear larger than it really is — inflating measured delivery and payment.
The correct method for batteries providing production-side upward flexibility is zero baseline: any injection during activation counts as delivered, with no prior-period manipulation possible.
This is the approach used in SWITCH: batteries providing increased production use zero baseline (noll-referens), while MBMA is reserved for consumption-side resources. (Source - SWITCH User Documentation (2026))
Capacity clearing vs. energy settlement
A common misconception: that capacity-based products (where clearing is in MW, not MWh) eliminate the need for a baseline. This is true for the clearing step — the DSO procures a capacity commitment and pays an availability fee in SEK/MW. But energy delivery validation still requires a baseline. When the DSO activates, the FSP must deliver the agreed energy volume; comparing metered output to the baseline determines whether the 75% delivery threshold was met and what the activation payment should be.
In SWITCH‘s TO (Tillgänglighetsordrar) and DO (Direktordrar) products: the market clearing is MW-based (capacity bids), but the FSP is still expected to deliver the energy amount awarded. A baseline is therefore required for post-activation settlement, even though clearing used capacity logic.
Method selection by DER type
| DER type | Recommended method | Rationale |
|---|---|---|
| Consumption load-DR (upward, stable loads) | XofY or rolling average + same-day adjustment | Consumption relatively predictable from history |
| Consumption load-DR (balancing, short-duration) | MBMA | No time for ex-ante calculation; MBMA is accurate for short windows |
| Non-controllable DG (wind, solar) | Regression or ML (with weather data) | Output driven by weather, not history; averaging methods fail |
| Controllable DG (backup generators, CHP) | Zero baseline | Backup generator has zero output when idle — zero IS the baseline |
| Battery — consumption side (charging) | MBMA or rolling average | Charging pattern may be predictable |
| Battery — production side (discharging/injection) | Zero baseline | MBMA integrity risk; zero eliminates gaming incentive |
| Multi-DER aggregation (same type) | Per-type method applied at portfolio level | Sum of individual baselines works for MBMA and zero |
| Multi-DER aggregation (mixed types) | Comparable day, control group, or submetering per technology | No single method covers mixed portfolios accurately |
Swedish market practice
SWITCH offers three methods (Source - SWITCH User Documentation (2026)):
- Egen referensplan — FSP uploads own baseline (UI or API); deadline D-1 09:30 for DA, H-4 for ID
- Noll-referens (zero baseline) — used for production resources, especially batteries providing upward flexibility via injection
- MBMA — automatic calculation from resource metering; default for consumption-side resources
The SWITCH documentation locks down baseline integrity: baseline values cannot be uploaded or changed after a flexibility trade has occurred on that market. Resource control was moved to the DSO after a 2022/23 risk assessment identified that FSPs could manipulate resource registration data; the control change was a preemptive measure, not a response to confirmed abuse. (Source - BeFlexible D5.1 Demo Planning and Deployment (2024))
NODES (via sthlmflex) used a 5-day rolling average as the standard method: same hour across the 5 preceding working days. FSPs could alternatively upload their own baseline or agree an alternative with the buying DSO. (Source - sthlmflex säsong 3 (2022-2023))
European LFM baseline practices
Sassone et al. (2025) document the baseline methodologies deployed across 7 operational European LFMs — the most comprehensive comparative dataset available. (Source - Local Flexibility Markets in Europe Critical Review (2025))
| Method | Computation | Used by |
|---|---|---|
| Historical: Average | Average power flow, same time window, past X days grouped by day type | Enedis, Swedish DSOs (sthlmflex), Areti, Unareti |
| Historical: Average + correction | As above, adjusted by actual flow H hours pre-activation | E-distribuzione (uses 2-hour window) |
| Historical: Median | Median (not average) over X similar days | Enedis |
| Historical: Mean X-in-Y | Average over X best days of last Y, excluding extremes | All British DSOs |
| Historical: Mean X-in-Y + correction | As above, with H-hour pre-activation correction | British DSOs, E-REDES (Portugal) |
| Historical: K-Nearest Neighbors | ML-selected closest days among last Y days | Enedis (consumption units only) |
| Recent data: Average | Average power flow, last H hours | Enedis, E-distribuzione, Elektro Ljubljana |
| Recent data: Trapezoidal | Linear interpolation between H hours before/after activation | Enedis (consumption only) |
| Benchmark | Weighted average of similar units not providing flexibility | Enedis (consumption, wind, solar) |
| Zero | No power exchange assumed; any deviation = delivered flexibility | All British DSOs |
| User-nominated | BSP’s own forecasting model, subject to DSO approval | British DSOs, Enedis, Swedish DSOs (sthlmflex) |
| None | N/A — bid is a schedule modification | Netherlands GOPACS |
The key limitation of historical baselines: weather-driven demand shifts between the observation period and the activation day can cause large errors. Correction factors (actual flow H hours before activation) partially mitigate this, but no single method suits all DER types. Swedish DSOs (sthlmflex) offered BSPs the user-nominated approach — own forecasting model, DSO-approved — placing the accuracy burden on the FSP while giving them maximum flexibility. The Netherlands GOPACS model avoids baselines entirely by defining each bid as a modification to a commercial schedule; this is conceptually similar to capacity-limitation products (see below).
Capacity-limit products as a baseline alternative
The European Commission’s 2025 LFM study (VITO) formally endorses capacity-limiting products (operating envelopes) as a structurally different approach that avoids baseline calculation entirely. Instead of measuring a flexibility volume (MWh change vs. counterfactual), the product defines a power cap at the connection point: the FSP commits to staying below (or above) a specified MW threshold. Delivery is verified against the cap, not against a counterfactual baseline.
This approach is particularly suited for:
- LV grid congestion where individual sub-metering is impractical
- Cases where baseline manipulation risk is high (batteries, weather-driven DG)
- Early-stage markets where settlement complexity is a participation barrier
The trade-off: capacity-limit products require different clearing algorithms and cannot be easily stacked with energy-based products in the same market session. They represent a different product architecture, not just a different measurement method. (Source - EC LFM Specification and Design Criteria (VITO, 2025))
MaxUsage™ — the Swedish worked example, and a baseline caveat. NODES MaxUsage™ at Effekthandel Väst is the clearest operational Swedish capacity-limit product: the FSP and DSO jointly set a consumption ceiling for fixed peak hours and the FSP is paid for staying under it — verified by direct metering, with no per-event baseline (Renova capped at 75 kW for 07:00–10:00; GoCo halved workplace EV-charging power 08:00–12:00). But it only appears baseline-free: the value is benchmarked against a historical-consumption reference (Renova’s ~300 kW historical draw in those hours), which is a counterfactual that drifts as efficiency and behaviour change. That drift is precisely why Kinnekulle Energi discontinued the product (GKT’s 450 kW reference shifted between seasons, confounding validation), even as Effekthandel Väst scaled it. The lesson for baseline design: a capacity-limit product removes the baseline from settlement but not from valuation unless the cap is anchored to a firm, non-drifting reference (e.g., contracted capacity) rather than historical consumption. (Source - Effekthandel Väst Produkter och MaxUsage (NODES, 2024))
FSP experience — baseline as a participation barrier
Palm et al. (2023) collected qualitative evidence from CoordiNet Uppland and Skåne FSPs and PFSPs showing that baseline calculation is a genuine day-to-day barrier, not just a theoretical design concern. (Source - Palm et al LFM Drivers and Barriers (2023))
Conceptual difficulty: Several FSPs found the concept of “what would we have consumed” genuinely confusing — “You could say that you had intended to consume something, but it may be untrue what you come up with. You might claim that you reduced the consumption a lot, but it might have happened anyway.” This uncertainty was particularly acute for consumption-side flexibility, where the counterfactual is less obvious than for a generator.
Baseline manipulation — the sports arena case: A story circulated among CoordiNet participants of a sports arena that turned on all its lighting just before the baseline measurement window to artificially inflate its reference load — then received payment for the resulting apparent (but not real) reduction in consumption. This incident damaged trust in the settlement mechanism among other participants and reinforced the information barrier: FSPs were not only confused by how baselines worked, but actively worried about the integrity of a system that could be gamed.
This corroborates the SWITCH response (Source - BeFlexible D5.1 Demo Planning and Deployment (2024)): after a 2022/23 risk assessment, E.ON moved resource control to the DSO (not FSP) as a preemptive anti-manipulation measure — transferring responsibility for the measurement reference from the resource owner to the market operator.
D-1 timing and real-time information loss: FSPs also noted that committing to a baseline a full day ahead means losing all real-time information that becomes available between bid submission and activation. One FSP: “In CoordiNet, you lose all information that is added between the day before and the control occasion … you can have a very good forecast, but it is a dynamic system.” This is less a baseline method problem than a market timing problem, but it compounds baseline uncertainty for FSPs — their baseline was set without access to information that only became available on the activation day.
Submetering as enabling infrastructure for baseline accuracy
A structural solution to the multi-DER baseline problem is dedicated measuring devices (DMDs) — circuit-level meters that measure a specific DER (EV charger, heat pump, battery) independently of total building consumption. See Submetering for the full concept.
DMDs solve the attribution problem for mixed-DER portfolios:
- Without a DMD, the whole-building meter contains the activation signal plus all unrelated household load variations — making the baseline inherently noisy
- With a DMD, only the specific DER’s pre/post consumption is captured; the baseline is clean and device-specific
- Per-device attribution enables applying different methods to different DER types in the same portfolio: zero baseline for the battery’s injection output, MBMA for the heat pump’s consumption reduction
Art. 7b of Regulation 2024/1747 (EMD Reform) creates a customer right to install DMDs; implementation rules at member-state level are pending. The Network Code on Demand Response T&C development will be where Sweden defines how DMD data must be used in baseline calculations and whether settlement-grade or indicative-grade measurement is required for different products. (Source - Submetering for Flexibility Services Comillas (2024))
An EU-level review of IA–DER commercial practices confirms that baseline accuracy directly determines FSP revenue fairness: a systematically high baseline over-rewards the FSP; a low baseline under-rewards real flexibility delivery. Aggregators managing third-party DERs have a fiduciary interest in accurate baselines that cannot be provided by whole-building meters alone. (Source - Aggregators DR Relationships Comillas (2025))
Regulatory horizon
The Network Code on Demand Response will require standardised settlement processes for flexibility markets across the EU. ACER’s NC DR text (Art. 14) calls for baseline methods that must be “recalculable, transparent, precise, accurate, and unbiased” and requires a register of all MS-approved baseline methods. NC DR Art. 35 explicitly allows alternatives that do not require baselines (such as capacity-limit products), providing flexibility to Member States.
The European Commission study (2025) sets a medium-term target of baseline methods coordinated at MS level and informed by an EU-level baseline library containing all approved national approaches — a structured convergence mechanism that avoids forced harmonisation while enabling knowledge sharing.
NC DR Art. 43–44 links settlement requirements to DNDP flexibility procurement, which will bring SWITCH’s and NODES’s current informal methods into scope of regulatory review. Sweden’s ~6-year operational experience with MBMA, noll-referens, and rolling-average methods will be directly relevant input to the national T&C baseline definition process. (Source - EC LFM Specification and Design Criteria (VITO, 2025))