FlexBaseline Methods

Baseline Methods


A baseline (referensplan) is the estimated level of electricity consumption or production a flexibility service provider (FSP) would have had at a given time, absent any flexibility activation. It is the counterfactual against which delivered flexibility is measured. The baseline problem is central to Flexibility Market settlement: without a credible baseline, it is impossible to verify how much flexibility was actually delivered, or to pay FSPs correctly.

The baseline is, in effect, a forecast with an adversary — which is why “good” forecasting for flex markets is defined by manipulation-resistance and recalculability, not just accuracy. See STLF for Flexibility Markets — What Counts as Good and How to Achieve It.

Why baselines are hard for DERs

In traditional balancing markets, large generators commit to a schedule in the day-ahead market; the schedule serves as the baseline, and flexibility delivery is measured as deviation from it. For distributed energy resources (DERs), no such individual schedule exists. A heat pump, battery, or EV charger has no pre-committed output plan visible to the market. The flexibility buyer (DSO or TSO) must therefore estimate what the resource would have done — a fundamentally uncertain exercise. (Source - Lind et al Baseline Methods (2023))

Baseline design has direct consequences for:

  • FSP revenue: a systematically high baseline inflates apparent flexibility delivery and over-rewards the FSP
  • Market integrity: FSPs may be able to game certain methods by changing behaviour before the baseline measurement window (e.g. charging a battery just before an MBMA reading)
  • Market participation: overly complex baseline methods are a barrier to entry for smaller FSPs

Methods

Nine methods are established in the literature and practice:

MethodMechanismBest forKey weakness
XofYAverage of X highest/mid/lowest days from last Y eligible daysConsumption-side load-DR, upward activationUpward bias (HighXofY); poor for weather-driven DG/storage
Rolling averageAverage of last X same-type days, recency-weightedLoad-DR, stable consumption patternsSame failure modes as XofY for variable DG/ESS
Comparable dayFSP chooses an ex-post reference day from historyNon-controllable DG (wind/solar)Low integrity — FSP selects own reference
RegressionStatistical model of baseline as function of weather, time, past consumptionLoad-DR, PV/wind with weather covariatesComplex; low simplicity
Machine learningML/neural network predicts baselineNon-controllable DG, complex portfoliosVery low simplicity; interpretability issues
MBMAMeter reading immediately before activation = baselineBalancing services, short-duration activationIntegrity risk for batteries (see below); degrades over long activations
Zero baselineBaseline = 0; any output during activation = delivered flexibilityBackup generators; batteries providing production-side upward flexibilityNot applicable to consumption-side DR
Control groupAverage profile of similar non-activating customersMulti-DER aggregation, behavioural DR programsRequires valid comparison group; low integrity
Capacity limitationProduct defined as a power cap; no energy-delta baseline needed for clearingDSO congestion managementDifferent clearing algorithms required; energy delivery validation still needed post-activation
Self-reportedFSP declares its own baselineLarge industrial FSPsLow integrity without independent verification

MBMA and the battery integrity problem

MBMA (Meter-Before-Meter-After) reads the meter immediately before activation. If that reading is used directly as the baseline, a battery operator providing upward flexibility via increased injection can game the method: by switching from charging to discharging just before the pre-activation reading, the apparent baseline decreases (or goes negative), making the subsequent injection appear larger than it really is — inflating measured delivery and payment.

The correct method for batteries providing production-side upward flexibility is zero baseline: any injection during activation counts as delivered, with no prior-period manipulation possible.

This is the approach used in SWITCH: batteries providing increased production use zero baseline (noll-referens), while MBMA is reserved for consumption-side resources. (Source - SWITCH User Documentation (2026))

Capacity clearing vs. energy settlement

A common misconception: that capacity-based products (where clearing is in MW, not MWh) eliminate the need for a baseline. This is true for the clearing step — the DSO procures a capacity commitment and pays an availability fee in SEK/MW. But energy delivery validation still requires a baseline. When the DSO activates, the FSP must deliver the agreed energy volume; comparing metered output to the baseline determines whether the 75% delivery threshold was met and what the activation payment should be.

In SWITCH‘s TO (Tillgänglighetsordrar) and DO (Direktordrar) products: the market clearing is MW-based (capacity bids), but the FSP is still expected to deliver the energy amount awarded. A baseline is therefore required for post-activation settlement, even though clearing used capacity logic.

Method selection by DER type

DER typeRecommended methodRationale
Consumption load-DR (upward, stable loads)XofY or rolling average + same-day adjustmentConsumption relatively predictable from history
Consumption load-DR (balancing, short-duration)MBMANo time for ex-ante calculation; MBMA is accurate for short windows
Non-controllable DG (wind, solar)Regression or ML (with weather data)Output driven by weather, not history; averaging methods fail
Controllable DG (backup generators, CHP)Zero baselineBackup generator has zero output when idle — zero IS the baseline
Battery — consumption side (charging)MBMA or rolling averageCharging pattern may be predictable
Battery — production side (discharging/injection)Zero baselineMBMA integrity risk; zero eliminates gaming incentive
Multi-DER aggregation (same type)Per-type method applied at portfolio levelSum of individual baselines works for MBMA and zero
Multi-DER aggregation (mixed types)Comparable day, control group, or submetering per technologyNo single method covers mixed portfolios accurately

Swedish market practice

SWITCH offers three methods (Source - SWITCH User Documentation (2026)):

  1. Egen referensplan — FSP uploads own baseline (UI or API); deadline D-1 09:30 for DA, H-4 for ID
  2. Noll-referens (zero baseline) — used for production resources, especially batteries providing upward flexibility via injection
  3. MBMA — automatic calculation from resource metering; default for consumption-side resources

The SWITCH documentation locks down baseline integrity: baseline values cannot be uploaded or changed after a flexibility trade has occurred on that market. Resource control was moved to the DSO after a 2022/23 risk assessment identified that FSPs could manipulate resource registration data; the control change was a preemptive measure, not a response to confirmed abuse. (Source - BeFlexible D5.1 Demo Planning and Deployment (2024))

NODES (via sthlmflex) used a 5-day rolling average as the standard method: same hour across the 5 preceding working days. FSPs could alternatively upload their own baseline or agree an alternative with the buying DSO. (Source - sthlmflex säsong 3 (2022-2023))

European LFM baseline practices

Sassone et al. (2025) document the baseline methodologies deployed across 7 operational European LFMs — the most comprehensive comparative dataset available. (Source - Local Flexibility Markets in Europe Critical Review (2025))

MethodComputationUsed by
Historical: AverageAverage power flow, same time window, past X days grouped by day typeEnedis, Swedish DSOs (sthlmflex), Areti, Unareti
Historical: Average + correctionAs above, adjusted by actual flow H hours pre-activationE-distribuzione (uses 2-hour window)
Historical: MedianMedian (not average) over X similar daysEnedis
Historical: Mean X-in-YAverage over X best days of last Y, excluding extremesAll British DSOs
Historical: Mean X-in-Y + correctionAs above, with H-hour pre-activation correctionBritish DSOs, E-REDES (Portugal)
Historical: K-Nearest NeighborsML-selected closest days among last Y daysEnedis (consumption units only)
Recent data: AverageAverage power flow, last H hoursEnedis, E-distribuzione, Elektro Ljubljana
Recent data: TrapezoidalLinear interpolation between H hours before/after activationEnedis (consumption only)
BenchmarkWeighted average of similar units not providing flexibilityEnedis (consumption, wind, solar)
ZeroNo power exchange assumed; any deviation = delivered flexibilityAll British DSOs
User-nominatedBSP’s own forecasting model, subject to DSO approvalBritish DSOs, Enedis, Swedish DSOs (sthlmflex)
NoneN/A — bid is a schedule modificationNetherlands GOPACS

The key limitation of historical baselines: weather-driven demand shifts between the observation period and the activation day can cause large errors. Correction factors (actual flow H hours before activation) partially mitigate this, but no single method suits all DER types. Swedish DSOs (sthlmflex) offered BSPs the user-nominated approach — own forecasting model, DSO-approved — placing the accuracy burden on the FSP while giving them maximum flexibility. The Netherlands GOPACS model avoids baselines entirely by defining each bid as a modification to a commercial schedule; this is conceptually similar to capacity-limitation products (see below).

Capacity-limit products as a baseline alternative

The European Commission’s 2025 LFM study (VITO) formally endorses capacity-limiting products (operating envelopes) as a structurally different approach that avoids baseline calculation entirely. Instead of measuring a flexibility volume (MWh change vs. counterfactual), the product defines a power cap at the connection point: the FSP commits to staying below (or above) a specified MW threshold. Delivery is verified against the cap, not against a counterfactual baseline.

This approach is particularly suited for:

  • LV grid congestion where individual sub-metering is impractical
  • Cases where baseline manipulation risk is high (batteries, weather-driven DG)
  • Early-stage markets where settlement complexity is a participation barrier

The trade-off: capacity-limit products require different clearing algorithms and cannot be easily stacked with energy-based products in the same market session. They represent a different product architecture, not just a different measurement method. (Source - EC LFM Specification and Design Criteria (VITO, 2025))

MaxUsage™ — the Swedish worked example, and a baseline caveat. NODES MaxUsage™ at Effekthandel Väst is the clearest operational Swedish capacity-limit product: the FSP and DSO jointly set a consumption ceiling for fixed peak hours and the FSP is paid for staying under it — verified by direct metering, with no per-event baseline (Renova capped at 75 kW for 07:00–10:00; GoCo halved workplace EV-charging power 08:00–12:00). But it only appears baseline-free: the value is benchmarked against a historical-consumption reference (Renova’s ~300 kW historical draw in those hours), which is a counterfactual that drifts as efficiency and behaviour change. That drift is precisely why Kinnekulle Energi discontinued the product (GKT’s 450 kW reference shifted between seasons, confounding validation), even as Effekthandel Väst scaled it. The lesson for baseline design: a capacity-limit product removes the baseline from settlement but not from valuation unless the cap is anchored to a firm, non-drifting reference (e.g., contracted capacity) rather than historical consumption. (Source - Effekthandel Väst Produkter och MaxUsage (NODES, 2024))

FSP experience — baseline as a participation barrier

Palm et al. (2023) collected qualitative evidence from CoordiNet Uppland and Skåne FSPs and PFSPs showing that baseline calculation is a genuine day-to-day barrier, not just a theoretical design concern. (Source - Palm et al LFM Drivers and Barriers (2023))

Conceptual difficulty: Several FSPs found the concept of “what would we have consumed” genuinely confusing — “You could say that you had intended to consume something, but it may be untrue what you come up with. You might claim that you reduced the consumption a lot, but it might have happened anyway.” This uncertainty was particularly acute for consumption-side flexibility, where the counterfactual is less obvious than for a generator.

Baseline manipulation — the sports arena case: A story circulated among CoordiNet participants of a sports arena that turned on all its lighting just before the baseline measurement window to artificially inflate its reference load — then received payment for the resulting apparent (but not real) reduction in consumption. This incident damaged trust in the settlement mechanism among other participants and reinforced the information barrier: FSPs were not only confused by how baselines worked, but actively worried about the integrity of a system that could be gamed.

This corroborates the SWITCH response (Source - BeFlexible D5.1 Demo Planning and Deployment (2024)): after a 2022/23 risk assessment, E.ON moved resource control to the DSO (not FSP) as a preemptive anti-manipulation measure — transferring responsibility for the measurement reference from the resource owner to the market operator.

D-1 timing and real-time information loss: FSPs also noted that committing to a baseline a full day ahead means losing all real-time information that becomes available between bid submission and activation. One FSP: “In CoordiNet, you lose all information that is added between the day before and the control occasion … you can have a very good forecast, but it is a dynamic system.” This is less a baseline method problem than a market timing problem, but it compounds baseline uncertainty for FSPs — their baseline was set without access to information that only became available on the activation day.

Submetering as enabling infrastructure for baseline accuracy

A structural solution to the multi-DER baseline problem is dedicated measuring devices (DMDs) — circuit-level meters that measure a specific DER (EV charger, heat pump, battery) independently of total building consumption. See Submetering for the full concept.

DMDs solve the attribution problem for mixed-DER portfolios:

  • Without a DMD, the whole-building meter contains the activation signal plus all unrelated household load variations — making the baseline inherently noisy
  • With a DMD, only the specific DER’s pre/post consumption is captured; the baseline is clean and device-specific
  • Per-device attribution enables applying different methods to different DER types in the same portfolio: zero baseline for the battery’s injection output, MBMA for the heat pump’s consumption reduction

Art. 7b of Regulation 2024/1747 (EMD Reform) creates a customer right to install DMDs; implementation rules at member-state level are pending. The Network Code on Demand Response T&C development will be where Sweden defines how DMD data must be used in baseline calculations and whether settlement-grade or indicative-grade measurement is required for different products. (Source - Submetering for Flexibility Services Comillas (2024))

An EU-level review of IA–DER commercial practices confirms that baseline accuracy directly determines FSP revenue fairness: a systematically high baseline over-rewards the FSP; a low baseline under-rewards real flexibility delivery. Aggregators managing third-party DERs have a fiduciary interest in accurate baselines that cannot be provided by whole-building meters alone. (Source - Aggregators DR Relationships Comillas (2025))

Regulatory horizon

The Network Code on Demand Response will require standardised settlement processes for flexibility markets across the EU. ACER’s NC DR text (Art. 14) calls for baseline methods that must be “recalculable, transparent, precise, accurate, and unbiased” and requires a register of all MS-approved baseline methods. NC DR Art. 35 explicitly allows alternatives that do not require baselines (such as capacity-limit products), providing flexibility to Member States.

The European Commission study (2025) sets a medium-term target of baseline methods coordinated at MS level and informed by an EU-level baseline library containing all approved national approaches — a structured convergence mechanism that avoids forced harmonisation while enabling knowledge sharing.

NC DR Art. 43–44 links settlement requirements to DNDP flexibility procurement, which will bring SWITCH’s and NODES’s current informal methods into scope of regulatory review. Sweden’s ~6-year operational experience with MBMA, noll-referens, and rolling-average methods will be directly relevant input to the national T&C baseline definition process. (Source - EC LFM Specification and Design Criteria (VITO, 2025))