Predicting home electricity usage based on historical patterns in Home Assistant

March 21, 2026

I use Home Assistant for automation, so the solution described uses Home Assistant's sensor configuration and database layout, however the overall idea should be more widely applicable.

I've been trying to minimise what we pay for electricity at home, by optimising the times at which e.g. house battery charges or water heats up. To do that it's useful to be able to predict future load. Why does this help ? Imagine a situation where you want to decide if to charge battery from grid, and by how much. If current battery charge will fully cover your predicted load until e.g. sun comes up and gives you solar you're golden, but if not it might make sense to charge from the grid.

As an aside, I am also in the middle of writing optimizer software for solving the whole problem (solar, battery, heat pump setup) using MILP solver but this is definitely a separate post.

As I'm on a tariff where price changes every half hour, I would like the resolution of the prediction also be in half hourly slots so that we can also predict the cost of future electricity usage.

Our house load differs quite a bit based on the time of the day as well a day of the week, weekends being much different than the working days etc. Idea is to use historical data, find what happened in the half hourly slot we're interested in in the past, e.g. if we're interested in load prediction for 9:30-10:00 for a Wednesday morning, let's look at previous 9:30-10:00 slots on previous Wednesdays and average them out, with most recent being the most important. I then blend it with the respective slots from last 2 days, just to catch any new emerging patterns, sometimes we have longer running loads that we switch on very ocasionally.

What follows is what I settled on after a bit of tuning - the weights are what worked for my load patterns but as you can see this week had more load overall, so we're underpredicting in some cases. Here's last week of data, if you squint you can see that the overall shape is correct, with some outliers.

And here's the implementation - I'm using the statistics table in addition to the states table because states typically only have a week of data. statistics are only kept with one data point per hour unfortunately so we need to extrapolate half hourly from that.

Feel free to take this code and play with it - adjust the weights and underlying sensor name.

Not sure how idiomatic the output format is, maybe there's a better way, but I am parsing it in a custom piece of code separately anyway so does not make much difference to my usecase. The output is effectively a lookup table for the predicted load at different days of the week and times, as computed using data from the last 8 weeks; where you can ask it for a day of the week and a slot, so e.g. 0_09:00 is 9:00-9:30 on Sunday.

# Day-of-week-aware half-hourly total load, recency-weighted with recent-day blending.
# Returns a week of entries (336) keyed by "{dow}_{HH:MM}" (e.g. "0_09:00" = Sunday 09:00).
# Uses a hybrid approach:
#   - Last 7 days: States table with trapezoidal integration (high precision)
#   - 7-56 days: Statistics table with hourly data divided by 2
# Blends: 65% DOW weighted average + 25% yesterday same slot + 10% 2-days-ago same slot.
# OUTPUT: kWh per half-hour slot (predicted energy consumed during that 30-minute period).
- name: "predicted half hourly load by day of the week"
  unique_id: "half_hourly_load_by_dow_sql"
  query: >
    WITH
    recent_raw AS (
      SELECT
        strftime('%w', last_updated_ts, 'unixepoch', 'localtime') as dow,
        strftime('%H', last_updated_ts, 'unixepoch', 'localtime') as hour,
        CASE
          WHEN CAST(strftime('%M', last_updated_ts, 'unixepoch', 'localtime') AS INTEGER) < 30
          THEN '00' ELSE '30'
        END as minute,
        DATE(last_updated_ts, 'unixepoch', 'localtime') as date,
        last_updated_ts as ts,
        CAST(state AS FLOAT) as power_w
      FROM states s
      INNER JOIN states_meta sm ON s.metadata_id = sm.metadata_id
      WHERE sm.entity_id = 'sensor.growatt_invpowertolocalload'
        AND s.state NOT IN ('unknown', 'unavailable', '')
        AND CAST(s.state AS FLOAT) BETWEEN 0 AND 15000
        AND last_updated_ts >= unixepoch('now', '-7 days')
    ),
    with_intervals AS (
      SELECT
        dow, hour, minute, date, ts, power_w,
        LAG(ts) OVER (PARTITION BY dow, hour, minute, date ORDER BY ts) as prev_ts,
        LAG(power_w) OVER (PARTITION BY dow, hour, minute, date ORDER BY ts) as prev_power_w
      FROM recent_raw
    ),
    interval_areas AS (
      SELECT
        dow, hour, minute, date,
        ((power_w + prev_power_w) / 2.0) *
        ((ts - prev_ts) / 3600.0) as interval_wh
      FROM with_intervals
      WHERE prev_ts IS NOT NULL
        AND (ts - prev_ts) < 300
    ),
    recent_integrated AS (
      SELECT
        dow,
        hour || ':' || minute as time_slot,
        date,
        SUM(interval_wh) / 1000.0 as kwh_per_slot,
        COUNT(*) as interval_count
      FROM interval_areas
      GROUP BY dow, hour, minute, date
      HAVING COUNT(*) >= 3
    ),
    older_hourly AS (
      SELECT
        strftime('%H:00', start_ts, 'unixepoch', 'localtime') as hour_slot,
        strftime('%w', start_ts, 'unixepoch', 'localtime') as dow,
        DATE(start_ts, 'unixepoch', 'localtime') as date,
        state as energy_kwh,
        LAG(state) OVER (ORDER BY start_ts) as prev_energy_kwh
      FROM statistics s
      INNER JOIN statistics_meta sm ON s.metadata_id = sm.id
      WHERE sm.statistic_id = 'sensor.growatt_localloadenergytotal'
        AND s.state IS NOT NULL
        AND start_ts >= unixepoch('now', '-56 days')
        AND start_ts < unixepoch('now', '-7 days')
    ),
    older_half_hourly AS (
      SELECT
        dow,
        hour_slot,
        date,
        (energy_kwh - prev_energy_kwh) / 2.0 as kwh_per_slot
      FROM older_hourly
      WHERE prev_energy_kwh IS NOT NULL
        AND energy_kwh > prev_energy_kwh
        AND (energy_kwh - prev_energy_kwh) BETWEEN 0.025 AND 5.0
    ),
    older_expanded AS (
      SELECT dow, hour_slot as time_slot, date, kwh_per_slot FROM older_half_hourly
      UNION ALL
      SELECT dow, SUBSTR(hour_slot, 1, 3) || '30' as time_slot, date, kwh_per_slot
      FROM older_half_hourly
    ),
    combined_data AS (
      SELECT dow, time_slot, date, kwh_per_slot, 'states_integrated' as source
      FROM recent_integrated
      UNION ALL
      SELECT dow, time_slot, date, kwh_per_slot, 'statistics' as source
      FROM older_expanded
    ),
    dow_ranked AS (
      SELECT
        dow, time_slot, date, kwh_per_slot,
        ROW_NUMBER() OVER (PARTITION BY dow, time_slot ORDER BY date DESC) as rn
      FROM combined_data
      WHERE kwh_per_slot BETWEEN 0.01 AND 5.0
    ),
    dow_weighted AS (
      SELECT
        dow, time_slot,
        CASE
          WHEN COUNT(*) >= 3 THEN ROUND(
            SUM(kwh_per_slot * CASE rn
              WHEN 1 THEN 0.40 WHEN 2 THEN 0.25 WHEN 3 THEN 0.15
              WHEN 4 THEN 0.10 WHEN 5 THEN 0.06 WHEN 6 THEN 0.04 ELSE 0 END) /
            SUM(CASE rn
              WHEN 1 THEN 0.40 WHEN 2 THEN 0.25 WHEN 3 THEN 0.15
              WHEN 4 THEN 0.10 WHEN 5 THEN 0.06 WHEN 6 THEN 0.04 ELSE 0 END), 4)
          WHEN COUNT(*) >= 1 THEN ROUND(AVG(kwh_per_slot), 4)
          ELSE 0.15
        END as dow_avg_kwh
      FROM dow_ranked
      WHERE rn <= 6
      GROUP BY dow, time_slot
    ),
    yesterday_integrated AS (
      SELECT
        hour || ':' || minute as time_slot,
        SUM(interval_wh) / 1000.0 as kwh
      FROM (
        SELECT
          strftime('%H', last_updated_ts, 'unixepoch', 'localtime') as hour,
          CASE
            WHEN CAST(strftime('%M', last_updated_ts, 'unixepoch', 'localtime') AS INTEGER) < 30
            THEN '00' ELSE '30'
          END as minute,
          ((CAST(state AS FLOAT) + LAG(CAST(state AS FLOAT)) OVER (ORDER BY last_updated_ts)) / 2.0) *
          ((last_updated_ts - LAG(last_updated_ts) OVER (ORDER BY last_updated_ts)) / 3600.0) as interval_wh
        FROM states s
        INNER JOIN states_meta sm ON s.metadata_id = sm.metadata_id
        WHERE sm.entity_id = 'sensor.growatt_invpowertolocalload'
          AND DATE(last_updated_ts, 'unixepoch', 'localtime') = DATE('now', 'localtime', '-1 day')
          AND s.state NOT IN ('unknown', 'unavailable', '')
          AND CAST(s.state AS FLOAT) BETWEEN 0 AND 15000
      ) sub
      WHERE interval_wh IS NOT NULL
      GROUP BY hour, minute
    ),
    twodaysago_integrated AS (
      SELECT
        hour || ':' || minute as time_slot,
        SUM(interval_wh) / 1000.0 as kwh
      FROM (
        SELECT
          strftime('%H', last_updated_ts, 'unixepoch', 'localtime') as hour,
          CASE
            WHEN CAST(strftime('%M', last_updated_ts, 'unixepoch', 'localtime') AS INTEGER) < 30
            THEN '00' ELSE '30'
          END as minute,
          ((CAST(state AS FLOAT) + LAG(CAST(state AS FLOAT)) OVER (ORDER BY last_updated_ts)) / 2.0) *
          ((last_updated_ts - LAG(last_updated_ts) OVER (ORDER BY last_updated_ts)) / 3600.0) as interval_wh
        FROM states s
        INNER JOIN states_meta sm ON s.metadata_id = sm.metadata_id
        WHERE sm.entity_id = 'sensor.growatt_invpowertolocalload'
          AND DATE(last_updated_ts, 'unixepoch', 'localtime') = DATE('now', 'localtime', '-2 days')
          AND s.state NOT IN ('unknown', 'unavailable', '')
          AND CAST(s.state AS FLOAT) BETWEEN 0 AND 15000
      ) sub
      WHERE interval_wh IS NOT NULL
      GROUP BY hour, minute
    ),
    blended AS (
      SELECT
        dw.dow, dw.time_slot,
        CASE
          WHEN yp.kwh IS NOT NULL AND tp.kwh IS NOT NULL THEN
            ROUND(dw.dow_avg_kwh * 0.65 + yp.kwh * 0.25 + tp.kwh * 0.10, 4)
          WHEN yp.kwh IS NOT NULL THEN
            ROUND(dw.dow_avg_kwh * 0.75 + yp.kwh * 0.25, 4)
          WHEN tp.kwh IS NOT NULL THEN
            ROUND(dw.dow_avg_kwh * 0.90 + tp.kwh * 0.10, 4)
          ELSE dw.dow_avg_kwh
        END as avg_kwh_per_slot
      FROM dow_weighted dw
      LEFT JOIN yesterday_integrated yp ON dw.time_slot = yp.time_slot
      LEFT JOIN twodaysago_integrated tp ON dw.time_slot = tp.time_slot
    )
    SELECT
      '{' ||
      GROUP_CONCAT(
        '"' || dow || '_' || time_slot || '":' || ROUND(avg_kwh_per_slot, 4),
        ','
      ) ||
      '}' as result
    FROM blended
    ORDER BY dow, time_slot;
  column: "result"

https://blog.cyplo.dev/feed.xml