Database

Overview

Home Assistant stores history in four different ways:

  1. Real-time state1
  2. Short-term statistics2
  3. Long-term statistics2
  4. Events3

States

The real time state1 (every reported state change) is stored in the database, in three separate tables4:

  • state_metadata: Stores the entity_id of an entity.
  • state_attributes: Stores the attributes of a state.
  • states: Stores the state itself, and FOREIGN KEY references to state_attributes on attributes_id and to state_metadata on metadata_id.

This means that a state change that only changes the state of an entity will only update the states table, and a state change that only changes the attributes of an entity will only update the state_attributes table. A state change that changes both the attributes and state of an entity will update all three tables.

Long term statistics

Home Assistant keeps both long-term and short-term statistics2, with the short-term statistics being downsampled to 5 minute intervals, while the long-term statistics are downsampled to hourly aggregates.

The long-term statistics are stored in three different tables4:

  • statistics_meta: Data source metadata, such as unit_of_measurement and name.
  • statistics_short_term: The 5-minute aggregates of data in the states table.
  • statistics: Hourly aggregates of the data in states_short_term table.

Short-term statistics are purged along with states, but long-term statistics are never purged from the database. Since they only store hourly aggregates, only there are only 24 rows per day.

Storing long-term statistics of an entity

For Home Assistant to store the long-term statistics of an entity_id, the entity must have state_class set to one of the valid state classes5:

  • measurement: Measurements in current time, for example current energy consumption.
  • total: The total amount that can both increase and decrease.
  • total_increasing: Monotonically increasing value, that periodically restarts from 0, for example monthly energy consumption.

If state_class is set to measurement, then device_class must not be energy, gas or monetary.

Using MariaDB

I'm using MariaDB for my database instead of the default SQLite file, which is configured in the recorder integration6 as:

recorder:
  db_url: |
    mysql://${db_user}:@localhost/${db_name}?charset=utf8mb4&unix_socket=/run/mysqld/mysqld.sock
  auto_purge: true
  auto_repack: true

Connects as ${db_user} to the unix socket file in the standard location /run/mysqld/mysqld.sock, using 4-byte UTF-8 Unicode encoding.

Managing database size

To get the total size of the database (in MiB):

SELECT table_schema AS 'DB',
       ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) AS 'Size in MiB'
FROM information_schema.tables
WHERE table_schema='${db_name}';

The result is the size of data in the database schema ${db_name}, including the size used for indices.

It's useful to define as a sensor in Home Assistant with the sql sensor platform7, to get the database size as a sensor entity in Home Assistant:

sql:
  - name: mariadb_database_size
    query: |
      SELECT table_schema AS 'DB',
             ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) AS 'value'
      FROM information_schema.tables
      WHERE table_schema='${db_name}';
    # Column name from query result used as value for the sensor
    column: "value"
    device_class: data_size
    unit_of_measurement: MiB

The name of column configuration value needs to match the name that the returned column is SELECT'ed AS in the SQL statement.

Analyze usage

Get an overview of event_types stored in your database:

SELECT
  COUNT(*) as cnt,
  COUNT(*) * 100 / (SELECT COUNT(*) FROM events) AS cnt_pct,
  event_types.event_type
FROM events
INNER JOIN event_types
    ON events.event_type_id = event_types.event_type_id
GROUP BY event_types.event_type
ORDER BY cnt DESC;

Since events typically don't store as much data as states, these entries likely don't consume as much disk space. But they will give you insight into the size of your database.

Select entity_ids which have a large number of state changes recorded:

SELECT COUNT(*) AS count,
       COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS 'count_pct',
	   states_meta.entity_id
FROM states
INNER JOIN states_meta
    ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY count DESC
LIMIT 10;

That will return how many rows an entity_ids is using in the states table in your database as cnt, representing state changes. The attributes of an entity are also recorded on each state change, meaning that entities which record a lot of data using attributes can bloat the states table.

To inspect how many bytes are used to store state changes per entity_id:

SELECT COUNT(state_id) AS cnt,
       ROUND(SUM(LENGTH(state_attributes.shared_attrs)) / 1024 / 1024, 1) AS mbytes,
       states_meta.entity_id
FROM states
LEFT JOIN state_attributes
  ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta
  ON (states.metadata_id=states_meta.metadata_id)
GROUP BY states.metadata_id
ORDER BY mbytes DESC
LIMIT 10;

This will return how much space in MiB is used for each entity_id to store it's state and attributes as mbytes.

The JOIN in the statements aren't very efficient, so it will likely take some time for the queries to return the result. Ajust the LIMIT if you want a longer/shorter list.

Configuring recorder

Then you can use these results to configure the recorder integration6 to exclude entity_ids and events that are bloating your database.

recorder:
  exclude:
    # Exclude a whole domain
    domains: []
    # Exclude specific entity_ids
    entities:
      - sensor.time
      - sensor.last_boot
    # Use shell-like globs as a wildcard to exclude many
    entity_globs:
      - sensor.uptime_*
      - sensor.esphome_radar_*_still_energy
      - sensor.*_rx
      - sensor.*_tx
    # Exclude an event_type
    event_types:
      - call_service

Note that if you exclude an entity_id from being stored, it's history will not be recorded so you won't have any graphs for that entity_id. If you exclude an event_type, you won't have a history of those events.

Home Assistant will by default keep 10 days of short-term history in the database, unless you disable auto_purge.

recorder:
  auto_purge: true
  purge_keep_days: 10

Anything older than purge_keep_days will still be recorded as downsampled long-term statistics2, and are still visible in the usual charts (just as a darker line). This does not apply for any excluded entities though, as those are not recoreded at all.8

Note that it is no longer recommended to change purge_keep_days, as per the release notes for 2023.12.09 where the improved history charts (that include the long-term data) were introduced. Though it is not clear why that means the short-term retention period should be changed.

By default auto_repack is enabled, and it is recommended to not disable it. It runs every second sunday, after the auto_purge operation.

recorder:
  auto_repack: true

Using MariaDB this will optimize or recreate the events and states tables and can often greatly help keep their size under control. Note that this is a heavy operation, but safe because MariaDB won't actually commit the changes until it is finished. Instead it writes to temporary tables, and "swaps" them out for the old tables when it's done.

Using recorder services to trim the database

The recoder.purge_entities service can be used to prune entities.

service: recoder.purge_entities
data:
  domains: []
  entity_id:
   - sensor.last_boot
  entity_globs:
   - sensor.esphome_radar_*_still_energy
  keep_days: 1

You can set keep_days to keep short-term statistic in the database for some days if you don't want to completely remove it. This can be very useful in a script or automation if you want to keep a few days of short-term history for some entities instead of excluding them completely.

Similarly the recorder.purge service can execute a repacking-operation:

service: recorder.purge
data:
  repack: true

If you have just pruned a large amount of entity_ids from your databse, a repack operation can help free up some disk space. Keep in mind that this is a heavy operation

References