We have changed the default table type (or access method) to be columnar. Now that columnar tables support updates, deletes, and vacuuming it is ready to be the default. This also addresses user feedback that many users assumed that any table created using Hydra would be columnar.
If you wish to use a traditional row-based table, also known as a heap table, you will need to use
USING heap at the end of a
CREATE TABLE statement, or convert the table after creation:
-- create a new row-based table CREATE TABLE new_table (...) USING heap; -- convert a columnar table to row format SELECT columnar.alter_table_set_access_method('table_name', 'heap');
This will also affect importing data from an existing Postgres database. For details, see our documentation on importing from Postgres.
Our implementation of updates and deletes still respects our append-only format. As a result, tables will grow in size as they are updated, and deleted data remains on disk. The only option was use
VACUUM FULL in order to compact your columnar tables, which was time consuming and locked the table for the duration of the operation.
In order to address this, we have added incremental vacuuming. To use the incremental vacuum, you call a function as follows:
SELECT columnar.vacuum('table', 25);
In this example,
25 is the maximum number of stripes that will be rewritten during vacuum. The table is locked for the duration of the operation, and returns the number of stripes modified. If the function returns 0, then there are no stripes remaining to vacuum. Note that a stripe must be contain at least 20% updated or deleted rows to be a candidate for vacuum.
To automate this further, we wrote a convenient function that will vacuum all of your columnar tables:
SELECT columnar.vacuum_all('public', 0.1, 25);
This will vacuum all columnar tables in the
public schema, 25 stripes at a time, pausing 0.1 seconds between each vacuum. We recommend scheduling this at a time of low activity of your database with Hydra Scheduler, or after performing large updates and deletes to your tables.
Auto-vacuum automatically calls
VACUUM periodically based on the number of changes to a given table. Hydra now supports the auto-vacuum statistics as well as implementing a basic
VACUUM for columnar tables. Unlike incremental vacuum, this “quick vacuum” looks at the most recent stripes added to the table and rewrites them into a single stripe. You can also call
VACUUM manually to perform this operation.
In our testing, we found that certain queries accessed the same data repeatedly, particularly when performing some types of joins. Since columnar data is compressed, this caused data to be uncompressed repeatedly, greatly slowing the performance of these queries. To solve this, we have added an in-memory column cache.
This cache is enabled by default on Hydra Cloud where we know the specs of the instance, but disabled by default in open source Hydra. By default, the cache is 200 MB per process. You can enable and modify the cache size as follows: