Conference Schedule - PostgresOpen 2017


How Postgres Could Index Itself

Level: Intermediate

In 2006, Jeff Atwood asked "Why Can't Database Tables Index Themselves?". Fast forward 11 years, and we're not much closer. This talk will explore what an automatic indexing approach might look like. It's built around the approach PgHero takes to suggest indexes:

  1. Get the most time-consuming queries from pg_stat_statements.

  2. Parse queries and look for specific patterns (the simplest place to start is a query for a single table with a WHERE clause that consists of only =, IN, IS NULL or IS NOT NULL and/or an ORDER BY clause).

  3. Use the pg_stats view to get estimates about distinct rows and percent of NULL values for each column.

  4. For each column in the WHERE clause, sort by the highest cardinality (most unique values). This allows the database to narrow its search the fastest. Perform row estimation to get the expected number of rows as we add columns to the index.

  5. Continue this process with columns in the ORDER BY clause.

  6. To make sure we don’t add useless columns, stop once we narrow it down to 50 rows in steps 5 or 6. Also, recheck the last columns to make sure they add value.

This talk will also look at shortcomings of this approach and ways to improve it, and will include the trade-offs that must be considered (read performance vs write performance vs space).


Andrew Kane