Conference Schedule - PostgresOpen 2017Back
How Postgres Could Index Itself
In 2006, Jeff Atwood asked "Why Can't Database Tables Index Themselves?". Fast forward 11 years, and we're not much closer. This talk will explore what an automatic indexing approach might look like. It's built around the approach PgHero takes to suggest indexes:
Get the most time-consuming queries from pg_stat_statements.
Parse queries and look for specific patterns (the simplest place to start is a query for a single table with a
WHEREclause that consists of only
IS NOT NULLand/or an
Use the pg_stats view to get estimates about distinct rows and percent of
NULLvalues for each column.
For each column in the
WHEREclause, sort by the highest cardinality (most unique values). This allows the database to narrow its search the fastest. Perform row estimation to get the expected number of rows as we add columns to the index.
Continue this process with columns in the
To make sure we don’t add useless columns, stop once we narrow it down to 50 rows in steps 5 or 6. Also, recheck the last columns to make sure they add value.
This talk will also look at shortcomings of this approach and ways to improve it, and will include the trade-offs that must be considered (read performance vs write performance vs space).