This change deprecates the old LayersIntroducingVulnerability for a new
one that orders output and contains an Index. This index is not
guaranteed to be consistent across multiple notifications, despite the
current Postgres implementation using the primary key of Layer table.
`searchNotificationAvailable` never effectively use any indexes because:
- `notified_at < $1`, where $1 is a recent timestamp, returns the
majority of the table and therefore it is cheaper for PostgreSQL
to use a sequential scan on the table.
- there is no index for `deleted_at IS NULL`.
However, when Clair has been running for long enough, the grand majority
of rows (99%+) are expected to have a non-NULL `deleted_at` field. This
commit adds a new index on this very field in order to fetch the
remaining 1% in the blink of an eye.
In other words, instead of realizing a full table scan for each
`searchNotificationAvailable` query, we'll use the small branch of a new
index, reducing the total cost from over 30k to a mere 150 on a Clair
database that has already managed more than 1 000 000 notifications.
This change enables the query planner to wait and sort the result set of
our query rather than attempting to re-use the layer table's index for
the ORDER BY clause. Because the result set is always small, this makes
queries that were previous tens of seconds, now tens of milliseconds.
This commit is issued in order to limit the bottleneck that the
exclusive database lock on Vulnerability_Affects_FeautreVersion
introduces, when we inserting FeatureVersions. This slowdowns a bit
the FeatureVersion insertion on a mostly empty database but should
increase a lot the throughput and parallelism on a populated database.
Our experiments have shown that PostgreSQL 9.4 makes bad
planning decisions about:
- joining the layer tree to feature versions and feature
- joining the feature versions to affected/fixed feature version and vulnerabilities
It would for instance do a merge join between affected feature versions (300 rows, estimated
3000 rows) and fixed in feature version (100k rows). In this case, it is much more
preferred to use a nested loop.