What is the “(scanning)” variant of a loose index scan?

A query plan uses loose index scan if “Using index for group-by” appears in the “Extra”  column of the EXPLAIN output. In some plans though, “Using index for group-by (scanning)” appears. What does “(scanning)” mean and how is it different from the regular loose index scan?

Loose index scan is mainly used for the GROUP BY / DISTINCT optimization.

Without an index, processing a GROUP BY query would require a temporary table that stores one row per group.  With an index, which provides sorting order for GROUP BY columns,  all the entries of an index would normally be accessed. Loose index scan avoids accessing all the entries in an index and filters based on the prefix columns.

The optimized approach considers only a fraction of the keys in an index, so it is called a loose index scan. A loose index scan reduces query execution time (aggregate processing), by using the prefix columns in an index.

How Loose Index Scan works

Example

What happens internally

Indexes are ordered based on their keys. Loose index scan effectively jumps from one unique value (or set of values) to the next based on the index’s prefix keys. In the above query column “c1” is the prefix of the index “c1_c2_idx”. Loose index scan jumps to the unique values of the column “c1” (because grouping is on c1) and picks the lowest value of “c2”.

The below diagram is a representation of the index “c1_c2_idx”.  Note that INNODB extends each secondary index by appending the primary key columns to it. Only the first row in “group” of rows is returned to the server. The “jump” is based on the relevant prefix of the index.

To “jump” values in an index, we use the handler call: ha_index_read_map(...). This is effectively a random disk access to read the next unique value of the index based on the prefix. The important details here is that: “Determination of the next unique value is handled by storage engine.”

The below function calls ha_index_read_map(...) when is_index_scan is set to false.

A special type of Loose Index Scan

Normally a query with AGG(DISTINCT …), i.e. “SELECT [SUM|COUNT|AVG](DISTINCT…) …”, would require an index or table scan, a temporary table to filter the distinct values and then return the count of all such distinct values.

The “scanning” feature is an extension of loose index scan for faster execution of queries with AGG(DISTINCT …). For such queries the EXPLAIN output’s extra column will contain “Using index for group-by (scanning)”.

“(scanning)” indicates that loose index scan will perform an index scan instead of random disk access to jump to the next distinct value (using ha_index_read_map(…)) if the cost of using loose index scan is more than that of a regular index scan. Notice in the below example that the cost for the regular index scan (cost of 1.6219) is cheaper than that of the regular loose index scan (cost  of 1.75). Since the cost of filtering based on prefix in the storage engine is higher, all the rows are retrieved and filtered in the server. Hence the “(scanning)” variant.

 

In contrast to the previous type of loose index scan, the index “jump” isn’t done by the storage engine. See the code above, is_index_scan is set to true for this option. Each entry in the index is read (by calling ha_index_next(..)) and the executor determines the unique value of the index based on the prefix. Determination of the next unique value is handled by executor in key_cmp(..)

The above diagram illustrates that all the rows in the index are brought from the storage engine to the executor.

This was added as part of WL#3220 and isn’t a new feature. It has been around since Mysql-5.5.

3 thoughts on “What is the “(scanning)” variant of a loose index scan?

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter * Time limit is exhausted. Please reload CAPTCHA.