Optimize MySQL/MariaDB Queries with STRAIGHT_JOIN
Recently, I had to deal with an issue on DMOJ, where a page displaying 100 out of around a million elements took over 10 seconds to load. Naturally, I started investigating the issue.
(Note: DMOJ uses MariaDB, but the same problem, as well as the eventual solution, should work the same on MySQL as well.)
The first course of action, of course, was to see what the database was trying
to do, by running an EXPLAIN
query. For those of you who don’t know,
if you have a query of the form SELECT x FROM y WHERE z
, running
EXPLAIN SELECT x FROM y WHERE z
would show what the query is doing, without
actually executing the query.
A quick look at the EXPLAIN
output showed that MariaDB first did a filter on
a 2000 row table, and then joined in the table with a million elements.
Then, the 100 wanted rows were filtered out. This query plan was quite
horrifying.
A much better strategy would be to filter out the 100 rows we want, since all I
was doing was a descending sort on the INTEGER PRIMARY KEY
column, and then
selecting 100 rows from that. This could quickly be very quickly using the
primary key on the million-row table. Afterwards, we can join in the smaller
tables.
It is unclear why MariaDB chose the suboptimal query plan in this case, even though the million-row table was listed first in the list of joins.
Fortunately, there is a way to override MariaDB’s attempted cleverness.
To do this, we simply replace all occurrences of INNER JOIN
with
STRAIGHT_JOIN
. This tells MariaDB to join tables in the order specified,
resulting in the desired query plan.
As DMOJ uses the Django ORM to do most queries, I wrote some code to convince
Django ORM to use STRAIGHT_JOIN
for certain queries:
class CacheDict(dict):
def __init__(self, func):
super(CacheDict, self).__init__()
self.func = func
def __missing__(self, key):
self[key] = value = self.func(key)
return value
def make_straight_join_query(QueryType):
class Query(QueryType):
def join(self, join, reuse=None):
alias = super(Query, self).join(join, reuse)
join = self.alias_map[alias]
if join.join_type == INNER:
join.join_type = 'STRAIGHT_JOIN'
return alias
return Query
straight_join_cache = CacheDict(make_straight_join_query)
def use_straight_join(queryset):
if connections[queryset.db].vendor != 'mysql':
return
queryset.query = queryset.query.clone(straight_join_cache[type(queryset.query)])
# Example:
queryset = Model.objects.select_related('foreign_key_one', 'foreign_key_two')
use_straight_join(queryset)
# Now when using `queryset`, the actual query will use STRAIGHT_JOIN.
# This is preserved even after further filtering and joining on the queryset.
I hope you find my article and code sample useful.