Recently, I had to deal with an issue on DMOJ, where a page displaying 100 out of around a million elements took over 10 seconds to load. Naturally, I started investigating the issue.
(Note: DMOJ uses MariaDB, but the same problem, as well as the eventual solution, should work the same on MySQL as well.)
The first course of action, of course, was to see what the database was trying
to do, by running an
EXPLAIN query. For those of you who don’t know,
if you have a query of the form
SELECT x FROM y WHERE z, running
EXPLAIN SELECT x FROM y WHERE z would show what the query is doing, without
actually executing the query.
A quick look at the
EXPLAIN output showed that MariaDB first did a filter on
a 2000 row table, and then joined in the table with a million elements.
Then, the 100 wanted rows were filtered out. This query plan was quite
A much better strategy would be to filter out the 100 rows we want, since all I
was doing was a descending sort on the
INTEGER PRIMARY KEY column, and then
selecting 100 rows from that. This could quickly be very quickly using the
primary key on the million-row table. Afterwards, we can join in the smaller
It is unclear why MariaDB chose the suboptimal query plan in this case, even though the million-row table was listed first in the list of joins.
Fortunately, there is a way to override MariaDB’s attempted cleverness.
To do this, we simply replace all occurrences of
INNER JOIN with
STRAIGHT_JOIN. This tells MariaDB to join tables in the order specified,
resulting in the desired query plan.
As DMOJ uses the Django ORM to do most queries, I wrote some code to convince
Django ORM to use
STRAIGHT_JOIN for certain queries:
class CacheDict(dict): def __init__(self, func): super(CacheDict, self).__init__() self.func = func def __missing__(self, key): self[key] = value = self.func(key) return value def make_straight_join_query(QueryType): class Query(QueryType): def join(self, join, reuse=None): alias = super(Query, self).join(join, reuse) join = self.alias_map[alias] if join.join_type == INNER: join.join_type = 'STRAIGHT_JOIN' return alias return Query straight_join_cache = CacheDict(make_straight_join_query) def use_straight_join(queryset): if connections[queryset.db].vendor != 'mysql': return queryset.query = queryset.query.clone(straight_join_cache[type(queryset.query)]) # Example: queryset = Model.objects.select_related('foreign_key_one', 'foreign_key_two') use_straight_join(queryset) # Now when using `queryset`, the actual query will use STRAIGHT_JOIN. # This is preserved even after further filtering and joining on the queryset.
I hope you find my article and code sample useful.