MySQL – EXPLAIN Yourself
November 13, 2008 by Matthew Boehm, Sr. Database Administrator in Tech Stuff, The Planet, Tips and Tricks
Let’s face it — we’ve all seen some bad code in our time. During my eight-plus years of using PHP, I’ve inherited several legacy systems, and when reading over the previous programmer’s code, I often sit back and say, “What the heck was this guy thinking?”
Combine a programmer’s bad code with his or her lack of understanding about how databases work, and you’ve got one inefficient, unresponsive Web site.
The best example of this that I’ve encountered was one programmer’s attempt simply to get the number of rows in a table with a particular column value. This was his solution:
$rowCount = 0;
$res = mysql_query("SELECT * FROM users_activity");
while($row = mysql_fetch_row($res))
{
if($row[‘activity] == ‘LOGIN')
{
$rowCount++;
}
}
I hope all of you are cringing as badly as I did when I came across it the first time. For those of you that don’t understand why this is horribly inefficient, I’ll EXPLAIN.
MySQL has an often overlooked but extremely powerful built-in query analyzer that allows you to take any SELECT SQL statement, preface it with the keyword EXPLAIN, and get an analysis of the query with lots of information on how to optimize it.
Let’s use the above SQL statement on a table with over 4 million rows. Here’s the schema:
CREATE TABLE `users_activity` (
`userId` int(11) NOT NULL,
`activity` varchar(30) NOT NULL DEFAULT 'LOGIN',
`notes` varchar(200) DEFAULT NULL,
`timestamp` datetime NOT NULL DEFAULT '1970-01-01 00:00:00',
KEY `userId_2` (`userId`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The first concern we have is that the query has no WHERE clause. Not having this clause requires MySQL to scan the entire table (full_table_scan), which usually involves reading the entire table from disk into memory, and if you only need to do this once, you just wasted all that memory. With EXPLAIN, MySQL can show us exactly how many rows it checks:
>EXPLAIN SELECT * FROM users_activity
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: users_activity
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 4060232
Extra:
MySQL will have to examine 4,060,232 rows to satisfy this query. Let’s add a WHERE clause and see how that helps:
>EXPLAIN SELECT * FROM users_activity WHERE activity = 'LOGIN'
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: users_activity
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 4060232
Extra: Using where
Hmm … Well, that didn’t change the number of rows evaluated, did it? The query optimizer still had to look at all the rows, one by one, to determine if the activity column was equal to ‘LOGIN’. Even with our WHERE clause, this query is still almost as bad as the original author’s PHP code. Why did MySQL have to do this? To answer that, take a look at the NULL value returned for possible_keys. This result shows that MySQL was unable to find an index for the column to help narrow down the search. Let’s add an index and see what happens:
>ALTER TABLE `users_activity` ADD INDEX (`activity`);
>EXPLAIN SELECT * FROM users_activity WHERE activity = 'LOGIN'
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: users_activity
type: ref
possible_keys: activity
key: activity
key_len: 92
ref: const
rows: 422436
Extra: Using where
Very nice. We immediately reduced the number of rows that MySQL has to examine from 4,060,232 to 422,436. An 89 percent reduction in virtually no time at all.
This is just one of many scenarios in which using EXPLAIN can help you write better SQL statements. You can turn on the log_queries_not_using_indexes option to find out which SQL statements are not using indexes. By using this log and EXPLAIN, you can enhance the performance of your MySQL server by leaps and bounds.
For further information on EXPLAIN, visit http://dev.mysql.com/doc/refman/5.1/en/using-explain.html
-Matthew













November 15th, 2008 at 9:36 pm
[...] The Planet Blog » Blog Archive » MySQL – EXPLAIN Yourself (tags: MySQL EXPLAIN) [...]
November 18th, 2008 at 11:24 am
Awesome post, Matthew!
November 18th, 2008 at 11:27 am
i had no idea you could run such a simple command to track down such a glaring inefficiency. for a little more context, is the improvement in performance very noticeable on the user side?
December 4th, 2008 at 11:56 am
[...] http://blog.theplanet.com/2008/11/13/mysql-explain-yourself/ Read More Post a Comment [...]
December 4th, 2008 at 1:56 pm
Nice post. I wasn’t aware of the EXPLAIN statement, or (and I’m ashamed to admit this) the benefit of adding non-primary indexes. I’ll be sure to take this into consideration for my future projects. Thanks!
June 10th, 2009 at 5:13 pm
I get that this is a post about using EXPLAIN, but EXPLAIN doesn’t show you that the better query would be “SELECT count(*) FROM user_activity WHERE activity=’Login’”.
Your code is cleaner and you don’t waste all that network bandwidth moving useless bytes from the database to the application.
ALSO, selecting “*” from that table means that the query must load all the table data… if you only selected the ‘activity’ column, you’d find that the index itself can fulfill the entire query: the table data itself doesn’t have to be consulted.
EXPLAIN can only help optimize what MySQL does with the query you give to it. If you’re giving it a dumb query, it can’t really help you, now can it?
June 11th, 2009 at 8:49 am
AaronA:
Yes, the improvement in performance is noticed everywhere. I’ve seen queries that originally took several seconds go down to sub-second once a couple indexes where added. The indexes, usually kept in memory, keep MySQL from having to access the disk which we all know is the slowest piece in a server.
Chris:
I agree with you. If I really just wanted a raw count, I would have done a COUNT(*). However, that wasn’t the point of the article and that was the super-simple query I thought of to illustrate my point of the indexes.
Also agreed, that if you give a dumb query it can help “somewhat.” I hope you’ll agree that an optimized “dumb query” is still better than a “dumb query” that does full-table scans or uses a disk-temp-table.