Mysql – How do MySQL indexes work

indexingMySQL

I am really interested in how MySQL indexes work, more specifically, how can they return the data requested without scanning the entire table?

It's off-topic, I know, but if there is someone who could explain this to me in detail, I would be very, very thankful.

Best Answer

Basically an index on a table works like an index in a book (that's where the name came from):

Let's say you have a book about databases and you want to find some information about, say, storage. Without an index (assuming no other aid, such as a table of contents) you'd have to go through the pages one by one, until you found the topic (that's a full table scan). On the other hand, an index has a list of keywords, so you'd consult the index and see that storage is mentioned on pages 113-120,231 and 354. Then you could flip to those pages directly, without searching (that's a search with an index, somewhat faster).

Of course, how useful the index will be, depends on many things - a few examples, using the simile above:

if you had a book on databases and indexed the word "database", you'd see that it's mentioned on pages 1-59,61-290, and 292 to 400. In such case, the index is not much help and it might be faster to go through the pages one by one (in a database, this is "poor selectivity").
For a 10-page book, it makes no sense to make an index, as you may end up with a 10-page book prefixed by a 5-page index, which is just silly - just scan the 10 pages and be done with it.
The index also needs to be useful - there's generally no point to index e.g. the frequency of the letter "L" per page.

Correctly setting up the connection

Note that when using PDO to access a MySQL database real prepared statements are not used by default. To fix this you have to disable the emulation of prepared statements. An example of creating a connection using PDO is:

$dbConnection = new PDO('mysql:dbname=dbtest;host=127.0.0.1;charset=utf8', 'user', 'password');

$dbConnection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$dbConnection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

In the above example the error mode isn't strictly necessary, but it is advised to add it. This way the script will not stop with a Fatal Error when something goes wrong. And it gives the developer the chance to catch any error(s) which are thrown as PDOExceptions.

What is mandatory, however, is the first setAttribute() line, which tells PDO to disable emulated prepared statements and use real prepared statements. This makes sure the statement and the values aren't parsed by PHP before sending it to the MySQL server (giving a possible attacker no chance to inject malicious SQL).

Although you can set the charset in the options of the constructor, it's important to note that 'older' versions of PHP (before 5.3.6) silently ignored the charset parameter in the DSN.

Explanation

The SQL statement you pass to prepare is parsed and compiled by the database server. By specifying parameters (either a ? or a named parameter like :name in the example above) you tell the database engine where you want to filter on. Then when you call execute, the prepared statement is combined with the parameter values you specify.

The important thing here is that the parameter values are combined with the compiled statement, not an SQL string. SQL injection works by tricking the script into including malicious strings when it creates SQL to send to the database. So by sending the actual SQL separately from the parameters, you limit the risk of ending up with something you didn't intend.

Any parameters you send when using a prepared statement will just be treated as strings (although the database engine may do some optimization so parameters may end up as numbers too, of course). In the example above, if the $name variable contains 'Sarah'; DELETE FROM employees the result would simply be a search for the string "'Sarah'; DELETE FROM employees", and you will not end up with an empty table.

Another benefit of using prepared statements is that if you execute the same statement many times in the same session it will only be parsed and compiled once, giving you some speed gains.

Oh, and since you asked about how to do it for an insert, here's an example (using PDO):

$preparedStatement = $db->prepare('INSERT INTO table (column) VALUES (:column)');

$preparedStatement->execute([ 'column' => $unsafeValue ]);

Can prepared statements be used for dynamic queries?

While you can still use prepared statements for the query parameters, the structure of the dynamic query itself cannot be parametrized and certain query features cannot be parametrized.

For these specific scenarios, the best thing to do is use a whitelist filter that restricts the possible values.

// Value whitelist
// $dir can only be 'DESC', otherwise it will be 'ASC'
if (empty($dir) || $dir !== 'DESC') {
   $dir = 'ASC';
}

Sql-server – Multiple Indexes vs Multi-Column Indexes

I agree with Cade Roux.

This article should get you on the right track:

One thing to note, clustered indexes should have a unique key (an identity column I would recommend) as the first column. Basically it helps your data insert at the end of the index and not cause lots of disk IO and Page splits.

Secondly, if you are creating other indexes on your data and they are constructed cleverly they will be reused.

e.g. imagine you search a table on three columns

state, county, zip.

you sometimes search by state only.
you sometimes search by state and county.
you frequently search by state, county, zip.

Then an index with state, county, zip. will be used in all three of these searches.

If you search by zip alone quite a lot then the above index will not be used (by SQL Server anyway) as zip is the third part of that index and the query optimiser will not see that index as helpful.

You could then create an index on Zip alone that would be used in this instance.

By the way We can take advantage of the fact that with Multi-Column indexing the first index column is always usable for searching and when you search only by 'state' it is efficient but yet not as efficient as Single-Column index on 'state'

I guess the answer you are looking for is that it depends on your where clauses of your frequently used queries and also your group by's.

The article will help a lot. :-)

Best Answer

Related Solutions

Php – How to prevent SQL injection in PHP

Correctly setting up the connection

Explanation

Can prepared statements be used for dynamic queries?

Sql-server – Multiple Indexes vs Multi-Column Indexes

Related Topic