SQL – many-to-many table primary key

many-to-manyprimary-keysql

This question comes up after reading a comment in this question:

When you create a many-to-many table, should you create a composite primary key on the two foreign key columns, or create a auto-increment surrogate "ID" primary key, and just put indexes on your two FK columns (and maybe a unique constraint)? What are the implications on performance for inserting new records/re-indexing in each case?

Basically, this:

PartDevice
----------
PartID (PK/FK)
DeviceID (PK/FK)

vs. this:

PartDevice
----------
ID (PK/auto-increment)
PartID (FK)
DeviceID (FK)

The commenter says:

making the two IDs the PK means the
table is physically sorted on the disk
in that order. So if we insert
(Part1/Device1), (Part1/Device2),
(Part2/Device3), then (Part 1/Device3)
the database will have to break the
table apart and insert the last one
between entries 2 and 3. For many
records, this becomes very problematic
as it involves shuffling hundreds,
thousands, or millions of records
every time one is added. By contrast,
an autoincrementing PK allows the new
records to be tacked on to the end.

The reason I'm asking is because I've always been inclined to do the composite primary key with no surrogate auto-increment column, but I'm not sure if the surrogate key is actually more performant.

Best Answer

With a simple two-column many-to-many mapping, I see no real advantage to having a surrogate key. Having a primary key on (col1,col2) is guaranteed unique (assuming your col1 and col2 values in the referenced tables are unique) and a separate index on (col2,col1) will catch those cases where the opposite order would execute faster. The surrogate is a waste of space.

You won't need indexes on the individual columns since the table should only ever be used to join the two referenced tables together.

That comment you refer to in the question is not worth the electrons it uses, in my opinion. It sounds like the author thinks the table is stored in an array rather than an extremely high performance balanced multi-way tree structure.

For a start, it's never necessary to store or get at the table sorted, just the index. And the index won't be stored sequentially, it'll be stored in an efficient manner to be able to be retrieved quickly.

In addition, the vast majority of database tables are read far more often than written. That makes anything you do on the select side far more relevant than anything on the insert side.

Correctly setting up the connection

Note that when using PDO to access a MySQL database real prepared statements are not used by default. To fix this you have to disable the emulation of prepared statements. An example of creating a connection using PDO is:

$dbConnection = new PDO('mysql:dbname=dbtest;host=127.0.0.1;charset=utf8', 'user', 'password');

$dbConnection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$dbConnection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

In the above example the error mode isn't strictly necessary, but it is advised to add it. This way the script will not stop with a Fatal Error when something goes wrong. And it gives the developer the chance to catch any error(s) which are thrown as PDOExceptions.

What is mandatory, however, is the first setAttribute() line, which tells PDO to disable emulated prepared statements and use real prepared statements. This makes sure the statement and the values aren't parsed by PHP before sending it to the MySQL server (giving a possible attacker no chance to inject malicious SQL).

Although you can set the charset in the options of the constructor, it's important to note that 'older' versions of PHP (before 5.3.6) silently ignored the charset parameter in the DSN.

Explanation

The SQL statement you pass to prepare is parsed and compiled by the database server. By specifying parameters (either a ? or a named parameter like :name in the example above) you tell the database engine where you want to filter on. Then when you call execute, the prepared statement is combined with the parameter values you specify.

The important thing here is that the parameter values are combined with the compiled statement, not an SQL string. SQL injection works by tricking the script into including malicious strings when it creates SQL to send to the database. So by sending the actual SQL separately from the parameters, you limit the risk of ending up with something you didn't intend.

Any parameters you send when using a prepared statement will just be treated as strings (although the database engine may do some optimization so parameters may end up as numbers too, of course). In the example above, if the $name variable contains 'Sarah'; DELETE FROM employees the result would simply be a search for the string "'Sarah'; DELETE FROM employees", and you will not end up with an empty table.

Another benefit of using prepared statements is that if you execute the same statement many times in the same session it will only be parsed and compiled once, giving you some speed gains.

Oh, and since you asked about how to do it for an insert, here's an example (using PDO):

$preparedStatement = $db->prepare('INSERT INTO table (column) VALUES (:column)');

$preparedStatement->execute([ 'column' => $unsafeValue ]);

Can prepared statements be used for dynamic queries?

While you can still use prepared statements for the query parameters, the structure of the dynamic query itself cannot be parametrized and certain query features cannot be parametrized.

For these specific scenarios, the best thing to do is use a whitelist filter that restricts the possible values.

// Value whitelist
// $dir can only be 'DESC', otherwise it will be 'ASC'
if (empty($dir) || $dir !== 'DESC') {
   $dir = 'ASC';
}

Sql – Add a column with a default value to an existing table in SQL Server

Syntax:

ALTER TABLE {TABLENAME} 
ADD {COLUMNNAME} {TYPE} {NULL|NOT NULL} 
CONSTRAINT {CONSTRAINT_NAME} DEFAULT {DEFAULT_VALUE}
WITH VALUES

Example:

ALTER TABLE SomeTable
        ADD SomeCol Bit NULL --Or NOT NULL.
 CONSTRAINT D_SomeTable_SomeCol --When Omitted a Default-Constraint Name is autogenerated.
    DEFAULT (0)--Optional Default-Constraint.
WITH VALUES --Add if Column is Nullable and you want the Default Value for Existing Records.

Notes:

Optional Constraint Name:
If you leave out CONSTRAINT D_SomeTable_SomeCol then SQL Server will autogenerate
a Default-Contraint with a funny Name like: DF__SomeTa__SomeC__4FB7FEF6

Optional With-Values Statement:
The WITH VALUES is only needed when your Column is Nullable
and you want the Default Value used for Existing Records.
If your Column is NOT NULL, then it will automatically use the Default Value
for all Existing Records, whether you specify WITH VALUES or not.

How Inserts work with a Default-Constraint:
If you insert a Record into SomeTable and do not Specify SomeCol's value, then it will Default to 0.
If you insert a Record and Specify SomeCol's value as NULL (and your column allows nulls),
then the Default-Constraint will not be used and NULL will be inserted as the Value.

Notes were based on everyone's great feedback below.
Special Thanks to:
@Yatrix, @WalterStabosz, @YahooSerious, and @StackMan for their Comments.