Java – Should I use composite primary keys or not

database-designjavalanguage-agnosticsql

There seems to only be 2nd class support for composite database keys in Java's JPA (via EmbeddedId or IdClass annotations). And when I read up on composite keys, regardless of language, people keep coming across as them being a bad thing. But I cannot understand why. Are composite keys still acceptable to use these days? If not, why not?

I've found one person who agrees with me:
http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx

But another who doesn't:
http://weblogs.java.net/blog/bleonard/archive/2006/11/using_composite.html

Is it just me, or are people not able to make the distinction of where a composite key is appropriate or not? I see composite primary keys useful when the table doesn't represent an entity – i.e. when it represents a join table.

A simple example:

Actor { Id, Name, Email } Movie { Id, Name, Year } Character { Id, Name } Role { Actor, Movie, Character }

Here Actor, Movie and Character obviously benefit from having an Id column as the primary key.

But Role is a Many-To-Many join table. I see no point in creating an id just to identify a row in the database. To me it seems obvious that the primary key is { Actor, Movie, Character }. It also seems like a rather limiting feature, especially if the data in the join table changes all the time, you could find yourself with primary key collisions once the primary key sequence wraps around to 0.

So, back to the original question, is it still acceptable practice to use composite primary keys? If not, why not?

Best Answer

In my personal opinion you should avoid composite primary keys due to several reasons:

Future changes: when you design a database you sometimes miss what in the future will become important. A significant example for this is thinking a combination of two or more fields is unique (and thus can become a primary key), whereas in the future you want to allow NULLs or other non-unique values in them. Having a single primary key is a good solid solution against such changes.
Uniformity: If every table has a unique numerical ID, and you also maintain some standard as to its name (e.g. "ID" or "tablename_id"), the code and SQL referring to it is clearer (in my opinion).

There are other reasons, but these are just a few.

The main question I would ask is why not use a separate primary key if you have a unique set of fields? What's the cost? An additional integer index? That's not too bad.

Hope that helps.

Related Solutions

Ruby-on-rails – Composite primary keys versus unique object ID field

Most of the commonly used engines (MS SQL Server, Oracle, DB2, MySQL, etc.) would not experience noticeable issues using a surrogate key system. Some may even experience a performance boost from the use of a surrogate, but performance issues are highly platform-specific.

In general terms, the natural key (and by extension, composite key) verses surrogate key debate has a long history with no likely “right answer” in sight.

The arguments for natural keys (singular or composite) usually include some the following:

1) They are already available in the data model. Most entities being modeled already include one or more attributes or combinations of attributes that meet the needs of a key for the purposes of creating relations. Adding an additional attribute to each table incorporates an unnecessary redundancy.

2) They eliminate the need for certain joins. For example, if you have customers with customer codes, and invoices with invoice numbers (both of which are "natural" keys), and you want to retrieve all the invoice numbers for a specific customer code, you can simply use "SELECT InvoiceNumber FROM Invoice WHERE CustomerCode = 'XYZ123'". In the classic surrogate key approach, the SQL would look something like this: "SELECT Invoice.InvoiceNumber FROM Invoice INNER JOIN Customer ON Invoice.CustomerID = Customer.CustomerID WHERE Customer.CustomerCode = 'XYZ123'".

3) They contribute to a more universally-applicable approach to data modeling. With natural keys, the same design can be used largely unchanged between different SQL engines. Many surrogate key approaches use specific SQL engine techniques for key generation, thus requiring more specialization of the data model to implement on different platforms.

Arguments for surrogate keys tend to revolve around issues that are SQL engine specific:

1) They enable easier changes to attributes when business requirements/rules change. This is because they allow the data attributes to be isolated to a single table. This is primarily an issue for SQL engines that do not efficiently implement standard SQL constructs such as DOMAINs. When an attribute is defined by a DOMAIN statement, changes to the attribute can be performed schema-wide using an ALTER DOMAIN statement. Different SQL engines have different performance characteristics for altering a domain, and some SQL engines do not implement DOMAINS at all, so data modelers compensate for these situations by adding surrogate keys to improve the ability to make changes to attributes.

2) They enable easier implementations of concurrency than natural keys. In the natural key case, if two users are concurrently working with the same information set, such as a customer row, and one of the users modifies the natural key value, then an update by the second user will fail because the customer code they are updating no longer exists in the database. In the surrogate key case, the update will process successfully because immutable ID values are used to identify the rows in the database, not mutable customer codes. However, it is not always desirable to allow the second update – if the customer code changed it is possible that the second user should not be allowed to proceed with their change because the actual “identity” of the row has changed – the second user may be updating the wrong row. Neither surrogate keys nor natural keys, by themselves, address this issue. Comprehensive concurrency solutions have to be addressed outside of the implementation of the key.

3) They perform better than natural keys. Performance is most directly affected by the SQL engine. The same database schema implemented on the same hardware using different SQL engines will often have dramatically different performance characteristics, due to the SQL engines data storage and retrieval mechanisms. Some SQL engines closely approximate flat-file systems, where data is actually stored redundantly when the same attribute, such as a Customer Code, appears in multiple places in the database schema. This redundant storage by the SQL engine can cause performance issues when changes need to be made to the data or schema. Other SQL engines provide a better separation between the data model and the storage/retrieval system, allowing for quicker changes of data and schema.

4) Surrogate keys function better with certain data access libraries and GUI frameworks. Due to the homogeneous nature of most surrogate key designs (example: all relational keys are integers), data access libraries, ORMs, and GUI frameworks can work with the information without needing special knowledge of the data. Natural keys, due to their heterogeneous nature (different data types, size etc.), do not work as well with automated or semi-automated toolkits and libraries. For specialized scenarios, such as embedded SQL databases, designing the database with a specific toolkit in mind may be acceptable. In other scenarios, databases are enterprise information resources, accessed concurrently by multiple platforms, applications, report systems, and devices, and therefore do not function as well when designed with a focus on any particular library or framework. In addition, databases designed to work with specific toolkits become a liability when the next great toolkit is introduced.

I tend to fall on the side of natural keys (obviously), but I am not fanatical about it. Due to the environment I work in, where any given database I help design may be used by a variety of applications, I use natural keys for the majority of the data modeling, and rarely introduce surrogates. However, I don’t go out of my way to try to re-implement existing databases that use surrogates. Surrogate-key systems work just fine – no need to change something that is already functioning well.

There are some excellent resources discussing the merits of each approach:

http://www.google.com/search?q=natural+key+surrogate+key

http://www.agiledata.org/essays/keys.html

http://www.informationweek.com/news/software/bi/201806814

Java – a serialVersionUID and why should I use it

The docs for java.io.Serializable are probably about as good an explanation as you'll get:

The serialization runtime associates with each serializable class a version number, called a serialVersionUID, which is used during deserialization to verify that the sender and receiver of a serialized object have loaded classes for that object that are compatible with respect to serialization. If the receiver has loaded a class for the object that has a different serialVersionUID than that of the corresponding sender's class, then deserialization will result in an InvalidClassException. A serializable class can declare its own serialVersionUID explicitly by declaring a field named serialVersionUID that must be static, final, and of type long:

ANY-ACCESS-MODIFIER static final long serialVersionUID = 42L;

If a serializable class does not explicitly declare a serialVersionUID, then the serialization runtime will calculate a default serialVersionUID value for that class based on various aspects of the class, as described in the Java(TM) Object Serialization Specification. However, it is strongly recommended that all serializable classes explicitly declare serialVersionUID values, since the default serialVersionUID computation is highly sensitive to class details that may vary depending on compiler implementations, and can thus result in unexpected InvalidClassExceptions during deserialization. Therefore, to guarantee a consistent serialVersionUID value across different java compiler implementations, a serializable class must declare an explicit serialVersionUID value. It is also strongly advised that explicit serialVersionUID declarations use the private modifier where possible, since such declarations apply only to the immediately declaring class — serialVersionUID fields are not useful as inherited members.

Best Answer

Related Solutions

Ruby-on-rails – Composite primary keys versus unique object ID field

Java – a serialVersionUID and why should I use it

Related Topic