I have been working with relational database, NoSQL database and Hadoop HBase for years mainly for storing data for analysis. There is a fundamental problem for people still overlooking the importance of primary key in a table. In this article, I would like to go through the detailed information about primary key such as: definition, benefits, drawback, surrogate key, features, properties and example not to use a primary key.
For making a table, it is not required a Primary key. Unfortunately, it is indicating a terrible database design and most likely being created by developers / programmers. Without a primary key, it is difficult to uniquely identity rows in your table. Also, it is not possible to define / enforce foreign key constraints.
What is Primary Key?
A primary key is a column or columns containing values able to uniquely identify every row in a table. It is very important for maintaining the integrity during insertion, deleting and update of records.
Benefits of a Primary Key
There are 2 major benefits with using Primary Key:
- Uniquely identifies each row of a table
- Able to form a unique index helping faster data access
Drawback of a Primary Key
With the importance of the unique identifier, using primary key is a must. The only drawback is that the database system may spend time / resource to check data integrity during insertion, updating or deletion of record(s).
Surrogate Key
If there is no “value-based” unique identifier, an ID column automatically generated is a common solution for the primary key – called a surrogate key.
Features of the surrogate key:
There are several key features:
- automatically generated by the system
- holds anonymous integer
- contains unique value for all records of the table
- can never be modified by any user or program
- also called as factless key as no logical meaning for the unique number identifier
However, it is not suggested to have surrogate key at the beginning of table or database design. It is because there is no technical or business meaning for a surrogate key.
Properties and Rules of a Primary Key
The properties of each set of primary column(s) are:
- enforcing uniqueness and no duplicated values is allowed
- uniquely identifies every single column in a table
- 1 table 1 primary key
- No null value (empty) is allowed in a primary key
- A primary key includes multiple columns is called a composite primary key
- A primary key can be created during “CREATE TABLE” or “ALTER TABLE” DDL SQL statements.
Exception Case for Not Using a Primary Key
In my experience, there are Data Warehouses without using a primary key in order to allow faster insertion and loading of data during ETL (data integration) jobs. However, their records in tables are unique and may create an index with columns potentially becoming a Primary Key. This is a special case for avoiding the database engine to be slow-down by compromising integrity checking while loading bulky data.
Conclusion
For this article, I will not go through technical stuff like SQL statements or any other DDL operations. I would like to highlight the importance of primary key in a table. It is vital to ensure the data integrity with the use of primary key.