Note: The examples in this post are written for MSSQL. Syntax may differ for other SQL variants.
I was recently tasked with designing a SQL database for a new app. One of the requirements was to be able to store a category tree of unknown complexity and depth. It wasn't sufficient to have categories and subcategories; subcategories could also have subcategories, which could have subcategories, and so on. Each branch of the tree could have a different structure and continue to a different depth. An example tree would look something like this:
A nested List:
It was clearly not going to be possible to satisfy these requirements with simple 'category' and 'subcategory' tables. I needed to be able to describe all the category relationships in one self-referential table.
Designing the self-referential table
Relationships in a self-referential table are defined by two fields:
- id (primary key, uniqueidentifier, not nullable)
- parentId (foreign key, uniqueidentifier, nullable)
Note that 'parentId' is a foreign key that references the same table. If your table is called 'category', 'category.parentId' has a foreign key constraint to 'category.id'.
(You can also declare any other fields you might need to store the category name and other information. Those aren't relevant to this tip.)
When you insert a row into the table, you can set 'parentId' to 'null' to put the row at the top of the tree, or set 'parentId' to the 'id' of any other category to make the row a subcategory.
Selecting branches of the tree
Creating the tree is simple enough, but how do you select only the rows that are descendants of a given category id? For that, we can use a recursive common-table expression (recursive CTE).
The recursive CTE creates a virtual table ('[tree]') that is the union of the rows that have a given parent, the rows that have that first set of rows as parents, the rows that have that next set of rows as parents, and so on until there are no more descendent rows to fetch. We then select all rows from the virtual table:
We can modify the 'SELECT' statement to fetch only the bottom-most descendants of a given category, omitting the upper levels of the tree:
We can also invert the CTE to fetch categories that are ancestors of a given subcategory:
By modifying the 'SELECT' statement, we can find the top-most ancestor of any subcategory:
This combination of a self-referential table and recursive common-table expressions provided a great solution to my requirements. The model is straightforward, and easy to maintain, and the selection of entire branches can be performed in a single query.