I just added a Closure Table to Publications (posts and comments).

What is a Closure Table?

Closure Tables are one way of managing Trees in SQL. SQL doesn't represent trees well. The ideal 'theory' way is using recursive queries with an adjacency list. We're using PostgreSQL, and it supports recursive queries, but in our testing they were unusably slow. As in 10–30 seconds to query all the comments on a post.

There are many alternative, denormalised ways to represent trees. Closure Tables are one of them. Essentially, a Closure Table has a row for every ancestor and descendant of any entry. For example, if we have a comment chain 1 → 2 → 3, the Closure table has ancestor–descendant entries (1 2), (2 3), (1 3). You can see how this is duplicate data: why do we need (1 3) if we already have the data that 2 is 3's parent, and 1 is 2's parent?

Why Closure Tables?

Because they're fast. Querying the Closure Table for all the comments on a 100-comment post takes about 2.5 milliseconds.

But you keep ranting about duplicate data in the old Arc code being bane of your existence?

There are exceptions to every rule. It turns out, duplicate data, nightmarish as it is, is only a problem if your data is mutable. It will never make sense to reassign a comment to a different parent.

Right now, the Closure Table has been populated with all existing publications, and is being updated with each new comment and post, but it isn't being used yet. But it will be. This is a big step toward making it fast to directly query the SQL, and getting data out of memory. Which is one of our biggest goals, to both reduce hosting costs and make the site more stable.

PS If you were following the livecoding Saturday, this is what I was working on.

bhrgunatha:

I've always cheated and designed tables that hold tree like data to include a level number and the parent in the table itself rather than a separate table. You can cheat and take a sneaky look at the data without joins! Take that Codd!

It adds some space overhead but makes querying so much easier.

And gives easy indexing options to increase performance.


posted 2846 days ago