Data Normalization

First a slight general critique on data normalization. The definitions of normalization tend to be vague and therefore subject to widespread differences as how to interpret them.
In the mid 90’s all the way through until the early 2000’s one had to sheepishly defend why there data base was not at least third normal form. First, before I commit grave sins against the relational establishment I want to say that I am a huge fan and advocate of relational design…

…that is when the problem we are trying to solve is relational.

Some problems require a hierarchical solution, some require an aggregate solution. Even back in the day when one was considered a heretic from the RDBMS community for breaking normal form, I would always counter and pose the following question back to the zealots…

“Please model for me in normal form the balance for my checking account.”

After major hrrrrmph’s and gnashing of teeth and stammering they would usually stomp off in search of their next victim.

I said above that the definition of normal forms were subject to interpretation. They are, and let me explain why. Consider first normal form. C. J. Date, one of the fathers of relational databases and a colleague of Codd describes first normal form as follows:

“a relation is said to be in first normal form (abbreviated 1NF) if and only if it satisfies the condition that it contains scalar values only.”1

For those of you familiar with the term scalar Webster’s defines scalar as “capable of being represented by a point on a scale.”

Not a great definition on first normal form, so I will add the one from Wikipedia.
“A relation is in first normal form if the domain of each attribute contains only atomic values”

Then to my great amusement proceed to offer up the following dataset as being in 1NF.

Customer ID Telephone Number
1 555-861-2025
2 555-403-1659
3 555-776-4100
4 555-808-9633


Hmmmmmm…. Scalar? Atomic? I believe the phone number is not atomic. The first three digits represent the area code. The next three represent the exchange, and the last four the extension.

Now we would have quite an argument about whether a single phone number needs to be spread across three separate fields in a table. I am in the camp that placing it in a single field is just fine.

I do apologize for the rant. I really do love relational design when solving a problem that is relational in nature. One of my favorite computer science quotes is “when all you have is a hammer everything looks like a nail.”

1 An Introduction To Database Systems. C.J. Date 6th Edition Addison-Wesley. page 289