Dealing With De-normalized Data in SQL Server

Introduction

I have spent a fair amount of time building data archival systems. This entails mapping and migrating data from older legacy systems into a more modern and lower cost DBMS like Oracle or SQL Server. Since these are archival systems the data comes over and the integrity and makeup of the data structures largely get preserved as they were on the legacy source system. De-normalized data is rather common in older systems, especially those that were done in Cobol. For good or bad it is what it is.

Typically, my team and I also have to build front end user applications to access this data. If you have every tried to map de-normalized data into a grid control for a user interface you quickly come to realize the processing fun and games you have to contend with. This little ditty is about how to normalize, de-normalized data without having a) to physically modify the table data itself, b) end up performing complex processing in your code to make it grid ready. Keep in mind, this is just one technique. There are many.

A simple example is in order. I have hardcoded the where clause for simplification. One certainly would use this technique using variables. Even though I have demonstrated the technique using a vanilla select statement, one could also accomplish this using a stored procedure.

Legacy System De-normalized Data


JTS                 Acct   Fac   Date1       Descr1    Amt1    Date2       Descr2     Amt2  
-------------------------------------------------------------------------------------------
211381428873996900  95137  PMH   2010-10-05  DED APPL  321.00  2010-10-05  CRED APPL  -21.37      
211381428874000815  95137  PMH   2010-11-14  DED APPL   61.00  2010-11-14  OFFSET     -61.00       
211381428874877321  95137  PMH   2011-08-12  BAL APPL   51.60  2011-08-10  BAL ADJ     00.60   


As you can see there are essentially two discrete rows of data for each row. The following is our query that produced the recordset.

  SELECT 
  jts as 'JTS',
  acctnum as 'Acct',
  facility as 'Fac',
  convert(varchar(10),entrydate1,20) as 'Date1',
  descr1 as 'Descr1',
  amt1 as 'Amt1',
  date2 as 'Date2',
  descr2 as 'Descr2',
  amt2 as 'Amt2'
  FROM CHARGES
  WHERE  acctnum = 95137

We desire for our output is something that looks like this. The recordset below conforms to a grid control much better than the previous recordset depicted.

Acct   Fac  Date        Descr       Amt 
----------- ----------------------------------
95137  PMH  2010-10-05  DED APPL    321.00     
95137  PMH  2010-10-05  CRED APPL   -21.37    
95137  PMH  2010-11-14  DED APPL     61.00      
95137  PMH  2010-11-14  OFFSET      -61.00     
95137  PMH  2011-08-12  BAL APPL     51.60     
95137  PMH  2011-08-10  BAL ADJ      00.60 

But, how do we achieve this type of output?

Well, if we analyze the first query and output we can quickly determine that the Date, Description, and Amount columns are the same and are essentially being replicated. One does not want to assume by the data and column names alone that they are identical. You also need to take a look at the table schema to make sure the data type are the same as well. Never assume anything when working with legacy data.

In this particular example (which by the way is a scaled down, much simplified example of a real problem I had to deal with on a recent project), the best means for dealing with this pattern of de-normalization is through creating two views. Let’s jump straight to the code.

SQL Views as a Means of Dealing with De-normalized Data

SQL View Code

  DROP VIEW
  chg_amt1
  GO
  CREATE VIEW chg_amt1 as
  SELECT
  jts as 'JTS'
  acctnum as 'Acct',
  facility as 'Fac',
  convert(varchar(10),entrydate1,20) as 'Date',
  descr1 as 'Descr',
  amt1 as 'Amt',
  FROM CHARGES 
  GO 
  
  DROP VIEW
  chg_amt1
  GO
  USE VIEW chg_amt2 as
  SELECT
  jts as 'JTS'
  acctnum as 'Acct',
  facility as 'Fac',
  convert(varchar(10),entrydate2,20) as 'Date',
  descr2 as 'Descr',
  amt2 as 'Amt',
  FROM CHARGES 
  GO 
  

Code Discussion

First, I like to add DROP VIEW code at the top of my view. I seldom get my view dialed in on the first try. Since dropping a view has no adverse effect on the data itself it makes my life easier to make and apply changes.

Next, if you analyze the two views you will notice that I have broken out the de-normalized data columns to their respective view. Also notice that for these columns I have used the exact same alias names. This is very important.

Finally, you will note that neither view offers no row restriction. One would not want to run these views by themselves against a large table. Essentially, the two views are the same except for the three referenced de-normalized columns. Also note that both views have the same number of columns and that the column order is the same between the views.

What next? ...

A Union Query to Deal With De-normalized Data

Now that we have created our two views all that remains is to create an SQL query using a UNION that will allow us to normalize our de-normalized data.

The Code

  SELECT  *
  from chg_amt1
  where chg_amt1.acctnum = 95137
  UNION
  SELECT *  
  from chg_amt2
  WHERE chg_amt2.acctnum = 95137
  ORDER BY jts
  

Code Discussion

As you can see in the code all we have are two simple select statements joined together by a SQL UNION. This simple technique will return the desired recordset.

Acct   Fac  Date        Descr       Amt 
----------- ----------------------------------
95137  PMH  2010-10-05  DED APPL    321.00     
95137  PMH  2010-10-05  CRED APPL   -21.37    
95137  PMH  2010-11-14  DED APPL     61.00      
95137  PMH  2010-11-14  OFFSET      -61.00     
95137  PMH  2011-08-12  BAL APPL     51.60     
95137  PMH  2011-08-10  BAL ADJ      00.60 

It should be noted that the ORDER BY clause is required at the end of the second SELECT. Also important to note, you must use * after the select. While it is possible to choose specific columns when referencing a view it is not possible to do so using the UNION. What happens under the hood (at 50,000 feet) is that the query executor runs each query pretty much the same as if you were to run them separately yourself. It then collates the two recordsets into a single recordset based upon the ORDER BY criteria. This is why it is important that when doing a union the two queries have the same number of columns, the alias names are the same, and ordinal between the two queries the data types line up (if the physically do not, it is possible to CAST or CONVERT the column to get it to conform.

Final Words and Parting Gifts

As I said above, this is only one way to deal with de-normalized data. The example presented is appropriate for the technique I just used. If you have a similar problem you now have a solution. There are many scenarios for dealing with de-normalized data and many different approaches and solutions. This technique will serve you if you need to deal with…

This is not the be-all end all example. Just as there are numerous types of saw, so there are numerous means for dealing with de-normalized data.

 Collapse All    Expand All
Custom Search

SQL Server Information and Resources