Making Data Access as Difficult as We Can

We are sometimes made to feel not-so-technical when we provide simple solutions to problems. I think the most brilliant solutions are the ones that take complex problems and encapsulate them in solutions so simple we take them for granted.

When it comes to database access the "great minds" of the time thought it would be a swell idea to marry object-oriented applications with relational databases. I remember this vividly as my company was moving its application infrastructure from Cobol/Tal to Java and its database from Enscribe to SQL.

For those of you that are not familiar with H-P NonStop (formerly Tandem) it is a scalable parallel, “shared-nothing” platform. The Enscribe database is, for the most part, relational (think relational without FK constraints). But what is interesting about Enscribe is that the schema definition allows for lexical scoping. In other words, from a schema perspective, it is hierarchical in the respect that one can create logical, nested groupings (fields) that contain physical fields. It also allows you to create a data-type lens across a physical data block (in bytes within the record).

I remember teaching engineering classes in Java, in our target architecture, and in SQL when I would frequently comment on how odd it was that the world was moving to object-oriented application design but also had chosen the hierarchically flat SQL. Sorry, but the two just do not mate well with one another and everyone in the industry that has to write mappers knows this.

One of the founders of our company even said at an all hands meeting (and I quote) "It is not clear why we are moving to SQL other than the market is dictating us to." At the time I remember being appalled by this message, however, it was a message in hindsight that was "spot-on."

In technology we sometimes chase trends and listen to the soothsayers and gurus of the day as they lead us out of or into the wilderness. When it comes to data and databases we have our share of achilles heels. One in particular is how applications do IO. Pick a language – Java, C++, Perl, Cobol, C#, Php… reading and writing to the database is a coding “soup sandwich.” The general solution to this has been to build what the industry calls connectors which generally take something that is inherently convoluted and wrap it all up in yet another convoluted package.

If you are an application developer or a development DBA I want you to think about what I am about to say. Think about it long and hard. When it comes to IO, at the core, there are really only two things one can do.

  • Read
  • Write

And as far as "write" goes we get granular by a factor of three.

  • Add new
  • Update
  • Delete (physically)

So, at most we have four things to deal with. That is pretty simple. So why is there so much developers have to do in their code before they even get to these four simple things (create, read, update, delete)?

When I created my open source WAMP/LAMP framework named zenyan I set out to encapsulate the IO layer and provide a simple interface to the API user. This contract consists of two methods. One that writes to the database. You tell me which one of three write statements you want to perform by providing the DML (it can be easily extended to support stored procedures) and we will take care of everything else for you. In fact, back in 2002 the first "real" java code I wrote was a lightweight data access API that encapsulated all this complexity and exposed two methods for the user. It was actually the first Java program I wrote when my OOP light bulb finally turned on a little brighter and I thought to myself, "hey, I think I finally get encapsulation.”

In terms of that older Java API and my data access API in zenyan, when it comes to reading from the database you tell us what you want and also tell us how you would like the data returned to you. A hash (key/value pairs... as in json enabled), raw, delimited, array, tabular. This API can even be extended to provide soap, xml, or whatever format you want to add.

The whole point behind this strategy is to marry the resultant data set with what is read from the database. Key/Value pairs line up well with web forms. Tabular lines up with grid controls. Even though SQL does not inherently mate with object-oriented design and typical user interface patterns, it really is not that hard to take a record set, encapsulate the complexity and mate things up in a more natural way.

It always has bothered me that the application developer is the one that is responsible for formatting the data. We have design patterns for everything else. Why not design patterns for formatting data returned from the database? This is what zenyan does. The concept is simple and I would encourage everyone to adopt it. Making software architectures simple should always be our mantra.

Baby and the Bathwater

Today, we have this dichotomy between "Big Data" and RDBMS. Personally, I see a co-existence between the two. Why can't we marry these two technologies into the application space? As far as the end user is concerned, what does it matter where the data is coming from as long as it makes it to/from the database or databases? This may sound like a radical idea, but why can't we encapsulate noSQL data stores like mongoDB with RDBMS's like mySQL, Oracle, DB2, or SQL Server?

I believe we can. I am currently working on v0.98 of zenyan. This build is all about taking our data access API and being able to integrate reading and writing to mongoDB or Hadoop. I am currently a neophtye with mongoDB, but I believe my simpleton IO philosophy/strategy lends itself well to this integration. The devil is always in the details so we shall see.

In any case, I would encourage you to take the ideas I have presented and utilize them in your architectures. being able to encapsulate noSQL and SQL databases within a single application allows us to build applications that offer the best of both worlds and allow us to scale. While noSQL databases are not inherently relational, the data we store there can be setup to allow for foreign keys back to an RDBMS. mongoDB also allows indexing of fields so integrating the two makes for a powerful union!