Help Bring Zork and the FyrevM to Android, Kindle et al
Cloud, SaaS and The Consumerization of IT

How Real is the Data Deluge?


It seems obvious that given the decreasing cost of storage and computation, there's going to be a significant increase in the volume of data that organizations accumulate over the next 10 years.  But the type of data being accumulated may be different from the areas where traditional DBMSs dominated.  It's not just about transactions; it's search patterns, on-line behavior, click-thru data, events fired off by smartphones, messages over Twitter & Facebook, log data of various kinds.

If an organization can figure out a better way identify prospects, or deliver more targeted ads, or optimize pricing decisions by analyzing terrabytes of data, they'd be crazy not to. Over the long term, companies that don't develop these capabilities will be at a competitive disadvantage.

As to what the implications are from a technological perspective, that's a whole different can of worms. I'm starting to see adoption of Big Data technologies like Hadoop, HDFS, Cassandra, MongoDB, XML databases, analysis with R, Pentaho, and loads of other technologies.  And MySQL continues to play a role here as do other traditional relational databases.  Over the next few months, I'm going to dig down deeper with people using these technologies to try and discern the emerging customer patterns.

If you're in this space or using some of these technologies, let me know your thoughts. What volume of data are you dealing with?  How many nodes or servers are you using?  Are you running on a public cloud, private cloud or hybrid? What technologies did you evaluate?  What about traditional DBMSs didn't work for this scenario? 


I see very little mental overlap between people who grew up in relational databases and those who are deep into the new technologies. People who truly see the value in both sides are pretty special individuals, and their companies will have a huge competitive advantage after 5 years or so, when the heel-draggers realize they have a lot of catching up to do and that you don't just figure out how to use these new technologies overnight. At the same time, the ones on the bleeding edge are going to throw off a lot of energy plowing through technologies that nobody realizes are headed for the scrap heap.

On the specific technologies you listed, two things come to mind. MongoDB isn't Big Data yet, and can't be until it gets better at handling data corruption. At large scale, crashes and other problems are guaranteed, and it's too expensive to deal with unless the technology recovers automatically. Hadoop is the other thing -- I think it is seeing remarkable adoption for where it is in its lifecycle, and more maturity will bring a rich set of tools and technologies to support it. When that happens, it'll be unstoppable.

But see my earlier note: I could be wrong about all of this.

The comments to this entry are closed.