Previous month:
August 2010
Next month:
October 2010

Do We Need a New Programming Language for Big Data?

Data_deluge
 

I'm the boards of two companies (Pentaho, Revolution Analytics) that are starting to see a lot of customer traction around Big Data. More and more companies in media, pharma, retail and finance are doing advanced analysis, reporting, graphing, etc with massive data sets. It made me wonder what other areas of the technology stack might evolve with the trend towards Big Data.  Obviously, there's new middleware layers like Hadoop and Map Reduce, and we're also seeing the emergence of NoSQL data management layers with Cassandra, MongoDB, MemBase and others.  But what about programming languages?  

OpenGamma CEO and resident genius Kirk Wylie wrote a post recently about why he wants a new programming language.

So why don't I have this language yet? Well, partially because programming language craftsmanship is hard. I'm pretty sure I'm not good enough to do it, which is usually my default criteria for saying something is Really Hard.

But I think as well the k3wl languages coming out are coming out of language requirements of the Top 10% crowd. They're the ones good enough to actually write the languages, and they're going to write a language that makes them happy. But then you end up with Scala, and then you end up with this monstrosity, and then you make me cry. A language in which that thing is even possible will never be a candidate as a Journeyman Programming Language.

You know who's going to do it? Someone like Gosling, who set about with the needs of the journeyman programmer in Java. But the state of the art has moved on, and Java just isn't suitable anymore.

Who I would really like to do it is Anders Hejlsberg. I am a very big fan of C#-the-Language. It's just that .Net-the-Ecosystem is so Microsoft-specific and horrific it'll never catch on in the wider world, no matter what Miguel de Icaza thinks.

This got me thinking about the challenge of the current complexity in Big Data systems.  Today, you have to be near genius level to build systems on top of Cassandra, Hadoop and the like today.  These are powerful tools, but very low-level, equivalent to programming client server applications in assembly language.  When it works it's great, but the effort is significant and it's probably beyond the scope of mainstream IT organizations.  (That's one reason that Revolution's R product has appeal, but R is a specialized statistical analysis tool, not a general purpose language.)

Could the Big Data complexity be factored out somehow with a new general purpose programming language?  No doubt. Having worked with Anders on the creation of Delphi many years back, this is right up his alley.  Or maybe we already have a good starting point with Erlang, Scala and Google's Go.  Go is particularly interesting having been designed by Rob Pike and Ken Thompson of Bell Labs / Unix fame.

What's been your experience in programming Big Data systems?  What do you think's needed?  Let me know in the comments below.

Zack Urlocker is an investor, advisor and board member to several startup software companies in SaaS and Open Source. He was previously the EVP of Products at MySQL responsible for Engineering and Marketing. He built the MySQL Enterprise subscription strategy and product line. MySQL was sold to Sun for $1 billion and is now part of Oracle Corporation. He is also a marathon runner, blues guitarist and fan of Interactive Fiction