March 2014

Sun Mon Tue Wed Thu Fri Sat
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          


« MySQL Sunday at Oracle Open World | Main | Help Bring Zork and the FyrevM to Android, Kindle et al »

September 13, 2010


I agree with you on the fact that we definitely need a new language for bigdata systems.

I am not an expert on the matter, but it looks obvious to me that this bigdata language would need to have monads ( ) as first-class citizens.
These monads would then be responsible for:
- distribution
- delayed execution
- partitioning
- storage
- filtering
- priorities
- nil handling
- exception handling
- caching
- eventual consistency
- ....
In short, you would be able to modify the execution behaviour without modifying the actual algorithm implementation.

Currently those bigdata/distributed systems work usually best through some kind of a message bus, where the messages are handled by handlers, and the handlers are decorated by handler decorators which alter the behaviour of the handler (these would be the monads).
I think you would have a pretty good language if you would be able to have such a system, but where the message bus is abstracted away from the developer ...

Just my .02€


The lesson of Java's (in my opinion) failure is that perhaps one size doesn't fit all.

In the past standardized solutions such a graphics abstractions, data source access, data representation, and interface programming evolved after living with and coming to understand the flaws and limitations of earlier approaches.

Too often today I see a rush to standardize on new tools and methodologies before tryng to solve problems with what's currently available.

Everyone's looking for the perfect pre-solution. They then act shocked when it doesan't work out as planned.

I believe a large part of the problem is that we fail to make a distinction between an "architectural" language - i.e. one that describes how you do things and an "intentional" language - i.e. one that describes what you want. For the architectural language, you need to be able to describe caching, queuing, latency, locality, that sort of thing (Cassandra is a pretty good start). For the intentional language we already have some great candidates, almost inevitably they have to be declarative (or applicative). There's nothing wrong with relational algebra (just get rid of the SQL mess). Functional Query Languages are well understood and have been for decades.

I don't think you need a new language, I think you need a framework on an existing language.

Existing higher level languages already deal with RAM acceptable amounts of data very well, and that's about as far as in-memory needs to reasonably go.

Past that you merely need the ability to acquire subsets from stores (rdbms, nosql, distributed cache systems, etc.), ideally in a general way.

So, build a general purpose big data query/analysis system as described and you should be good to go.

I think you are working on the wrong end of it. You don't need a new language yet; what you need is a new paradigm: a new way of thinking about managing and using Big Data. The language, as an expression of the paradigm, can come later. Must come later. Trying to find or fashion the perfect language without first trying to refashion your conception is like putting the cart before the horse. SQL did not just spring into being. What came first was the realization "what if we treated a pile of data as a mathematical set, and then we can use set algebra to operate on it! Now how do we express the algebra of sets without requiring a degree in mathematics to understand the symbols?".

Once you have your new paradigm, you can prototype the means of expressing that paradigm in LISP, and graduate to creating something more formal as things solidify ;)

So what is it that makes Big Data so much more different than the plain old data? Is it really just about size? What new or existing concepts demand language expression that are cumbersome or impossible today? Why do they need expression in the language instead of just figuring out how to improve the technical architecture? Can't we just fix the technical architecture without messing with the language? Not every paradigm demands expression as a programming language after all. Maybe (as Patrick above said) we need a different language for expressing the architecture of the system that is unrelated to the language used to access the data. Of course, by Codd's rules, such a system may no longer be a Relational Database Management System, but that might not necessarily be a bad thing.

but that was just my off-the-cuff take on it...

SciDB goes beyond the relational world Stonebraker helped pioneer by swapping rows and columns for mathematical arrays that put fewer restrictions on the data and can work in any number of dimensions. Stonebraker claimed arrays are 100 or so faster than a RDBMS on this class of problem.

There's your new paradigm.

I don't know all the programming languages out there. I think there may already be a few.
Rather than a new language, updating or adding commands to an old language would be quicker for us to learn
and quicker for the company to create and test, than starting from scratch.
A new version would need to use and store larger numbers.
So the program and computers may need to be upgraged from 8bit to 16bit, 32bit, 64bit, 128bit 256bit ...

But you must remember that not all your users that want to run your software may have a 256bit processor.

thanks for the comments. Of course, a new programming language has to be designed with the right conceptual framework. Building a new language before we know what the model is unlikely to come out new. But it may also be the case that it helps us formulate the concepts.


Does anybody remember COBOL? I scored 100% on my COBOL exam, walked out of the exam, and never used it ever again. =)

If the data store and object persistence layer already employs a distributed architecture, and a scalable addressing scheme, then all the current languages should be capable of utilizing distributed, big data and processing it.

The apps around big data need to access a single logical view, node elasticity or auto-provisioning (an ability to add or remove compute nodes, which should be provided by the cloud platform host or some middleware solution), and the dbms should be such that very little or no administration is required to spin up needed resources. With those conditions met (although I'm still looking for the middle tier auto-provisioning layer that allows the app and dbms to figure out ho wmany nodes are needed), "big data" could be accessed and utilized as easily as any other size.

Sure, but the distinction is between being able to utilize distributed data and making it easy to do so. That's where further abstraction may help.

Just don't see the issue here Python, and I'm sure many other similar systems are quite capable in this domain. Python is already quite "embeddable" and there are plenty of constructs for dealing with all the algorithms mentioned such as interfaces to R and C as well as Numpy, Scipy and plenty more. Someone will have to be way clearer than this as to what issues are being "solved".

I would like to propose not to wait until you earn enough cash to buy all you need! You should just take the home loans or just financial loan and feel yourself comfortable

We do not need a new language for big data. What we new is a new standard API on top of which current languages can work. Something like Red Hat's Deltacloud. Languages are improved for boosting programmers productivity, but what we have here is a new kind of problem which complexity can be probabibly hidden in a (good) library without the need of paying the price of creating a whole new language.

Something the article conflict each other. Big Data is a specific domain. And one of the article's complain about R is it is not a *general-purpose* language. How BOTH the requirement to a DSL and the demand to a general-purpose language on the same article?

The comments to this entry are closed.