LAMP Stack Has Fewer Defects
Gates to Retire

Wikipedia Runs MySQL


It's no secret that MySQL is widely deployed in most leading web sites.  Everyone knows how successful the LAMP stack has become and that companies like Google, Yahoo, Sabre, Evite, Citysearch all use tons of open source software (including MySQL) to scale their operations.  As important as all of these companies are, I'm most proud of the fact that MySQL is used at Wikipedia.  This is one of the top web sites in the world and not only is it built on open source software (Linux, Apache, MySQL, Lucene) and the MediaWiki project. 

Wikipedia is famous not just for being a highly scalable open source project, but more importantly, it is an open and collaborative repository of human knowledge.  There are more than 1 million articles in the english language edition of Wikipedia and there are smaller versions in more than 100 languages.  And all of this is done with a staff of 5 employees (2 FTE).  The bulk of the work is done by volunteers.  The management of the systems, the software, the articles, the editing, you name it.  Whatever criticisms there may be, Wikipedia has proven the value of an open, collaborative model and I'm proud that MySQL is a part of it.  The Fortune article below discusses the distributed organizations of both MySQL and Wikipedia. 


Do you really think Wikipedia scales well? There are so many times during the day where access is so slow, not sure if that is due to the "design" (programming) of MediaWiki or what is the bottleneck.


It scales very well but it's also typically been doubling in load every 3-4 months. Doing a few billion queries per day at the moment on the database side.

The bottleneck shifts around between the PHP boxes, the Squid cache servers, the database and various other parts of the system. It'll only be consistently fast when growth slows down. So far, making it fast has just encouraged faster growth until it slows down again. By way of comparison, other sites at its traffic load may typically have a thousand or two thousand servers instead of a hundred or two. Then there's Google and the hundreds of thousands... :)

Today, money is typically the best solution for Wikipedia slowness, with the vast majority of the possible tuning work already done on the software side. If growth stops at levels, Wikipedia will end up needing about 10-20 million dollars more in equipment. Fortunately the donation rate has always increased with the traffic and so far the fund raising has been able to roughly keep up with the need for hardware.

And the software... not only zero charge but MySQL donated a support contract.

I was the first Wikipedia DBA and now both I and the person who took over from me are working for MySQL Support, so if you wondered what those support contracts buy, it includes people who've done the one server to billions of queries and top 20 web sites growth and tuning experience.

James Day

The comments to this entry are closed.