Some thoughts on dbShards

I heard about dbShards via two recent blog posts — one by Curt Monash and the other by Todd Hoff. It seemed like an interesting product, so I spent some time digging around on their website.

dbShards

dbShards

As the name suggests, dbShards is all about sharding. Sharding, also known as partitioning, is the process of distributing a given dataset into smaller chunks based some policy. AFAIK, the term “shard” was popularized recently by Google even though the concept of partitioning is at least a few decades old. Most distributed data management systems implement some form of sharding by necessity, since the entire data set will not fit in memory on a single node (if it would, you should not be using a distributed system). And therein lies the USP of dbShards — it brings sharding (and with it, performance and scalability) to commodity, single-node databases such as MySQL and Postgres.

So how does it work? Well, dbShards acts as a transparent layer sitting in front of multiple nodes running MySQL, lets say. Transparent, because they want to work with legacy code, meaning no or minimal client side modifications. Inserting new data is pretty simple: dbShards using a “sharding key” to route an incoming tuple to the appropriate destination. Queries are a bit more complex, and here the website is skimpy on details. Monash’s post mentions that join performance is good when sharding keys are the same — this is not a surprise. I’m not interested in what other kinds of query optimizations are in place. When data is partitioned, you really need a sophisticated query planner and optimizer that can minimize data movement and aggregation, and push down as much computation as possible to individual nodes.

I found the page on replication intriguing. I’m guessing when they say “reliable replication”, they mean “consistent replication” in more common parlance (alternative, that dbShards supports strong consistency, as opposed to eventual or lazy consistency). This particular bit in the first paragraph caught my eye: “deliver high performance yet reliable multi-threaded replication for High Availability (HA)”. I’m not sure how to read this. Are they implying that multi-threaded replication is typically not performant? And usually you do NOT want threading for high availability, because a single thread can still take the entire process down. The actual mechanism for replication seems like a straightforward application of the replicated state machine approach.

But making a replicated state machine based system scale requires very careful engineering, otherwise it is easy to hit performance bottlenecks. I’d be very interested in knowing a bit more about the transaction model in dbShards and how it performs on larger systems (tens to hundreds of nodes).

The pricing model is also quite interesting. I think it is the first vendor I know of that is pricing on CPU and not storage (their pricing is $5,000 per year per server). I think this is indicative of the target customer segment as well — I would imagine dbShards works well with a few TBs of data on machines with a lot of CPU and memory.

Udaan and Whitespace

There are movies, and then there are movies.

Udaan Poster

Udaan is one of those rare movies where it seems like the crew had an intense clarity about the movie they wanted to make, and that is exactly what they did. They did not make it for the money, they did not make it to please a broad audience, they did not make it to please the critics — they made it, because that was what they wanted to show.

I’m not going to talk about the story or the characters much, just Google those things if you are interested. Instead, I want to talk about an analogy.

Any good designer knows the importance of whitespace, whether in layout or typography. Architects have long understood that negative or empty space is just as (or perhaps more) important as filled space. Watching Udaan was a good reminder that good moments in a movie need their space as well.

I didn’t feel rushed as I saw the movie; it felt a bit slow at times, but there was no hurry to get to the end. There are several scenes that are made poignant by the lack of dialog. The same goes for the music. Amit Trivedi has done an outstanding job with the background score as well as the soundtrack. The lyrics (by Amitabh Bhattacharya) are fabulous and are fittingly given their space in the songs — Amit makes sure that the music recedes and does not overwhelm so you can pay attention to the words. But when the voices take a break, the music that fills in the gaps is just as good.

As my wife observed, “this movie has craft.”

How to buy a new car

A few weeks ago we were in the market for a new car. Now, I like to think of myself as a cautious buyer: I like to do my research, I’m not much of an impulse shopper and I’m generally suspicious of sales people. A new car is a significant investment; naturally I felt extra prudent. Of course, all my friends kept wondering why I was making such a big deal: you go into a dealership, pick up the car, do the paperwork and walk out, as simple as that. I say good for them! But I sleep more peacefully knowing that I had my bases covered.

Courtesy http://www.flickr.com/photos/ericrobinson/

A quick Google search on “how to buy a new car” led me straight to the very comprehensive CarBuyingTips.com. It is probably a great resource for many people. But after spending a few hours clicking through the numerous links on there, I almost felt exhausted. There was way too much (redundant) information, perhaps badly organized and overall just not very easy to consume. As they say, hindsight is 20/20. So, with the experience of having just purchased a new vehicle, here is my attempt at a concise, five step guide to buying a new car.

  1. Figure out what you want: You should know exactly what make and model you want, down to the last detail — this includes the interior color, upholstery and exterior color, as well as any other options and accessories. The more precise you are in what you want, the better off you will be. My first impression while researching new cars was that I could get whatever configuration I wanted — if the dealership doesn’t have it in stock, they’ll simply order it in. Unfortunately, most dealers will only work with what stock they have. So checking for availability is critical. Go ahead and schedule those test drives, but let the dealers know upfront that you are not looking to buy just yet. The dealers will ask for your contact information though, so be prepare for a barrage of emails and phone calls from them, until you’ve made your purchase.
  2. Get the numbers: Once you have identified the configuration you want, find your car on Edmunds.com. Edmunds will give you the invoice price of your car. Go ahead and add all the options and select the colors to get a final estimated invoice price. The more informed you are, the better your chances are when negotiating with dealers and making an informed decision.
  3. Get a quote from CarsDirect: Buying a car online these days is not only possible, but highly recommended. You save the hassle of driving to dealerships, wasting time over the phone etc. Start your hunt for the best price by getting a quote from carsdirect.com. They partner with local dealerships and have very competitive pricing. My experience with CarsDirect was fantastic and I’d have definitely bought a car from them had a local dealer not given me a much better deal.
  4. Get quotes from local dealers: Open a spreadsheet, fire up your browser and start calling your local dealerships. Ask for the new car sales department and let them know exactly what configuration you are looking for. Ask them for price and availability. Always ask for out-of-the-door price, including taxes and rebates. This way there will be fewer surprises on the final bill. Make a point to let them know that you are talking to other dealers. Jot down the dealers quote in the spreadsheet (add the CarsDirect quote here as well). This process can take some time because you may not be able to reach them in the first attempt and there might be some back and forth while they get back to you with details. I recommend setting aside 2 slots of 2 hrs each for these phone calls.
  5. Decide and Buy: Once you have all competing quotes, you can make your decision. The final decision will probably depend not just on the price, but other factors such as availability, location of the dealership, your experience with the dealership etc. If you finance your car, most car companies typically have their own financing arm which usually provides great APRs. If not, talk to your bank. For the final paperwork, you should make a visit to the dealership. Be sure to read the fine print and know exactly what service (if any) the dealer will provide above and beyond the warranty and services provided by the manufacturer.

That’s pretty much it! I read a lot of horror stories online about swindling and cheating in dealerships. My personal experience with at least the Toyota dealers in the Bay Area was pretty good. Most of them were very straightforward and to the point. They did not want to waste their time or mine, and did not try to pressurize or hoodwink me into a bad deal. You might also find this guide useful.

What is node.js?

The logo of the Node.js Project from the offic...
Image via Wikipedia

If you follow the world of Javascript and/or high-performance networking, you have probably heard of node.js. If you already grok Node, then this post is not for you; move along. If, however, you are a bit confused as to exactly what Node.js is and how it works, then you should read on.

The node.js website doesn’t mince words in describing the software: “Evented I/O for V8 JavaScript.” While that statement is precise and captures the essence of node.js succinctly, at first glance it did not tell me much about node.js. I did what anyone interested in node.js should do: downloaded the source and started playing around with it.

So what exactly is node.js? Well, first and foremost it is a Javascript runtime. Think of your web browser; how does it run Javascript? It implements a Javascript runtime and supports APIs that make sense in the browser such as DOM manipulation etc. Javascript as a language itself is fairly browser agnostic. So node.js is yet another runtime for Javascript, implemented primarily in C++.

Because node.js focuses on networking, it does not support the standard APIs available in a browser. Instead, it provides a different set of APIs (with fantastic documentation). Thus, for instance, HTTP support is built into node.js — it is not an external library.

The other salient feature of node.js is that it is event driven. If you are familiar with event driven programming (ala Python Twisted, Ruby’s Event Machine, the event loop in Qt etc), you know what I’m talking about. The key difference though is that unlike all these systems, you never explicitly invoke a blocking call to start the event loop — node.js automatically enters the event loop as soon as it has finished loaded the program. A corollary is that you can only write event driven programs in node.js, no other programming models are supported. Another consequence of this design choice is that node.js is single-threaded. To exploit CPU parallelism, you need to run multiple node.js instances. Of course, there are several node.js modules and projects already available to address this very issue.

To implement a runtime for Javascript, node.js first needs to parse the input Javascript. node.js leverages Google’s V8 Javascript engine to do this. V8 takes care of interpreting the Javascript so node.js need not worry about syntactical issues; it only need to implement the appropriate hooks and callbacks for V8.

node.js claims to be extremely memory efficient and scalable. This is possible because node.js does not expose any blocking APIs. As a result, the program is completely callback driven. Of course, any kind of I/O (disk or network) will eventually block. node.js does all blocking I/O in an internal thread pool — thus even though the application executes in a single thread, internally there are multiple threads that node.js manages.

Overall, node.js is very refreshing. The community seems great and there is a lot of buzz around the project right now, with some big companies like Yahoo starting to use experiment with node.js. node.js is also driving the “server side Javascript” movement. For instance, Joyent’s Smart platform allows you to write your server code in Javascript, which they can then execute on their hosted platforms.

Finally, no blog post about node.js is complete without an example of node.js code. Here is a simple web server:

SIGCOMM goes to Delhi

For those of you who don’t know, SIGCOMM is the one of the most prestigious conferences in the networking community. SIGCOMM is the ACM Special Interest Group on Data Communications. Unlike SOSP and OSDI which alternate every year, SIGCOMM is an annual event showcasing the very best in networking research from around the world.

I was quite thrilled when I found out that SIGCOMM 2010 is going to be in Delhi this year!! While some of these conferences are known to pick “exotic” venues, it is also an encouraging nod from the academic community. This is probably the first top-tier systems/networking conference to be held in India and I hope the local universities in and around Delhi will take advantage of this opportunity.

It is shaping up to be an exciting year for Delhi, with the Commonwealth Games coming up soon after SIGCOMM. A glance at the organizing committee for this year suggests that Geoff Voelker might have been involved in pushing for this venue :)

I personally am not very fond of Delhi, however. Out of curiosity, I looked over the local information page on the SIGCOMM 2010 web site and found myself both agreeing with and a bit disappointed at some of the tips listed on that page:

  • Keep your wallet/passports in an inner pocket of a jacket or shirt. Avoid keeping it in the rear pocket, especially while moving around in crowded places or in public transport like buses or Delhi Metro.
  • Never follow the advice of taxi or cab drivers regarding your stay and travel in the city. Always take assistance from “May I Help You” counter and other assistance cell of the government like Delhi Tourism, DTC, etc.
  • Don’t travel alone late nights, especially female travelers. If you are getting late, take proper private cab or arrange a pick up

I do hope all visitors to SIGCOMM have a fantastic stay and that more and more conferences choose India.