Experiences with Google App Engine


I’ve been playing around with [[http://appengine.google.com|Google AppEngine]] for the past two weeks, and the experience has been mixed so far. First, the good:

* really easy to build something simple and get started.
* no need to worry about scaling, backup, replication etc. I haven’t verified this obviously, but at least thats the claim.
* the integration with Google accounts is nice.
* good documentation, lots of sample code available.
* dev server really helps with most of the development.
* the sort of restrictive resource usage limits (see below) forced us to think carefully about our code and heavily optimize certain operations to make them work on GAE.

{{ http://code.google.com/appengine/images/appengine_lowres.jpg}}

And now, the bad:
* too many limits: 1 million is their favorite number. No files over 1MB, no request should take more than 1 million CPU cycles (whatever that means) and who knows what other limits they impose internally. While developing, this was the biggest barrier for us. Things would randomly fail, and then our application would be disabled for several hours.
* The dev server doesn’t replicate the constraints in production. So everything would run fine and dandy locally, and the minute we upload, it would fail. Since we can only debug in production, and our application exceeded the quota every time we ran it, debugging was extremely slow and painful.
* local data store is excruciatingly slow. But this is not that critical, since it is only for testing anyways.
* even the remote data store is very flaky and slow at times. Any query involving more than a few hundred elements exceeds the quota.
* the bulk uploader is very useful, but again it is really really slow. If you want to upload anything in “bulk”, you’ll have a hard time. The parameters have to be chosen carefully as well. Even for very simple data models involving 3-5 fields (mostly strings), we had to reduce the batch size to 2-4 to make it work. And despite that we got a few HTTP 500 errors while uploading.

But its been fun so far. Hopefully most of these issues will get ironed out moving forward. As for what we are building? That will have to wait for another post ;-)

Google Reader auto sort


[[http://reader.google.com|Google Reader]] offers several options for [[http://www.google.com/support/reader/bin/answer.py?answer=69980&topic=12012|sorting feed items]]. After having played around with the “auto-sort” for several months now, I am reverting back to “Sort by newest”.

{{ http://www.google.com/googlereader/images/logo_reader.gif|Google Reader}}

The problem is that the auto-sort mode is a little too simplistic. Here’s what it does in their own words:


//This works by prioritizing subscriptions with fewer items. So, with this setting, your friend’s blog with one item a month will not be drowned out by higher volume sites such as the New York Times because we’ll raise the blog to the top.//

The general idea behind auto-sort is good, but unfortunately the execution hasn’t evolved at all to become smarter. For instance, some blogs I read haven’t been updated in a while. And I’m really not interested in the stuff they wrote some months back. So I never read those few old posts and yet they continue to hang around at the top of my feeds, which gets annoying quickly.

Ideally, the auto-sort should also take into account my reading trends (they obviously collect all this data, so might as well use it). In my case, what I really want the auto-sort to do is this: if there are some old posts and I’m consistently choosing not to read them, then perhaps they don’t need to be raised to the top any more. If I need to find them, I can always do so. In fact, I wouldn’t even mind if the old posts were raised to the top of the list once in a while.

An even smarter auto-sort will also take into account my reading habits. If there’s an infrequently updated blog that I read religiously, then I definitely don’t want to miss even an old post, no matter what. Similarly, old posts from an inactive blog that I have stopped following should be given less weight.

How do you sort your feeds?

Come on Yahoo! dikha de!!


I’ve always felt a little sorry for [[http://yahoo.com|Yahoo!]] (and I find it ironic that even for such a statement, I need to use the exclamation point). They always seem to be living in the shadow of Google, some times to no fault of theirs. Sure, they have made their share of mistakes, but I think the tech circles, and particularly the media give Y! much less credit than it deserves. And thus I’ve been following the Microhoo saga with some interest, and with a feeling of resignation ([[http://news.yahoo.com/fc/Business/Microsoft_Yahoo|full coverage]]).

{{ http://farm3.static.flickr.com/2214/2234037367_2a77f57641_m.jpg|Microsoft’s hostile takeover bid}}

It would be sad if the merger/acquisition does go through (which I think it will, eventually). Meanwhile, while the long drawn battle plays itself out, I can’t help but wonder why Y! failed to leverage some of its really valuable assets. Honestly, some of their assets have incredible value in them. To some extent I do blame the media (or Yahoo’s PR). I don’t believe that Google does //all// the innovation, nor that all their products are superior to the competition. But still, even if someone in Google sneezes, it gets Dugg and Slashdotted and every one just goes hyper. In this post I’ll discuss some of these issues.

First off, some of the good stuff (I’m not going to mention the usual suspects like Y’s traffic numbers or their share in the web-mail and IM markets):

* Yahoo! is a major supporter and contributor in [[http://hadoop.apache.org|Hadoop]]: an open source implementation of [[http://google.com|Google's]] [[wp>MapReduce|MapReduce]]. Complaints of Yahoo playing catch up and “too little too late” apart (I will address them in another post), I do think this is a timely and much needed development, both for Yahoo and the industry in general. A cursory look at the [[http://wiki.apache.org/hadoop/PoweredBy|list of places using Hadoop]] is enough to give an idea of the kind of enabler this platform is. An entire community and several other projects are mushrooming around Hadoop including [[http://hadoop.apache.org/hbase/|HBase]], [[http://incubator.apache.org/pig/|Pig]] (bad name if you ask me) and [[http://hypertable.org/|Hypertable]]. Google might have the largest, most efficient MapReduce and BigTable implementations, but their implementations are just that — theirs, and extremely closely coupled to their infrastructure. Opening up such a platform for others and building a healthy community around it is I think a Good Thing.
* Yahoo! Developer Network: This crew has churned out some remarkable products (such as [[http://developer.yahoo.com/yui/|YUI]] and [[http://developer.yahoo.com/yslow/|YSlow]]) as well as some really well organized guidelines (such as [[http://developer.yahoo.com/ypatterns/|the Design Pattern library]]).
* [[http://finance.yahoo.com/|Finance]]: the [[http://finance.google.com|competition]] is not even close.
* Flickr and Del.icio.us
* [[http://mobile.yahoo.com|Yahoo! Mobile]]: I have yet to get on the mobile Internet bandwagon, so I really have no first hand experience here. But I’ve heard that Yahoo products have much better support across a wide variety of devices compared to the competition. In fact, until the Java based GMail reader came out, the mobile version of GMail’s web interface was quite lacking.

That said, I feel there are two main areas Y! needs to work on if they want to get back in the game:
* Brand image: Y! needs to work on how they are perceived //externally// as well as //internally//. I feel that people who work at Y! themselves don’t believe in the company, or have the feeling it is somehow not as good as or not as cool as other companies. A lot of Google’s brand image comes from the attitude of its employees, and the work culture. Ditto for Microsoft.
* Streamlined products: Yahoo! Maps and Mail are good applications, but they are far too bloated. Even on my reasonably powerful dual-core desktop, these applications feel sluggish and drive the CPU to saturation which is just not acceptable. In comparison, offerings from Google feel much leaner, load quicker and are more responsive.

In the end, the company that remains competitive and offers the best value to its customers and shareholders will prevail. And I feel that a combined Microsoft-Yahoo entity will not make the space any more interesting. On the other hand (as many fear) I think it might kill and certainly slow down innovation that might otherwise have happened. If Yahoo! can somehow manage to stay afloat on its own, it will at least be a little more exciting. So come on Yahoo! dikha de (translates to “show us”)!!

How gTalk pushed jabber


I remember signing up for a Jabber account several years back. Since there were a lot of Jabber servers to choose from, and really no “canonical” choice, I ended up trying out a few different ones, until I finally settled on the jabber.org server. Of course, since hardly any one I knew was using Jabber at that time, that account was rarely used.

{{ http://floatingsun.net/wordpress/wp-content/uploads/2008/02/screenshot6.png|Jabber}}

I subsequently tried to convince my friends to start using Jabber, even issuing a [[http://floatingsun.net/2005/07/28/call-for-jabber/|call for Jabber]] on my blog. Suffice to say that in all I added perhaps three friends to my Jabber buddy list. So much for technological merit driving adoption!

Somewhat naively, in that post I said:


//I should point out that Jabber is meant for (and only for) instant messaging. This means that there is protocol bloat for supporting webcams or voice chats. Use video conferencing or VoIP if you want those. Lets keep IM simple.//

Oh, how wrong I was. There are now official [[http://xmpp.org|XMPP]] extensions for [[http://www.xmpp.org/extensions/xep-0167.html|audio]], [[http://www.xmpp.org/extensions/xep-0180.html|video]] and [[http://www.xmpp.org/extensions/xep-0096.html|file transfer]]! Nevertheless, the basic premise of Jabber remains the same: open, free standard, distributed implementation and rich functonality.

However, it wasn’t really until Google embraced XMPP for Google Talk that Jabber really took off. Even now most end users are not familiar with the technological underpinnings of Google Talk. When Google Talk launched, it was a closed network. That is, though it used Jabber as the communication protocol, non Google Talk Jabber users could not communicate with Google Talk users. After some initial resistance, Google finally gave in, making Google Talk an open Jabber network.

It is kind of unfortunate that one of the main “features” of Jabber — a distributed implementation much like that of email — has essentially been nullified by Google Talk, since the vast majority of Jabber users //are// Google Talk users. Of course, it has been a boon to Jabber as well, since it piqued interest in Jabber from all kinds of commercial interests, leading to the significant increase in interest in the XMPP protocol stack. The extensions I mentioned earlier are just a small sampling of the [[http://www.xmpp.org/extensions/|total extensions available]].

It is interesting, as well as a little disappointing, that good ideas often get ignored not due to lack of technical merit, and some how endorsement by a powerful and recognized brand suddenly lends credibility to them.

New MapReduce article in CACM


Several people have already [[http://googlesystem.blogspot.com/2008/01/google-reveals-more-mapreduce-stats.html|noted]] that Google has published updated statistics on [[http://labs.google.com/papers/mapreduce.html|MapReduce]] in a [[http://doi.acm.org/10.1145/1327452.1327492|recent article]] published in the Communications of the ACM.

While numbers from Google are certainly always interesting, what struck me was the **absolutely pathetic** quality of the graphs in the article. To see what I mean, check out the graphs on Page 5 (you need an ACM account to get the PDF I think). They are hardly readable, both in print and on screen (zooming in doesn’t help). Here is a screenshot (I have included some of the surrounding text to give you an idea of the resolution):

{{ http://floatingsun.net/wordpress/wp-content/uploads/2008/01/screenshot15.png|MapReduce graph}}

As a member of the academic community, I’m quite disappointed and surprised that neither the authors, nor the editors took note of such an obvious shortcoming. MapReduce is great work, and a publication like CACM reaches out to a much broader audience than the conference proceedings of OSDI (where MapReduce was originally published) so I would expect the presentation to be top-notch (and remember, this is Google we’re talking about). Besides, what irks me most is that these are the //exact// same graphs (or at least some of them are) from the original MapReduce paper ({{http://labs.google.com/papers/mapreduce-osdi04.pdf|pdf here}}). Was is so hard to just copy paste or import the figures without messing up the resolution so bad?