Some thoughts on iCloud

Sorry, all the sensationalist headlines were taken, so I had to pick something boring.

As we all know by now (read: probably 1% of the world’s population), at WWDC earlier this week, Apple spilled the beans on the upcoming iCloud, among other things. In this post, I wanted to share some of my thoughts on the much hyped iCloud (not that there is any dearth of opinions and articles on the subject, thanks to the echo-chamber that is Twitterverse and Blogosphere)

iCloud

First off, some quick bullets summarizing what it is:

  • iCloud aims to make cloud storage painless, the idea being that your data should be available to you from all your devices, all the time.
  • It’s automatic and transparent. Apple is baking iCloud support deep into 9 different applications: iTunes, Photo Stream, Apps, Books, Documents, Backup, Contacts, Calendar and Mail. And that’s just the beginning.
  • It’s free. Upto 5GB — excluding purchased music, books, apps and photo stream.
  • Sync over the air: iCloud can sync across devices over wireless. As a concrete example, you’ll no longer need a cable to sync and backup your iPhone with your laptop.

Here are some cool things about iCloud:

  • Scan and skip upload (iTunes only): when dealing with large data sets (such as your movies and music collection), one of the main impediments to using cloud storage is the overhead of doing the initial import. With a 1Mbps uplink, a 10GB music collection will take a full day to upload. Of course, if the file you are trying to upload already exists somewhere in the cloud, you don’t need to upload it and this is exactly what iCloud does. Because of the iTunes store, Apple already has a library of 18 million songs (and counting) and detecting if two files are for the same song is a lot easier than for many other media types (say images or movies).
  • Storage APIs for developers: APIs are all the rage these days. By exposing the right set of APIs, Apple could attract developers to build iCloud functionality on other platforms (Android, for example). Unfortunately, the API is fairly limited at this point (key-value store or documents).
  • HP, Teradata, maybe EMC are rumored to have supplied bulk of the hardware in the spanking new datacenter that will be the backbone for iCloud.
  • Despite all the hoopla around “cloud” recently, it was still grounded firmly within the tech circles. Apple has the ability, experience and motivation to take cloud computing truly mainstream with iCloud.

What is NOT so cool:

  • Apple has a habit of exaggerating the novelty and efficacy of their features (remember Spaces?) Scan and skip upload is nothing new: it is just deduplication under the wraps — a well known technique in storage systems. Videos and photos will still have to be uploaded though — there’s no real shortcut for those. Of course, there are techniques to dedup arbitrary data and I hope Apple is leveraging them.
  • In the same vein, syncing of Mail, Calendar and Contacts is just catch up. Ever used Google? Likewise for Docs and Books. The delivery model is different — Apple apps work with the local data and sync when there’s connectivity. They haven’t touched upon conflict resolution, disconnected clients etc.
  • Implications for Dropbox: transparent, automatic sync across multiple devices is a phenomenally hard problem. Apple makes it sound like they’ve nailed it. It took Dropbox several years to address all the performance and security concerns. I’d wager Apple will run into its share of snags along the way.
  • Apples all the way: despite their claims, iCloud is designed to lock you in. Sure you may be able to leverage some of the features by installing additional software on a PC. But unless you are using an Apple device, you won’t get the full experience or service. Want your “reading list” available on Android (or Chome, for that matter)? Tough luck. Want your music available to other music players (open source players like Banshee and Amarok, god forbid)? How about your photo stream in Picasa?

Finally, there’s no doubt that iCloud will drastically alter the cloud landscape. However, Apple is focused mainly on the personal cloud — which is a good thing, they are playing to their strengths. It is also a great opportunity because the enterprise cloud market is still wide open. The requirements, challenges and “killer apps” in that market are very very different than the personal/consumer cloud market. Should be fun!

The silent victories of open source

Tux, the Linux penguin

Image via Wikipedia

For years, free/libre/open source software (henceforth referred to as FLOSS) have proclaimed, year after year, how that year is the year of Linux, or the year that open source will become mainstream, or the year that open source will finally take off etc. But it never has, at least traditionally speaking. Linux based desktops haven’t penetrated either the enterprise or consumer markets; with a few notable exceptions (Apache httpd, for instance), most FLOSS products — be it office software like OpenOffice, multimedia software such as Gimp or Inkscape — remain popular with economically insignificant niches. And yet, this year, more than ever before, open source forges ahead with its silent victories.

Consider the following shifts:

  • all the top brands of the day — Apple, Google, Facebook, Twitter, Amazon — they ALLstand tall on the shoulders of FLOSS giants.
  • Contributing software back to the open source community is becoming increasingly common, even expected. Take a look at the GitHub repositories of Twitter and Facebook, or the various Google projects. In fact, when screening engineering candidates, I often look for and encourage people to talk about their open source contributions.
  • Most of the activity around “big data” and “cloud computing” is being driven in large part by FLOSS, whether it is the Hadoop-powered ecosystem or the Xen/Linux powered Amazon Web Services.
  • Given the current smartphone landscape, it is highly likely that Android will become ubiquitous on tablet devices and a variety of consumer smart phones. Already, Android has more search mindshare than Linux, despite the fact that Linux is part of the Android stack.
  • If you start a software company today, I would bet that you will find yourself bootstrapping almost entirely using open source software. The entire development process — from the GCC compiler toolchain, to the build systems, to the scripting languages, to the version control systems, to the code review systems, to the continuous integration systems — everything is dominated by FLOSS products. Good bug trackers and enterprise Wikis are the last bastions but it is just a matter of time.

I’ve had a chance to see the enterprise software market up close and increasingly find more and more open source everywhere I look. FLOSS has not arrived, it has taken over.

How do you use Twitter/Buzz/Facebook?

No no, I’m not late to the party and I’m not asking literally how does one use the above mentioned services. Rather, I’m asking how does one put these various services to use. When do you post something on Twitter but not on Buzz, Facebook but not on Twitter; or do you post everything everywhere (ping.fm style)? I’m not a heavy hitter by any means and my usage of social networks is mediocre at best. Yet I myself confounded with all of the various services and their accompanying warts and virtues. Don’t you?

To help sort out my thoughts, I drew a picture (don’t you dare judge me for my lack of creativity!):

Twitter/Facebook/Buzz

Below I elaborate more on how I currently use each of the services.

Twitter

  • I tend to use it for technical and/or non-personal content. Things that I would want to publicize.
  • Unlike Buzz/Facebook, I don’t pay too much attention to who is following me. Most tweets are public anyways.
  • The 140 character limit is sometimes amusing, but often irritating. Are people still using regular SMS with Twitter?
  • Multiple startups devoted to managing Twitter “noise” is not encouraging.
  • @ replies are bandaid. Twitter is a broadcast-and-forget medium — I can’t have (or follow) a conversation on it.

Facebook

  • Use it for sharing random, personal updates (or things I find interesting :p)
  • Mostly on because of network effect (read: don’t want to be left off the social bandwagon).
  • Like that I can “Like” most things and actually follow the conversation via comments.
  • Always worried if my privacy settings are working and if there’s a new “default” I need to worry about.
  • Pay more attention to who I friend. The noise level is still quite high despite that.

Buzz

  • Usage domain similar to that of Facebook. Unlike Facebook, can choose to make posts Public.
  • Love the email integration. Conversely, API/clients still have to catch up to Twitter.
  • Supports likes, comments and “resharing”.
  • Privacy is modeled around my contacts (chat or otherwise), which seems natural.

I’m fine with using Twitter for all of my public posts. The main confusion lies between Buzz and Facebook. Facebook obviously has more social traction. That said, Buzz is just more convenient to use (because of the email integration mostly). Of course, all of the various connectors available (Twitter <-> Buzz, Twitter <-> Facebook, multicast via ping.fm or Chromedeck etc) make the whole thing even more confusing. At the end of the day, I might just go back to not using anything on a regular basis.

How are you using Twitter, Buzz and Facebook?

Startup Infrastructure: Where Linux Fails

Category:WikiProject Cryptography participants

Image via Wikipedia

It is no secret that I’m an open source evangelist and so when it was time to set up internal infrastructure at work, naturally the first order of business was to evaluate the various OSS projects out there — everything from wikis, bug trackers, source control, code review and project management. Running Ubuntu LTS (10.04) on all of our servers was a no-brainer and there were plenty of excellent options for most everything else as well (a follow-up post on our final choices later). The Linux ecosystem is fabulous for most of the infrastructure needs of a startup, but I learnt the hard way that there are still some areas where Linux needs a lot of work before it can become competitive with proprietary, non-Linux solutions.

Authentication

Centralized account management (users and groups) and authentication is critical component in any IT deployment, no matter the size. Even for a small startup, creating users/groups repeatedly for each new server, separate authentication mechanisms for each new service is simply not scalable. That is precisely why Active Directory is so ubiquitous at enterprises.

LDAP was the obvious solution in Linux-land and I figured it would be trivial to setup an OpenLDAP server that can manage user/group information for us. It would also be the single authentication source for all servers and services. I was so wrong.

After struggling with OpenLDAP for several painful hours, I gave up — the documentation is fragmented, Google doesn’t help much and personally I think the LDAP creators had never heard of “usability” when designing it. The seemingly simple task of creating some new users and groups involved several black-magic incantations of the LDAP command line tools. Getting servers to authenticate against the resulting directory was even harder.

Just as I was about to throw in the towel and setup an AD instance in-house, I stumbled upon the 389 Directory Server (now known as the Fedora Directory Server). With a new found hope, I set about installing it on Ubuntu and hit another roadblock — there are no up-to-date packages of FDS for Ubuntu. Reluctantly, I setup a Fedora instance (the only one so far) and installed FDS. Thankfully, Red Hat has put together really comprehensive documentation and guides for the Directory Server, which was invaluable.

From there on, it was mostly downhill (only a few minor hiccups). Finally we have a nice GUI to manage users and groups, and all servers/services authenticate against a single Directory Server. But the journey was unnecessarily painful. Here’s what I’d like to see:

  • Up-to-date packages of FDS for Ubuntu. Sane defaults and functionality out-of-the-box
  • Ready to consume documentation on how to integrate LDAP with various web applications, Linux distros etc (I’ll put together some of this soon)
  • More awareness — I should have found FDS a lot sooner than I did, but it is certainly not very well marketed
  • Single sign on: This is a whole different beast

Remote Access

At my previous company, we had a Cisco VPN solution. There were plenty of Cisco compatible VPN clients on Windows and Mac. In fairness, it was relatively easy to get vpnc working on Ubuntu as well. In fact, with Network Manager, you can manage your VPN connections using a simple and intuitive UI. But the setup was not very reliable and my connections would get dropped relatively frequently. It was impossible to have a long-running VPN session without disruption. I’m not sure if the problem was with the Cisco hardware or the Ubuntu vpnc client; I did see similar issues with the built-in VPN client on Mac OS X.

But at least VPN on Linux works. I can’t say the same about other remote access mechanisms, in particular IPSec and L2TP over IPSec. It took me some time to figure out which package to use (Strongswan, Openswan, iked etc etc); another couple of hours to get the Openswan configuration just right; several hours of struggling to automatically setup DNS lookups when using the IPSec connection (gave up and ended up using entries in /etc/hosts!). There is no UI in Network Manager to manage IPSec connections either. Strongswan does have a NM plugin, but that only works for IKEv2 (certificate based authentication), while I had to use IKEv1 (shared key based authentication).

At the end of the day, I do have a working IPSec tunnel and it is definitely more reliable than the Cisco VPN (been up for more than 2 days without disruption). But all this can and should become a lot more seamless.

These are a few areas where Linux failed me in setting up the infrastructure for a startup; it shines most everywhere else. Hopefully these last few kinks will get ironed out soon.

Observations from The Social Network

Image representing Facebook as depicted in Cru...

Image via CrunchBase

The Social Network is rather like a fast paced documentary. The content, production value and background scores were great. I really enjoyed the bit around the Harvard boat race — a nice piece of whitespace in the movie :) But this post is not about these aspects; rather I wanted to make a few observations about the several tiny tid-bits of open source sprinkled throughout the movie.

  • wget makes several appearances in a short segment of the movie where Mark is scraping the Harvard intranet for the seed data for various precursors to Facebook. To my relief, everything I saw seemed very real and plausible unlike, say, the hackery mumbo-jumbo in Matrix or (gasp) Swordfish. Nonetheless, I did not see (and have not seen) any evidence that Mark Zuckerberg is the programming genius that most reviews and synopsis claim. Of course, programming genius has no correlation with being successful (read: being the youngest billionaire)
  • The usage of Emacs, Perl and curl were also faithful. The emphasis should be on Zuck’s intuition about the idea and his ability to prototype quickly. The technology itself was something any script kiddy could have come up with.
  • Zuck is shown running KDE 3 on his workstation. Again, the attention to detail is impressive. KDE 3 was around the same time as the early years of Facebook development.

The Social Network

The Social Network

There were a few more things, but I saw the movie several weeks ago and the details are fuzzy in my head. Meanwhile, if you are interested in the veracity of the movie’s substance, I found this Gigaom post useful.