
- Image via Wikipedia
Over the past few years, Google has open sourced several projects that provide some commonly used building blocks in any large software project. Some of them I was aware of since when they were launched (like protobufs), while others I discovered only recently. I couldn’t find any location where all the projects were listed together and combing through Google Code looking for them was painful, so I’m putting together a list myself. Hope some of you find it useful.
- protobufs: Platform agnostic messages. Critical for any distributed system. Note that protobufs only provide message serialization/deserialization (for various languages). An important missing piece is an RPC framework built on top of them. There are several projects attempting to build one using protobufs, but none of them are robust or mature enough for production use.
- style guide: The importance of a style guide is probably understated. It is not about what is the “right” style — it is about consistency. While people may have different opinions, if everyone follows the same style, the code becomes much more readable and maintainable. Google maintains style guides for C++ and Python.
- config flags: Another important building block for all command line programs.
- logging: Self-evident. Google’s logging library supports various log levels and other useful macros.
- core dumper: A very nifty library — it allows you to dump core from within a running application. Extremely useful for debugging production systems.
- perftools: An extremely useful library for measuring and monitoring performance of programs. By simply linking against perftools, your application gets a much better malloc, heap checking, visual CPU profile of various routines (via graphviz), visualization of memory usage etc.
- googlemock: A framework to quickly build mock objects — useful for testing.
- googletest: Google’s C++ unit testing framework, built on top of xUnit. Integrates well with googlemock.
Of course, this is not an exhaustive list. There are numerous other open source projects from Google, some of them probably much more bigger and visible than the ones listed above — such as Wave, Go, GWT etc. If there’s a project that is a software building block that I missed out, do chime in the comments below.



![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=4b496703-543c-471e-9665-ef0f3e6a2d44)
Tools for the savvy grad student
As a grad student, I was always looking out for tools to make my life simple (read: I’m quite lazy). Here are some of the tools I think every savvy grad student must know.
A good plotting library
Don’t even mention gnuplot. Not only is it old school (how many times have you looked at a graph in a paper and just known that it was produced using gnuplot?), but it is extremely limited in its feature set. My biggest gripe with gnuplot, however, is that it forced me to separate my data collection/analysis from the actual plotting of the data. I personally am a huge fan of matplotlib — it is an uber-plotting library written in Python. It can produce high-quality graphics in dozens of formats (including interactive plotting), it has an object-oriented API as well as a imperative API along the lines of Matlab (hence the name). You can create amazingly rich plots and best of all, you can combine your data collection and analysis (which I was doing in Python anyways) with your plotting.
If you are more a Ruby person, check out gruff.
Bibliography Management
There are two aspects of bibliography management. First is the context of a specific paper: you are working on a paper and you want to collect all the relevant bibliographic information for citing in the paper. BibTex is the tool that is most commonly used for this, in combination with LaTeX. However, BibTex is buggy, the syntax is inconsistent across implementations, it lacks simple features like variables and the ability to “import” other bibtex files etc. Enter CrossTeX — a drop-in replacement for BibTex. CrossTeX is written in Python. It has an object-oriented model for representing citations. So once you define an object for author “Foo Bar” aliased as foobar, you can simply use foobar wherever you would like to cite “Foo Bar”. CrossTeX also makes it trivial to define new formatting styles for your citations. For instance, if you want to change the capitalization of the titles or abbreviate “Proceedings” to “Proc.” everywhere. Finally, CrossTeX was built by some nice folks at Cornell, so they know exactly what the pain points of BibTeX were.
The second aspect of bibliography management is simply keeping a track of all the papers you read and review. These will come in handy when you are writing a paper, a dissertation, preparing for a talk or an interview, or simply trying to recall prior work in a given field. I highly recommend using CiteULike — it is an online bibliography management portal. Some features I really like: CiteULike has a really nice bookmarklet that you adding new items to your bibliography using a single click from various sites such as ACM, USENIX, IEEE, PubMed, arXiv and so on; it has some really nice social features as well such as tagging, groups, watch lists etc.; you can download selected citations in multiple formats; you can search easily by keyword, tag, author, area, year etc.
A Text Editor
I don’t mean an IDE (like Eclipse) or a Word processor (like MS Word). I mean a text editor and only a text editor. AFAIC, that means Vim or Emacs (if that works for you). The bottom line is, learn a text editor and become really really good at it. You will be amazed at how much time will save you and how much can it impact your productivity. Some features that are essential: syntax highlighting, regular expression support, spell check, support for snippets etc.
On that note, learn to write in LaTeX. I’m horrified by the fact that so many people are still using Word like tools to write papers. I don’t have anything against Word, but it is the wrong tool for writing papers. Just reference management, formatting, including figures etc are so incredibly easier in LaTeX.
Version Control
I can’t stress this more — you must get in the habit of versioning everything. Not just code, but your notes, write-ups and obviously papers. Having some version control has saved me from disasters many a times. And if you are collaborating on papers, I can’t imagine how people do it without some kind of version control system. Now there are a lot of choices out there. But if you are really savvy, you must use git :) Basically use any reasonable distributed VCS (Mercurial and Bazaar are also ok), but avoid Subversion and absolutely refuse to use CVS at all costs. CVS has lived a good life, but its time is now past and we must let it go.
Information Management
And by that, I mean staying on top of the news and research in your research area and/or academic community. I’ve found it very useful to add all the relevant blogs to a ‘research’ tag in my Google Reader (yes, the blogging bug has bit academia). Likewise, you can find a lot of current information on Twitter. I’m sure people have already started live-blogging and twittering from academic conferences as well!
Of course, for more conventional searches, DBLP and Google Scholar are invaluable. CiteSeer used to be the go-to website a few years ago, but I personally find Google Scholar much nicer to use and with just as much information, if not more.