Tools for the savvy grad student

As a grad student, I was always looking out for tools to make my life simple (read: I’m quite lazy). Here are some of the tools I think every savvy grad student must know.

A good plotting library

Don’t even mention gnuplot. Not only is it old school (how many times have you looked at a graph in a paper and just known that it was produced using gnuplot?), but it is extremely limited in its feature set. My biggest gripe with gnuplot, however, is that it forced me to separate my data collection/analysis from the actual plotting of the data. I personally am a huge fan of matplotlib — it is an uber-plotting library written in Python. It can produce high-quality graphics in dozens of formats (including interactive plotting), it has an object-oriented API as well as a imperative API along the lines of Matlab (hence the name). You can create amazingly rich plots and best of all, you can combine your data collection and analysis (which I was doing in Python anyways) with your plotting.

If you are more a Ruby person, check out gruff.

Bibliography Management

There are two aspects of bibliography management. First is the context of a specific paper: you are working on a paper and you want to collect all the relevant bibliographic information for citing in the paper. BibTex is the tool that is most commonly used for this, in combination with LaTeX. However, BibTex is buggy, the syntax is inconsistent across implementations, it lacks simple features like variables and the ability to “import” other bibtex files etc. Enter CrossTeX — a drop-in replacement for BibTex. CrossTeX is written in Python. It has an object-oriented model for representing citations. So once you define an object for author “Foo Bar” aliased as foobar, you can simply use foobar wherever you would like to cite “Foo Bar”.  CrossTeX also makes it trivial to define new formatting styles for your citations. For instance, if you want to change the capitalization of the titles or abbreviate “Proceedings” to “Proc.” everywhere. Finally, CrossTeX was built by some nice folks at Cornell, so they know exactly what the pain points of BibTeX were.

The second aspect of bibliography management is simply keeping a track of all the papers you read and review. These will come in handy when you are writing a paper, a dissertation, preparing for a talk or an interview, or simply trying to recall prior work in a given field. I highly recommend using CiteULike — it is an online bibliography management portal. Some features I really like: CiteULike has a really nice bookmarklet that you adding new items to your bibliography using a single click from various sites such as ACM, USENIX, IEEE, PubMed, arXiv and so on; it has some really nice social features as well such as tagging, groups, watch lists etc.; you can download selected citations in multiple formats; you can search easily by keyword, tag, author, area, year etc.

A Text Editor

I don’t mean an IDE (like Eclipse) or a Word processor (like MS Word). I mean a text editor and only a text editor. AFAIC, that means Vim or Emacs (if that works for you). The bottom line is, learn a text editor and become really really good at it. You will be amazed at how much time will save you and how much can it impact your productivity. Some features that are essential: syntax highlighting, regular expression support, spell check, support for snippets etc.

On that note, learn to write in LaTeX. I’m horrified by the fact that so many people are still using Word like tools to write papers. I don’t have anything against Word, but it is the wrong tool for writing papers. Just reference management, formatting, including figures etc are so incredibly easier in LaTeX.

Version Control

I can’t stress this more — you must get in the habit of versioning everything. Not just code, but your notes, write-ups and obviously papers. Having some version control has saved me from disasters many a times. And if you are collaborating on papers, I can’t imagine how people do it without some kind of version control system. Now there are a lot of choices out there. But if you are really savvy, you must use git :) Basically use any reasonable distributed VCS (Mercurial and Bazaar are also ok), but avoid Subversion and absolutely refuse to use CVS at all costs. CVS has lived a good life, but its time is now past and we must let it go.

Information Management

And by that, I mean staying on top of the news and research in your research area and/or academic community. I’ve found it very useful to add all the relevant blogs to a ‘research’ tag in my Google Reader (yes, the blogging bug has bit academia). Likewise, you can find a lot of current information on Twitter. I’m sure people have already started live-blogging and twittering from academic conferences as well!

Of course, for more conventional searches, DBLP and Google Scholar are invaluable. CiteSeer used to be the go-to website a few years ago, but I personally find Google Scholar much nicer to use and with just as much information, if not more.

Enhanced by Zemanta
Posted in Featured, Grad School, Tools | 6 Comments

gooLego: Google’s software building blocks

Google Inc.
Image via Wikipedia

Over the past few years, Google has open sourced several projects that provide some commonly used building blocks in any large software project. Some of them I was aware of since when they were launched (like protobufs), while others I discovered only recently. I couldn’t find any location where all the projects were listed together and combing through Google Code looking for them was painful, so I’m putting together a list myself. Hope some of you find it useful.

  • protobufs: Platform agnostic messages. Critical for any distributed system. Note that protobufs only provide message serialization/deserialization (for various languages). An important missing piece is an RPC framework built on top of them. There are several projects attempting to build one using protobufs, but none of them are robust or mature enough for production use.
  • style guide: The importance of a style guide is probably understated. It is not about what is the “right” style — it is about consistency. While people may have different opinions, if everyone follows the same style, the code becomes much more readable and maintainable. Google maintains style guides for C++ and Python.
  • config flags: Another important building block for all command line programs.
  • logging: Self-evident. Google’s logging library supports various log levels and other useful macros.
  • core dumper: A very nifty library — it allows you to dump core from within a running application. Extremely useful for debugging production systems.
  • perftools: An extremely useful library for measuring and monitoring performance of programs. By simply linking against perftools, your application gets a much better malloc, heap checking, visual CPU profile of various routines (via graphviz), visualization of memory usage etc.
  • googlemock: A framework to quickly build mock objects — useful for testing.
  • googletest: Google’s C++ unit testing framework, built on top of xUnit. Integrates well with googlemock.

Of course, this is not an exhaustive list. There are numerous other open source projects from Google, some of them probably much more bigger and visible than the ones listed above — such as Wave, Go, GWT etc. If there’s a project that is a software building block that I missed out, do chime in the comments below.

Enhanced by Zemanta
Posted in Featured, Software, Tools | Tagged , , , , | 3 Comments

Horse Drawn Carriage

Our wonderful friend Sanya has several fantastic handmade screen prints up at her store on Etsy. And if the prints weren’t already cool enough, she is now offering FREE shipping all through December!! What a way to spread around the holiday cheer :)

Here’s a sample of her works:

Siren

Siren

Smokestacks

Smokestacks

She also has a series of daily drawings up on her blog.

Posted in Art | Leave a comment

Emacs vs. Vim

This is a follow up on my previous post.

Update: Since this topic deserves a little more than a post, I have moved this post to a separate article which I shall keep expanding over time.

Posted in Software, Technology | Tagged , , | Leave a comment

Reconsidering Vim

NOTE: This post is not about the editor war — so please don’t try to start one either.

First, some background. Lets just say that I lost my editor virginity to Vim. It was a brief, but violent introduction — the modal editing was too unfamiliar, the learning curve too steep. After dabbling with a few other conventional editors (such as KWrite), I settled upon Emacs (XEmacs actually, but thats another story).

For the next three years, I tweaked my .emacs file, fiddled around with settings and plugins and modes, played games and browsed the web, checked my email and newsgroups, all within the comfortable confines of Emacs. But I was getting wary of the long startup times and (at that time) the inability to use the same interface and features in console mode (such as over SSH) as in GUI mode. It was time to move on.

I rediscovered Vim around 6 years ago. I started with a clean slate. As the saying goes, Emacs is an operating system that also happens to have an editor in it. The relatively more focused feature set of Vim was refreshing in comparison. I loved that I could work in GUI mode, save my session, go back home and resume my session in a terminal over SSH, which the exact same interface and keybindings. I quickly became very productive with Vim, and over the years have honed my plugins, settings and color themes to just how I like them.

But recently, I’ve been thinking about this again, and I might just reconsider Vim. I highly recommend reading these two blog posts to better understand where I’m coming from:

Don’t get me wrong — I think Vim still has a lot to offer. But, I can not deny that Vim is not what I would call a “forward looking editor.” Here’s why:

  • Development community: the Emacs development community is a lot more open and vibrant right now than the Vim community. Part of this has to do with the BDFL model in Vim. Bram Moolenar has done a tremendous job in bringing Vim to the stage where it is. People can and have forked Vim in the past. But for one reason or another, Vim has stayed Vim, and its development trajectory has been slow and incremental.
  • Source code: Vim’s source code is not clean. At all. I just briefly skimmed over the source tree for Emacs 23, and it looks a lot more understandable and well structured.
  • Architecture: Vim 7 finally got spell check. But the spell check does not use any of the existing tools or formats. Vim has its own scripting language, with its own interpreter, grammer and data structures. Why not just use one of the many wonderful programming languages out there? Yes, there are interfaces to allow writing Vim code in Python, Ruby, Perl etc. But why reinvent the wheel all over again?

When Bram Moolenaar — the lead developer of Vim –  joined Google, I had hoped that Vim would generate a lot more interest and enthusiasm. But so far, it hasn’t changed much.

And so, in the next few weeks, I’m going to take another look at Vim as well as Emacs. I’ll try to do an objective evaluation of where the editors stand today, where I perceive they are headed. I hope to make my decision on whether to move away from Vim or not by the end of this year.

Reblog this post [with Zemanta]
Posted in Software, Tools | Tagged , , | 6 Comments