iCalendar – Quirks From the Trenches

While working on Trello2iCal, an awesome webapp that schedules Trello cards on your calendar by their due dates, I worked a lot with the iCalendar protocol (or iCal). It was actually the biggest source of bugs and misbehaviors, me taking a very close second. It didn’t help that Apple decided to name their calendar app iCal adding noise to search results, which were scarce to begin with.

iCalendar itself has two version defined by two RFCs: RFC 2445 from 1998 and RFC 5545 from 2009. It’s a text-based protocol, which contains some metadata and a list of “components” that represent events, todos, alarms, and other calendar related structures. Each of these components is a list of “content-lines” which are mostly “key: value” lines. When you put these calendars online and constantly update them, like Trello2iCal does, it’s called a feed (much like an RSS feed). Here is an iCal feed with two events:

DESCRIPTION:Card URL -nsome url
DESCRIPTION:Card URL -nsome_url

If I was a smartass, I would really question the decision to invent yet another textual protocol instead of using XML which is quite suitable for this. Maybe if the protocol was XML based it would have been easier to implement, thus making it more popular, which would have resulted in a bigger knowledge base and better tools. One tool I do have to point out that saved me a lot of time is the iCal validator (thanks Ben Fortuna!), but it only checks whether your iCal feed is valid according to the spec, not with the real world quirks below.

Below, I talk about the issues and quirks I ran into and their solutions or workaround, in no particular order.

  • iCalendar clients, especially Google Calendar, are very bad at refreshing their data. I’ve had reports of users waiting for over 24 hours for their calendars to sync up. There is a workaround but if you’re doing active planning, this is unacceptable. It’s a pretty easy issue to solve for Google Calendar – just add a button that a user can click to refresh, but I guess it’s not a priority. Outlook and Apple Calendar are a bit better, syncing a few times an hour, but AFAIK lacks a refresh option as well. I wish the protocol had a way to tell clients the maximal time between syncs.
  • Google Calendar refused to load iCalendar feeds before I put a robots.txt file, allowing Google bot access. I don’t think that makes much sense as this isn’t a bot but a protocol client (which is probably best called an “agent”).
  • While developing, I put version 0.1 as my iCalendar version mistaking it for the version of my app. Every iCalendar client accepted my feed except Apple Calendar who silently rejected it. Apparently, the only value they consider valid is version 2.0.
  • Apple Calendar was also the only one that didn’t parse the feed without a DTSTAMP field.
  • All Day Events – There are many methods that don’t work or are wrong:
    • WRONG: Adding one day to a date time object is not right – just make a 24-hour meeting. It doesn’t matter if you start at midnight and end at midnight either.
    • WRONG: Having the start date time the same as end doesn’t work either as it produces undefined behavior.
    • WRONG: Removing the time component and setting the end day and the start day as the same day – this is an “event on the day”, not a full day event. This should be used for birthdays for example.

    The RIGHT way to do it is to remove the time component and set the end day for a day later. If not for the first point, this would have been easier to test, but when refresh is a 5-step process, it becomes tedious.

    A mess of iCal Calendars and Events.

    A mess of iCal Calendars and Events.

  • New lines in the description – in this case Outlook is the outlier where encoding the line ends as “=0d=0a” (this encoding is probably a relic from Outlook Express in which this was the way to encode ASCII characters in emails) worked. The RIGHT way to do it is using “n”. No, not the ASCII n which is 0x0d, but the actual ASCII slash character “” followed by the letter “n”. Fun.
  • I was using the iCalendar python library. It is well maintained but looking at the code, it is pretty gnarly. It also doesn’t support Unicode in a sane way making me modify it in a few places that I’m now retroactively opening bugs for in their github repo.
    A current issue I’m having – both Apple Calendar and Outlook have no problem parsing the unicode I’m serving, but Google Calendar insists to turn them into question marks. I’m pretty sure the problem is somewhere in my HTTP headers, but I still didn’t solve it.

To be honest, I hope I didn’t come off as too critical of the protocol. Passing complex data like calendar events is a very hard problem – you need to take into account time zones, multiple people, multiple fields, recurring events, whole day events, free-busy times, event updates, event deletion, and a myriad of other scenarios. iCalendar is a solution, which is better than none, and a lot of the quirks can be blamed on calendar clients, but I still find this whole environment very unfriendly and mediocre at best.

Who Needs Code Comments?

A recent Hacker News discussion about source code comments has grown into a debate about whether you need them or not. Apparently it is a contentious issue. The comment that started it all included “Comments are for the weak” and these 5 words incited a hefty discussion and this post. Some people argued code comments are a bad code smell and others said that comments are essential for people to understand code. My theory is that this divide is between people using low level languages like Assembly, C, C++ and to some degree Java and people using high level languages like Python and Ruby.

The biggest reason I think that is that low level languages need a lot more lines to do the same thing and those lines are harder to understand. A simple but poignant example is opening a file and reading its content:
C code (taken from here):
char * buffer = 0;
long length;
FILE * f = fopen (filename, "rb");
if (f)
fseek (f, 0, SEEK_END);
length = ftell (f);
fseek (f, 0, SEEK_SET);
buffer = malloc (length);
if (buffer)
fread (buffer, 1, length, f);
fclose (f);

buffer = open(filename).read()

The python example is one line long and uses descriptive names for the actions – open and read – and doesn’t bother you with implementation details of allocating memory for the data. The C example is about 12 lines long and exposes a lot of implementation details both about memory and about how file systems work (seek etc.). Both methods have their uses and advantages but one thing for sure – anyone not familiar with C will have a hard time understanding the C code and even non-programmers can understand roughly what the python code does.
One commenter specifically caught my attention giving an example of “readable” C code from the Unix source code. Here is the second function from the source file:

* Wake up all processes sleeping on chan.
register struct proc *p;
register c, i;
c = chan;
p = &proc[0];
i = NPROC;
do {
if(p->p_wchan == c) {
} while(--i);

This is part of one the most influential operating systems written, but at least in my book this code wouldn’t pass code review. It will fail because:

  • using one letter variable names is bad – this is the biggest offender by far.
  • wakeup is a really general name for such a specific function. wakeup_on_channel is better.
  • Abbreviating “Number” to N in NPROC.
  • Not declaring input variables type (TiL that the default is int…)

This brings me to the second contributing point to my theory – writing something hard to read will be strongly discouraged by some communities more then others. High level languages are written with the axiom that code must be easy to understand. It’s even in their name – high means farther from machine code and closer to humans while low-level means closer to machines and their language.

Looking back at the HN discussion, you can make some good educated guesses about who in that thread is a high level programmer and who works closer to the metal. These two groups might not get each other’s context and so this discussions goes round and round. Both groups need to acknowledge that languages like C will need more comments and documentation to be understandable and that while commenting is good it might be a strong code smell that you need to refactor in a higher level language. I usually use this Python idiom – if you feel the need to comment something, make it into a function and write a docstring.

Practical Programming: Technical Debt

This is part of the “Practical Programming” series in which I try to say something profound about the craft of programming.

Technical debt is a very important factor in any software project yet you might never even encounter it until you start working in the field and having long-term projects. Even then, technical debt is something that creeps up. It’s not a bug, nor a feature you forgot or a meeting you dozed off in. It is more of a feeling, a state, a cup that slowly fills up until it overflows.

My definition of technical debt is the difference between the code right now, and the code if it was written perfectly by a thousand coders with unlimited time. This debt includes missing documentation, no tests of any kind, no release process, no design, no consistency in the code and other such issues that are usually regarded as secondary to actual working code. The metaphor is aptly named because it also has interest – the more the technical debt grows, the harder it is to scale development, be it adding new features, adding new developers to the team or being efficient in solving bugs. In the end you go broke and declare code bankruptcy – a rewrite.
(Some great resources that explain it in more depth: Martin Fowler’s article, c2 wiki page)

Let’s unwind the stack of metaphors and get back to why is technical debt a problem – to create a functional program all you need is code that compiles, but to create a sustainable development environment you need code, design, documentation, tests, deployment procedures, etc. This difference is the technical debt of your project. If a lot of those basic things are missing from your project it’ll get tougher and tougher to make even the smallest change without ruining something. As with any debt the interest will keep on climbing until it will be obvious that a rewrite is easier than fighting the code. This step always comes. You can ignore it and expect developers to work around these problems but eventually this will stop being cost efficient – bugs will pop up everywhere taking time from developing new features and new features will be buggy and produce unexplained behavior (the old “compile it and let’s see what happens”).

There are several ways to lessen the burden of technical debt. The easiest is just slowing down the development process and allowing time for basic software development practices. This is definitely the preferred way to do your projects (hehe). The other side of that is rewriting the whole code base, adding 1 to the version number and starting a  marketing campaign – “new packaging”.

I’ve recently had to think about it and I’m advocating a hybrid solution – allocating time in each development sprint to pay the technical debt of a specific class or package in your code base. I have summarized my approach to five easy steps:

  1. Documentation – create a page for the package using whatever documentation repository you use. Document the following about the code:
    1. How should it work? This means a general overview of the correct working procedure for the code and comments, if there are any, about how the current implementation differs from this ideal.
    2. Class level documentation – a few words about each classes’ responsibilities and the interactions with other classes. If there are more than 3 classes a diagram might be needed.
    3. Other things that will need to be documented: XML and other data exchange formats, client server interactions, security measures, relevant databases, etc.
  1. Code Documentation – Document the code itself:
    1. Documentation for classes and all public functions. Use your platform’s most popular documentation framework.
    1. Document algorithms and private functions as needed.
    2. Add a header to the file. A header should contain who is responsible for the code (usually the author), a date and a short description of the file’s content. Some places will also want the license added to the code.
  1. Uni-test the code – Write at least one unit test for every public function in the class. Try to hit the major code paths. This will make refactoring easier because you’ll have the safety net of knowing the tests pass, and having one unit-test makes the mental barrier of adding more way smaller.
  2. Make easy and safe refactorings::
    1. Magic values into global constants. Strings to external files for easy L10N
    2. Don’t Repeat Yourself (DRY) problems – if some code is copy pasted make it into a helper function / its own module / whatever. Sometimes this needs to be said… I know.
    1. Run Lint on your code and see what is easy to fix.
  1. Code Review – Review the code and document the major changes you’ll need to do to make it more robust. Create tasks, put them in the backlog and allocate time for them to be fixed along-side bug fixes and features. This might seem like cheating because you’re still deferring the problem, but knowing what needs to be done is half the battle. If you have it among your other tasks it is easy to schedule it and consider it part of development instead of some concept that doesn’t contribute to the push forward.

Is there a way to write software without getting in debt? Probably, but it might not be practical. Let’s not have perfect get in the way of good enough and ask is there a way to write software without getting into much debt? Of course. The best way is identifying those moments where you are deciding between a “quick and dirty” solution and a slower but better solution, and understanding that the “quick” in “quick and dirty” is only short-term and it might be slower in the long run because of the effects of technical debt.