tl;dr - exposed WebSockets are an API, and changing an API once its event somewhat visible can have unintended consequences.

So imagine this - it's about two hours before our scheduled Big Apps final pitch in front a panel of star judges, including the CTO of Google and the president of the NYC EDC, to name a few. At this time, I should be getting ready for my trek into Brooklyn, when I'm informed by Lonique, Culture Island's front-end developer, that the server is down.

What? That doesn't make any sense. There were no pushes since the night before, and, of course, I tested everything as throughly as usual. After a frantic 15-30 minutes of debugging - which somehow simultaneously went by much too quickly and much too slowly - I narrowed down the bug down to two lines of code - one on the server side, one on the client side, both related to WebSockets.

I had updated an existing client side socket message to pass an extra parameter, and the corresponding server message handler to expect and act according said parameter. This worked fine in testing because the browser was constantly being refreshed, meaning that the latest client code was always interacting with the latest server code.

The server crash was caused by Lonique having a cached/unrefreshed tab open on his mobile phone with the app loaded. Since the tab had not been refreshed/the old client code had been cached, when the WebSocket passed a message to the server sans the new parameter, a null pointer exception was raised and never handled, which, in the land of Node.js, spells death to the server.

I learned to important lessons that day:

  1. WebSockets are a form of API - once in the wild, tread with caution when introducing structural updates
  2. Use Forever.js to keep your Node.js servers up and your blood pressure low