Pulling free and open weather data

When I decided to add realtime weather effects to @choochoobot, I knew there were a few qualities I wanted to find in a data source. Ideally I would find something free and reliable that didn’t require me to agree to many developer terms or sign up for an API token. Google shuttered its undocumented Weather API in 2012, and Yahoo’s offering, which has changed a few times over the years, now requires an account and a consumer key and secret.

It took some poking around but I was eventually successful, and now @choochoobot should correctly show clouds, rain, snow, or thunderstorms, depending on whether observations in New York at the moment it’s tweeting.

Current observed weather conditions seems to me like something that should be provided by the government as open data. Fortunately, that intuition was correct: free realtime weather data is openly available, if you know where to look.

The National Weather Service, an agency within NOAA, provides weather observations from stations all over the US in RSS and parseable XML formats. And while parsing XML in Python isn’t exactly pleasant, it’s straightforward enough and I am now able to get qualitative descriptions of the current weather that I can translate into emoji. Here’s the relevant code, and here’s how it works:

  • Each time @choochoobot tweets, it checks to see whether it’s daytime or nighttime. If it’s nighttime, I skip the weather check and instead calculate the phase and placement of the moon in the sky.
  • If it’s during the day, I make download the XML file provided by the KNYC weather station in Central Park. You can load the same file in your browser at any point, but you probably have to view-source to see it in a reasonable human-readable format. That’s one of about 1,800 locations in US states and territories that the NWS provides information for.
  • @choochoobot then parses the XML file using Python’s built-in ElementTree XML API. The relevant field for my purposes is labeled "weather", which contains a text description of the observed conditions.
  • At least in theory, that phrase will always be one of the 250 or so pre-set descriptions provided by the the NWS. These are sort of grouped into categories—there’s a pretty clear thunderstorm grouping, and one for hail—but it seems a bit ad hoc. My use requires classifying the observed weather into just four or five buckets with matching emoji; I just made a big list of terms that I’d take to mean "cloudy," for example, and checked to see whether the observed weather phrase was on that list.
  • Then I pick emoji for the sky, and put the whole tweet together. If the weather is cloudy, I replace the sun emoji with a sun-behind-clouds emoji. Real scientific stuff.

In case it’s useful, I’ve converted the NWS list of weather conditions to JSON and submitted it to Darius Kazemi’s corpora project. Once that gets merged in, those weather conditions will all be available that way.