Pulling free and open weather data

When I decided to add realtime weather effects to @choochoobot, I knew there were a few qualities I wanted to find in a data source. Ideally I would find something free and reliable that didn’t require me to agree to many developer terms or sign up for an API token. Google shuttered its undocumented Weather API in 2012, and Yahoo’s offering, which has changed a few times over the years, now requires an account and a consumer key and secret.

It took some poking around but I was eventually successful, and now @choochoobot should correctly show clouds, rain, snow, or thunderstorms, depending on whether observations in New York at the moment it’s tweeting.

Current observed weather conditions seems to me like something that should be provided by the government as open data. Fortunately, that intuition was correct: free realtime weather data is openly available, if you know where to look.

The National Weather Service, an agency within NOAA, provides weather observations from stations all over the US in RSS and parseable XML formats. And while parsing XML in Python isn’t exactly pleasant, it’s straightforward enough and I am now able to get qualitative descriptions of the current weather that I can translate into emoji. Here’s the relevant code, and here’s how it works:

  • Each time @choochoobot tweets, it checks to see whether it’s daytime or nighttime. If it’s nighttime, I skip the weather check and instead calculate the phase and placement of the moon in the sky.
  • If it’s during the day, I make download the XML file provided by the KNYC weather station in Central Park. You can load the same file in your browser at any point, but you probably have to view-source to see it in a reasonable human-readable format. That’s one of about 1,800 locations in US states and territories that the NWS provides information for.
  • @choochoobot then parses the XML file using Python’s built-in ElementTree XML API. The relevant field for my purposes is labeled “weather”, which contains a text description of the observed conditions.
  • At least in theory, that phrase will always be one of the 250 or so pre-set descriptions provided by the the NWS. These are sort of grouped into categories—there’s a pretty clear thunderstorm grouping, and one for hail—but it seems a bit ad hoc. My use requires classifying the observed weather into just four or five buckets with matching emoji; I just made a big list of terms that I’d take to mean “cloudy,” for example, and checked to see whether the observed weather phrase was on that list.
  • Then I pick emoji for the sky, and put the whole tweet together. If the weather is cloudy, I replace the sun emoji with a sun-behind-clouds emoji. Real scientific stuff.

In case it’s useful, I’ve converted the NWS list of weather conditions to JSON and submitted it to Darius Kazemi’s corpora project. Once that gets merged in, those weather conditions will all be available that way.

“A Mind At Play” and Claude Shannon’s grave

As the “father of information theory,” Claude Shannon’s contributions to the development of modern digital technology are hard to overstate. That work puts him in the ranks of the Einsteins, Turings, and Feynmans of the world; somehow, though, he never seemed to get the credit and public recognition that those scientists received.

A great new biography of Shannon, “A Mind At Play,” tells the story of his work and may provide an explanation. In contrast to the stereotype of a single-minded scholar who doggedly pursues a particular theory for an entire lifetime, Shannon seemed to hop around to whatever captured his imagination—and in more than a few cases, advanced the state of the art or provided entirely new frameworks with which to consider its thorniest problems. (I was fortunate enough to be sent a copy of this book by its author because I had previously posted about a long New Yorker profile of Shannon commemorating the centennial of his birth.)

That professional pathway meant he probably never had a dull moment, but also that he wasn’t around to lead the fields he revolutionized. Beyond that, it makes it difficult to distill his work down to an equation, or even a few sentences. If you had to, though, you could do worse than his formulation of information as entropy. One fascinating detail from the end of “A Mind At Play” was that that equation appears on his grave stone at the Mount Auburn Cemetery in Cambridge where he’s buried.

What’s concealed, however, is a message on the reverse: covered by a bush, the open section of the marble on the back of the tombstone holds Shannon’s entropy formula.Shannon’s children had hoped the formula would grace the front of the stone; their mother thought it more modest to engrave it on the back.

And so Claude Shannon’s resting place is marked by a kind of code: a message hidden from view, invisible except to those looking for it.

This struck me as amazing, and I wanted to see how it was rendered. But, perhaps unsurprisingly, it is difficult to find a picture of the back of a tombstone—even one marking the grave of a person as notable as Claude Shannon. So I turned to Twitter, posting that passage and asking if anybody in or around Cambridge was available to take a picture for me.

Fortunately, it seems many people were as intrigued by that detail as I was, and the tweet was pretty widely circulated. To my surprise and delight, a few groups of people reached out to tell me they were available to go, and a few more just set out for there. I had inadvertently prompted a small flashmob at the grave of a scientist who had passed some 16 years earlier.

There it is! pic.twitter.com/wjz7PqxVeN

— Space User 583 (@User583) August 12, 2017

The equation is in fact pretty well hidden, but a few folks were able to duck into the bush behind the grave and grab a shot. I love the way it looks, and I was very excited that this new book prompted me and a few nerds around the world to share a moment of appreciation for the great Claude Shannon over a network that his work made possible.

Which states hated Wesley?

One of my goals while at Recurse Center has been to improve my ability to manipulate and visualize data sets. To that end, I’ve been toying around with the Social Security Administration’s baby name dataset, which records the number of babies born with each given name every year, both federally and at the state level. Because I’ve also been watching Star Trek: The Next Generation along with the Treks And The City podcast, I chose to dig into information about the name “Wesley.”

On my first pass through the data I noticed that the name’s popularity dramatically spiked around 1976, and then tapered off for a few decades after. Honestly, that spike is the most interesting property of the whole graph, and I can’t explain it very well. But a funny secondary effect is that neither TNG‘s premiere nor the release of The Princess Bride—both in 1987—could prop up the name as it declined in popularity. The effect makes it look like it’s tumbling off a cliff, instead of regressing to the mean. This graph, including the label, was generated in Python’s matplotlib.

After looking at the federal data, I decided to dig into the state-level stuff, to give me a (long-anticipated!) opportunity to generate a choropleth map. Again, I cleaned up the data in Python, and then generated a map using a Javascript library called <a href="https://d3-geomap.github.io/">d3-geomap</a>. For a long time I’ve wanted to get more familiar with its parent library, d3, and this has been a nice opportunity to dip my toe into that.

New bot: @78_sampler, serving up old records

The Internet Archive hosts an incredible collection of over 25,000 professionally digitized 78rpm records. The great thing about a catalog that large is that, if you know what you want, you’re likely to find it. On the other hand, if you just want to browse it can be overwhelming and even intimidating. Each item could possibly be a delight, but it’s difficult to even think about individual records in the face of such a huge archive.

In that sense, would-be browsers face similar challenges with the Great 78 Project as they do with the Pomological Watercolor Collection—an archive I’ve worked with a lot. Sensing that similarity, I decided to build a tool like @pomological to help surface individual records.

@78_sampler tweets every two hours with a randomly selected record from the Archive’s collection. It was important to me that the audio fit smoothly and natively into a Twitter timeline, so I decided to render each tune into a video file using the Archive’s still image of the record as the visual. Twitter limits videos to 2:20—exactly 140 seconds, cute—which is shorter than most 78 tunes, so while rendering the video I truncate the clip at that point with a short audio fade at the end.

The code to do all this is a short Python script which I’ve posted online. It relies on ffmpeg to do the video encoding. Crafting ffmpeg commands is famously convoluted, and it’s a little frustrating to format those commands to be called from Python. Maybe that’s something I’ll do differently in the future but, for now, this works and I can dip my cup into the deep Archive well with a little more ease than before.

Hosting change

Just a quick meta note: I’ve moved this site to a new hosting situation, but there shouldn’t be any disruption to its availability. I’ll probably also be looking into different CMS options while I’m here at Recurse Center.