“A Mind At Play” and Claude Shannon’s grave

As the “father of information theory,” Claude Shannon’s contributions to the development of modern digital technology are hard to overstate. That work puts him in the ranks of the Einsteins, Turings, and Feynmans of the world; somehow, though, he never seemed to get the credit and public recognition that those scientists received.

A great new biography of Shannon, “A Mind At Play,” tells the story of his work and may provide an explanation. In contrast to the stereotype of a single-minded scholar who doggedly pursues a particular theory for an entire lifetime, Shannon seemed to hop around to whatever captured his imagination—and in more than a few cases, advanced the state of the art or provided entirely new frameworks with which to consider its thorniest problems. (I was fortunate enough to be sent a copy of this book by its author because I had previously posted about a long New Yorker profile of Shannon commemorating the centennial of his birth.)

That professional pathway meant he probably never had a dull moment, but also that he wasn’t around to lead the fields he revolutionized. Beyond that, it makes it difficult to distill his work down to an equation, or even a few sentences. If you had to, though, you could do worse than his formulation of information as entropy. One fascinating detail from the end of “A Mind At Play” was that that equation appears on his grave stone at the Mount Auburn Cemetery in Cambridge where he’s buried.

What’s concealed, however, is a message on the reverse: covered by a bush, the open section of the marble on the back of the tombstone holds Shannon’s entropy formula.Shannon’s children had hoped the formula would grace the front of the stone; their mother thought it more modest to engrave it on the back.

And so Claude Shannon’s resting place is marked by a kind of code: a message hidden from view, invisible except to those looking for it.

This struck me as amazing, and I wanted to see how it was rendered. But, perhaps unsurprisingly, it is difficult to find a picture of the back of a tombstone—even one marking the grave of a person as notable as Claude Shannon. So I turned to Twitter, posting that passage and asking if anybody in or around Cambridge was available to take a picture for me.

Fortunately, it seems many people were as intrigued by that detail as I was, and the tweet was pretty widely circulated. To my surprise and delight, a few groups of people reached out to tell me they were available to go, and a few more just set out for there. I had inadvertently prompted a small flashmob at the grave of a scientist who had passed some 16 years earlier.

The equation is in fact pretty well hidden, but a few folks were able to duck into the bush behind the grave and grab a shot. I love the way it looks, and I was very excited that this new book prompted me and a few nerds around the world to share a moment of appreciation for the great Claude Shannon over a network that his work made possible.

Which states hated Wesley?

One of my goals while at Recurse Center has been to improve my ability to manipulate and visualize data sets. To that end, I’ve been toying around with the Social Security Administration’s baby name dataset, which records the number of babies born with each given name every year, both federally and at the state level. Because I’ve also been watching Star Trek: The Next Generation along with the Treks And The City podcast, I chose to dig into information about the name “Wesley.”

On my first pass through the data I noticed that the name’s popularity dramatically spiked around 1976, and then tapered off for a few decades after. Honestly, that spike is the most interesting property of the whole graph, and I can’t explain it very well. But a funny secondary effect is that neither TNG‘s premiere nor the release of The Princess Bride—both in 1987—could prop up the name as it declined in popularity. The effect makes it look like it’s tumbling off a cliff, instead of regressing to the mean. This graph, including the label, was generated in Python’s matplotlib.

After looking at the federal data, I decided to dig into the state-level stuff, to give me a (long-anticipated!) opportunity to generate a choropleth map. Again, I cleaned up the data in Python, and then generated a map using a Javascript library called d3-geomap. For a long time I’ve wanted to get more familiar with its parent library, d3, and this has been a nice opportunity to dip my toe into that.


New bot: @78_sampler, serving up old records

The Internet Archive hosts an incredible collection of over 25,000 professionally digitized 78rpm records. The great thing about a catalog that large is that, if you know what you want, you’re likely to find it. On the other hand, if you just want to browse it can be overwhelming and even intimidating. Each item could possibly be a delight, but it’s difficult to even think about individual records in the face of such a huge archive.

In that sense, would-be browsers face similar challenges with the Great 78 Project as they do with the Pomological Watercolor Collection—an archive I’ve worked with a lot. Sensing that similarity, I decided to build a tool like @pomological to help surface individual records.

@78_sampler tweets every two hours with a randomly selected record from the Archive’s collection. It was important to me that the audio fit smoothly and natively into a Twitter timeline, so I decided to render each tune into a video file using the Archive’s still image of the record as the visual. Twitter limits videos to 2:20—exactly 140 seconds, cute—which is shorter than most 78 tunes, so while rendering the video I truncate the clip at that point with a short audio fade at the end.

The code to do all this is a short Python script which I’ve posted online. It relies on ffmpeg to do the video encoding. Crafting ffmpeg commands is famously convoluted, and it’s a little frustrating to format those commands to be called from Python. Maybe that’s something I’ll do differently in the future but, for now, this works and I can dip my cup into the deep Archive well with a little more ease than before.

Hosting change

Just a quick meta note: I’ve moved this site to a new hosting situation, but there shouldn’t be any disruption to its availability. I’ll probably also be looking into different CMS options while I’m here at Recurse Center.

He did the monster mosh: automated datamoshing with tweaked GOP lengths

After I posted yesterday about my automated datamoshing experiment, I got a nice message on Twitter from developer and botmaker Ryan Bauman who was able to run my script on his own videos. We talked for a bit about how it could be improved, and I had a major realization that I had to test immediately.

In yesterday’s post I mentioned that the relative prevalence of P-Frames precluded my preferred effect of stills from one video and movement from another. Too much image data was being drawn by the P-Frames for it to really get unrecognizable. I didn’t and still don’t know how to reduce the number of P-Frames per se, but the big realization was I could change the ratio of I-Frames to P-Frames by ratcheting the “Group Of Pictures” size way down. That variable determines how often a video includes an I-Frame, and I could set it easily in ffmpeg while re-encoding the source videos.

A normal default GOP size might be 250 frames — which is to say, no more than 10 seconds of video go by without a full I-Frame render. Poking around, I tried changing that number to a few different values to see what works best. Experimentally, a GOP size maximum of 48 frames (once every two seconds) seems to do the trick for two video sources. An excerpt of that video looks like this:

What are the constraints on the GOP size? Here, as in many other moments of this project, my considerations are very different from most people’s. In most cases, you want I-Frames to happen frequently enough that cutting and seeking through the video can be relatively precise, but rare enough that the filesize can stay low. Since every I-Frame has to contain a full frame of image data, they can be large.

Because I’m swapping I-Frames at the byte level, and truncating or padding each frame to fit, every I-Frame insertion is a potential source of trouble. You can see in the video above, my encoder struggles in certain places. And given that I’m doing substitution, the closer my GOP gets to 1, the more my mosh-up starts to just look like a slideshow.

For single source videos, where I-Frames are likely to be more similar to each other, I was able to use an even shorter GOP length. Here’s a datamoshed version of the Countdown video with a GOP length of 25 frames.

So, in conclusion: messing with GOP sizes means a different number of I-Frames which means more opportunities for hijinks in mangling the videos.