He did the monster mosh: automated datamoshing with tweaked GOP lengths

After I posted yesterday about my automated datamoshing experiment, I got a nice message on Twitter from developer and botmaker Ryan Bauman who was able to run my script on his own videos. We talked for a bit about how it could be improved, and I had a major realization that I had to test immediately.

In yesterday’s post I mentioned that the relative prevalence of P-Frames precluded my preferred effect of stills from one video and movement from another. Too much image data was being drawn by the P-Frames for it to really get unrecognizable. I didn’t and still don’t know how to reduce the number of P-Frames per se, but the big realization was I could change the ratio of I-Frames to P-Frames by ratcheting the “Group Of Pictures” size way down. That variable determines how often a video includes an I-Frame, and I could set it easily in ffmpeg while re-encoding the source videos.

A normal default GOP size might be 250 frames — which is to say, no more than 10 seconds of video go by without a full I-Frame render. Poking around, I tried changing that number to a few different values to see what works best. Experimentally, a GOP size maximum of 48 frames (once every two seconds) seems to do the trick for two video sources. An excerpt of that video looks like this:

What are the constraints on the GOP size? Here, as in many other moments of this project, my considerations are very different from most people’s. In most cases, you want I-Frames to happen frequently enough that cutting and seeking through the video can be relatively precise, but rare enough that the filesize can stay low. Since every I-Frame has to contain a full frame of image data, they can be large.

Because I’m swapping I-Frames at the byte level, and truncating or padding each frame to fit, every I-Frame insertion is a potential source of trouble. You can see in the video above, my encoder struggles in certain places. And given that I’m doing substitution, the closer my GOP gets to 1, the more my mosh-up starts to just look like a slideshow.

For single source videos, where I-Frames are likely to be more similar to each other, I was able to use an even shorter GOP length. Here’s a datamoshed version of the Countdown video with a GOP length of 25 frames.

So, in conclusion: messing with GOP sizes means a different number of I-Frames which means more opportunities for hijinks in mangling the videos.

Making mosh-ups: automated datamoshing from multiple video sources

Datamoshing is a glitch art technique applied to videos to intentionally create “pixel bleeding” and other digital motion artifacts. It became popular several years ago when it was used in near-simultaneous music videos by Chairlift and Kanye West. In those cases, and in the tutorials and techniques documented since then, the glitches are typically introduced to a single edited video, and done manually in a visual editing program.

My goal for this project was to use two separate video sources — to make a “mosh-up,” har har — and to completely automate the merger. The holy grail would be to use all the motion from one video over all the stills of another, to make sort of an animated Magic Eye effect, but without the eye focus requirement. (Side note: it’s possible and awesome to actually create animated Magic Eyes, but that’s beside the point.)

As you’ll see, I fell a little short of that stretch goal, but still managed to make something that looks pretty cool.

Moshed-up vids

The script I wrote can create two kinds of datamoshes. The first takes a single video as a source and rearranges some key frames to glitch it out. Here’s an example of one such video, glitching up Beyoncé’s incredible Countdown video with itself. I’ve muted the audio in this upload, but as output from the script it still sounds pretty good.

The second (and more exciting) kind of data moshes takes a certain kind of key frame from one video and replaces the same kind of frame in another video. All of the motion and some of the “re-drawings” of subsequent frames are pulled out of context, creating an effect that is a little surreal and unsettling. It’s not as precise as the pros do with their manual edits, but it also can automatically combine two sources in a way I’ve never seen before. (Here, I used Countdown again, and moshed it together with Formation.)

Again, I’ve muted the audio, but in this case it would normally play back Partition without any noticeable flaws.

How it works

This moshing technique relies on some facts about how the H.264 spec compresses and stores its data. It really is a remarkable standard, and if you’re not familiar it’s absolutely worth reading the tribute that is “H.264 is Magic”. The gist is that only a very small portion of frames, dubbed I-Frames, contain all the image data necessary to draw a full screen. The other kinds of frames, P- and B-Frames, have partial screens and “motion” data.

Other popular datamoshing techniques include removing I-Frames altogether or duplicating P-Frames so the same motion is re-applied to an image. In this example, instead, I’m replacing I-Frames with other I-Frames, and I’m doing it simply by copying and pasting the bytes from one into the place of another, truncating or padding out the data so it fits exactly.

The reason I can’t get only the motion from one video and only the “textures” from another is that the P-Frames blend those two types of data together into a single frame. If I could figure out some way to isolate the image data in the frame, or to re-encode a video to consist entirely of I- and B-Frames, I could probably get a wilder effect.

As it stands, the output of this script is a video that plays, but is pretty badly mangled. If you try to play it back in a client that shows you errors, you’ll see a lot of complaints. For compatibility’s sake, I’ve manually transcoded these videos into another format and back.

LinkArchiver, a new bot to back up tweeted links

Twitter users who want to ensure that the Wayback Machine has stored a copy of the pages they link to can now sign up with @LinkArchiver to make it happen automatically. @LinkArchiver is the first project I’ve worked on in my 12-week stay at Recurse Center, where I’m learning to be a better programmer.

The idea for @LinkArchiver was suggested by my friend Jacob. I liked it because it was useful, relatively simple, and combined things I knew (Python wrappers for the Twitter API) with things I didn’t (event-based programming, making a process run constantly in the background, and more). I did not expect it to get as enthusiastic a reaction as it has, but that’s also nice.

The entire bot is one short Python script that uses the Twython library to listen to the Twitter User stream API. This is the first of my Twitter bots that is at all “interactive”—every previous bot used the REST APIs to post, but can not engage with things in their timeline or tweeted at them.

That change meant I had to use a slightly different architecture than I’ve used before. Each of my previous bots were small and self-contained scripts that produced a tweet or two each time they run. That design means I can trigger them with a cron job that runs at regular intervals. By contrast, @LinkArchiver runs all the time, listening to its timeline and acting when it needs to. It doesn’t have much interactive behavior—when you tweet at it directly, it can reply with a Wayback link, but that’s it—but learning this kind of structure will enable me to do much more interactive bots in the future.

It also required that I figure out how to “daemonize” the script, so that it could run in the background when I wasn’t connected and restart in case it crashed (or when I restart the computer). I found this aspect surprisingly difficult; it seems like a really basic need, but the documentation for how to do this was not especially easy to find. I host my bots on a Digital Ocean box running Ubuntu, so this script is running as a systemd service. The Digital Ocean documentation and this Reddit tutorial were both very helpful for my figuring it out.

Since launching the bot, I’ve gotten in touch with the folks at the Wayback Machine, and at their request added a custom user-agent. I was worried that the bot would get on their nerves, but they seem to really appreciate it—what a relief. After its first four days online, it’s tracking some 3,400 users and has sent about 25,000 links to the Internet Archive.

Building Mastodon to be frozen

As the federated social network Mastodon has surged in popularity over the last month, more than a thousand instances — ranging from a single user to tens of thousands — have been started by the community.

That’s a really great development in terms of decentralization and distribution, which bring a lot of benefits, but it also makes it a near certainty that a currently popular instance will go away. It could happen abruptly, if a sysadmin accidentally drops a database, or gradually, if it becomes to expensive or time-consuming to run, but it will happen.

Mastodon developers can make some choices now that could help preserve those communities — if only in a “frozen” form — after they are no longer active. And if done right, it could open up new possibilities for persistent presentation of ephemeral communities.

Specifically, Mastodon can develop a more robust option to export an entire instance in a format that can be served statically. The Mastodon instance would be frozen, in the sense that nobody could sign up or add new content to it, but its links could be preserved and the interactions could be saved. Serving a static version of the site in a dedicated viewer could be done cheaply, and organizations like the Internet Archive would likely step up to host significant defunct communities.

(Twitter sort of has an option like this on the individual level: users can export their own archive, and get a zip file that looks like Twitter but is all local.)

The historical benefits of that kind of feature are obvious to anybody who’s gone through old forums or mailing list posts. But if it were built out as a feature, I think more communities would find new creative ways of using the software. One that immediately comes to mind: Conferences could throw up an instance and create accounts for all the attendees. Once that instance was “frozen,” it’s a record of the backchannel like we haven’t really had before. Or in cases where they’ve gotten clear consent, researchers could parse the data to learn things about how the different ways in which individual communities communicate.

Obviously not every instance would want to get the preservation treatment, and instance admins would likely want to make clear what their long term plans are. And of course, this feature would have to be designed very carefully to respect the privacy preferences of people who participate. But for many networks, the present moment gets all the focus while the real value lies in each of those presents that have now become the past. Most social networks don’t stop to consider that fact. Mastodon, with its community focus, could.

There aren’t that many years of Web (or even Internet) history, but already those years haven’t been kind to online communities. Archiveteam heroics only go so far — designing for the long-term preservation of our spaces should be a priority.

Online communities under threat in new copyright decision

A Ninth Circuit copyright decision in Mavrix v. LiveJournal could bring nasty implications for online communities, threatening the copyright “safe harbor” provisions that allows those communities to form.

Specifically, the Ninth Circuit has said that volunteer moderators of online communities may be considered “agents” of the platform they’re on, and that if those moderators learn about copyright infringements (or “red flags” that suggest infringements) that it’s like the platform itself learning about them. That’s really important, because platforms can only claim the “safe harbor” provided by the Digital Millennium Copyright Act (DMCA) if they do not have that kind of knowledge.

Being in that safe harbor is generally considered a pre-requesite to operating a large platform for user-generated content. So the concern goes: if platforms can’t allow volunteer moderators to curate communities without incurring massive copyright liability, they may decide to disable community moderation altogether.

Two major caveats here. The first is that the Ninth Circuit didn’t say these moderators are “agents” of the platform. It just said that the lower court was too hasty in saying they were not, and that a trial was necessary to decide. That’s still bad news, though. The companies that run major platforms generally will go to great lengths to avoid the expense and uncertainty of a trial. If a volunteer-moderated community is a magnet for litigation, platforms may decide it’s not worth it.

The second is that defendant LiveJournal’s handling of the situation may have exposed it to more risk than other companies or platforms face. In particular, it hired an active moderator to be the “primary leader” of the community in question. That employee relationship muddies the waters when it comes to agency, though it will be up to the lower court to articulate how exactly that works out.

Still, even if the moderator draws a paycheck from the platform, it seems unreasonable to expect them to approach thorny copyright questions with the nuance of a trained professional. That is especially true when you compare this ruling with the Ninth Circuit’s most recent opinion in Lenz v. Universal, the “dancing baby” case, which looks down the other end of the copyright gun at takedown notice senders. Notice senders must consider fair use, but only so far as to form a “subjective good faith belief” about it. If courts don’t require the people sending a takedown notice to form an objectively reasonable interpretation of the law, why should they impose a higher standard on the moderators at platforms handling staggering quantities of user uploads?

But if moderators are a platform’s “agents,” then it runs into trouble if they have actual or “red flag” knowledge of infringements. The Ninth Circuit has instructed the lower court to find out whether the moderators had either. Noting the watermarks on some of the copyrighted images in the case, the court phrased the question of “red flag” knowledge as whether “it would be objectively obvious to a reasonable person that material bearing a generic watermark or a watermark referring to a service provider’s website was infringing.” That’s an important point to watch. Copyright ownership and licensing can be extremely complex — so oversimplifying it to the idea that the presence of a watermark means any use is infringing would have profound negative consequences.

The Ninth Circuit decision kicking it back down to the district court means that these questions are very much in play. And it could already mean, as EFF puts it, that using moderators means you will have to go all the way to trial.

There’s one more troubling aspect of the opinion that drives home the cost of such a trial: anonymous moderators, whom LiveJournal was previously able to protect from deposition, may now be forced to appear.

The chilling effect here is very serious. Mavrix, already a closely watched case, is poised to attract even more attention as a district court grapples with these big questions. The fate of moderated online communities could hang in the balance.

Note: Although I used to work at the Electronic Frontier Foundation, which joined an amicus brief in this case, my views do not represent those of my former employer and also do not constitute legal advice.