LinkArchiver, a new bot to back up tweeted links

Twitter users who want to ensure that the Wayback Machine has stored a copy of the pages they link to can now sign up with @LinkArchiver to make it happen automatically. @LinkArchiver is the first project I’ve worked on in my 12-week stay at Recurse Center, where I’m learning to be a better programmer.

The idea for @LinkArchiver was suggested by my friend Jacob. I liked it because it was useful, relatively simple, and combined things I knew (Python wrappers for the Twitter API) with things I didn’t (event-based programming, making a process run constantly in the background, and more). I did not expect it to get as enthusiastic a reaction as it has, but that’s also nice.

The entire bot is one short Python script that uses the Twython library to listen to the Twitter User stream API. This is the first of my Twitter bots that is at all “interactive”—every previous bot used the REST APIs to post, but can not engage with things in their timeline or tweeted at them.

That change meant I had to use a slightly different architecture than I’ve used before. Each of my previous bots were small and self-contained scripts that produced a tweet or two each time they run. That design means I can trigger them with a cron job that runs at regular intervals. By contrast, @LinkArchiver runs all the time, listening to its timeline and acting when it needs to. It doesn’t have much interactive behavior—when you tweet at it directly, it can reply with a Wayback link, but that’s it—but learning this kind of structure will enable me to do much more interactive bots in the future.

It also required that I figure out how to “daemonize” the script, so that it could run in the background when I wasn’t connected and restart in case it crashed (or when I restart the computer). I found this aspect surprisingly difficult; it seems like a really basic need, but the documentation for how to do this was not especially easy to find. I host my bots on a Digital Ocean box running Ubuntu, so this script is running as a systemd service. The Digital Ocean documentation and this Reddit tutorial were both very helpful for my figuring it out.

Since launching the bot, I’ve gotten in touch with the folks at the Wayback Machine, and at their request added a custom user-agent. I was worried that the bot would get on their nerves, but they seem to really appreciate it—what a relief. After its first four days online, it’s tracking some 3,400 users and has sent about 25,000 links to the Internet Archive.

Building Mastodon to be frozen

As the federated social network Mastodon has surged in popularity over the last month, more than a thousand instances — ranging from a single user to tens of thousands — have been started by the community.

That’s a really great development in terms of decentralization and distribution, which bring a lot of benefits, but it also makes it a near certainty that a currently popular instance will go away. It could happen abruptly, if a sysadmin accidentally drops a database, or gradually, if it becomes to expensive or time-consuming to run, but it will happen.

Mastodon developers can make some choices now that could help preserve those communities — if only in a “frozen” form — after they are no longer active. And if done right, it could open up new possibilities for persistent presentation of ephemeral communities.

Specifically, Mastodon can develop a more robust option to export an entire instance in a format that can be served statically. The Mastodon instance would be frozen, in the sense that nobody could sign up or add new content to it, but its links could be preserved and the interactions could be saved. Serving a static version of the site in a dedicated viewer could be done cheaply, and organizations like the Internet Archive would likely step up to host significant defunct communities.

(Twitter sort of has an option like this on the individual level: users can export their own archive, and get a zip file that looks like Twitter but is all local.)

The historical benefits of that kind of feature are obvious to anybody who’s gone through old forums or mailing list posts. But if it were built out as a feature, I think more communities would find new creative ways of using the software. One that immediately comes to mind: Conferences could throw up an instance and create accounts for all the attendees. Once that instance was “frozen,” it’s a record of the backchannel like we haven’t really had before. Or in cases where they’ve gotten clear consent, researchers could parse the data to learn things about how the different ways in which individual communities communicate.

Obviously not every instance would want to get the preservation treatment, and instance admins would likely want to make clear what their long term plans are. And of course, this feature would have to be designed very carefully to respect the privacy preferences of people who participate. But for many networks, the present moment gets all the focus while the real value lies in each of those presents that have now become the past. Most social networks don’t stop to consider that fact. Mastodon, with its community focus, could.

There aren’t that many years of Web (or even Internet) history, but already those years haven’t been kind to online communities. Archiveteam heroics only go so far — designing for the long-term preservation of our spaces should be a priority.

Online communities under threat in new copyright decision

A Ninth Circuit copyright decision in Mavrix v. LiveJournal could bring nasty implications for online communities, threatening the copyright “safe harbor” provisions that allows those communities to form.

Specifically, the Ninth Circuit has said that volunteer moderators of online communities may be considered “agents” of the platform they’re on, and that if those moderators learn about copyright infringements (or “red flags” that suggest infringements) that it’s like the platform itself learning about them. That’s really important, because platforms can only claim the “safe harbor” provided by the Digital Millennium Copyright Act (DMCA) if they do not have that kind of knowledge.

Being in that safe harbor is generally considered a pre-requesite to operating a large platform for user-generated content. So the concern goes: if platforms can’t allow volunteer moderators to curate communities without incurring massive copyright liability, they may decide to disable community moderation altogether.

Two major caveats here. The first is that the Ninth Circuit didn’t say these moderators are “agents” of the platform. It just said that the lower court was too hasty in saying they were not, and that a trial was necessary to decide. That’s still bad news, though. The companies that run major platforms generally will go to great lengths to avoid the expense and uncertainty of a trial. If a volunteer-moderated community is a magnet for litigation, platforms may decide it’s not worth it.

The second is that defendant LiveJournal’s handling of the situation may have exposed it to more risk than other companies or platforms face. In particular, it hired an active moderator to be the “primary leader” of the community in question. That employee relationship muddies the waters when it comes to agency, though it will be up to the lower court to articulate how exactly that works out.

Still, even if the moderator draws a paycheck from the platform, it seems unreasonable to expect them to approach thorny copyright questions with the nuance of a trained professional. That is especially true when you compare this ruling with the Ninth Circuit’s most recent opinion in Lenz v. Universal, the “dancing baby” case, which looks down the other end of the copyright gun at takedown notice senders. Notice senders must consider fair use, but only so far as to form a “subjective good faith belief” about it. If courts don’t require the people sending a takedown notice to form an objectively reasonable interpretation of the law, why should they impose a higher standard on the moderators at platforms handling staggering quantities of user uploads?

But if moderators are a platform’s “agents,” then it runs into trouble if they have actual or “red flag” knowledge of infringements. The Ninth Circuit has instructed the lower court to find out whether the moderators had either. Noting the watermarks on some of the copyrighted images in the case, the court phrased the question of “red flag” knowledge as whether “it would be objectively obvious to a reasonable person that material bearing a generic watermark or a watermark referring to a service provider’s website was infringing.” That’s an important point to watch. Copyright ownership and licensing can be extremely complex — so oversimplifying it to the idea that the presence of a watermark means any use is infringing would have profound negative consequences.

The Ninth Circuit decision kicking it back down to the district court means that these questions are very much in play. And it could already mean, as EFF puts it, that using moderators means you will have to go all the way to trial.

There’s one more troubling aspect of the opinion that drives home the cost of such a trial: anonymous moderators, whom LiveJournal was previously able to protect from deposition, may now be forced to appear.

The chilling effect here is very serious. Mavrix, already a closely watched case, is poised to attract even more attention as a district court grapples with these big questions. The fate of moderated online communities could hang in the balance.

Note: Although I used to work at the Electronic Frontier Foundation, which joined an amicus brief in this case, my views do not represent those of my former employer and also do not constitute legal advice.

New Twitter bot after artist Gerhard Richter

This week I went to the SFMOMA for the first time. It’s great! I spent hours there and felt like I had to rush to see even a fraction of the collection. One of the pieces that really struck me was Gerhard Richter’s massive 1974 painting “256 Colors” (or “256 Farben” in his original German). I took a picture of it there:

He made a few “256 Colors” and a bunch of other color charts, some of which are more explicitly generative—that is, where the colors and their arrangement are left to chance.

Anyway, that sounded like fun, so I’ve put together a small script that creates paintings that look a bit like “256 Colors.” Here’s one that it made:

And naturally, I’ve made a Twitter bot that posts them twice a day: @256farben. Follow along for two daily reminders of the beauty of randomness.

New bot: @i_remember_txt, tweeting Joe Brainard’s “I Remember” (1975)

Joe Brainard’s 1975 book “I Remember” is an incredible work of poetry. The New Yorker called it “his miniaturist memoir-poem,” and Paul Auster’s blurb for the 2001 edition gives a good sense of it:

I Remember is a masterpiece. One by one, the so-called important books of our time will be forgotten, but Joe Brainard’s modest little gem will endure. In simple, forthright, declarative sentences, he charts the map of the human soul and permanently alters the way we look at the world. I Remember is both uproariously funny and deeply moving. It is also one of the few totally original books I have ever read.

Those simple declarative sentences—almost all of which begin with “I remember…”—would have been a shock as a book, but today they have the strange familiarity of status updates from your most nostalgic friend.

So when Avery Trufelman asked if somebody could make a bot that tweeted his “memories,” I jumped at the chance. And the resulting bot, @i_remember_txt, fits in great in between other tweets.

Per usual, the code is online and comments are welcome. It’s pretty straightforward. One thing I did this time which was pretty cool: in the case of memories that were longer than a single tweet, it does threaded “tweetstorms” of up to 4-5 in a row.