Subtitles for Norway’s SlowTV train ride to Oslo
One thing I enjoy is Norway’s public service broadcaster’s production of a train ride from Bergen to Oslo, which was first broadcast in real time, over seven or so hours, in 2009. It’s predictably pretty quiet stuff, but—at least now that it’s on Netflix—there are in fact subtitles of what little dialog there is.
Netflix makes it pretty straightforward to extract the subtitles from a given program, and it stores them according to a very fun standard called the Timed Text Markup Language, or TTML, which just missed adoption by the WHATWG in favor of a lighter-weight, less-XML spec called WebVTT.
Anyway, I pulled out the (very spare) subtitles in that format and wanted to convert them to something a little more usable. So first I converted them to JSON, and produced an array with an object for every subtitle, and then processed it a little further and created a version where “adjacent” subtitles are combined into single objects.
The result is nearly as hypnotic as the original video:
{
"begin": "00:17:21",
"end": "00:17:23",
"text": [
"[metallic clang]"
]
},
{
"begin": "00:19:40",
"end": "00:19:44",
"text": [
"[indistinct conversation / woman laughs]"
]
},
{
"begin": "00:19:52",
"end": "00:19:54",
"text": [
"[woman laughs]"
]
},
{
"begin": "00:19:57",
"end": "00:20:01",
"text": [
"[indistinct conversation]"
]
},
{
"begin": "00:21:47",
"end": "00:21:48",
"text": [
"[metallic clang]"
]
},
{
"begin": "00:22:10",
"end": "00:22:12",
"text": [
"[metallic clang]"
]
}