Simplify Websites for Longevity

created Oct 3, 2017

Sep 28, 2017 NiemanLab.org story titled The internet isn’t forever. Is there an effective way to preserve great online interactives and news apps?

Maybe the goal should be to create simple, static HTML pages with some CSS and little to no JavaScript. If at all possible, do not make database-backed web apps. But data journalism projects, however, require database queries nearly every time readers make a request.

Broussard and colleague Katherine Boss, the librarian for journalism, media, culture, and communication at NYU, are working on a workflow and on building tools to help organizations effectively and efficiently preserve their big data journalism projects, and putting together a scholarly archive of data journalism projects.

“News apps can’t be preserved the same way you preserve the static webpage,” Broussard said. The Internet Archive’s Wayback Machine is dependable for finding a snapshot in time, but a searcher needs to know the time frame of what they’re looking for, and snapshots don’t really capture a complicated, database-driven project or any site with a lot of dynamic links. “The way to capture these is from the backend. You can grab the whole database — all of the images from the server side, and so forth. We’re looking to build server-side tools that will allow for automated, large-scale, long-term archiving of data journalism projects.”

Boss and Broussard’s first move has been surveying developers and journalists on the tech used to make and store their news apps. The preliminary survey returned a range of technologies, frameworks, and platforms: Flask, Django, Ruby, Node.js, d3, AWS, Heroku, and on and on.

“Nobody’s yet collected data on what we’re trying to learn. Where are all projects being stored? How are they built — are they pulling from external APIs? We’re trying to ask the right questions digital archivists will want the answers to, in order to build these sorts of tools,” Boss said. “There are great projects that are really just being lost — there is currently no way to archive or preserve them. It’s not really technically possible right now. We’re trying to develop new workflows, and not just within libraries, that could be used by anyone.”

The data journalist community has also been concerned with preserving interactive projects news apps for years. The Journalism Digital News Archive, part of the Reynolds Journalism Institute, has also been convening these researchers and journalists to tackle this problem.

In librarian/archivist nerd-speak, Boss explained, there’s “migration,” and then there’s “emulation.” “Migration” is the traditional stuff we might associate with libraries: digitizing print materials, digitizing microfilms, moving VHS to DVDs and then DVDs to Blu-Ray and then Blu-ray to streaming media. That process doesn’t make sense for digital “objects” like news apps that are dependent on many different types of software, and therefore have too many moving parts to migrate. What if, a hundred years out, we’re not even browsing the internet on computers, or at least not the computers we’re familiar with now? What’s needed is a way to capture a data journalism project from the server side and then “emulate” that whole environment on whatever future device is being used to view the project.