A Manifesto for Digital Preservation

29.08.25 01:29 AM - Comment(s) - By Benjamin Langrill

Trust the past?

Think about how you experience the past.

You read books, look at pictures, visit locations. In all of these cases you are looking at a snapshot of the past. Some remnant, but not the living fullness. We catch glimpses of what it was like to live in an era or be somewhere at it's prime

So much of computing today builds on the technology of yesteryear and yet the best way to experience that tech is screenshots and maybe old books. Blogs often have interesting information but whatever environment the author was looking at is not often available to the reader. Depending on how long it has been, it may not be available to the author either.

Go read a review of iOS 1 from 2007. You can't experience that today. Even if you managed to find a phone with that version of the software, the networks it interacted with no longer exist.

The wayback machine has the right idea for websites but is scoped to static content. Virtualization lets us preserve isolated systems but you want more, a window into the past where I can see that solaris thin client desktop. Interact with it, browse the late 90's web with it. You want to be able to browse a university network in the mid 2000's with Internet Explorer 6 on Windows XP SP2. You want to be able to walk through the original New York Pennsylvania Station.

The cost of inference will drop over time. Already we see AI video becoming more and more commodity. The next frontier is fully interactive AI experiences. Google has made great progress on project Genie which allows for interactivity with virtual worlds. You will be able to extend this by generating code/simulating pages that will replicate the experience of using a digital device and or being in a place and time.

Think beyond current tech, future robots, holograms and things we haven't thought of will be used to bring the past alive. This means that history will be experienced through inference. This could be you asking chatgpt a question about a historical figure or experiencing an interactive simulation.

As amazing as this will be, how do you know it is accurate? Who controls the inference? How was the model trained? What are you seeing that was influenced by someone else?

In the distant past, oral history could be changed with low energy. Find the one or two historians and persuade them something else happened. This even happened organically as oral traditions tend to drift over generations. Initial written language helped, but still it was the victors who got to write the history so you must keep that in perspective. Widespread book publication individualized access to both reading and writing books which meant that the Truth was more likely to be out there. Now we get to the information age where digital versions of history are widely available, however we are beginning to see more and more people experience history through secondary, tertiary and whatever level LLM inference is.

This means that the ability for bad actors to influence what you know is even easier. How do you know if the information you are reading is from a primary source or even a trusted source? How do you know if it is even the same as it was last year?

Changing a physical copy of a book at a library - hard

Changing a transcript of a book on a website -- easy

How can you trust the past?

The impact and reasoning behind actions in the past are subjective but there were events that happened at specific times and we have capacity to know them. Similar to a penetration test where you initially see just pieces of the network and eventually come to know the entirety of it as you gain more access and collect more data.

This is an integrity problem in computer security. We have lots of data and want to ensure that it remains unchanged going into the future. This becomes important not just for future historians but for anyone that wants to learn about humanity over time.

What do we need?

A fertile ground for AI agents wielding microtransactions to pay for it.
A storage base layer for the actual artifacts - digital reproduction of the object itself or a representative value. The point is that it must be preserved.
A storage layer for the models which are trained on the base layer
A storage presentation layer for the artifacts of those models.
Cryptographic security of all artifacts on all layers that both keeps them from drifting over time and ensures immutability.
Decentralized control that means no one entity controls it. This is for humanity, by humanity.

Why? Learn from the past. Comprehend our history. Know the future.

Get Started Now