📑 Table of contents

The Pompeii Challenge: AI has just fully read a scroll charred by Vesuvius 2,000 years ago

Deep Tech 🟢 Beginner ⏱️ 14 min read 📅 2026-06-30

The Pompeii Challenge: AI has just read an entire scroll charred by Vesuvius 2,000 years ago

🔎 A carbonized scroll has just found its voice

On June 25, 2026, an international team announced a feat the scientific community had been waiting for for decades: the complete reading of PHerc. 1667, a Herculaneum scroll sealed since the eruption of Vesuvius in 79 A.D.

This is not a partial translation or a fragment deciphered by chance. It is a complete Greek philosophical treatise, twenty-two columns long, recovered from start to finish without ever unrolling the papyrus.

Why now? Because the convergence between X-ray tomography from a European synchrotron, physically-based rendering using lighting models, and machine learning ink detection models has finally reached the critical threshold required.


The key points

  • PHerc. 1667 is the first complete Herculaneum scroll to be virtually read from start to finish: approximately 1.4 meters of written surface, twenty-two columns of ancient Greek.
  • The text is a Stoic ethics treatise from the 2nd century B.C., mentioning Aristocreon (the nephew of Chrysippus).
  • The technical pipeline: ESRF synchrotron (BM18 beamline, Grenoble) → 3D volumetric reconstruction → rolled sheet segmentation → virtual flattening → ML ink detection → transcription by papyrologist.
  • There are roughly 600 surviving scrolls remaining, of which only ~30 have been scanned to date, and ~80% of the Herculaneum site remains unexcavated.

Tool / Resource Main usage Price (June 2026, check on scrollprize.org) Ideal for
Vesuvius Challenge Open competition, data, code Free (donation-funded) Researchers, data scientists
ML Models (Hugging Face, org scrollprize) Ink detection on CT volumes Open source, free Computer vision engineers
AWS Open Data Registry Large CT datasets AWS storage/compute cost Teams needing scalability
Hostinger Web project hosting around the challenge Variable pricing (June 2026, check on hostinger.com) Content creators, science blogs

PHerc. 1667 : what exactly was just read?

A carbonized papyrus roll, measuring about 8 cm in height in its current state — that is, a fraction of its original size, the rest having disappeared.

The identified text is a treatise on Stoic ethics written in ancient Greek, dated to the 2nd century BC. It explicitly names Aristocrates, the nephew of the philosopher Chrysippus, making it a potentially unprecedented document for the history of Stoic philosophy.

The importance is not only in the philosophical content. It is the proof of concept that the method works at the scale of an entire scroll, with all the difficulties that implies: tears, collapsed areas, uneven ink, overlapping layers.

This breakthrough echoes other fields where AI is pushing the boundaries of human knowledge, such as when OpenAI solved the Erdős problem, a geometry theorem that had resisted resolution for 80 years. In both cases, the machine does not replace the researcher — it opens a door that traditional methods had condemned.

A text that existed in no library

No other copy of this treatise is known. The Stoics produced a considerable amount of literature, but almost all of it has been lost. Each recovered scroll is therefore potentially an addition to the Western philosophical corpus.

The transcription was carried out by professional papyrologists working from the ink signal produced by the ML model, without ever seeing the physical papyrus.


The technical pipeline: from the synchrotron to the Greek text

The method is nothing magical. It is a multi-step processing pipeline, each step solving a specific problem.

Step 1: X-ray tomography at the ESRF synchrotron

The scroll is irradiated with X-rays at the BM18 beamline of the European ESRF synchrotron in Grenoble. Unlike a conventional medical scanner, a synchrotron produces an extremely intense and coherent beam.

The goal: to build a complete 3D volume of the object, layer by layer. The largest scrolls generate approximately 260 TB of reconstructed data (in the case of "Paris 3").

The ink of the Herculaneum papyri is carbon-based — no iron, no lead. This means it is practically invisible to conventional X-rays. This is the fundamental problem that blocked researchers for decades.

Step 2: segmentation and tracing of the rolled sheet

Once the 3D volume is obtained, each turn of papyrus in the roll must be isolated. The carbonized papyrus looks like a black cylinder composed of hundreds of compressed and fused sheets.

Segmentation algorithms identify the surfaces of each layer and trace them in 3D space. This step is critical: a tracing error and all the downstream text is shifted or unreadable.

Step 3: virtual flattening

The traced sheet is thus virtually "unrolled". The result is a 2D surface that represents what the papyrus would have shown if it had been opened manually — but without any risk of destruction.

A foundational paper on ArXiv describes this verifiable recovery method as "non-invasive volumetric imaging followed by a multi-step computational pipeline".

Step 4: ML ink detection + physical rendering

This is where AI plays a decisive role. Even on the flattened surface, the carbon ink is subtle. Machine learning models are trained on human annotations (ground truth) to spot the micro-reliefs caused by the ink on the papyrus.

A recent paper on ArXiv showed that "high-resolution surface topography alone contains a usable signal for ink detection on carbonized scrolls". In other words, it is not only the chemical composition that betrays the ink — it is its physical texture.

Physically based rendering then simulates the lighting of the surface to maximize the contrast between the ink and the bare papyrus.

Step 5: papyrological transcription

The ink signal is passed to papyrologists who transcribe the text into ancient Greek. This step remains human. The AI produces an image of the ink; the expert produces the text.

The result for PHerc. 1667: twenty-two columns of Greek, read from end to end, with coherent philosophical meaning.


The Vesuvius Challenge: a competition that changed archaeology

The Vesuvius Challenge is not a traditional academic project. It is an open competition, funded by donations, with over $1.8 million in prizes distributed to date.

According to Scientific American, this competitive approach has dramatically accelerated the development of the pipeline. Students, independent researchers, and corporate teams contributed pieces of the solution that, when assembled, surpassed what any single laboratory would have accomplished.

Students at the heart of the discovery

The NEH (National Endowment for the Humanities) emphasizes that students played a key role in decoding these 2,000-year-old scrolls. This is not merely anecdotal. The competition has democratized access to data and tools.

The models are open source on Hugging Face (scrollprize organization). The data is available on the AWS Open Data Registry. The pipeline code is on GitHub. Everything is under a Creative Commons license.

This openness contrasts with the opposite trend observed in the private AI sector, where regulations like the Great American AI Act could freeze the legislative landscape for three years. Here, open source triumphs.

The other announcements on June 25

PHerc. 1667 was not the only novelty. The ink of Scroll 1 has been confirmed in 3D at a higher resolution. And the title of PHerc. 139 has been recovered: it is Philodemus, On the Gods, Book 8.

Reuters reports that this breakthrough "could help recover hundreds of sealed scrolls from the ancient library." DW News speaks of a text on "papyrus rolled and carbonized 2,000 years ago" made readable by AI.


What the text says — and what it doesn't say

The Stoic ethical treatise of PHerc. 1667 is not a recovered ancient best-seller. It is a technical text of moral philosophy, probably intended for an educated audience.

The mention of Aristocreon is significant. This figure is known from other sources as the dedicatee of works by Chrysippus, the third head of the Stoic school. The scroll could therefore be linked to the immediate circle of this major philosopher, whose work is almost entirely lost.

The limits of the current reading

The text is not perfect. Some columns are more legible than others. Areas where the papyrus is particularly degraded produce a fragmented ink signal.

Papyrologists use brackets and ellipses to mark gaps. This is classical philology, not automatic reading. AI did not "read" the Greek — it made the ink visible so that humans could read it.

One should not exaggerate the role of AI here. Computer vision models like those that could be built with architectures similar to Claude Opus 4.7 (Adaptive) or Gemini 3 Pro Deep Think for reasoning do not intervene in linguistic transcription. They detect visual patterns.


Why carbon ink is the real problem

If the Herculaneum papyri had been written with iron-gall ink (like medieval manuscripts), the problem would have been solved a long time ago. Iron is opaque to X-rays. The ink would appear directly on the CT scans.

But the ink used at Herculaneum is carbon-black based — essentially soot mixed with water. Its density is almost identical to that of the charred papyrus surrounding it. The contrast is zero.

The topography solution

This is where the paper on ArXiv changes the game. Carbon ink does not significantly modify the density of the papyrus. But it modifies its surface.

When the ink dries, it creates a micro-relief — dips and bumps at the micrometer scale. In high resolution, this relief is detectable in the CT volume, even without chemical contrast.

ML models learn to associate these topographic micro-variations with the presence of ink, by training on regions where unrolled papyrus fragments provide reliable ground truth.


The numbers: how many scrolls remain to be read?

The most cited estimate: around 600 surviving scrolls from the Villa of the Papyri in Herculaneum. Of this total, about 30 have been scanned by X-ray tomography to date.

But the most staggering number lies elsewhere: it is estimated that 80% of the Herculaneum site remains unexcavated. The Villa of the Papyri could be just one part of a larger complex. Other libraries could be waiting under the ash.

Statistic Value
Estimated surviving scrolls ~600
Scanned scrolls to date ~30
Surface read for PHerc. 1667 ~1.4 m, ~22 columns
Data per large scroll (CT) ~260 TB
Portion of Herculaneum excavated ~20%
Prizes awarded (Vesuvius Challenge) >$1.8M

ArtNet describes the written surface of PHerc. 1667 as "about 1.4 meters of papyrus and about twenty-two columns of Greek". The University of Kentucky, where Professor Brent Seales pioneered virtual unwrapping within his EduceLab, covers the event as "the day the Herculaneum scrolls started speaking again".


What does this breakthrough mean for the digital humanities?

The decoding of PHerc. 1667 is not a technological gimmick applied to archaeology. It is a paradigm shift in how we access ancient texts.

Today, our knowledge of Greco-Roman literature relies on medieval manuscripts copied from copies of copies. Each step introduces errors, omissions, and modifications. The Herculaneum scrolls are direct witnesses to the ancient text — not copies, but the originals themselves.

The lost library could contain known and unknown works

The Villa of the Papyri has yielded texts by Philodemus, Epicurus, and Demetrius Lacon. But the Stoics, the Peripatetics, and the poets of the Hellenistic period could be among the unread scrolls.

We know from ancient sources that major works have been lost: the second book of Apollonius of Rhodes's Odyssey, treatises by Aristotle of which we only have summaries, and lost plays by Sophocles and Euripides.

There is no guarantee that they are in Herculaneum. But the probability is not zero. And for the first time, we have a technical means of verifying this without destroying the scrolls.

A reproducible model for other sites

The technology developed for the Vesuvius Challenge is not specific to carbonized papyri. Scientific American points out that "the technology could be adapted to decipher other lost texts beyond the Bay of Naples".

Carbonized scrolls exist in other collections — in Egypt, the Near East, and Central Asia. Folded parchments, palimpsests, and sealed tablets could benefit from variants of this pipeline.


Computer vision as a key discipline

This achievement illustrates a point often underestimated in the public discourse on AI: it is not always the language model doing the remarkable work. Here, it is computer vision that takes center stage.

Detecting ink on CT volumes is a 3D semantic segmentation problem. The models involved are closer to those used in medical imaging or geology than in text generation.

The ML architectures at play

Without going into proprietary details, the pipeline uses convolutional networks and segmentation architectures trained on meticulous human annotations. The ground truth comes from papyrologists who mark pixel by pixel where the ink is on the reference fragments.

Performance depends directly on the quality of the annotations. It is laborious, essential work that is almost invisible in mainstream reports.

The role of language models

LLMs like GPT-5.5, Claude Opus 4.6 or DeepSeek V4 Pro are not used in the ink detection pipeline. They could, however, intervene downstream — to help with transcription, the identification of verb forms, or preliminary translation.

But for now, the chain remains strictly: visual ML → ink image → human → Greek text.


❌ Common mistakes

Mistake 1: thinking that AI "translates" the scroll

AI does not read Greek. It produces a probability map of ink on the flattened surface of the papyrus. Translation is the work of a human papyrologist. Confusing the two is like saying a microscope translates a cell.

Mistake 2: thinking that the scrolls can be opened physically

The Herculaneum scrolls are carbonized — they look like lumps of charcoal. Attempting to unroll them mechanically would destroy them. This is precisely why the virtual method was developed. Historical attempts (in the 18th century) produced fragments, not complete texts.

Mistake 3: imagining that we are going to read everything in the coming months

Reading PHerc. 1667 took years. Scanning, processing, annotation, model training, transcription — each step is intensive. With ~570 unscanned scrolls and colossal data volumes, the timeline is measured in decades, not months.

Mistake 4: reducing the Vesuvius Challenge to an AI application

It is a convergence of disciplines: synchrotron physics, computational geometry, computer vision, classical philology, epigraphy. AI is one link, not the entire chain.


❓ Frequently asked questions

What is the exact contents of PHerc. 1667?

A Stoic ethics treatise from the 2nd century BC, mentioning Aristocreon, nephew of the philosopher Chrysippus. Twenty-two columns of ancient Greek on about 1.4 meters of papyrus.

Why can't we simply open the scrolls?

The papyrus is carbonized and extremely fragile. Historical attempts at mechanical unrolling have partially destroyed the scrolls. Virtual tomography is the only non-destructive method.

What exact role do the ML models play?

They detect the presence of carbon ink on the flattened surface of the papyrus by analyzing the high-resolution surface topography. They do not translate the text.

How many scrolls remain to be read?

About 600 surviving scrolls, of which ~30 have been scanned. And 80% of the Herculaneum site remains unexcavated, which could reveal other libraries.

Are the data and code accessible?

Yes. Models on Hugging Face (scrollprize organization), data on AWS Open Data Registry, code on GitHub, all under a Creative Commons license.

Can this method be applied to other archaeological sites?

That is the goal. Scientific American notes that the technology could be adapted for lost texts beyond the Bay of Naples — palimpsests, carbonized scrolls from other regions, sealed tablets.


✅ Conclusion

On June 25, 2026, a scroll sealed since the year 79 yielded its twenty-two columns of Stoic Greek — not by being opened, but by being seen through. The Vesuvius Challenge has proven that the convergence of synchrotron technology, computer vision, and classical philology could recover what Vesuvius had erased. Hundreds of scrolls remain under the ash, and perhaps entire libraries lie beneath the feet of Herculaneum. The question is no longer whether we can read them, but how long it will take to do so. Follow the project directly on the official Vesuvius Challenge website.