A new research project by Israeli and international scholars has digitally transcribed the texts featured in hundreds of thousands of fragments from the celebrated Cairo Geniza, as well as thousands of additional Hebrew manuscripts, the National Library of Israel announced on Monday.
The project, dubbed MiDRASH (which is meant to loosely correspond to Migrations of Textual and Scribal Traditions via Large-Scale Computational Analysis of Medieval Manuscripts in Hebrew Script), was launched in 2023 after securing a €10 million ($11.5 million) grant over six years from the EU’s European Research Council (ERC).
Virtually all the 400,000 fragments from the geniza have been photographed, and their images digitized, in the past. However, less than 15 percent of them have been transcribed, and many have never been properly read, let alone studied.
“Our goal is to reconstruct Jewish medieval literary book culture, and we are starting by transcribing the huge collection of virtual manuscripts that has been assembled at the National Library of Israel,” said Daniel Stökl Ben Ezra, professor of Ancient Hebrew and Aramaic at the École Pratique des Hautes Études (PSL) in Paris, one of the principal academics in the project.
According to Jewish law, it is forbidden to throw away or destroy documents featuring God’s name. For about a millennium, the Jews of Cairo deposited manuscripts, letters, old prayerbooks and more in a room in the city’s Ben Ezra Synagogue, whose original building is believed to have existed since before the 9th century CE.
Get The Times of Israel’s Daily Edition
by email and never miss our top stories
By signing up, you agree to the terms
Preserved by Egypt’s dry climate, the trove of documents — the Cairo Geniza — came to the attention of European scholars in 1896. Most of the artifacts were transferred to England in the following years.
“This material is extremely important because 90% of the Jews [in the Middle Ages] lived in Muslim-ruled areas, not in Europe, and yet most of their manuscripts got lost,” Stökl told The Times of Israel in a phone interview. “After the Cairo Geniza was discovered, we got to know lots of new texts, lots of new versions of texts we already knew, and we learned a huge number of things.”

Scholars who are taking part in the MiDRASH project to transcribe medieval Hebrew manuscripts visit the Bodleian Library in Oxford during their annual meeting in December 2024. (Elisha Rosenzweig/MiDRASH Project)
Having a digital transcription of a document means many things: Words can be searched, genres can be grouped, language can be easily compared, and more.
Eventually, all the transcriptions will be available on the NLI’s website, each accompanied by the photographs of the original manuscripts.
According to Dr. Tsafra Siew, who manages NLI’s Research-Oriented Projects, MiDRASH represents a new chapter in the library’s core mission to preserve Hebrew books.
“In 1950, David Ben-Gurion, the first prime minister of Israel, initiated the Institute of Microfilmed Hebrew Manuscripts,” Siew recalled. “The idea was to salvage as many Hebrew manuscripts as possible, and because it was not possible to bring them physically to Jerusalem, the institute started photographing all the manuscripts on microfilm. This is how the first photographed collection started.”

Dr. Tsafra Siew, manager of Research-Oriented Projects at the National Library of Israel. (National Library of Israel)
Over the decades, some 1,500 book collections from around the world were added to the project. The Cairo Geniza fragments were digitized under an initiative called the Friedberg Geniza Project starting in 2006.
“In 2014, we started to digitize our photographic collection and build a website to allow users to search and view all the digitized manuscripts,” Siew said. “It was a breakthrough.”
“The MiDRASH project is the next technological leap,” she added.
The NLI is now working to build the website infrastructure and understand how to upload all the materials, a considerable task.
Asked how long it would take for the transcriptions to be available online, Siew said it was difficult to say, but she hoped it would be “under a year.”
“Working with the National Library has given us a huge advantage because it has all this preliminary work of assembling this huge collection,” said Stökl. “If you worked in Arabic, you would have millions of manuscripts, but dispersed across the world, and it would take decades before they are all included in one electronic library.”

Daniel Stökl Ben Ezra, professor of Ancient Hebrew and Aramaic at the École Pratique des Hautes Études (PSL) in Paris. (Courtesy)
The MiDRASH interdisciplinary group of scholars includes paleographers (who study historical writing), computer scientists, linguists and experts in Hebrew literature and Jewish studies. In addition to Stökl, the principal investigators are Prof. Nachum Dershowitz of Tel Aviv University, Dr. Avi Shmidman of Bar-Ilan University and Prof. Judith Olszowy-Schlanger of the University of Oxford. Teams from the University of Haifa and from the NLI are also involved in the project, under the supervision of Dr. Moshe Lavee and of Siew. Additional support is provided by the Princeton Geniza Project led by Prof. Marina Rustow.
The group has used an open-source platform for automatic transcription of prints and manuscripts, called eScriptorium, to develop a model for segmenting and transcribing the material in Hebrew script (the texts are written not only in Hebrew, but also in Aramaic and Judeo-Arabic, which both use Hebrew characters). The model has been trained by feeding it existing transcriptions done manually by scholars.
According to Stökl, over the next few months, they plan to transcribe a whopping additional 10 million images of Hebrew manuscripts.
Yet, the scholar stressed, transcription represents just the first step.
“The transcription is only the beginning of the process,” he said. “We need the transcriptions in order to carry out other analyses. We want to conduct linguistic analysis to identify who is quoting whom, who is paraphrasing whom, and who takes which ideas from whom, so we can trace ideas, motifs, and commentaries throughout the centuries. The main focus of the project comes after the transcription part.”

A fragment of Hebrew text found in the Cairo Geniza and a transcribed version. (MiDRASH Project)
While the transcriptions will likely contain mistakes, especially in the first phase, they will make the texts searchable and comparable. The system will be able to recognize whether a fragment features an excerpt from the Bible, a page of Talmud, another well-known text, a private letter, and so on.
One of Stökl’s goals is to study how midrashim, or rabbinical tales and expositions drawn from the Bible, changed as they were shared in Muslim or Christian environments and evolved over time.
According to Stökl, the project’s results will eventually be published in peer-reviewed academic journals.
Meanwhile, to help scholars improve transcriptions and systems, NLI launched a “Transcribe-a-thon” in Jerusalem and online, beginning November 24 and going to November 27, encouraging volunteers with relevant skills to review and improve automatic transcriptions generated by the platform, thereby enhancing its machine-learning-based model.
Ultimately, the scholars are confident they will be able to uncover many new secrets the manuscripts hold.
“[This project] enables us to ask new questions, bigger and deeper,” Siew said. “Questions that you can only answer based on the entire view of the Hebrew manuscripts collection.”