A pair of philosophers have developed a new research tool that uses AI to provide comprehensive and reliable philosophical literature reviews, and they’d like you to give it a try.

Just last week I checked out a new AI tool discussed in Nature that is supposed to be able to “synthesize scientific literature”. As good as it may be at that (I’m not in a position to judge), I can tell you that it didn’t seem to have access to much philosophy, and so was not of any use for philosophical inquiries. And general LLMs like ChatGPT may pull from random or odd or even imaginary sources, making them difficult to trust.

Still, for some, the idea of an AI philosophy research assistant has significant appeal, and now, thanks to Johannes Himmelreich (Syracuse) and Marco Meyer (Hamburg), you can see for yourself what one could do and what you think about it.

They call their tool PhilLit, and in the following guest post, they explain why they made it, what it does, and how you can try it.

Can AI Write a Useful Philosophical Literature Review?
by Johannes Himmelreich and Marco Meyer

A year ago, the best AI model could complete tasks that take a human expert 56 minutes. Today, this same metric, the task-completion time horizon, is around 6.5 hours.[1] These numbers were derived from tasks used in software development. How much better did AI get in the past 12 months at tasks that we use in philosophy?

Unfortunately, nobody knows. As philosophers, we might want to know whether and how AI can be used for philosophy. Of course, asking “how AI can be used for philosophy” in the abstract is about as fruitful as asking “how the internet can be used for philosophy”—it depends on the philosophical task and the corner of the internet where you look for help.

Recently, this blog hosted a guide on whether AI can help develop research ideas  through conversations. Conversations are a general-purpose tool for cognitive work. But research also involves certain more specific tasks.

AI can help with at least one specific task that we as researchers undertake regularly: orienting ourselves in unfamiliar literature. But asking ChatGPT to do so won’t do. Even the research agents of the leading AI labs can’t reliably get the facts right or limit themselves to academic literature, let alone consider how different debates relate to one another. Generic AI research agents fabricate citations and can’t distinguish high-quality philosophical research from other content.

We’ve built a tool that does better. It’s called PhilLit. It’s open source, runs on Claude, and is free to use with a Claude Code subscription. In this post, we explain what the tool does and why it addresses a real need.

You can take a look at example literature reviews about the Extended Mind and Cognitive Offloading, the Metaphilosophy of Literature Reviews, and the Moral Value of DIY. If you are comfortable with the command line or the Terminal app on Mac, you can generate reviews on whatever topic you like. In principle, the tool could run on a website with a friendly and easy-to-use interface. But for now, we concentrate on how well it works before improving how easy it is to use. To assess whether this tool lives up to the standards required for serious philosophical research, we are preparing a research study.

What PhilLit is for

PhilLit is for philosophers who want an up-to-date overview of the philosophical literature on a topic. Maybe you’re an ethicist who needs to understand debates in the epistemology of testimony. Maybe you work on philosophy of mind and want to engage with recent work on AI agency. Maybe you’re writing a grant proposal that crosses subfield boundaries.

What do you do? You ask colleagues. But they may not work on the specific intersection you need. You look for an SEP article. It is excellent when one exists, but the Stanford Encyclopedia doesn’t cover every topic, entries can lag years behind the latest work, and they’re written for a general audience rather than oriented toward your specific question. You browse PhilPapers. It gives you papers but no map of the debate. In desperation, you ask ChatGPT. It is fast, but you can’t trust the citations, and some of the sources it cites are obscure posts on Reddit. None of these give you what you actually need: a reliable, up-to-date overview of the philosophical literature on a topic, organized around the key debates and positions, with a verified bibliography you can start reading from.

What PhilLit does

PhilLit tries to solve this problem. You give it a research topic or question, and it produces two things: an analytical overview of the literature (roughly 3,000–4,000 words) organized around key debates and positions, plus a verified and annotated bibliography in BibTeX format that you can import directly into your reference manager of choice.

Think of the output as a personalized, up-to-date SEP-like article, tailored to your specific research question. Unlike a static encyclopedia entry, PhilLit can regenerate a current overview anytime—a step toward what a continuously updated SEP might look like.

To be clear about what PhilLit is not: it’s not meant to write the literature review section of your paper, or to produce text for journal submissions or grant applications. It’s a research tool. The aim is not to produce more philosophical text, but to make feasible the kind of thorough engagement with adjacent literatures that good research requires and that time constraints often prevent. The output is a starting point for doing the philosophical work yourself: reading the papers, forming your own views, and identifying where your contribution fits.

As a slogan, the idea of using AI to augment research is to put in 100x the research effort, not publish 100x more.

Is PhilLit better than ChatGPT?

You might wonder why we built a dedicated tool when you could just prompt the Research feature of Claude or ChatGPT with “write me a literature review on X.” PhilLit is built on Anthropic’s Claude, but it is designed to meet the requirements of philosophical research. Three design features matter most:

PhilLit searches relevant databases. Every paper in the output was found by searching actual academic databases: PhilLit searches the Stanford Encyclopedia of Philosophy, PhilPapers, Notre Dame Philosophical Reviews, Semantic Scholar, OpenAlex, arXiv, and CrossRef. The system queries the same sources you’d search yourself, and nothing else.

PhilLit verifies every citation. The system includes a verification process. Every bibliographic detail, e.g. title, author, journal name, volume number, page range, year in the case of a journal article, is checked against API data in bibliographic databases. If a detail can’t be verified against an authoritative source, it’s removed. This means the bibliography may occasionally have gaps (a missing volume number), but it won’t contain fabrications.

PhilLit is built for philosophy. Most AI research tools, including recently-released open-source products, are designed with disciplines like biomedicine or computer science in mind. PhilLit, by contrast, organizes reviews by identifying arguments and positions in philosophical debates. And because its search process is systematic rather than relying on any individual’s scholarly network, it can be directed to seek out work on neglected topics or from underrepresented traditions. Such a system could go some way toward correcting biases that inevitably arise from the way in which we otherwise discover and disseminate knowledge.

Is PhilLit any good in practice?

At minimum, a literature review should be accurate (about metadata and interpretation), comprehensive, analytically perceptive, and written in a helpful way. To what extent reviews generated by PhilLit possess these qualities is largely an empirical question. The agent architecture that we developed addresses some serious failures of other literature review agents. But how far that gets us—we don’t know.

Anyone can use PhilLit now. We’re excited to hear about what it does and doesn’t do well.

Moreover, to assess PhilLit rigorously, we’re launching two validation studies (pending IRB approval). We’re looking for philosophers willing to test PhilLit on topics they already know well.

How to use it

PhilLit is open source and free to download. The only cost of using it is paying to access the Claude family of models developed by Anthropic. If you already have a subscription to Claude Code, you can use PhilLit at no additional cost. If you use pay-as-you-go API credits, a review should cost you 9 to 13 USD on average, depending on whether you choose to use the cheaper (Sonnet 4.5) or the more expensive model (Opus 4.6 on high effort). You will be paying Anthropic, but not us.

Using PhilLit in Claude Code

You can try PhilLit in two ways.

Run it yourself: If you are comfortable with Python and the command line, you can install the tool directly from GitHub. You will need:

Python installed on your machine.
API keys from Anthropic, Semantic Scholar, and Brave Search.
Familiarity with running scripts in a terminal.

The repository includes detailed setup instructions: PhilLit – Getting Started.

Participate in our Validation Study: We are designing two studies to test whether the literature overviews are genuinely useful to experts. We need philosophers to test PhilLit on topics they already know well.

For our first study, you will use PhilLit yourself and assess the reviews you get. We help you with the technical setup and pay the costs of running the reviews on the topics of your choice. You will provide structured feedback on accuracy, comprehensiveness, and usefulness.

Our second study is for anyone, regardless of whether you are comfortable with the Terminal app, Python, or managing API keys. You will get a chance to provide feedback on the reviews that others generated.

If you are interested in participating in either of these studies: sign up here and we’ll update you once we’re ready to go.

[1] This is the 50% task-completion time-horizon, that is, the maximum task duration (measured by how long it takes a human expert) at which an AI agent is predicted to succeed at least half of the time.

Related:

Two Cultures of Philosophy: AI Edition
Shaping the AI Revolution in Philosophy
‘Hey Sophi’, or How Much Philosophy Will Computers Do?
Reviving the Philosophical Dialogue with Large Language Models
Philosophers Develop AI-Based Teaching Tool to Promote Constructive Disagreement
Have Pen, Laptop, and ChatGPT, Will Publish