Summary: A new brain decoding method called mind captioning can generate accurate text descriptions of what a person is seeing or recalling—without relying on the brain’s language system. Instead, it uses semantic features from vision-related brain activity and deep learning models to translate nonverbal thoughts into structured sentences.
The method worked even when participants recalled video content from memory, showing that rich conceptual representations exist outside language regions. This breakthrough offers potential for nonverbal communication tools and redefines how thoughts can be decoded from brain activity.
Key Facts
Language-Free Translation: The method decodes visual and semantic brain activity into text without activating traditional language areas.Structured Output: The generated sentences preserve relational meaning, not just object labels—reflecting structured thought.Memory Works Too: The system successfully decoded and captioned remembered videos, opening doors for memory-based brain communication.
Source: Neuroscience News
Imagine watching a silent video clip and having a computer decode what you saw—then generate a description of it, using only your brain activity. Now imagine the same process applied to your memory of that video, or even your imagination. That’s the frontier a new study has just crossed.
A groundbreaking new brain decoding method, called mind captioning, has demonstrated the ability to generate coherent, structured text from human brain activity—describing what a person is watching or recalling without relying on their ability to speak, move, or engage the traditional language network.
Instead of translating thoughts into words via language centers, this system directly decodes semantic information encoded in the brain’s visual and associative areas and uses a deep learning model to turn those representations into meaningful sentences.
The study, which used functional MRI (fMRI) data from participants watching and recalling video clips, bridges neuroscience and natural language processing by using semantic features—a type of intermediate representation that connects brain activity to words without jumping straight to full language output. In doing so, it opens up entirely new possibilities for decoding thought, especially in individuals who cannot communicate using spoken or written language.
Bridging Visual Thought and Language With Semantic Features
Traditional brain-to-text systems rely on decoding linguistic brain activity—either by monitoring speech-related areas during internal dialogue or by training on language tasks. This approach, however, has limitations for people with aphasia, locked-in syndrome, or developmental conditions that impair language.
Mind captioning takes a fundamentally different route. Instead of depending on the brain’s language centers, the new method builds linear decoders that translate whole-brain activity—triggered by viewing or imagining videos—into semantic features extracted from captions of those videos. These semantic features are derived from a deep language model (DeBERTa-large), which captures contextual meaning from word combinations.
To turn these decoded features into readable text, the researchers used a process of iterative optimization—beginning with a blank slate and gradually refining word choices by aligning their semantic meaning with the brain-decoded features.
Through repeated steps of masking and replacing words using a masked language model (RoBERTa), the system was able to evolve rough sentence fragments into natural-sounding, accurate descriptions of what participants had seen or remembered.
Reading the Mind Without Words
One of the most remarkable findings is that the method worked even when participants were simply recalling a video from memory—without seeing it again. The descriptions generated from recalled content were not only intelligible but also matched the original video content closely enough that the system could identify which video was being recalled out of 100 possibilities, achieving nearly 40% accuracy in some individuals (chance would be 1%).
Even more compelling, this was achieved without relying on the language network, the brain’s frontal and temporal regions traditionally associated with language production and comprehension.
In fact, when the researchers excluded these areas from analysis, performance dropped only slightly, and the system still generated structured, coherent descriptions. This suggests that the brain encodes complex, linguistically expressible information—about objects, relationships, actions, and context—outside the language system itself.
These findings provide compelling evidence that nonverbal thought can be translated into language, not by reconstructing speech, but by decoding the structured semantics encoded in the brain’s visual and associative areas.
Structured Meaning, Not Word Lists
Critically, the generated descriptions were not mere lists of keywords or object labels. They preserved relational information—for example, distinguishing “a dog chasing a ball” from “a ball chasing a dog.”
When the researchers shuffled the word order of these generated sentences, the system’s ability to match them to the correct brain activity dropped significantly, proving that it wasn’t just the words that mattered—it was their structure.
This structured output mirrors the way humans encode meaning: not as isolated elements but as interconnected representations of objects, actions, and relationships. The success of mind captioning shows that these high-level, structured representations are deeply embedded in brain activity and can be accessed without triggering overt language use.
Toward Nonverbal Brain-to-Text Communication
This research has profound implications for assistive communication technologies. By decoding thoughts without relying on speech or language production, mind captioning could offer new tools for people with severe communication impairments—including those with aphasia, ALS, or brain injuries that affect motor and language function.
Because the system builds from nonlinguistic visual stimuli and generalizes to recalled mental imagery, it could also be adapted for individuals with different native languages—or even for pre-verbal children or non-human animals, offering a window into mental experiences previously inaccessible.
Moreover, it opens exciting doors for brain-machine interfaces (BMIs) in general. Rather than relying on rigid commands or neural triggers, future systems could interpret complex, subjective experiences—turning mental content into text-based input for digital systems, virtual assistants, or even creative writing.
Caution and Promise
While the system currently depends on fMRI and intensive data collection per individual, advances in neural decoding, language models, and alignment techniques may allow future iterations to work with less invasive or more portable systems. Ethical safeguards will be essential, especially regarding mental privacy, as these tools become more powerful.
Still, the core achievement of this research is clear: thoughts can be translated into words—not by mimicking speech, but by mapping meaning. This reframing of brain decoding could fundamentally reshape how we think about communication, cognition, and the boundary between mind and machine.
Funding:
This research was supported by grants from JST PRESTO grant number JPMJPR185B Japan and JSPS KAKENHI grant number JP21H03536.
Key Questions Answered:Q: What is mind captioning and how does it work?
A: Mind captioning is a new brain decoding method that translates semantic brain activity—triggered by viewing or remembering video content—into descriptive text using deep learning models, bypassing the need for language network activation.
Q: How is this different from previous brain-to-text approaches?
A: Unlike prior methods that rely on decoding speech or internal dialogue, mind captioning works from nonlinguistic, visual representations and builds sentences through a semantic matching process, even from recalled mental imagery.
Q: Who could benefit from this technology in the future?
A: Individuals with aphasia, locked-in syndrome, or other speech impairments may one day use mind captioning to communicate, since it doesn’t require language production or motor control.
About this neurotech research news
Author: Neuroscience News Communications
Source: Neuroscience News
Contact: Neuroscience News Communications – Neuroscience News
Image: The image is credited to Neuroscience News
Original Research: Open access.
“Mind captioning: Evolving descriptive text of mental content from human brain activity” by Tomoyasu Horikawa. Science Advances
Abstract
Mind captioning: Evolving descriptive text of mental content from human brain activity
A central challenge in neuroscience is decoding brain activity to uncover mental content comprising multiple components and their interactions.
Despite progress in decoding language-related information from human brain activity, generating comprehensive descriptions of complex mental content associated with structured visual semantics remains challenging.
We present a method that generates descriptive text mirroring brain representations via semantic features computed by a deep language model.
Constructing linear decoding models to translate brain activity induced by videos into semantic features of corresponding captions, we optimized candidate descriptions by aligning their features with brain-decoded features through word replacement and interpolation.
This process yielded well-structured descriptions that accurately capture viewed content, even without relying on the canonical language network.
The method also generalized to verbalize recalled content, functioning as an interpretive interface between mental representations and text and simultaneously demonstrating the potential for nonverbal thought–based brain-to-text communication, which could provide an alternative communication pathway for individuals with language expression difficulties, such as aphasia.