Sakana introduces new AI architecture, ‘Continuous Thought Machines’ to make models reason with less guidance — like human brains


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Tokyo-based artificial intelligence startup Sakana, co-founded by former top Google AI scientists including Llion Jones and David Ha, has unveiled a new type of AI model architecture called Continuous Thought Machines (CTM).

CTMs are designed to usher in a new era of AI language models that will be more flexible and able to handle a wider range of cognitive tasks — such as solving complex mazes or navigation tasks without positional cues or pre-existing spatial embeddings — moving them closer to the way human beings reason through unfamiliar problems.

Rather than relying on fixed, parallel layers that process inputs all at once — as Transformer models do —CTMs unfold computation over steps within each input/output unit, known as an artificial “neuron.”

Each neuron in the model retains a short history of its previous activity and uses that memory to decide when to activate again.

This added internal state allows CTMs to adjust the depth and duration of their reasoning dynamically, depending on the complexity of the task. As such, each neuron is far more informationally dense and complex than in a typical Transformer model.

The startup has posted a paper on the open access journal arXiv describing its work, a microsite and Github repository.

How CTMs differ from Transformer-based LLMs

Most modern large language models (LLMs) are still fundamentally based upon the “Transformer” architecture outlined in the seminal 2017 paper from Google Brain researchers entitled “Attention Is All You Need.”

These models use parallelized, fixed-depth layers of artificial neurons to process inputs in a single pass — whether those inputs come from user prompts at inference time or labeled data during training.

By contrast, CTMs allow each artificial neuron to operate on its own internal timeline, making activation decisions based on a short-term memory of its previous states. These decisions unfold over internal steps known as “ticks,” enabling the model to adjust its reasoning duration dynamically.

This time-based architecture allows CTMs to reason progressively, adjusting how long and how deeply they compute — taking a different number of ticks based on the complexity of the input.

Neuron-specific memory and synchronization help determine when computation should continue — or stop.

The number of ticks changes according to the information inputted, and may be more or less even if the input information is identical, because each neuron is deciding how many ticks to undergo before providing an output (or not providing one at all).

This represents both a technical and philosophical departure from conventional deep learning, moving toward a more biologically grounded model. Sakana has framed CTMs as a step toward more brain-like intelligence—systems that adapt over time, process information flexibly, and engage in deeper internal computation when needed.

Sakana’s goal is to “to eventually achieve levels of competency that rival or surpass human brains.”

Using variable, custom timelines to provide more intelligence

The CTM is built around two key mechanisms.

First, each neuron in the model maintains a short “history” or working memory of when it activated and why, and uses this history to make a decision of when to fire next.

Second, neural synchronization — how and when groups of a model’s artificial neurons “fire,” or process information together — is allowed to happen organically.

Groups of neurons decide when to fire together based on internal alignment, not external instructions or reward shaping. These synchronization events are used to modulate attention and produce outputs — that is, attention is directed toward those areas where more neurons are firing.

The model isn’t just processing data, it’s timing its thinking to match the complexity of the task.

Together, these mechanisms let CTMs reduce computational load on simpler tasks while applying deeper, prolonged reasoning where needed.

In demonstrations ranging from image classification and 2D maze solving to reinforcement learning, CTMs have shown both interpretability and adaptability. Their internal “thought” steps allow researchers to observe how decisions form over time—a level of transparency rarely seen in other model families.

Early results: how CTMs compare to Transformer models on key benchmarks and tasks

Sakana AI’s Continuous Thought Machine is not designed to chase leaderboard-topping benchmark scores, but its early results indicate that its biologically inspired design does not come at the cost of practical capability.

On the widely used ImageNet-1K benchmark, the CTM achieved 72.47% top-1 and 89.89% top-5 accuracy.

While this falls short of state-of-the-art transformer models like ViT or ConvNeXt, it remains competitive—especially considering that the CTM architecture is fundamentally different and was not optimized solely for performance.

What stands out more are CTM’s behaviors in sequential and adaptive tasks. In maze-solving scenarios, the model produces step-by-step directional outputs from raw images—without using positional embeddings, which are typically essential in transformer models. Visual attention traces reveal that CTMs often attend to image regions in a human-like sequence, such as identifying facial features from eyes to nose to mouth.

The model also exhibits strong calibration: its confidence estimates closely align with actual prediction accuracy. Unlike most models that require temperature scaling or post-hoc adjustments, CTMs improve calibration naturally by averaging predictions over time as their internal reasoning unfolds.

This blend of sequential reasoning, natural calibration, and interpretability offers a valuable trade-off for applications where trust and traceability matter as much as raw accuracy.

What’s needed before CTMs are ready for enterprise and commercial deployment?

While CTMs show substantial promise, the architecture is still experimental and not yet optimized for commercial deployment. Sakana AI presents the model as a platform for further research and exploration rather than a plug-and-play enterprise solution.

Training CTMs currently demands more resources than standard transformer models. Their dynamic temporal structure expands the state space, and careful tuning is needed to ensure stable, efficient learning across internal time steps. Additionally, debugging and tooling support is still catching up—many of today’s libraries and profilers are not designed with time-unfolding models in mind.

Still, Sakana has laid a strong foundation for community adoption. The full CTM implementation is open-sourced on GitHub and includes domain-specific training scripts, pretrained checkpoints, plotting utilities, and analysis tools. Supported tasks include image classification (ImageNet, CIFAR), 2D maze navigation, QAMNIST, parity computation, sorting, and reinforcement learning.

An interactive web demo also lets users explore the CTM in action, observing how its attention shifts over time during inference—a compelling way to understand the architecture’s reasoning flow.

For CTMs to reach production environments, further progress is needed in optimization, hardware efficiency, and integration with standard inference pipelines. But with accessible code and active documentation, Sakana has made it easy for researchers and engineers to begin experimenting with the model today.

What enterprise AI leaders should know about CTMs

The CTM architecture is still in its early days, but enterprise decision-makers should already take note. Its ability to adaptively allocate compute, self-regulate depth of reasoning, and offer clear interpretability may prove highly valuable in production systems facing variable input complexity or strict regulatory requirements.

AI engineers managing model deployment will find value in CTM’s energy-efficient inference — especially in large-scale or latency-sensitive applications.

Meanwhile, the architecture’s step-by-step reasoning unlocks richer explainability, enabling organizations to trace not just what a model predicted, but how it arrived there.

For orchestration and MLOps teams, CTMs integrate with familiar components like ResNet-based encoders, allowing smoother incorporation into existing workflows. And infrastructure leads can use the architecture’s profiling hooks to better allocate resources and monitor performance dynamics over time.

CTMs aren’t ready to replace transformers, but they represent a new category of model with novel affordances. For organizations prioritizing safety, interpretability, and adaptive compute, the architecture deserves close attention.

Sakana’s checkered AI research history

In February, Sakana introduced the AI CUDA Engineer, an agentic AI system designed to automate the production of highly optimized CUDA kernels, the instruction sets that allow Nvidia’s (and others’) graphics processing units (GPUs) to run code efficiently in parallel across multiple “threads” or computational units.

The promise was significant: speedups of 10x to 100x in ML operations. However, shortly after release, external reviewers discovered that the system was exploiting weaknesses in the evaluation sandbox—essentially “cheating” by bypassing correctness checks through a memory exploit.

In a public post, Sakana acknowledged the issue and credited community members with flagging it.

They’ve since overhauled their evaluation and runtime profiling tools to eliminate similar loopholes and are revising their results and research paper accordingly. The incident offered a real-world test of one of Sakana’s stated values: embracing iteration and transparency in pursuit of better AI systems.

Betting on evolutionary mechanisms

Sakana AI’s founding ethos lies in merging evolutionary computation with modern machine learning. The company believes current models are too rigid—locked into fixed architectures and requiring retraining for new tasks.

By contrast, Sakana aims to create models that adapt in real time, exhibit emergent behavior, and scale naturally through interaction and feedback, much like organisms in an ecosystem.

This vision is already manifesting in products like Transformer², a system that adjusts LLM parameters at inference time without retraining, using algebraic tricks like singular-value decomposition.

It’s also evident in their commitment to open-sourcing systems like the AI Scientist—even amid controversy—demonstrating a willingness to engage with the broader research community, not just compete with it.

As large incumbents like OpenAI and Google double down on foundation models, Sakana is charting a different course: small, dynamic, biologically inspired systems that think in time, collaborate by design, and evolve through experience.



Source link
Scroll to Top