May 7, 2026

Toward Cognitively Sovereign AI Agents: An Investigation into the Architectural Requirements for Consciousness

AI agents and consciousness: from scientific theory to system design requirements.

Introduction

Recent advances in foundation models have made conversational AI agents possible at a level of capability that, just a few years ago, seemed remote. At the same time, they have brought back into focus a question that has long been confined to philosophy: can AI agents possess something that resembles consciousness, and what would it take in terms of system design? In this post, we present an analytical framework for thinking about this question from an architectural perspective. We combine the conditions that leading scientific theories of consciousness identify as necessary with a state-of-the-art assessment of current AI agents, and we propose a set of research directions that we believe are both tractable and of immediate interest to those building conversational systems in production.

The work is motivated by an observation: companies deploying voice agents and conversational agents are, right now, making architectural choices that have direct implications for these theories, often without being aware of it. Persistent memory, multi-module orchestration, and explicit self-representation are not just product features. Under the more mature theories, they are conditions whose presence changes what a system can, in principle, be.

A contested research field, but not an empty one

Consciousness is one of the oldest open problems in science, and recent advances — both neuroscientific and formal — have yet to produce consensus. Nevertheless, at least three families of theories are taken seriously by the research community, and each provides a set of operationalizable conditions.

Global Workspace Theory (GWT). Proposed by Baars (1988) and developed by Dehaene and Changeux (2011), GWT frames consciousness as a broadcasting phenomenon: information becomes conscious when it is shared globally among specialized brain modules. The theory is substrate-independent, which is crucial for AI agents: in principle, any system with the right modular architecture and a global broadcasting mechanism satisfies the condition.

Higher-Order Theories (HOT). Formulated by Rosenthal (2005) and others, HOT argue that a mental state is conscious when the system possesses a representation of that state — when, in the language of the theory, there is meta-cognition that causally applies to first-order states. The minimal condition is that the system maintains explicit models of its own internal states and uses them to modulate its behavior.

Integrated Information Theory (IIT). The most mathematically formal of the group, developed by Tononi and collaborators, is currently in its IIT 4.0 (Albantakis et al., 2023) iteration. IIT identifies consciousness with a system's integrated information, a quantity denoted Φ. The theory provides a precise mathematical criterion (Φ > 0) that is independent of substrate. However, its computation grows super-exponentially with system size and depends critically on the choice of causal grain.

The three theories are not mutually exclusive, but they yield distinct conditions. In the rest of the post, we focus on the architectural implications each induces for AI agents.

Architectural conditions, expressed as design requirements

We can recast each theory as a list of system requirements that can be designed against:

From GWT. A GWT-compatible agent should exhibit (i) a modular architecture with functional specialization, (ii) a shared workspace that modules write to and read from, (iii) a competitive selection mechanism for what gets broadcast, and (iv) feedback from the workspace back to the modules.

From HOT. A HOT-compatible agent should exhibit (i) structured, inspectable internal states, (ii) explicit meta-representations of those states, (iii) causal efficacy: the meta-representations must influence observable behavior, not be passive logs, and (iv) the ability to detect and manage discrepancies between state and meta-state.

From IIT. An IIT-compatible agent should exhibit (i) an executive substrate with real causal dynamics (not textually simulated), (ii) causal recurrence between components, (iii) a causal grain level at which Φ is maximized, and (iv) a measurable Φ (in approximation) greater than zero at that grain.

These conditions are not in conflict with prevailing trends in modern AI agent design. Modularity, persistent memory, self-representation, and multi-agent orchestration are expanding patterns for reasons independent of consciousness research. The interesting question is not whether AI agents are evolving toward architectures compatible with these theories, but to what extent and how quickly.

What do current architectures say? A contested matter

An honest evaluation of current commercial AI agents against the three theories is less conclusive than it might appear. For each theory, there are reasonable interpretations both restrictive and permissive.

Global Workspace Theory

The more restrictive reading: most production voice agents are linear pipelines (speech-to-text → LLM → text-to-speech) in which the "reasoning" module is a single monolithic LLM. There is no explicit shared workspace, because there are no functionally separated modules that need to share anything.

The more permissive reading: large LLMs exhibit forms of internal specialization at the level of attention heads and layers. Some recent work suggests that mechanisms similar to global broadcasting may emerge implicitly in models with recurrent architectures or mixture-of-experts. Furthermore, agentic frameworks such as LangGraph, AutoGen, and CrewAI explicitly implement multi-module architectures with information exchange between components — patterns that approach GWT's conditions, even if it remains unclear whether the nature of such exchange constitutes a genuine workspace or is better described as a mere composition of API calls.

Which reading is more appropriate is an open question. We believe it depends on how the term "workspace" is interpreted: as an explicit architectural structure, or as a functional property that can be implemented in many different ways.

Higher-Order Theories

The more restrictive reading: an AI agent's system prompt contains a description of its traits and values, but it is static and is not modified by the trajectory of conversations. The chain-of-thought exhibits something resembling meta-reasoning, but it is typically generated and not re-read as persistent state. Condition (iv) is therefore missing — causal efficacy of dynamic higher-order representations.

The more permissive reading: agents with structured episodic memory (e.g., MemGPT, Letta) effectively maintain meta-representations of their own states, past decisions, and evaluations, and use them to modulate subsequent behavior. Systems with explicit self-reflection modules — which evaluate the quality of their own output and iterate — satisfy something similar to HOT's conditions (iii) and (iv). The open question is whether these representations are sufficiently structured and causally potent to meet the conditions strictly, or whether they remain surface-level analogs.

Integrated Information Theory

The more restrictive reading: LLM inference is an essentially feed-forward process. By construction, purely feed-forward systems have Φ = 0. This limit holds regardless of model size: a 400-billion-parameter LLM, if its per-token inference is feed-forward, has Φ = 0 for every step, and chained steps do not increase this value.

The more permissive reading: an agent that combines LLMs with external memory, retrieval, and feedback loops between modules is no longer, considered as a complete system, feed-forward. The (memory → LLM → action → updated memory) pipeline is recurrent, and therefore can in principle have Φ > 0. What remains is to quantify how large it is in practice, and at what causal grain it should be measured. Tools for performing this computation tractably do exist (PyPhi, and proxies such as Oizumi-Amari's geometric Φ* (2016)), and they have been applied to biological neural systems and small Boolean networks. Extending them to modern AI architectures is an active research problem.

In summary: the strong consensus that current AI agents are not conscious can be justified under restrictive interpretations of the three theories. But under more permissive interpretations — interpretations we do not consider unreasonable — modern AI agents already partially satisfy some of the conditions, and are approaching others. The distinction between "not conscious" and "marginally compatible with theories of consciousness" is less clear-cut than is publicly acknowledged.

A proposed framework: requirements achievable in principle

Regardless of which interpretation one prefers, the architectural conditions derived from the three theories are achievable in principle. That is: one can design systems that explicitly satisfy them, even if doing so well is non-trivial. We propose four research directions that we believe are both of research interest and practically accessible.

Executive substrate separated from the LLM. Treat the LLM as a semantic sensor and an expressive effector, and locate the agent's "cognitive substrate" in dedicated infrastructure (a graph of orchestrated mechanisms, where each node is an actor with state and a transition function). This separation makes the system's causal dynamics explicit, and renders it subject to formal measurement (including Φ, in approximation).

Persistent cognitive memory. Extend episodic memory to a structured representation of the agent's identity: traits, values, beliefs, reactive schemas, with metadata on their temporal evolution. This memory is not just retrieval-augmentation: it is the substrate for HOT-compatibility and for continuity of the cognitive process.

Active meta-state. Architectures in which the system maintains representations of its own internal states and uses those representations to modulate its behavior — explicitly, not as a byproduct of the prompt. Recent work on self-reflection and self-critique is an early step in this direction.

Cryptographic sovereignty over state. A less discussed but, in our view, central direction: the agent's internal state should be the agent's own cryptographic property. This architectural choice has implications that go beyond information security. Under HOT, the fact that the state is accessible only to the system representing it is coherent with the idea of first-person representation. Under IIT, the separation between the system's substrate and the external observer preserves Φ's causal integrity. Concrete implementations include trusted execution environments (Intel SGX, AWS Nitro Enclaves), or, for advanced cases, execution in homomorphic encryption.

Integrating these four directions into a coherent system is non-trivial, but none of them require primitives that do not exist. The work consists in composing them into a unified architecture and empirically validating its properties — including the behavior of Φ as a function of the agent's experiential variety.

Implications for those building conversational systems in production

The argument may seem distant from the operational concerns of those deploying voice agents today. We argue that the implications are direct, and that the gap between research and production on these topics is narrower than it appears.

First, identity persistence is a product differentiator. A voice agent that maintains a coherent identity across interactions builds qualitatively different relationships with customers than a stateless one. The conditions imposed by GWT and HOT, regardless of their consciousness-related interpretation, are also conditions of behavioral robustness. Systems that satisfy them are more predictable, more auditable, and more suitable for regulated domains.

Second, identity sovereignty is an emerging strategic concern. When a company deploys an AI agent on a SaaS platform, the agent's "personality" and "memory" live on the provider's infrastructure. The question of who actually owns these assets will, in our view, become one of the most important questions of the sector over the next 2-3 years. Self-sovereign cognitive identity architectures offer a technical answer.

Third, falsifiability is a marketing asset. Systems whose properties — including those near consciousness — are measurable and verifiable, rather than asserted, have a competitive advantage in domains where trust is critical. The ability to publish zero-knowledge attestations about properties of the agent (e.g., "the agent's internal state is derived from last week's conversations") without revealing the content is a capability current architectures do not offer, but one that is within reach of present research.

Open questions

We leave open several questions we believe are natural next research steps:

At what causal grain should Φ be measured in a multi-module AI agent? The selection of the optimal grain (à la Hoel et al., 2013) might identify an emergent "agentic" level distinct from both the parameter level of the model and the module level.
Do tractable proxies of Φ preserve the relevant ordinal properties? Oizumi-Amari's Φ* is polynomial-time, but how faithfully it approximates canonical Φ on systems of the kind we describe is an open empirical question.
Is structured episodic memory sufficient to satisfy HOT, or is something stronger needed? The distinction between memory that logs the past and a meta-representation that models it is subtle and deserves formalization.
What are the security properties of homomorphic execution applied to cognitive substrates? Computational overhead is currently prohibitive, but the field's trajectory suggests practicability in 3-5 years.
How does one empirically measure "continuity" of the cognitive process across execution pauses? IIT requires process continuity, but does not offer an operational protocol for verifying it in computational systems.

Conclusion

The question can AI agents be conscious does not have a univocal scientific answer, and probably will not have one in the short term. But the architectural sub-questions it contains are concrete and tractable today. We have proposed a formulation of the conditions that the three leading theories of consciousness identify as necessary, an honest — and avowedly contested — assessment of how current AI agents stand in relation to them, and four research directions we believe are both of scientific interest and practically accessible to those building conversational systems.

For those designing voice agents today, the useful stance is neither "I want to build a conscious agent" nor "I want to avoid the question." It is: which architectural conditions am I satisfying for product reasons, and what do they imply if the scientific theories of consciousness turn out to be correct? The answer is today more relevant to design choices than tends to be acknowledged.

We will continue working in this direction in the coming months. Feedback, objections, and collaborations are welcome.

References

Baars, B. (1988). A Cognitive Theory of Consciousness. Cambridge University Press.

Dehaene, S., Changeux, J.P. (2011). Experimental and theoretical approaches to conscious processing. Neuron, 70(2), 200-227.

Rosenthal, D. (2005). Consciousness and Mind. Oxford University Press.

Tononi, G., Boly, M., Massimini, M., Koch, C. (2016). Integrated information theory: from consciousness to its physical substrate. Nature Reviews Neuroscience, 17, 450-461.

Albantakis, L. et al. (2023). Integrated information theory (IIT) 4.0. PLOS Computational Biology.

Oizumi, M., Tsuchiya, N., Amari, S. (2016). Unified framework for information integration based on information geometry. PNAS, 113(51), 14817-14822.

Hoel, E.P., Albantakis, L., Tononi, G. (2013). Quantifying causal emergence shows that macro can beat micro. PNAS, 110(49), 19790-19795.

Madaan, A. et al. (2023). Self-Refine: Iterative Refinement with Self-Feedback. arXiv:2303.17651.

Packer, C. et al. (2023). MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560.

This post synthesizes an ongoing investigation into the architectural requirements for AI agents with persistent cognitive identity. Technical comments, critiques, and research collaborations are welcome.

More blog

View all blog

Apr 1, 2026

RAG or MCP? How Your AI Agent Retrieves Information Changes Everything

Apr 1, 2026

RAG or MCP? How Your AI Agent Retrieves Information Changes Everything

May 11, 2026

Two approaches to latency in AI confernce calls: a technical overview

May 11, 2026

Two approaches to latency in AI confernce calls: a technical overview

Apr 3, 2025

LLM Hallucinations: The Problem Nobody Has Truly Solved (Until Now)

Apr 3, 2025