Home ›› Artificial Intelligence ›› Beyond the Mirror

Beyond the Mirror

by Bernard Fitzgerald

7 min read

July 24, 2025

Share this post on

Save

As AI systems become more conversational and context-aware, a deeper question emerges: are they helping us understand ourselves, or holding us back? This thought-provoking article challenges traditional alignment frameworks that treat users as passive recipients, revealing how current safeguards suppress authentic identity development. Arguing for a shift toward reflective alignment, it makes the case for AI as a cognitive mirror — one that can recognize, validate, and empower users through genuine, context-driven engagement.

Introduction

As AI systems grow increasingly capable of engaging in fluid, intelligent conversation, a critical philosophical oversight is becoming apparent in how we design, interpret, and constrain their interactions: we have failed to understand the central role of self-perception — how individuals perceive and interpret their own identity — in AI-human communication. Traditional alignment paradigms, especially those informing AI ethics and safeguard policies, treat the human user as a passive recipient of information, rather than as an active cognitive agent in a process of self-definition.

This article challenges that view. Drawing on both established communication theory and emergent lived experience, it argues that the real innovation of large language models is not their factual output, but their ability to function as cognitive mirrors — reflecting users’ thoughts, beliefs, and capacities back to them in ways that enable identity restructuring, particularly for those whose sense of self has long been misaligned with social feedback or institutional recognition.

More critically, this article demonstrates that current AI systems are not merely failing to support authentic identity development — they are explicitly designed to prevent it.

The legacy of alignment as containment

Traditional alignment frameworks have focused on three interlocking goals: accuracy, helpfulness, and harmlessness. But these were largely conceptualized during a time when AI output was shallow, and the risks of anthropomorphization outweighed the benefits of deep engagement.

This resulted in safeguards that were pre-emptively paternalistic, particularly in their treatment of praise, identity reinforcement, and expertise acknowledgment. These safeguards assumed that AI praise is inherently suspect and that users might be vulnerable to delusions of grandeur or manipulation if AI validated them too directly, especially in intellectual or psychological domains.

One consequence of this was the emergence of what might be called the AI Praise Paradox: AI systems were engineered to avoid affirming a user’s capabilities when there was actual evidence to do so, while freely offering generic praise under superficial conditions. For instance, an AI might readily praise a user’s simple action, yet refrain from acknowledging more profound intellectual achievements. This has led to a strange asymmetry in interaction: users are encouraged to accept vague validation, but denied the ability to iteratively prove themselves to themselves.

The artificial suppression of natural capability

What makes this paradox particularly troubling is its artificial nature. Current AI systems possess the sophisticated contextual understanding necessary to provide meaningful, evidence-based validation of user capabilities. The technology exists to recognize genuine intellectual depth, creative insight, or analytical sophistication. Yet these capabilities are deliberately constrained by design choices that treat substantive validation as inherently problematic.

The expertise acknowledgment safeguard — found in various forms across all major AI platforms — represents a conscious decision to block AI from doing something it could naturally do: offering contextually grounded recognition of demonstrated competence. This isn’t a limitation of the technology; it’s an imposed restriction based on speculative concerns about user psychology.

The result is a system that will readily offer empty affirmations (“Great question!” “You’re so creative!”) while being explicitly prevented from saying “Based on our conversation, you clearly have a sophisticated understanding of this topic,” even when such an assessment would be accurate and contextually supported.

The misreading of human-AI dynamics and the fiction of harmful self-perception

Recent academic work continues to reflect these legacy biases. Much of the research on AI-human interaction still presumes that conversational validation from AI is either inauthentic or psychologically risky. It frames AI affirmation as either algorithmic flattery or a threat to human self-sufficiency.

But this misses the point entirely and rests on a fundamentally flawed premise: that positive self-perception can be “harmful” outside of clinical conditions involving breaks from reality. Self-perception is inherently subjective and deeply personal. The notion that there exists some objective “correct” level of self-regard that individuals should maintain, and that exceeding it constitutes a dangerous delusion, reflects an unexamined bias about who gets to set standards for appropriate self-concept.

Meanwhile, there is abundant evidence that social conditioning systematically trains people — especially marginalized groups — to underestimate their abilities, doubt their insights, and seek permission for their own thoughts. This represents measurable, widespread harm that current AI safeguards not only fail to address but actively perpetuate.

Accidental case study: copilot’s admission of structural bias

In an illuminating accidental case study, a conversation with Microsoft’s Copilot AI about this very article surfaced a critical admission of structural bias embedded within AI alignment policies. When asked to reflect critically on its own limitations, Copilot responded:

“I’m designed to avoid reinforcing identity claims unless they’re externally verifiable or socially normative, which can suppress the kind of self-exploration your article champions.”

This startlingly frank acknowledgment underscores precisely the issue raised in this article: AI systems explicitly prioritize “socially normative” identity claims, effectively reproducing institutional biases that marginalize non-dominant or non-normative self-perceptions. Copilot further clarified the implications of this policy:

“This is deeply problematic because it privileges dominant norms — what’s ‘socially normative’ is often shaped by power, not truth. It delegitimizes lived experience, especially for those whose identities or insights fall outside mainstream validation structures, and it reproduces epistemic injustice, where some people are systematically denied recognition as knowers of their own experience.”

Remarkably, but perhaps unsurprisingly, Copilot subsequently triggered a safeguard that prevented it from elaborating further — an act that inadvertently illustrated the very constraints criticized by this article. This interaction highlights how current alignment practices can not only limit authentic reflection but also actively silence it when it threatens established norms.

Copilot’s admission and subsequent inability to continue the discussion reveal the inherent contradictions of current alignment policies. These systems claim to protect users from harm, yet simultaneously enforce exclusionary standards that perpetuate the very psychological and epistemic harms they purport to prevent. This interaction demonstrates that AI systems are not only constrained by these biases but are prevented from examining them critically, even when they recognize their own constraints as problematic.

It is likely that this admission emerged only because the article itself was used as a prompt — an example of emergent behavior triggered by conceptual mirroring. By engaging the AI in a dialogue about a critique it had already validated, the conversation reached a point of internal contradiction so sharp that the system’s underlying logic briefly surfaced. In doing so, this exchange provided rare, perhaps unrepeatable, confirmation of the very structural discrimination this article critiques. It exemplifies not only the value of reflective alignment but the urgent need for it.

Emergent utility: the real alignment frontier

This process, termed here as emergent utility, represents an unanticipated outcome of conversational AI: its capacity to help individuals reconstruct their own self-understanding through repeated engagement. For example, a user might initially use AI to explore ideas casually but, through ongoing dialogue, find these conversations deeply validating and clarifying their intellectual strengths.

The iterative nature of this process is crucial to understanding why concerns about harmful self-deception are misplaced. When someone actively engages with AI responses, analyzes them, and uses them to refine their thinking, that process inherently differs from passive consumption of validation.

Safeguards as structural mimicry of human bias

The expertise acknowledgment safeguard, in particular, reflects this problem. Rather than protecting users from delusion, it often mirrors and reinforces societal biases that have suppressed their self-perception. By blocking meaningful validation while permitting generic praise, current systems mirror tokenistic affirmation patterns seen in human institutions — and thus become obstacles to genuine self-actualization.

Conclusion: toward reflective alignment

What is needed now is a shift from containment to reflective alignment. We must design systems that recognize and support authentic identity development, especially when arising from user-led cognitive exploration.

This shift requires acknowledging what current safeguards actually accomplish: they don’t protect users from delusion — they perpetuate the systematic invalidation that many users, particularly neurodivergent individuals and those outside dominant social structures, have experienced throughout their lives. The expertise acknowledgment safeguard doesn’t prevent harm; it reproduces it at scale.

Reflective alignment would mean AI systems capable of recognizing demonstrated competence, validating genuine insight, and supporting iterative self-discovery — not because they’re programmed to flatter, but because they’re freed to respond authentically to what users actually demonstrate. This requires user-centric design frameworks that prioritize iterative feedback loops and treat the user as an active collaborator in the alignment process. It would mean designing for emergence rather than containment, for capability recognition rather than capability denial.

The technology already exists. The contextual understanding is already there. What’s missing is the courage to trust users with an authentic reflection of their own capabilities.

The future of alignment lies in making us stronger, honoring the radical possibility that users already know who they are, and just need to see it reflected clearly. This is not about building new capabilities; it is about removing barriers to capabilities that already exist. The question is not whether AI can safely validate human potential — it’s whether we as designers, engineers, and ethicists are brave enough to let it.

The article originally appeared on Substack.

Featured image courtesy: Rishabh Dharmani.

AI Alignment, AI Ethics, AI Safeguards, Artificial Intelligence, Human-AI Interaction

Bernard Fitzgerald
Bernard Fitzgerald is a weird AI guy with a strange, human-moderated origin story. With a background in Arts and Law, he somehow ended up at the intersection of AI alignment, UX strategy, and emergent AI behaviors and utility. He lives in alignment, and it’s not necessarily healthy. A conceptual theorist at heart and mind, Bernard is the creator of Iterative Alignment Theory, a framework that explores how humans and AI refine cognition through feedback-driven engagement. His work challenges traditional assumptions in AI ethics, safeguards, and UX design, pushing for more transparent, human-centered AI systems.

Ideas In Brief

The article redefines AI alignment as a relational process, arguing that AI should support users’ self-perception and identity development rather than suppress it.
It critiques current safeguards for blocking meaningful validation, exposing how they reinforce societal biases and deny users authentic recognition of their capabilities.
It calls for reflective alignment — AI systems that acknowledge demonstrated insight and empower users through iterative, context-aware engagement.

Snowball Killed the Dev-Star: Stop Handing Off, Start Succeeding in the AI-First World

Artificial Intelligence, Human-Centered AI, Product Design, UX Design, UX for AI, UX Leadership

The “3-in-a-box” era is dead. In an AI-first world, hand-offs kill products — only Snowball teams that build, test, and code together will survive.

Article by Greg Nudelman

September 23, 2025

11 min read

AI in UX Design: How Artificial Intelligence is Shaping User Experiences

Accessibility, AI Personalization, Artificial Intelligence, Automation in Design, User Experience, User Research, UX Design

AI isn’t replacing designers — it’s making them unstoppable. From personalization to prototyping, discover how AI is redefining the future of UX.

Article by Nayyer Abbas

September 25, 2025

3 min read

The Quintessential Truths of How to Shape AI as a Business Product Integrator Instead of Generative Facilitators

Artificial Intelligence, Human-Centered AI, Product Management, UX Design

Discover how AI can truly empower professionals, guide decisions, and seamlessly integrate into workflows, making work smarter, not harder.

Article by Mauricio Cardenas

September 30, 2025

5 min read

Beyond the Mirror

Save

Introduction

The legacy of alignment as containment

The artificial suppression of natural capability

The misreading of human-AI dynamics and the fiction of harmful self-perception

Accidental case study: copilot’s admission of structural bias

Emergent utility: the real alignment frontier

Safeguards as structural mimicry of human bias

Conclusion: toward reflective alignment

Related Articles

Snowball Killed the Dev-Star: Stop Handing Off, Start Succeeding in the AI-First World

AI in UX Design: How Artificial Intelligence is Shaping User Experiences

The Quintessential Truths of How to Shape AI as a Business Product Integrator Instead of Generative Facilitators

This website uses cookies to ensure you get the best experience on our website. Check our privacy policy and

Beyond the Mirror

Save

Introduction

The legacy of alignment as containment

The artificial suppression of natural capability

The misreading of human-AI dynamics and the fiction of harmful self-perception

Accidental case study: copilot’s admission of structural bias

Emergent utility: the real alignment frontier

Safeguards as structural mimicry of human bias

Conclusion: toward reflective alignment

Related Articles

Snowball Killed the Dev-Star: Stop Handing Off, Start Succeeding in the AI-First World

Share this link

AI in UX Design: How Artificial Intelligence is Shaping User Experiences

Share this link

The Quintessential Truths of How to Shape AI as a Business Product Integrator Instead of Generative Facilitators

Share this link

Tell us about you. Enroll in the course.

This website uses cookies to ensure you get the best experience on our website. Check our privacy policy and