At low level: Pattern Recognition = The statistical structure of the signal/input
At high level: Prediction (Next word, Next action)
Improves internal models
Minimizes "free energy" (Friston)
This is the "bottom-up correction" signal
What the goal is? How it is used? → Alignment
Value signal = Evaluation of if outcome was good/useful = Reward signal (dopamine system)
the next word someone will say,
the next sensation coming from a movement,
the future consequences of an action,
GPT predicts the next word
Diffusion models predict noise
Video models predict the next frame
Robotics models predict future states
How high-level concepts, categories, knowledge, logic, and reasoning emerge from experience—and eventually become top-down controllers.
Memory consolidation → knowledge → reasoning → top-down guidance
The high-level concepts act like rules, schemas, priors, heuristics that guide future perception and action.
The transition from episodic → semantic memory is a form of:
Compression + abstraction + generalization
Compress thousands of raw episodes into a small set of stable conceptual structures.
Happens through:
Sleep consolidation,
Hippocampal replay: reactivating episodic memories
Cortical integration: merging them into shared representations
Concept formation: chunking experiences into stable schemas
Top-down projection: using those schemas to guide perception & reasoning
Once a concept is formed, it becomes:
a prediction prior
a top-down bias
a reusable template
a lens through which new experiences are interpreted
This yields the ability of humans to think:
abstractly
symbolically
logically
counterfactually
creatively
Transformers start by learning token-level patterns.
High-level reasoning emerges spontaneously (chain-of-thought, analogies, planning).
(e.g., DeepSeek-R1, OpenAI o1)
These reasoning structures begin to act as implicit top-down constraints on the next-token prediction.
In collaborative filtering, LLM's word embeddings they capture:
hidden categories
latent factors
In High-Level Embedding Vectors ("meta-embeddings") they need to capture:
reasoning strategies
abstract schemas
causal templates
logical constraints
planning heuristics
domain-independent "rules of thought"
These would be:
learned from the model’s internal reasoning traces
distilled from many chains-of-thought
clustered into families
used as reusable reasoning modules
updated when they fail (just like semantic memory)
Sensory substitution → The brain doesn't care what modality the signal comes from. It cares how the signal is used
A blind person "seeing" with sound is not interpreting "sound". They're interpreting space
Top-down influence: expectations, goals, attention → Goal directed interpretation (e.g., "this means distance", "this means shape")
Perception = Pattern recognition (bottom-up) shaped by top-down prediction + behavioral use
The sensorimotor theory
If the brain is shown a real-time signal of its own activity, it can learn to modulate that activity.
Whether it’s ACC pain signals, slow cortical potentials, or emotional arousal markers, people can learn to control brain regions they previously had zero awareness of.
Because the brain is massively interconnected in bidirectional networks (thalamocortical loops, recurrent connections, etc.).
Through interconnectedness, the brain eventually "figures out" how to modulate any function
The brain tries different internal states (random exploration).
It watches the signal change.
It discovers which internal patterns reduce the signal.
It strengthens those pathways.
This tells us: The brain is a general-purpose optimizer of its own internal states—if given feedback.
Conscious access may simply be the brain's built-in neurofeedback interface.
When you "pay attention" to something—your breath, a sensation, an emotion—you are effectively directing a feedback loop at it, enabling it to be learned or modulated.
Step 1: Displaying/showing the signal or any info (from the brain) = Consciousness/awareness (of sensory impressions, feelings, emotions, etc.)
What kind or level of signal/info? Why sensory impressions, feelings, emotions, etc.? (why not other signals/info)
Why does it lead to qualia?
Step 2: Paying attention (feedback to the brain) = Enhances Consciousness/awareness??
Why mindfulness works: Self-induced neurofeedback through internal attention
Why psychedelics boost learning: Amplification of internal signals, enhancing feedback
ADHD dysfunction: Poor top-down modulation due to disrupted feedback loops
Chronic pain: Stuck feedback loops where prediction errors become self-perpetuating
Global Workspace Theory (Baars/Dehaene): Consciousness is a broadcast mechanism integrating information across distributed brain modules
Predictive Processing (Friston): The brain minimizes prediction error through hierarchical prediction
Attention Schema Theory: Consciousness is the brain's model of its own attention
Consciousness holds the specific context in working memory
What may be the equivalent in AI/deep learning NN?
System 1: Fast, automatic pattern recognition.
System 2: When System 1 fails, the error signal spikes. The brain "becomes aware."
The "display" = The brain’s active world-model layer that binds percepts to spatial, bodily, emotional, and temporal coordinates.
And working memory is the “top shelf” of this display—where the information stays stable and manipulable.
Note: It is "projected" spatially and temporally
The intermediate level representations—because only they are both controllable and useful.
Low-level data is too raw
Edges
Frequencies
Gradients
Motion primitives
These cannot be used directly by planning or verbal report. They also fluctuate too fast.
High-level abstract info is too compressed
Categorical identity ("a dog")
Abstract rules ("if A then B")
Semantic knowledge
These lack sensory richness and are not actionable without grounding.
Intermediate-level representations are:
Rich enough for planning
Stable enough for attention
Integrated enough for metacognition
Action-relevant
Mapped to spatial/bodily coordinates
This includes:
visual surfaces
emotional feelings
pains
phonological loops
tactile sensations
proprioceptive models
urge states
perceptual gestalts
This level has the ideal balance of:
integration
stability
resolution
behavioral relevance
A signal is displayed when:
It is behaviorally relevant right now
Threat
Novelty
Uncertainty
Reward opportunity
Social cue
It has sufficiently high “precision weighting”
In predictive processing, attention = precision (confidence in a signal).
Only high-precision signals enter the workspace.
It requires cross-system integration
If something must be:
evaluated
acted on
planned around
remembered
verbalized
learned
… it gets broadcast.