This episode of the M365.FM Podcast — “The Architecture of Persistent Context: Why Episodic AI Is Slowing You Down” — explains that persistent context is not a convenience feature but a foundational architectural layer that determines whether AI systems can scale reliably and productively in the enterprise. The host argues that most organizations treat AI as a “session”, where each interaction starts from scratch based on immediate prompts. This episodic design prevents agents and Copilots from becoming truly effective because context — identity, goals, constraints, history, and provenance — is never carried forward in a structured, auditable, and bounded way. Without persistent context, systems repeat effort, generate inaccurate decisions, and create inconsistencies that amplify risk and operational debt. The episode prescribes architectural principles for implementing persistent context properly — including explicit context boundary definitions, scoped identity tokens, standardized memory primitives, and lifecycle policies — and shows how treating context as data with governance and contracts transforms transient AI into reasoned, accountable, auditable automation that can safely power enterprise workflows.
🧠 Core Theme
Episodic AI (stateless, session-based interaction) cannot scale in enterprise environments.
Persistent context is the architectural layer that enables consistency, governance, and task continuity.
Without structured, governed memory, AI systems remain conversational helpers rather than operational systems.
Why Episodic AI Fails at Scale
Every session starts from zero context.
No durable awareness of goals, constraints, previous actions, or ownership.
Repeated verification and re-prompting waste time.
Inconsistent outputs increase human review overhead.
Automation gains erode because humans must constantly re-ground the system.
Episodic models create invisible operational debt — every workflow restart compounds inefficiency.
What Persistent Context Actually Means
Persistent context is not “chat history.” It is structured, governed state that survives across interactions.
It includes:
Identity Context
Who is acting
Under what permissions
On whose behalf
Goal Context
What outcome is being pursued
Multi-step objectives
Constraint Context
Policy rules
Compliance requirements
Operational guardrails
Provenance Context
Where information came from
Evidence and source tracking
State Context
What has already happened
Current workflow position
Pending actions
Without these layers, AI cannot function as a production system.
Why Persistent Context Changes Everything
✅ Consistency
Repeated tasks yield consistent decisions.
✅ Efficiency
The system doesn’t ask for the same inputs repeatedly.
✅ Auditability
Decisions are traceable through context chains.
✅ Governance
Rules travel with the workflow rather than relying on memory of prior prompts.
✅ Automation Depth
Systems can resume tasks without human reset.
Architectural Design Principles
1. Explicit Context Boundaries
Define what persists and what does not.
Bound persistence to business-aligned scopes (e.g., ticket lifecycle, case lifecycle, project phase).
Avoid unbounded memory accumulation that introduces noise and drift.
Clear expiration policies reduce cost and risk.
2. Scoped Identity Integration
Identity must attach to persistent context.
Agents operate under clear, least-privilege, attributable identities.
Context without identity produces ambiguity in accountability.
Persistent identity allows defensible audit trails.
3. Structured Memory Primitives
Context must be structured, not free-form narrative memory.
Examples of primitives:
Task state markers
Policy references
Decision checkpoints
Evidence pointers
Escalation status
Structured memory enables deterministic reasoning rather than probabilistic guesswork.
4. Lifecycle Governance
Persistent context requires governance like any other enterprise data asset:
Creation standards
Update controls
Access limitations
Retention policies
Archival and deletion rules
Unmanaged context eventually becomes stale, misleading, and risky.
Practical Enterprise Scenarios
Ticket Lifecycle Continuity
Episodic Model:
Each follow-up interaction starts from scratch, requiring manual history review.
Persistent Model:
Ticket state, prior actions, and constraints travel with the workflow, enabling automatic escalation, routing, and SLA enforcement.
Multi-Step Approvals
Episodic Model:
Each step depends on manual context recovery.
Persistent Model:
Constraints, risk posture, and decision rationale remain embedded in the workflow state.
Incident Response
Episodic Model:
Investigators restate parameters at each step.
Persistent Model:
Threat indicators, containment decisions, and evidence persist until closure.
Common Misconceptions Addressed
“Better prompts will fix consistency.”
They won’t. Prompt quality cannot replace structured workflow memory.
“Chat history equals memory.”
Chat logs are narrative artifacts, not governed state machines.
“Memory is just convenience.”
Memory is the backbone of decision continuity and auditability.
Governance Implications
Persistent context becomes part of the AI control plane.
Context stores must be classified, retained, and monitored like any data system.
Leaders must treat workflow memory as a governed asset, not a temporary cache.
Audit readiness depends on context provenance and lifecycle discipline.
Leadership Takeaways
AI scalability depends on continuity, not just intelligence.
Persistent context transforms episodic assistants into accountable systems.
Identity, boundaries, and structured memory must be engineered deliberately.
Without context governance, enterprises accumulate decision inconsistency and verification overhead.
The real architectural shift isn’t smarter models — it’s durable, governed state.
1
00:00:00,000 --> 00:00:03,120
Most organizations think co-pilot success is a prompting problem.
2
00:00:03,120 --> 00:00:06,240
If users just learn the right magic words, the model will behave.
3
00:00:06,240 --> 00:00:09,040
They're wrong. Your prompts aren't failing because people can't write.
4
00:00:09,040 --> 00:00:12,320
They're failing because the enterprise never built a place where intent can live,
5
00:00:12,320 --> 00:00:15,920
stay current and be governed. So co-pilot improvises, confidently.
6
00:00:15,920 --> 00:00:19,760
That's how you get plausible nonsense, governance, debt, and decisions that take longer
7
00:00:19,760 --> 00:00:24,080
because nobody trusts the output. If you want fewer co-pilot demos and more architectural
8
00:00:24,080 --> 00:00:29,120
receipts, subscribe to the M365 FM podcast. In the next few minutes,
9
00:00:29,120 --> 00:00:34,480
this gets simple. A femoral context versus persistent context, where truth must live,
10
00:00:34,480 --> 00:00:40,720
and how control actually attaches. The foundational misunderstanding, co-pilot isn't a chatbot.
11
00:00:40,720 --> 00:00:46,080
The core misconception is treating Microsoft 365 co-pilot like a chatbot with a nicer suit.
12
00:00:46,080 --> 00:00:50,000
A chatbot is basically, you ask, it answers the conversation, scrolls away,
13
00:00:50,000 --> 00:00:54,080
and the risk stays mostly personal. You got something wrong, you look silly, you move on.
14
00:00:54,080 --> 00:00:58,560
Co-pilot in the enterprise is not that. In architectural terms, co-pilot is an interaction layer
15
00:00:58,560 --> 00:01:03,280
sitting on top of three things you already run. Microsoft graph as the data surface,
16
00:01:03,280 --> 00:01:07,680
entra as the identity and authorization engine, and whatever governance you did,
17
00:01:07,680 --> 00:01:11,920
or didn't, build with purview labels, retention, DLP, and policy.
18
00:01:11,920 --> 00:01:17,280
The model generates language, sure, but the system behavior is shaped by what it can retrieve,
19
00:01:17,280 --> 00:01:21,360
what it's allowed to retrieve, and what signals exist to rank one source above another.
20
00:01:21,360 --> 00:01:26,640
That distinction matters. Because when co-pilot fails, it usually isn't failing at language.
21
00:01:26,640 --> 00:01:30,720
It's failing at selection. It's pulling the wrong material from the wrong place with the wrong
22
00:01:30,720 --> 00:01:35,520
implied authority, and then writing it in a tone that sounds like it's certain. That's not a prompt
23
00:01:35,520 --> 00:01:39,840
problem. That's a context architecture problem. This is why co-pilot looks incredible in demos,
24
00:01:39,840 --> 00:01:44,480
and then becomes mediocre inside your tenant. The demo environment has clean, curated sources,
25
00:01:44,480 --> 00:01:49,920
and tidy permissions. The retrieval universe is small, the content is recent, and there's one obvious
26
00:01:49,920 --> 00:01:54,560
truth document. In your tenant, the retrieval universe is a landfill with an indexing service,
27
00:01:54,560 --> 00:01:59,040
and that means your prompt is operating inside a probabilistic system, even if you keep pretending
28
00:01:59,040 --> 00:02:03,440
it's deterministic. Here's the uncomfortable truth. The prompt doesn't create truth. The prompt
29
00:02:03,440 --> 00:02:08,640
only steers, which fragments of your tenant co-pilot will consider while it manufactures a response.
30
00:02:08,640 --> 00:02:14,400
When that steering lacks strong constraints, authoritative sources, scoped context, clear intent,
31
00:02:14,400 --> 00:02:18,720
the model compensates with pattern completion. That's where the confident fiction comes from.
32
00:02:18,720 --> 00:02:23,200
It's not malicious. It's just how generative models behave when they don't have an anchor.
33
00:02:23,200 --> 00:02:27,120
So what do people do? They try to fix it with more prompt engineering. They invent frameworks.
34
00:02:27,120 --> 00:02:31,920
They write longer prompts. They create rules that actually work. And yes, sometimes that improves
35
00:02:31,920 --> 00:02:36,720
output because you're adding constraints and context manually, but manual prompting doesn't scale.
36
00:02:36,720 --> 00:02:41,280
It can't. The enterprise is a distributed system. It's a thousand teams, a million files,
37
00:02:41,280 --> 00:02:45,840
10 million permissions, and a governance model that slowly erodes because exceptions feel productive.
38
00:02:45,840 --> 00:02:50,400
Every exception is an entropy generator. Now add co-pilot.co-pilot doesn't simplify that.
39
00:02:50,400 --> 00:02:55,200
Co-pilot accelerates it because it can produce polished outputs faster than your organization
40
00:02:55,200 --> 00:03:00,000
can validate them. And the first time a senior leader forwards a co-pilot generated answer as if
41
00:03:00,000 --> 00:03:04,080
it's policy you've just converted a drafting assistant into an unofficial authority.
42
00:03:04,080 --> 00:03:07,920
You didn't deploy an AI assistant. You deployed a distributed decision engine that speaks in
43
00:03:07,920 --> 00:03:12,080
complete sentences. Let's make this obvious with one micro example. A user asks,
44
00:03:12,080 --> 00:03:17,440
what's our rule for sending customer data to a vendor? In a healthy architecture, there's a single
45
00:03:17,440 --> 00:03:22,560
authoritative policy source. It's owned, labeled, current, and discoverable. Co-pilot retrieves it,
46
00:03:22,560 --> 00:03:27,680
sites it, and answers with boundaries. In the real enterprise that policy lives in three places.
47
00:03:27,680 --> 00:03:33,360
A PDF from 2021, a SharePoint page, someone edited last month, and a team's message thread where
48
00:03:33,360 --> 00:03:39,040
legal said it depends and everyone ignored the rest. Co-pilot retrieves all of it, ranks something
49
00:03:39,040 --> 00:03:43,360
because it has more keywords and produces a smooth paragraph that sounds like compliance.
50
00:03:43,360 --> 00:03:47,840
The user didn't get an answer. They got a statistically plausible synthesis of your organizational
51
00:03:47,840 --> 00:03:53,600
confusion. And now governance has a new enemy. Answers that look like decisions but have no receipts.
52
00:03:53,600 --> 00:03:58,720
This is where the deterministic versus probabilistic distinction matters. A deterministic system is
53
00:03:58,720 --> 00:04:03,760
one where the same input reliably produces the same governed outcome because the system has a
54
00:04:03,760 --> 00:04:08,800
stable source of truth and enforced constraints. Identity is consistent. Labels matter access
55
00:04:08,800 --> 00:04:15,040
boundaries are real. Content life cycle exists. A probabilistic system is one where outcomes drift
56
00:04:15,040 --> 00:04:20,400
because the retrieval set drifts. Permissions drift, content rods, duplicates, multiply, and
57
00:04:20,400 --> 00:04:25,280
co-pilot politely pretends it all makes sense. Most organizations are running co-pilot as a probabilistic
58
00:04:25,280 --> 00:04:29,680
system and then blaming users for not being deterministic enough with prompts. So no, co-pilot
59
00:04:29,680 --> 00:04:34,800
isn't a chatbot. It's closer to an authorization aware retrieval and reasoning pipeline wrapped in
60
00:04:34,800 --> 00:04:40,400
a chat UI. It is a compiler that takes your intent, pulls in whatever context it can legally see,
61
00:04:40,400 --> 00:04:45,360
and outputs a plausible artifact, which means the control plane is not the prompt. The control plane
62
00:04:45,360 --> 00:04:51,040
is context. What exists? Where it lives? Who owns it? How it's labeled? How it's updated? And how
63
00:04:51,040 --> 00:04:55,200
it's constrained? Once you see co-pilot that way, the rest of this episode becomes painfully
64
00:04:55,200 --> 00:04:59,520
straightforward. And it explains why co-pilot notebooks exist at all. Because if you don't give
65
00:04:59,520 --> 00:05:04,400
the system a governed container for persistent intent, you'll keep doing what humans always do.
66
00:05:04,880 --> 00:05:09,920
You'll keep trying to solve an architectural problem with better typing, persistent context,
67
00:05:09,920 --> 00:05:15,520
what it is, and what it is not. Persistent context is not a feature. It's a design decision.
68
00:05:15,520 --> 00:05:19,760
It's the choice to stop treating what the model should know as a side effect of whatever happens
69
00:05:19,760 --> 00:05:25,600
to be open, recent, or popular in the graph. And instead, build a curated, governable context set
70
00:05:25,600 --> 00:05:31,680
that can survive more than one conversation. In simple terms, persistent context is reusable intent
71
00:05:31,680 --> 00:05:36,800
plus reusable sources. Intent means the constraints you keep retiping today. Scope, assumptions,
72
00:05:36,800 --> 00:05:42,080
definitions, tone, required output format, and the explicit do not do this exclusions.
73
00:05:42,080 --> 00:05:46,960
Sources means the documents, pages, and records you're willing to treat as input to a decision.
74
00:05:46,960 --> 00:05:53,200
Not helpful reading inputs. When those two things persist, co-pilot stops acting like a slot machine.
75
00:05:53,200 --> 00:05:57,520
It starts acting like a system. Now, what persistent context is not? It is not chat history.
76
00:05:57,520 --> 00:06:04,080
Chat history is an audit trail of what was said, not a stable substrate of truth. It's messy by nature.
77
00:06:04,080 --> 00:06:09,280
Half-formed questions, wrong turns, speculative answers, and good enough for now drafts.
78
00:06:09,280 --> 00:06:15,280
Treating chat history as institutional knowledge is how bad outputs fossilise into future mistakes.
79
00:06:15,280 --> 00:06:20,560
Persistent context is also not personal memory. Some tools offer "remember my preferences".
80
00:06:20,560 --> 00:06:24,800
So that's personalization, not governance. It might make responses feel smoother,
81
00:06:24,800 --> 00:06:28,480
but it doesn't solve enterprise truth, accountability, or policy enforcement.
82
00:06:28,480 --> 00:06:31,520
If co-pilot remembers that a user likes concise answers fine.
83
00:06:31,520 --> 00:06:35,840
If it remembers how your finance policy works, you've just invented an unofficial policy store
84
00:06:35,840 --> 00:06:39,920
with no owner and no change control. That is not architecture. That is entropy with a smile.
85
00:06:39,920 --> 00:06:45,440
And it's definitely not whatever I had open. This is the most common illusion in co-pilot usage,
86
00:06:45,440 --> 00:06:50,320
the belief that proximity equals authority. The deck you are reading, the email you skimmed,
87
00:06:50,320 --> 00:06:53,920
the meeting recap you forgot you joined. None of that is authoritative by default.
88
00:06:53,920 --> 00:06:58,560
It's merely a Jason. Co-pilot can retrieve a Jason. It cannot infer intent from a Jason C.
89
00:06:58,560 --> 00:07:03,920
The system doesn't know if that slide deck is final draft or a dead end someone sent to get you off their back.
90
00:07:03,920 --> 00:07:08,080
So persistent context needs to be treated as an asset class. That distinction matters because
91
00:07:08,080 --> 00:07:13,120
assets have owners. They have life cycles. They have review cadences. They have audit expectations.
92
00:07:13,120 --> 00:07:17,760
If you want co-pilot outputs to be stable enough to trust, the context feeding those outputs
93
00:07:17,760 --> 00:07:21,920
has to be stable enough to defend. You can't defend a pile of links. You defend a curated
94
00:07:21,920 --> 00:07:26,560
corpus with explicit intent. This is also where enterprises accidentally build the opposite. They
95
00:07:26,560 --> 00:07:31,760
build context by accumulation, not by design. Someone creates a team, then a channel, then a sharepoint
96
00:07:31,760 --> 00:07:38,160
site, then a folder, then five copies of the same PowerPoint with different final V7 real final names.
97
00:07:38,160 --> 00:07:42,960
Then loop components start living in chats and meeting notes and pages copied around like confetti.
98
00:07:42,960 --> 00:07:47,840
Then people bookmark things, then people stop updating things. Then everyone assumes the latest is
99
00:07:47,840 --> 00:07:52,640
whatever they touched most recently. That is not persistent context. That's context sprawl pretending
100
00:07:52,640 --> 00:07:57,440
to be knowledge and it gets worse because co-pilot is polite. It will answer anyway. It will synthesize
101
00:07:57,440 --> 00:08:01,760
the sprawl into something coherent sounding even when the underlying material contradicts itself.
102
00:08:01,760 --> 00:08:06,320
The enterprise interprets coherence as correctness. It isn't. Persistent context requires a boundary.
103
00:08:06,320 --> 00:08:10,720
A boundary is a deliberate reduction in the retrieval universe. It's saying, for this domain,
104
00:08:10,720 --> 00:08:15,120
these are the sources that count and these are the instructions that define how to interpret them.
105
00:08:15,120 --> 00:08:19,200
That's why notebooks are interesting. Not because they're one note on steroids but because they
106
00:08:19,200 --> 00:08:24,720
represent a container where you can bind sources and intent together repeatedly with traceability.
107
00:08:24,720 --> 00:08:29,200
You can keep the same constraints and the same references and iterate outputs without
108
00:08:29,200 --> 00:08:34,080
relitigating what truth even means. But persistent context also implies governance has somewhere to
109
00:08:34,080 --> 00:08:39,600
attach. If the source set is curated, labels and retention policies matter, access boundaries matter,
110
00:08:39,600 --> 00:08:44,560
ownership matters, review cadence matters, and when something changes. Policy updates,
111
00:08:44,560 --> 00:08:49,760
vendor changes, regulatory changes. You can update the context asset and know what downstream
112
00:08:49,760 --> 00:08:56,080
reasoning environments depend on it. In other words, persistent context turns prompting into configuration.
113
00:08:56,080 --> 00:09:01,440
And yes, that scares people because configuration implies responsibility. Good. Because the alternative
114
00:09:01,440 --> 00:09:06,560
is what you already have. A probabilistic system producing confident fiction at enterprise scale.
115
00:09:06,560 --> 00:09:11,760
Next, the first failure mode shows up immediately. Co-pilot hallucinating policy enforcement because
116
00:09:11,760 --> 00:09:17,360
the enterprise never placed policy where it can be retrieved as truth. Failure mode one hallucinated
117
00:09:17,360 --> 00:09:21,600
policy enforcement. This failure mode is the one that makes auditors sweat because it doesn't look
118
00:09:21,600 --> 00:09:26,800
like a security incident. It looks like help. Someone asks co-pilot a policy-shaped question.
119
00:09:26,800 --> 00:09:31,680
Co-pilot responds in a policy-shaped tone and the organization treats the answer as policy because
120
00:09:31,680 --> 00:09:37,040
it sounds clean, complete, and confident. The system didn't enforce anything. It narrated something.
121
00:09:37,040 --> 00:09:41,600
That distinction matters. Halucinated policy enforcement happens when a generative system
122
00:09:41,600 --> 00:09:45,920
gets asked to behave like a rule engine, but you never gave it an authoritative rule source that
123
00:09:45,920 --> 00:09:51,200
the retrieval pipeline can consistently anchor to. So it does what it was built to do. It synthesizes
124
00:09:51,200 --> 00:09:56,160
patterns from whatever it can see. It writes a plausible policy paragraph by stitching together
125
00:09:56,160 --> 00:10:00,880
fragments of old guidance, partial exceptions, and whatever happens to be keyword dense.
126
00:10:00,880 --> 00:10:05,360
And because it's written in adult sentences, people stop questioning it. The most common trigger is
127
00:10:05,360 --> 00:10:12,880
a question that has the shape of governance. Are we allowed to? What's the rule for, do we need approval
128
00:10:12,880 --> 00:10:18,640
if what label do we apply to? Can I share this with a vendor? These questions are not about content.
129
00:10:18,640 --> 00:10:22,400
They're about decisions. They're requests for boundaries. In a well-designed enterprise,
130
00:10:22,400 --> 00:10:27,120
those boundaries live in one of two places and enforced control in the platform or an authoritative
131
00:10:27,120 --> 00:10:33,920
policy artifact with ownership, life cycle, and semantic stability, ideally both. In most enterprises,
132
00:10:33,920 --> 00:10:39,120
those boundaries live in a PDF, nobody owns, a SharePoint page, everybody edits, a team's thread
133
00:10:39,120 --> 00:10:44,400
where someone said fine, and a training deck that's now wrong. Four sources, four levels of authority,
134
00:10:44,400 --> 00:10:50,000
zero enforced hierarchy. So co-pilot picks one or blends them or worse,
135
00:10:50,000 --> 00:10:54,800
invents the missing glue. Here's what makes this failure mode lethal. Co-pilot often doesn't
136
00:10:54,800 --> 00:10:59,520
hallucinate random facts. It hallucinates governance. It hallucinates certainty. It hallucinates
137
00:10:59,520 --> 00:11:03,920
the existence of a rule that the enterprise wishes it had. That's how you end up with compliance by
138
00:11:03,920 --> 00:11:09,200
autocomplete. And yes, the model can cite sources that doesn't save you. Citations often validate that
139
00:11:09,200 --> 00:11:13,760
co-pilot read something, not that what it read is current, authoritative, or even internally consistent.
140
00:11:13,760 --> 00:11:18,000
If the citations point to the wrong truth, you've just built a more confident delivery mechanism
141
00:11:18,000 --> 00:11:21,920
for the wrong decision. This is why policy content can't be treated as just more content.
142
00:11:21,920 --> 00:11:26,480
Policy is a control plane artifact. A policy statement without control change is not a policy.
143
00:11:26,480 --> 00:11:30,800
It's a suggestion that rots. A policy statement without ownership is not a policy. It's a rumor with
144
00:11:30,800 --> 00:11:35,840
a URL. A policy statement without enforced semantics is not a policy. It's a paragraph that competes
145
00:11:35,840 --> 00:11:40,240
with every other paragraph in your tenant. And the enterprise loves paragraphs. So this failure
146
00:11:40,240 --> 00:11:45,520
mode shows up in predictable places. HR asks about leave, discipline, or hiring rules. The policy
147
00:11:45,520 --> 00:11:51,280
lives in a handbook PDF from two reogs ago and a half updated internet page. Co-pilot answers
148
00:11:51,280 --> 00:11:56,800
like it's the HR director and the manager forwards it to an employee. Now the organization has created
149
00:11:56,800 --> 00:12:01,920
a human impact event with no authoritative anchor. Security asks about data classification and
150
00:12:01,920 --> 00:12:07,600
sharing. The real rules live partly in purview labels in DLP, partly in a standard document,
151
00:12:07,600 --> 00:12:12,880
and partly in what we've always done. Co-pilot answers with a blended story. The user follows it.
152
00:12:12,880 --> 00:12:17,680
You get oversharing or you get unnecessary blockage and both outcomes create operational drag.
153
00:12:17,680 --> 00:12:22,560
Procurement asks about vendor onboarding. There's a process doc, a service now workflow,
154
00:12:22,560 --> 00:12:27,840
and a set of exceptions that were approved last year under pressure. Co-pilot returns a neat checklist
155
00:12:27,840 --> 00:12:32,880
that omits the exceptions or incorrectly normalizes them. Now teams either bypass the workflow or
156
00:12:32,880 --> 00:12:37,360
assume it's optional. The root cause is boring. You never place truth where co-pilot can retrieve it
157
00:12:37,360 --> 00:12:41,200
with predictable authority. Instead you spread governance across convenient locations,
158
00:12:41,200 --> 00:12:45,920
PDFs in random libraries, email attachments, share point pages with no change control,
159
00:12:45,920 --> 00:12:51,920
and chats that feel authoritative because someone's senior typed them. Co-pilot then behaves exactly
160
00:12:51,920 --> 00:12:57,520
like the retrieval system it is. It ranks, it samples, it synthesizes, it cannot enforce intent,
161
00:12:57,520 --> 00:13:02,320
you never encoded, so the fix isn't, train users to prompt better. That's how the enterprise
162
00:13:02,320 --> 00:13:07,440
absorbs itself and keeps the same architecture. The fix is to treat policy as something the system
163
00:13:07,440 --> 00:13:12,800
must be able to ground in. A single own source, a stable publishing model, and clear boundaries
164
00:13:12,800 --> 00:13:19,440
between policy, guidance, and discussion. If you can't separate those, co-pilot won't either,
165
00:13:19,440 --> 00:13:23,840
and once policy has an authoritative home, you still have one more job. Make sure co-pilot
166
00:13:23,840 --> 00:13:28,240
can't treat everything else as equal. That means the next section because the real problem isn't
167
00:13:28,240 --> 00:13:32,800
that co-pilot can't find information, it's that the enterprise never decided where truth is allowed
168
00:13:32,800 --> 00:13:38,880
to live. Where truth must live, authoritative sources versus convenient sources. The enterprise
169
00:13:38,880 --> 00:13:45,040
keeps pretending authoritative is a vibe, it isn't. Authoritative means three boring things that almost
170
00:13:45,040 --> 00:13:51,120
nobody implements. Control change, single ownership, and predictable semantics. Control change means
171
00:13:51,120 --> 00:13:56,240
updates follow an explicit process, not whoever had edited rights and caffeine. Single ownership means
172
00:13:56,240 --> 00:14:01,120
one accountable role can answer who approved this and when, without a treasure hunt. Predicable
173
00:14:01,120 --> 00:14:06,400
semantics means the content uses stable terms and definitions, so the system can retrieve and
174
00:14:06,400 --> 00:14:11,360
interpret it consistently. Convenient sources fail all three. Convenient sources are whatever is
175
00:14:11,360 --> 00:14:16,480
close at hand, a slide deck, a team's message, a meeting recap, a sharepoint page that became a
176
00:14:16,480 --> 00:14:22,080
dumping ground, the policy folder that contains 400 files, and the word doc someone attached to an
177
00:14:22,080 --> 00:14:27,680
email five quarters ago. Convenience produces volume. Volume produces ambiguity, ambiguity produces
178
00:14:27,680 --> 00:14:32,400
retrieval drift, and retrieval drift produces co-pilot answers that sound decisive while being
179
00:14:32,400 --> 00:14:37,760
structurally untrustworthy. So when people ask where should truth live, the answer isn't sharepoint.
180
00:14:37,760 --> 00:14:42,480
Sharepoint is a file and page platform. It's not an authority model. Authorities what you build on top,
181
00:14:42,480 --> 00:14:47,600
permissions, publishing workflows, page ownership, change control, and life cycle. Without those,
182
00:14:47,600 --> 00:14:52,560
Sharepoint becomes a sprawl engine with a nice UI. The same is true for teams. Teams is not a knowledge
183
00:14:52,560 --> 00:14:57,040
system. It's a high velocity conversation system with permanent storage side effects. Treating
184
00:14:57,040 --> 00:15:01,680
teams' messages as policy is like treating hallway gossip as a contractual term. It may reflect
185
00:15:01,680 --> 00:15:06,800
reality. It may also reflect one person's confidence during a bad week. That distinction matters,
186
00:15:06,800 --> 00:15:11,840
because co-pilot doesn't know which of these is truth. It knows which is retrievable. It knows which
187
00:15:11,840 --> 00:15:16,000
matches the prompt. It knows which has the right keywords, and it knows which you have access to.
188
00:15:16,000 --> 00:15:21,120
That's it. If you don't encode authority, co-pilot will synthesize convenience. So the architecture
189
00:15:21,120 --> 00:15:26,160
decision you need is a placement model. Policy must live where change is controlled and semantics
190
00:15:26,160 --> 00:15:30,880
are stable. Guidance can live where it's consumable and contextual. Discussion can live where it's
191
00:15:30,880 --> 00:15:35,200
fast and disposable. And those three must not compete as equals. Here's a pragmatic split that
192
00:15:35,200 --> 00:15:39,760
actually holds up. If it's an enforceable rule, classification requirements, retention requirements,
193
00:15:39,760 --> 00:15:44,320
external sharing constraints, mandatory approvals, then it needs a home that behaves like a control
194
00:15:44,320 --> 00:15:50,080
play and artifact. That usually means a formally published policy set with versioning, ownership,
195
00:15:50,080 --> 00:15:55,120
and review cadence plus enforcement in platform controls where possible. Per view doesn't store
196
00:15:55,120 --> 00:15:59,920
all your policy, but it does express policy as label taxonomy, retention, and DLP behaviors.
197
00:15:59,920 --> 00:16:04,320
That's the point. It turns narrative rules into machine-inforcible constraints.
198
00:16:04,320 --> 00:16:08,400
If it's operational guidance, how to do the thing inside your organization, who to contact,
199
00:16:08,400 --> 00:16:12,640
what templates to use, then a curated SharePoint knowledge base can work, but only if it's treated
200
00:16:12,640 --> 00:16:18,640
like a product. Page owners, publishing approvals, and explicit last-reviewed discipline. If pages
201
00:16:18,640 --> 00:16:22,800
don't have owners, they're not guidance. They're content that decays. If it's interpretation,
202
00:16:22,800 --> 00:16:27,760
negotiation, exception handling, or what we think that belongs in chat meetings and threads,
203
00:16:27,760 --> 00:16:32,960
high velocity, low authority. And ideally, with a path to promote the outcome into a governed artifact
204
00:16:32,960 --> 00:16:37,120
when it becomes real, because the enterprise always does the opposite. It stores policy as PDFs
205
00:16:37,120 --> 00:16:41,920
because that's how legal likes it. It stores guidance as decks because that's how training works.
206
00:16:41,920 --> 00:16:47,360
It stores decisions as chat messages because that's where we were talking. Then it wonders why
207
00:16:47,360 --> 00:16:52,720
co-pilot can't tell policy from opinions. The system can't separate what you refuse to separate.
208
00:16:52,720 --> 00:16:56,960
Now connect that back to co-pilot notebooks. A notebook is not where truth should originate. It's
209
00:16:56,960 --> 00:17:01,680
not your policy store. It's not your compliance system. It's a context container that points at truth,
210
00:17:01,680 --> 00:17:07,200
binds it to intent, and makes the system behave predictably for a defined domain. That means the
211
00:17:07,200 --> 00:17:11,200
notebook is downstream of truth placement. If you feed it convenient sources, it will produce
212
00:17:11,200 --> 00:17:15,920
convenient answers. If you feed it authoritative sources, you get outputs, you can defend.
213
00:17:15,920 --> 00:17:19,840
Not because co-pilot got smarter, but because you narrowed the retrieval universe to
214
00:17:19,840 --> 00:17:25,200
sources with actual governance semantics. And yes, this forces a decision leaders hate.
215
00:17:25,600 --> 00:17:30,240
You can't declare single source of truth as a slogan. You have to pay for it with ownership,
216
00:17:30,240 --> 00:17:35,120
control change, and removal of duplicates. So the rule is blunt. If the content changes decisions
217
00:17:35,120 --> 00:17:39,600
later, it must have an authoritative home. If it doesn't, co-pilot will happily invent the missing
218
00:17:39,600 --> 00:17:44,400
authority for you. Next, the failure mode that follows truth placement is just as predictable.
219
00:17:44,400 --> 00:17:49,760
Once you stop hallucinated policy, you run headfirst into context sprawl pretending to be knowledge.
220
00:17:49,760 --> 00:17:52,240
Failure mode 2. Context sprawl
221
00:17:53,360 --> 00:17:58,400
Masquerading as knowledge. Once truth is placed, the next failure shows up anyway, because most
222
00:17:58,400 --> 00:18:02,960
tenants don't fail from missing information. They fail from too much information with no authority
223
00:18:02,960 --> 00:18:07,440
gradient. This is the part where leaders say, but we have SharePoint, we have Teams, we have OneDrive,
224
00:18:07,440 --> 00:18:11,920
we have everything in Microsoft 365. Correct, you have everything. That's the problem.
225
00:18:11,920 --> 00:18:17,200
Context sprawl masquerading as knowledge is what happens when the enterprise treats volume as
226
00:18:17,200 --> 00:18:22,160
coverage. The graph becomes a dumping ground of near duplicates, half finished drafts,
227
00:18:22,160 --> 00:18:27,520
abandoned project sites and temporary workspaces that never die. Co-pilot doesn't see a knowledge base,
228
00:18:27,520 --> 00:18:32,400
it sees a retrieval universe, and in a sprawl universe relevance collapses. The system starts
229
00:18:32,400 --> 00:18:37,040
ranking what's popular, recent keyword dense, or simply easier to pass, not what's correct.
230
00:18:37,040 --> 00:18:41,360
Over time, you aren't just losing accuracy. You're losing semantic stability. The same question
231
00:18:41,360 --> 00:18:45,600
asks two months apart produces two different answers because the underlying document landscape
232
00:18:45,600 --> 00:18:50,480
shifted. That's not intelligence, that's drift. Teams accelerates this because it creates content
233
00:18:50,480 --> 00:18:54,560
faster than governance can classify it. Every new team provisions a SharePoint side, every channel
234
00:18:54,560 --> 00:18:59,760
generates files, meeting artifacts and recordings. Every chat now leaks loop components into existence,
235
00:18:59,760 --> 00:19:04,080
and loop components are especially efficient entropy generators because they feel lightweight,
236
00:19:04,080 --> 00:19:09,360
shareable and harmless. They are not harmless. They're fragments of truth that can be copied into
237
00:19:09,360 --> 00:19:14,320
10 places without the discipline of a single owner, a single version, or a life cycle. Copy
238
00:19:14,320 --> 00:19:18,960
becomes the default behavior. Reference becomes the exception. And the moment copy becomes normal,
239
00:19:18,960 --> 00:19:23,680
you've lost deterministic outcomes. SharePoint sprawl works the same way. People treat SharePoint
240
00:19:23,680 --> 00:19:28,720
sites like project rooms, then forget to close the door when the project ends. They keep the permissions,
241
00:19:28,720 --> 00:19:32,960
they keep the content, they keep the final deck that was final for that week, then a new project
242
00:19:32,960 --> 00:19:38,880
spins up and someone copies the old content because it's close enough, co-pilot then retrieves both,
243
00:19:38,880 --> 00:19:43,840
because both are true in the sense that they exist. And this is where the enterprise's favorite lie
244
00:19:43,840 --> 00:19:49,920
shows up again. Search will handle it. Search can rank. Search cannot establish authority. Search
245
00:19:49,920 --> 00:19:54,480
can't tell you which deck represents the current operating model, which page reflects policy,
246
00:19:54,480 --> 00:19:59,520
and which document was a political compromise that nobody implemented. Search returns candidates.
247
00:19:59,520 --> 00:20:04,240
Co-pilot then reasons over those candidates and generates a smooth answer that hides the ambiguity
248
00:20:04,240 --> 00:20:09,360
you should have seen, so sprawl doesn't just degrade accuracy. It degrades accountability.
249
00:20:09,360 --> 00:20:14,960
When the answer is wrong, nobody can explain why the system chose those sources, or why the real document
250
00:20:14,960 --> 00:20:20,560
didn't win. And the usual response is predictable. Someone creates yet another deck, this time titled
251
00:20:20,560 --> 00:20:25,840
co-pilot guidance, and drops it into yet another site, entropy responds with gratitude.
252
00:20:25,840 --> 00:20:32,400
Now, the most dangerous part of context sprawl is that it creates false confidence through repetition.
253
00:20:32,400 --> 00:20:37,440
If the wrong document gets copied into enough places, it starts to dominate retrieval. It becomes
254
00:20:37,440 --> 00:20:41,840
the statistically likely answer, people see it more often, they quote it more. It becomes what everyone
255
00:20:41,840 --> 00:20:47,200
knows. That is not consensus. That is document replication. And it creates a nasty feedback loop.
256
00:20:47,200 --> 00:20:52,400
Co-pilot surfaces the replicated content, uses trusted because it looks familiar, and then they
257
00:20:52,400 --> 00:20:57,200
propagate it further by pasting it into new artifacts. You've just built mimetic drift into your
258
00:20:57,200 --> 00:21:03,120
operating model. This is also why just add more sources is a trap. Ragn design's collapse at enterprise
259
00:21:03,120 --> 00:21:08,160
scale when the source set becomes a soup. More sources do not mean more truth. They mean more candidates,
260
00:21:08,160 --> 00:21:13,520
and more candidates means more ranking noise. The retrieval engine must pick something, and it will
261
00:21:13,520 --> 00:21:18,000
pick what the signals reward, not what your governance team intended. If you want reliable answers,
262
00:21:18,000 --> 00:21:22,480
you need to shrink the universe, not expand it, which means you need to treat content pathways
263
00:21:22,480 --> 00:21:27,520
as managed systems, where content is allowed to live, how it gets promoted from discussion to guidance
264
00:21:27,520 --> 00:21:33,680
to policy, and how duplicates get killed on contact. If you don't kill duplicates, you are explicitly
265
00:21:33,680 --> 00:21:38,560
choosing probabilistic outcomes. And this is why the notebook container matters again. Notebooks don't
266
00:21:38,560 --> 00:21:43,120
magically fix sprawl. They don't clean your tenant. What they can do is create an intentional boundary.
267
00:21:43,120 --> 00:21:49,040
For this domain, these sources count. That's a design move against sprawl. It's context narrowing as
268
00:21:49,040 --> 00:21:53,680
a control. But if you don't also manage life cycle, freshness, ownership, review cadence,
269
00:21:53,680 --> 00:21:59,440
then the notebook just becomes a curated landfill. A smaller landfill, still a landfill. So failure mode
270
00:21:59,440 --> 00:22:04,160
number two is not users being messy. It's the platform doing exactly what it was built to do,
271
00:22:04,160 --> 00:22:09,040
enable creation at scale with governance, lagging behind, copilot then retrieves at scale with your
272
00:22:09,040 --> 00:22:14,320
ambiguity baked in. Next, the predictable consequence, context rot. Stairness is not a content problem.
273
00:22:14,320 --> 00:22:19,120
It's a security control problem. Life cycle as a security control, context rot is predictable.
274
00:22:19,120 --> 00:22:23,680
Context rot isn't an accident. It's not people forgot. It's the predictable outcome of letting
275
00:22:23,680 --> 00:22:28,720
information persist without a life cycle. Enterprises love retention because retention feels like control.
276
00:22:28,720 --> 00:22:35,200
Keep everything. Never delete. Auditors nod. Legal relaxes. Storage is cheap until it isn't.
277
00:22:35,200 --> 00:22:39,600
But retention is not the same thing as usefulness and it's definitely not the same thing as truth.
278
00:22:39,600 --> 00:22:44,800
Retention preserves artifacts. Truth requires maintenance. And the moment copilot enters the environment,
279
00:22:44,800 --> 00:22:49,200
staleness becomes a first-class risk. Not because all documents exist but because all documents get
280
00:22:49,200 --> 00:22:54,000
retrieved, they become inputs to new decisions that turns outdated guidance into an operational
281
00:22:54,000 --> 00:22:58,880
vulnerability. This is where most organizations make a foundational category error. They treat
282
00:22:58,880 --> 00:23:03,760
freshness as a content quality issue when it's actually a control plane issue. If a document can change
283
00:23:03,760 --> 00:23:08,640
a decision then staleness is a security problem because an outdated policy description can produce
284
00:23:08,640 --> 00:23:14,560
an unauthorized action. An old vendor on boarding process can bypass new risk controls. An old
285
00:23:14,560 --> 00:23:19,760
exception can resurrect a closed loophole. And copilot will do this politely. In perfect grammar with
286
00:23:19,760 --> 00:23:25,360
citations. So life cycle has to be designed not hoped for. Life cycle means three things. Ownership,
287
00:23:25,360 --> 00:23:31,440
review cadence and deprecation behavior. Ownership is not the site owner. Ownership is the person or role
288
00:23:31,440 --> 00:23:36,320
accountable for correctness over time. Someone who can say yes this is still valid. No, that's been
289
00:23:36,320 --> 00:23:41,440
superseded. Here's the replacement. Without that content becomes a permanent maybe. Review cadence is
290
00:23:41,440 --> 00:23:46,400
the second part leaders avoid because it sounds like work. It is work. That's the point. If the
291
00:23:46,400 --> 00:23:51,600
information affects regulatory exposure, security posture or financial decisions then the review cadence
292
00:23:51,600 --> 00:23:56,640
needs to match the risk. Quarterly for high-risk policies. Same annual for operational standards.
293
00:23:56,640 --> 00:24:00,800
Annual for low-impact guidance. Not because those intervals are magical but because time kills
294
00:24:00,800 --> 00:24:05,600
accuracy. Deprecation behavior is the part almost nobody implements. Most tenants don't have a
295
00:24:05,600 --> 00:24:10,480
clean. This is obsolete pattern. They just stop linking to the old thing and hope it dies. It doesn't
296
00:24:10,480 --> 00:24:15,120
die. Search still finds it. Copilot still retrieves it. People still share it. The artifact becomes
297
00:24:15,120 --> 00:24:20,960
undead. No longer maintained. Still influential. That's context rot. And it's predictable because
298
00:24:20,960 --> 00:24:26,880
ownership decays in the same pattern every time. Week one. The project has energy. The notebook or
299
00:24:26,880 --> 00:24:31,520
site looks curated. The links are clean. The instructions are explicit. Everyone agrees this will be
300
00:24:31,520 --> 00:24:36,640
the place. Week three. The owner changes roles or takes PTO or gets pulled into the next fire.
301
00:24:36,640 --> 00:24:41,840
Updates slow down. People add just one more document without pruning. Nobody removes duplicates because
302
00:24:41,840 --> 00:24:47,760
removal feels political. Week six. The notebook of record becomes the notebook of convenience.
303
00:24:47,760 --> 00:24:51,840
It still exists but it no longer represents current truth. It represents the last moment
304
00:24:51,840 --> 00:24:56,320
anyone cared enough to curate it. Then Copilot arrives and treats it as equal to everything else.
305
00:24:56,320 --> 00:25:00,880
That's how you get drift inside the very container you built to stop drift. Now layer retention policies
306
00:25:00,880 --> 00:25:05,520
on top because this is where the enterprise confuses compliance with correctness. Records retention
307
00:25:05,520 --> 00:25:11,360
answers. Can we prove we kept it? It does not answer should we still use it? In fact retention often
308
00:25:11,360 --> 00:25:16,240
guarantees that obsolete material stays available longer than it stays accurate. So the platform
309
00:25:16,240 --> 00:25:20,320
faithfully preserves a record and Copilot faithfully retrieves it and you faithfully make a bad
310
00:25:20,320 --> 00:25:25,920
decision faster. That is not governance. That is automated nostalgia. So life cycle needs to be treated
311
00:25:25,920 --> 00:25:31,040
as a security control specifically for AI assisted work. You're not managing documents. You're
312
00:25:31,040 --> 00:25:36,800
managing decision inputs and that means life cycle has to attach to the context pathway not to individual
313
00:25:36,800 --> 00:25:41,280
user habits. You can't train a million users to remember which file is stale. You can build a
314
00:25:41,280 --> 00:25:46,080
system where stale sources get flagged, demoted or removed from the reasoning environment entirely.
315
00:25:46,080 --> 00:25:50,720
That's why persistent context without life cycle is just a slower version of chat sprawl. It last
316
00:25:50,720 --> 00:25:55,920
longer. It fails later. And it fails with more confidence because everyone assumes persistence implies
317
00:25:55,920 --> 00:26:00,960
validity. So the actual rule is unpleasantly simple. If you want persistent context you need
318
00:26:00,960 --> 00:26:06,000
persistent stewardship. A maintained source set named owners review triggers clear deprecation and
319
00:26:06,000 --> 00:26:10,640
you need a container that can hold intent alongside sources so the constraints don't rot separately
320
00:26:10,640 --> 00:26:14,800
from the documents because files alone don't carry intent. They just carry words.
321
00:26:14,800 --> 00:26:21,680
That container is why notebooks exist. Why Copilot notebooks exist? The container for managed
322
00:26:21,680 --> 00:26:27,520
context. Copilot notebooks exist because Microsoft finally ran into the same wall every enterprise
323
00:26:27,520 --> 00:26:33,440
hits. Chat is an interface, not a system. Chat is good at one thing. Ephemeral interaction
324
00:26:33,440 --> 00:26:38,160
ask answer move on but enterprises keep trying to use chat to do persistent work.
325
00:26:38,160 --> 00:26:42,560
Policy interpretation, architectural standards, project delivery, operating procedures,
326
00:26:42,560 --> 00:26:47,120
risk decisions and executive briefings. Those aren't conversations. Those are repeatable reasoning
327
00:26:47,120 --> 00:26:52,000
problems and repeatable reasoning needs a container. A Copilot notebook is that container.
328
00:26:52,000 --> 00:26:56,720
A managed context workspace where sources and intent are bound together long enough to matter.
329
00:26:56,720 --> 00:27:01,680
Not just for a single prompt for an entire decision thread that spans days, weeks and multiple people.
330
00:27:01,680 --> 00:27:06,960
This is the part most people miss. Notebooks are not primarily about better prompting.
331
00:27:06,960 --> 00:27:11,760
They're about shrinking the retrieval universe on purpose. In the M365 ecosystem,
332
00:27:11,760 --> 00:27:17,120
Copilot can potentially retrieve from a huge surface area. Mail, calendars, files,
333
00:27:17,120 --> 00:27:22,560
SharePoint sites, Teams chats, meetings, loop components and whatever else the graph can see.
334
00:27:22,560 --> 00:27:26,960
That scale is the selling point. It's also the failure mode because when the universe is too large,
335
00:27:26,960 --> 00:27:31,600
ranking becomes guesswork, guesswork becomes drift. Drift becomes why did it answer that?
336
00:27:31,600 --> 00:27:34,880
A notebook is an architectural hack against that drift. It says,
337
00:27:34,880 --> 00:27:39,440
for this work stream, these are the sources that count. And then it keeps them attached to the interaction
338
00:27:39,440 --> 00:27:43,440
so users don't have to re-ground every prompt like they're pleading with a goldfish.
339
00:27:43,440 --> 00:27:48,160
But notebooks are more than a bucket of references. They're also a place to persist instructions.
340
00:27:48,160 --> 00:27:52,160
What the enterprise keeps calling prompting frameworks. But what architects should
341
00:27:52,160 --> 00:27:57,600
recognize as intent constraints? Definitions, exclusions, formatting requirements,
342
00:27:57,600 --> 00:28:03,200
escalation rules and how to behave when sources conflict. This matters because ambiguity is the
343
00:28:03,200 --> 00:28:07,520
default state of enterprise content. So the notebook becomes a kind of authorization aware
344
00:28:07,520 --> 00:28:12,640
reasoning sandbox. You can't make Copilot omniscient, but you can make it predictable inside a scoped
345
00:28:12,640 --> 00:28:16,960
domain. Now compare the roles because this is where tool misuse becomes inevitable. Copilot chat
346
00:28:16,960 --> 00:28:22,080
is for speed. It's the place where people ask, catch me up, summarize a draft. What's the status?
347
00:28:22,080 --> 00:28:27,520
And turn this into something readable. It's disposable. High velocity. Low guarantees.
348
00:28:27,520 --> 00:28:34,000
Pages, loop pages are for collaborative synthesis. They're the artifact you publish when you're done
349
00:28:34,000 --> 00:28:39,360
exploring. A page is where a team turns messy thinking into a consumable output. A brief, a plan,
350
00:28:39,360 --> 00:28:44,320
a set of notes, a table, a checklist. It's the thing you share broadly. Agents are for execution.
351
00:28:44,320 --> 00:28:48,800
When the problem stops being produce an answer and starts being performer workflow, agents matter.
352
00:28:48,800 --> 00:28:53,040
They are the bridge into systems of action where the output becomes a ticket, a task, a record,
353
00:28:53,040 --> 00:28:56,960
or an automation notebook sit in the middle as the reasoning environment, deep work,
354
00:28:56,960 --> 00:29:02,080
source bound, intent bound, iterative. They're where you do analysis that you'll revisit, defend,
355
00:29:02,080 --> 00:29:07,360
and reuse. And this is the key claim. Notebooks reduce randomness by narrowing the retrieval universe
356
00:29:07,360 --> 00:29:12,000
and making intent persistent, not eliminating randomness, reducing it. Because the model is still
357
00:29:12,000 --> 00:29:16,000
generative, it will still produce language. It will still make probabilistic choices, but you're
358
00:29:16,000 --> 00:29:20,240
changing the odds by changing the inputs and you're doing it in a way that can be owned and governed.
359
00:29:20,240 --> 00:29:24,640
This aligns with what Christoph Tuyhaus highlights in his notebook overview. The core idea is
360
00:29:24,640 --> 00:29:29,200
created context, leading to responses that are more traceable, higher quality, and verifiable.
361
00:29:29,200 --> 00:29:33,760
Not because the model suddenly became trustworthy. Because the context became defensible. And there's
362
00:29:33,760 --> 00:29:38,960
one more subtle design choice that matters. Notebooks work by referencing not copying. They link to
363
00:29:38,960 --> 00:29:43,600
sources rather than duplicating them into a new shadow repository. That's not a convenience feature.
364
00:29:43,600 --> 00:29:48,000
That's a governance choice. Copy creates version drift. Links preserve a single update path,
365
00:29:48,000 --> 00:29:52,640
assuming the underlying content has life cycle and ownership. Also notice what notebooks don't do.
366
00:29:52,640 --> 00:29:56,400
They don't grant access. Sharing a notebook doesn't magically override
367
00:29:56,400 --> 00:30:00,800
intra permissions on the underlying sources. That's the platform refusing to let a context container
368
00:30:00,800 --> 00:30:04,480
become a backdoor. Good. The authorization model stays the authorization model. So if you're
369
00:30:04,480 --> 00:30:08,800
expecting notebooks to fix governance, you're going to be disappointed. Notebooks don't fix
370
00:30:08,800 --> 00:30:12,800
governance. They expose it. They make it obvious when your truth placement is broken. When your
371
00:30:12,800 --> 00:30:18,080
source set is stale. And when your permissions model is a disaster. Because the moment you try to
372
00:30:18,080 --> 00:30:22,800
curate context, you discover you don't know what's authoritative. You don't know who owns it.
373
00:30:22,800 --> 00:30:27,360
And you can't explain why one team sees a different answer than another, which is the point.
374
00:30:27,360 --> 00:30:31,040
A notebook is a container for managed context. It's the place where context engineering
375
00:30:31,040 --> 00:30:35,680
becomes real work, not a motivational poster about prompting better. And now the term that
376
00:30:35,680 --> 00:30:41,040
everyone keeps avoiding starts to matter. Context engineering. Context engineering. The new work
377
00:30:41,040 --> 00:30:45,440
you keep avoiding. Context engineering is what happens when you stop treating co-pilot like a clever
378
00:30:45,440 --> 00:30:50,640
employee and start treating it like a system you own. And yes, it's the work everyone avoids because
379
00:30:50,640 --> 00:30:54,880
it sounds like governance and governance sounds like delay. But the delay is already there. You just
380
00:30:54,880 --> 00:30:59,680
pay it later in rework, escalation and incident reviews. So here's the simple definition. Context
381
00:30:59,680 --> 00:31:04,320
engineering is the deliberate design of what co-pilot is allowed to consider how it should interpret it
382
00:31:04,320 --> 00:31:09,600
and what must be produced as evidence after it answers. Not write a better prompt design the
383
00:31:09,600 --> 00:31:14,880
environment. There are three layers and if you ignore any of them you get drift. Layer one is sources.
384
00:31:14,880 --> 00:31:19,760
What the notebook can pull from us truth. Not useful docs. Inputs that are allowed to influence
385
00:31:19,760 --> 00:31:24,720
decisions. This is where you enforce authority. The policy set, the standard operating procedure,
386
00:31:24,720 --> 00:31:29,760
the approved templates, the canonical architecture decisions, the vendor contracts that actually apply,
387
00:31:29,760 --> 00:31:35,600
you're building a small corpus with high signal, not a library with high volume. Layer two is instructions.
388
00:31:35,600 --> 00:31:40,320
The persistent intent. This is where you encode the rules your organization keeps relying on people
389
00:31:40,320 --> 00:31:45,120
to remember what to do when sources conflict, what definitions to use, what to refuse to answer,
390
00:31:45,120 --> 00:31:49,200
when to escalate to a human. Which output formats are acceptable. This is the part executives
391
00:31:49,200 --> 00:31:54,320
call tone, but architects should recognize as constraints and guardrails that reduce ambiguity.
392
00:31:54,320 --> 00:31:59,440
Layer three is outputs, the decision record. This is the part almost nobody builds and it's why
393
00:31:59,440 --> 00:32:04,000
co-pilot adoption stalls. If the output can't be defended it can't be trusted. And if it can't be
394
00:32:04,000 --> 00:32:10,320
trusted it stays a toy. Outputs need to behave like receipts, citations, assumptions,
395
00:32:10,320 --> 00:32:15,120
and a stable format that can be reviewed. The point isn't to make co-pilot verbose. The point is to
396
00:32:15,120 --> 00:32:19,440
make co-pilot accountable inside a workflow. Now here's the objection that shows up immediately.
397
00:32:19,440 --> 00:32:24,400
We already have a prompt framework, goal context source expectations. We trained users. Sure. And
398
00:32:24,400 --> 00:32:28,160
it works about as well as every other program that tries to make humans compensate for missing
399
00:32:28,160 --> 00:32:32,400
system design. Prompt frameworks are manual context engineering. They're just done badly because
400
00:32:32,400 --> 00:32:36,880
they're done one prompt at a time by the least consistent part of the system, people. In a large
401
00:32:36,880 --> 00:32:41,440
tenant you don't need better individual behavior. You need reusable governed configuration that
402
00:32:41,440 --> 00:32:45,680
survives turnover and stress. That's what the notebook gives you a place to bind the source set
403
00:32:45,680 --> 00:32:50,240
and the instructions so you don't have to recreate the same constraints every day. But it only
404
00:32:50,240 --> 00:32:54,720
works if you treat the notebook like a product, not like a scratch pad. So context engineering has a
405
00:32:54,720 --> 00:33:00,560
few non-negotiable behaviors. First, define the question space. What is this notebook allowed to answer?
406
00:33:00,560 --> 00:33:05,920
Vendor onboarding for external data processes. Project status and risks for program X.
407
00:33:05,920 --> 00:33:10,800
Security standards for endpoint configuration. And if you can't define that in one sentence,
408
00:33:10,800 --> 00:33:16,000
you're building a junk drawer. Second, define exclusions. This is where the enterprise stops being naive.
409
00:33:16,000 --> 00:33:20,960
What must the notebook refuse? Legal interpretation beyond policy text, HR advice beyond
410
00:33:20,960 --> 00:33:25,520
published guidance. Anything involving regulated data movement without explicit citations.
411
00:33:25,520 --> 00:33:30,240
The system needs permission to say no, or it will say yes, in fluent English. Third,
412
00:33:30,240 --> 00:33:35,040
define the authoritative source hierarchy. When sources conflict what wins, a formally published
413
00:33:35,040 --> 00:33:40,240
policy over a slide deck. A labeled standard over a meeting note. A signed contract over an email
414
00:33:40,240 --> 00:33:44,960
summary. If you don't tell co-pilot what authority means, it will treat recency and keyword density
415
00:33:44,960 --> 00:33:51,520
as authority. That's how garbage becomes truth. Fourth, make ownership explicit. Not everyone can edit.
416
00:33:51,520 --> 00:33:56,560
Someone owns the context. Someone curates the source list. Someone reviews the instructions. Someone
417
00:33:56,560 --> 00:34:01,600
sets the cadence. Otherwise, the notebook becomes a museum of good intentions. This is why executives
418
00:34:01,600 --> 00:34:06,160
should care, even if they don't care about the mechanics. Because context engineering is how you get
419
00:34:06,160 --> 00:34:12,000
two things. Leadership actually wants. Quality and accountability. Fewer wrong answers that look
420
00:34:12,000 --> 00:34:16,880
right. Faster decisions that don't get relitigated because nobody knows where the answer came from.
421
00:34:16,880 --> 00:34:22,640
And there's a cost angle too, because entropy isn't free. Sproul drives storage. Sproul drives indexing.
422
00:34:22,640 --> 00:34:28,720
Sproul drives compute. Then everyone pretends co-pilot is expensive. When the real cost is that you
423
00:34:28,720 --> 00:34:34,560
never control the context surface area in the first place. So no, context engineering isn't an
424
00:34:34,560 --> 00:34:38,960
niche practice. It's the new enterprise literacy for AI. And the most important part is the one
425
00:34:38,960 --> 00:34:44,240
that breaks almost every design at scale. Identity. Because the moment you try to engineer context
426
00:34:44,240 --> 00:34:48,720
across real teams, you discover that the retrieval problem is not finding the right document.
427
00:34:48,720 --> 00:34:54,720
It's which documents exist for which user, at which moment, with which permissions, and with which
428
00:34:54,720 --> 00:35:01,200
drift. That's where the next failure mode lives. Failure mode 3. Broken ragged scale. Broken ragged
429
00:35:01,200 --> 00:35:05,680
scale is what happens when everyone runs a pilot celebrates the demo and then collides with the part
430
00:35:05,680 --> 00:35:11,200
they didn't model. The enterprise is not a lab and retrieval is not search with better vibes.
431
00:35:11,200 --> 00:35:15,600
In a pilot, the corpus is small. Permissions are clean because you handpicked the participants. The
432
00:35:15,600 --> 00:35:20,080
content is fresh because you just created it. And the truth documents are obvious because you curated
433
00:35:20,080 --> 00:35:25,360
them. So rag looks deterministic. Ask a question, get the right source, get a decent answer. Then you
434
00:35:25,360 --> 00:35:29,920
go to production. Now the corpus is millions of items. Duplicates exist by design. Old content
435
00:35:29,920 --> 00:35:34,480
didn't get deprecated because retention kept it alive. Teams spun up hundreds of sites that nobody
436
00:35:34,480 --> 00:35:39,680
owns. And permissions drifted because share with the vendor felt urgent at the time. Ragged
437
00:35:39,680 --> 00:35:45,360
still works. Technically. But the behavior becomes unreliable because the input universe becomes
438
00:35:45,360 --> 00:35:50,960
adversarial to relevance. The first failure pattern is retrieval mismatch. The correct document exists,
439
00:35:50,960 --> 00:35:55,120
but it doesn't dominate the ranking. The wrong document wins because it has more keywords,
440
00:35:55,120 --> 00:36:00,000
more repetitions, a more generic title, or simply fewer access constraints. In other words,
441
00:36:00,000 --> 00:36:04,720
the system doesn't retrieve the best source. It retrieves the most retrievable source. That distinction
442
00:36:04,720 --> 00:36:10,400
matters. Because in enterprise content, easy to retrieve often correlates with least governed.
443
00:36:10,400 --> 00:36:15,520
Broad access, weak ownership, lots of copies, lots of drafts, the exact opposite of what you want
444
00:36:15,520 --> 00:36:20,880
feeding an AI that speaks with confidence. The second failure pattern is permission skew.
445
00:36:20,880 --> 00:36:25,600
When a user asks a question, the system can only retrieve what that user is allowed to access.
446
00:36:25,600 --> 00:36:30,480
So two users ask the same question and get different answers, not because the model behaved differently,
447
00:36:30,480 --> 00:36:34,880
but because the retrieval set was different. The organization interprets this as co-pilot is
448
00:36:34,880 --> 00:36:41,760
inconsistent, as co-pilot isn't inconsistent. Your authorization graph is... This is the part leaders
449
00:36:41,760 --> 00:36:48,400
don't like hearing. Ragn, Microsoft 365 is not global enterprise truth. It is authorization filtered
450
00:36:48,400 --> 00:36:52,640
truth. The answer is always shaped by the call as identity, that's how it should be, but it means
451
00:36:52,640 --> 00:36:57,600
your context strategy must assume fragmentation. There is no single answer across the enterprise
452
00:36:57,600 --> 00:37:03,600
unless you designed one. The third failure pattern is freshness collapse. People assume retrieval will
453
00:37:03,600 --> 00:37:09,360
find the latest version. But latest is not a stable concept in a tenant with multiple copies,
454
00:37:09,360 --> 00:37:14,400
multiple sites, and multiple publishing pathways. A policy PDF from last year might be latest in
455
00:37:14,400 --> 00:37:19,920
one library. A page updated yesterday might be latest elsewhere. A deck revised this morning might
456
00:37:19,920 --> 00:37:24,480
be latest in someone's one drive. The retrieval engine doesn't understand your publishing model because
457
00:37:24,480 --> 00:37:29,120
you never built one, so you get temporal roulette. And this is where the enterprise starts doing
458
00:37:29,120 --> 00:37:34,640
dangerous things like asking co-pilot to only use the most recent document as if recency implies
459
00:37:34,640 --> 00:37:41,440
authority. Reconcy often means someone touched it, not someone govern it. The fourth failure pattern
460
00:37:41,440 --> 00:37:46,080
is the observability gap. When an answer is wrong you can't explain why it happened. You might see
461
00:37:46,080 --> 00:37:50,560
a few citations but you can't see the ranking rationale, the excluded candidates, the permission
462
00:37:50,560 --> 00:37:55,120
filtering decisions, or the full context window that shaped the generation. So the platform owners
463
00:37:55,120 --> 00:38:00,320
can't debug, architects can't defend, and leadership can't trust. This is why Raga is not a feature,
464
00:38:00,320 --> 00:38:05,920
it's a system, and systems require observability and control surfaces if you want reliable behavior.
465
00:38:05,920 --> 00:38:11,360
Now to be clear, none of this means Raga is bad. It means the common mental model is wrong.
466
00:38:11,360 --> 00:38:16,800
Most people treat Raga like search, type words, get the right file. But Raga is not search. Raga is
467
00:38:16,800 --> 00:38:22,160
retrieval plus synthesis under constraints. The output is not the document. The output is a generated
468
00:38:22,160 --> 00:38:27,360
artifact that inherits every ambiguity in the retrieval set. So when the retrieval set is messy the
469
00:38:27,360 --> 00:38:32,240
synthesis is confident nonsense. And when the retrieval set is permission fragmented the synthesis
470
00:38:32,240 --> 00:38:37,040
becomes user fragmented. And when the retrieval set is stale, the synthesis becomes operationally
471
00:38:37,040 --> 00:38:41,600
dangerous. This is why notebooks matter again. They narrow the retrieval universe and keep the source
472
00:38:41,600 --> 00:38:46,400
set explicit. They don't solve identity, they don't solve freshness, they don't solve sprawl, but they
473
00:38:46,400 --> 00:38:51,520
let you design a bounded reasoning environment where Raga has a fighting chance to behave predictably.
474
00:38:51,520 --> 00:38:55,600
And here's the uncomfortable truth. If you can't explain why an answer happened you don't have an
475
00:38:55,600 --> 00:38:59,840
AI system. You have a slot machine with citations, which is why the next section is unavoidable.
476
00:38:59,840 --> 00:39:04,800
The real control plane isn't the model. It's Entra. Entra is the real control plane identity
477
00:39:04,800 --> 00:39:09,840
shapes context. Everyone keeps looking for the co-pilot control plane inside co-pilot. It isn't there.
478
00:39:09,840 --> 00:39:14,320
The real control plane is Entra because Entra decides what co-pilot is even allowed to consider
479
00:39:14,320 --> 00:39:19,200
as context. Not what it answers, what it can see. And that means identity doesn't just secure
480
00:39:19,200 --> 00:39:23,120
co-pilot. Identity shapes co-pilot's reality. That distinction matters.
481
00:39:23,120 --> 00:39:30,560
In Microsoft 365, retrieval is authorization filtered. Co-pilot doesn't retrieve the best answer.
482
00:39:30,560 --> 00:39:34,720
It retrieves the best answer you're permitted to access at that moment with your current group
483
00:39:34,720 --> 00:39:40,240
memberships, your current link permissions and whatever inheritance chaos, your tenant accumulated
484
00:39:40,240 --> 00:39:44,800
over the last decade. So when leadership asks why did co-pilot tell finance one thing and legal
485
00:39:44,800 --> 00:39:49,280
another? The answer is usually boring. Different users, different graphs, different retrieval sets.
486
00:39:49,280 --> 00:39:53,360
Co-pilot didn't contradict itself. Your authorization graph did. And the enterprise keeps acting
487
00:39:53,360 --> 00:39:58,720
surprised because it still treats identity like a gate at the front door. Authenticate. Then you're
488
00:39:58,720 --> 00:40:03,760
inside. That mental model died years ago. Entra is a distributed authorization system. It's
489
00:40:03,760 --> 00:40:08,720
continuously evaluated. It's group memberships, conditional access outcomes, app permissions,
490
00:40:08,720 --> 00:40:14,400
share links, external collaboration settings, and the slow erosion of least privilege as urgent
491
00:40:14,400 --> 00:40:19,280
work keeps demanding exceptions. Those exceptions don't stay isolated. They accumulate.
492
00:40:19,280 --> 00:40:25,600
Permission drift is not a one-time mistake. It's a structural behavior. Groups expand,
493
00:40:25,600 --> 00:40:31,040
owners change, sites inherit permissions, nobody remembers, and guest access becomes permanent because
494
00:40:31,040 --> 00:40:35,680
we might need them again. Then someone creates a sharing link with anyone with the link because the
495
00:40:35,680 --> 00:40:40,000
vendor couldn't access the file and the meeting started in two minutes. That single link is an
496
00:40:40,000 --> 00:40:44,400
entropy generator because now the document's audience is no longer defined by a group with owners
497
00:40:44,400 --> 00:40:48,720
and life cycle. It's defined by the existence of a URL. And Co-pilot will happily retrieve that
498
00:40:48,720 --> 00:40:53,360
document for anyone who can access it. The link didn't just share a file. It changed the retrieval
499
00:40:53,360 --> 00:40:58,000
landscape. Now take that to scale. SharePoint inheritance breaks in odd places. Teams create sites
500
00:40:58,000 --> 00:41:02,240
automatically. Loop components get shared and reshare across chats. Users drop files into one
501
00:41:02,240 --> 00:41:07,040
drive and then share them externally with links. Group-based access and link-based access collide.
502
00:41:07,040 --> 00:41:11,280
And nobody has a clean map of who can see what anymore. So ragged scale becomes permission chaos
503
00:41:11,280 --> 00:41:16,400
at scale. This is why Co-pilot notebooks will fix it is the wrong expectation. A shared notebook does
504
00:41:16,400 --> 00:41:22,000
not grant underlying resource access. Notebooks can reference files and pages, but entra still enforces
505
00:41:22,000 --> 00:41:26,880
access to the targets. If someone opens your notebook and half the sources show as inaccessible,
506
00:41:26,880 --> 00:41:31,680
that isn't a notebook problem. That's the system telling you the truth. Your team doesn't share
507
00:41:31,680 --> 00:41:36,160
a common context boundary. And if your team doesn't share a common boundary, you will never get
508
00:41:36,160 --> 00:41:41,040
consistent answers. You'll get roll-shaped answers. Which is fine until you pretend they're enterprise
509
00:41:41,040 --> 00:41:46,160
truth. So entra becomes the real control plane for persistent context because it defines the audience
510
00:41:46,160 --> 00:41:52,160
for truth. If your authoritative policy lives in a site that everyone can read, you better mean
511
00:41:52,160 --> 00:41:56,640
everyone. If it lives in a restricted library, you better accept that many users will never get
512
00:41:56,640 --> 00:42:01,120
that policy as a grounding source. Therefore, Co-pilot will fill the gap with whatever else they can
513
00:42:01,120 --> 00:42:05,680
see. This is where architects have to stop being sentimental about collaboration. Open access
514
00:42:05,680 --> 00:42:10,640
increases reuse, but it also increases blast radius. Restricted access reduces blast radius,
515
00:42:10,640 --> 00:42:15,120
but it also fragments truth. You don't avoid that trade-off. You choose it, then you design for it.
516
00:42:15,120 --> 00:42:19,120
And the enterprise-friendly way to design for it is to make the truth layer broadly readable
517
00:42:19,120 --> 00:42:23,280
and tightly right-able. Wide-read access, narrow-edit access, formal publishing,
518
00:42:23,280 --> 00:42:27,600
versioning, ownership, that's not bureaucracy. That's how you create a stable retrieval anchor
519
00:42:27,600 --> 00:42:32,720
across the tenant without letting content mutate through helpful edits. Then you use PerView
520
00:42:32,720 --> 00:42:37,680
labeling and DLP to constrain sensitive parts of that truth so it doesn't leak where it shouldn't.
521
00:42:37,680 --> 00:42:42,880
Identity defines who can see. Labels define what can travel. Together they define the context boundary.
522
00:42:42,880 --> 00:42:46,400
But again, none of this is a co-pilot feature. It's authorization architecture.
523
00:42:46,400 --> 00:42:50,560
So if you want persistent context to work, you have to stop treating permissions as an afterthought.
524
00:42:50,560 --> 00:42:55,120
You need to manage group sprawl, kill anonymous links, control guest access, and watch inheritance
525
00:42:55,120 --> 00:42:59,440
like it's a security perimeter because it is. And you need to accept the harsh conclusion.
526
00:42:59,440 --> 00:43:03,120
Co-pilot outcomes are only as consistent as your identity model.
527
00:43:03,120 --> 00:43:05,920
If your entrograph is drifting, your answers will drift.
528
00:43:05,920 --> 00:43:08,880
If your groups are unmanaged, your truth will fragment.
529
00:43:08,880 --> 00:43:12,080
If your sharing links are uncontrolled, your context boundary will dissolve.
530
00:43:12,080 --> 00:43:14,880
That's why governance can't attach to user behavior.
531
00:43:14,880 --> 00:43:19,760
It has to attach to the pathways, how content gets created, shared, labeled, and made retrievable.
532
00:43:19,760 --> 00:43:24,480
And that leads directly to the part everyone claims they'll do later right after the rollout,
533
00:43:24,480 --> 00:43:28,240
right after adoption, right after the next incident. PerView. PerView.
534
00:43:28,240 --> 00:43:32,880
The part everyone claims they'll do later. PerView is the part of the story where everyone nods,
535
00:43:32,880 --> 00:43:36,880
agrees, and then quietly changes the subject. Because PerView feels like compliance tooling.
536
00:43:36,880 --> 00:43:41,680
And compliance feels like a tax, something to bolt on after the real work of co-pilot adoption,
537
00:43:41,680 --> 00:43:45,520
right after the pilot, right after the excitement, right after the business units,
538
00:43:45,520 --> 00:43:48,080
stop calling it magic. That delay is not neutral.
539
00:43:48,080 --> 00:43:52,160
It's an architectural choice to let co-pilot operate without a classification model,
540
00:43:52,160 --> 00:43:56,800
without consistent policy signals, and without defensible handling for the outputs it generates.
541
00:43:56,800 --> 00:44:01,040
In other words, you're asking a probabilistic system to behave responsibly while you postpone
542
00:44:01,040 --> 00:44:05,760
the only machinery you have for expressing responsibility at scale. PerView isn't a sticker machine,
543
00:44:05,760 --> 00:44:09,840
sensitivity labels aren't decorative, they're context constraints, they're machine-readable
544
00:44:09,840 --> 00:44:14,320
signals that say this content has a handling requirement, a sharing boundary, and sometimes an
545
00:44:14,320 --> 00:44:19,040
encryption boundary. When labels exist and are applied consistently, co-pilot doesn't just
546
00:44:19,040 --> 00:44:24,480
see documents. It sees documents with guardrails attached. That's the point. Without those signals,
547
00:44:24,480 --> 00:44:29,200
co-pilot retrieves across a flat content universe. Everything looks the same, so the model ranks by
548
00:44:29,200 --> 00:44:33,840
relevant signals and produces an answer. And if the answer includes sensitive content,
549
00:44:33,840 --> 00:44:38,640
you're now relying on the user to notice and behave. That's not governance, that's wishful thinking.
550
00:44:38,640 --> 00:44:44,160
This is also where people confuse enforcement with awareness. A label taxonomy, without policy,
551
00:44:44,160 --> 00:44:49,680
is just a vocabulary lesson. The enterprise needs labels to drive behaviors, default labeling,
552
00:44:49,680 --> 00:44:55,040
mandatory labeling, restrictions on sharing, and downstream controls, like DLP. Otherwise,
553
00:44:55,040 --> 00:44:59,120
you've built a classification scheme that exists purely for reporting dashboards, no one reads.
554
00:44:59,120 --> 00:45:03,920
And DLP is where the later excuse becomes expensive. DLP is not there to punish users. It's there
555
00:45:03,920 --> 00:45:10,000
to prevent predictable failure modes, pasting regulated data into the wrong place. Sharing a summary
556
00:45:10,000 --> 00:45:15,920
that includes PII with the wrong audience, or taking an AI-generated artifact and distributing it
557
00:45:15,920 --> 00:45:21,920
as if it's clean. Co-pilot accelerates creation. DLP becomes the seatbelt. You don't install seatbelts
558
00:45:21,920 --> 00:45:27,040
after the crash. Now, add inside-a-risk organizations pretend inside-a-risk is only about malicious
559
00:45:27,040 --> 00:45:32,800
actors. It's not. It's also about high-velocity accidents. Someone pressure to deliver using co-pilot
560
00:45:32,800 --> 00:45:37,680
to synthesize data and then moving it to an unmanaged location because it's just a draft.
561
00:45:37,680 --> 00:45:42,640
Co-pilot didn't leak data by itself. Your workflow did. Inside-a-risk management exists to
562
00:45:42,640 --> 00:45:47,360
detect those patterns and put friction where it matters. Then there's lineage and e-discovery.
563
00:45:47,360 --> 00:45:51,680
The part nobody wants to talk about because it turns AI-assistant into records problem.
564
00:45:51,680 --> 00:45:56,720
Co-pilot outputs are not ephemeral by default. They get copied into emails. They get pasted into decks.
565
00:45:56,720 --> 00:46:01,120
They get turned into loop pages. They become briefs, decisions, risk registers, and guidance.
566
00:46:01,120 --> 00:46:05,040
Those artifacts influence outcomes. That means they become discoverable evidence in
567
00:46:05,040 --> 00:46:09,920
audits, investigations, and litigation. If you can't trace what sources shape them and what labels
568
00:46:09,920 --> 00:46:14,720
govern them, you didn't just lose accountability. You lost defensibility. This is why Perview has
569
00:46:14,720 --> 00:46:18,960
to be treated as part of the co-pilot architecture, not as a compliance phase. Perview is how you
570
00:46:18,960 --> 00:46:23,280
attach governance to content pathways. The inputs you retrieve, the outputs you generate,
571
00:46:23,280 --> 00:46:28,000
and the places those outputs travel. And yes, there's a catch. Perview can't compensate for missing
572
00:46:28,000 --> 00:46:32,720
authority. It can constrain data movement. It can apply labels. It can enforce retention. But it
573
00:46:32,720 --> 00:46:37,440
cannot decide which document is true when your tenant has five contradictory versions. That's
574
00:46:37,440 --> 00:46:41,600
still a context engineering problem. So the correct mental model is Perview constraints the boundary
575
00:46:41,600 --> 00:46:46,880
conditions, not the reasoning quality. It reduces blast radius. It increases auditability.
576
00:46:46,880 --> 00:46:51,760
It makes outputs and sources governable. And it gives leadership something they always ask for,
577
00:46:51,760 --> 00:46:56,800
once the first incident happens. Proof that the system respected handling requirements.
578
00:46:56,800 --> 00:47:01,200
Proof that sensitive outputs didn't travel where they shouldn't. Proof that decisions have
579
00:47:01,200 --> 00:47:05,520
receipts, not just polished prose. So when people say we'll do Perview later, what they mean is
580
00:47:05,520 --> 00:47:09,520
we'll accept uncontrolled context now and we'll deal with the consequences when the consequences
581
00:47:09,520 --> 00:47:14,560
become visible. The platform will allow that. The auditors will not. Next, this has to connect back
582
00:47:14,560 --> 00:47:18,560
to patterns, not tools, because the point isn't to memorize Perview features. The point is to
583
00:47:18,560 --> 00:47:23,360
understand the persistent context triad Microsoft is quietly assembling. Personal capture,
584
00:47:23,360 --> 00:47:30,480
collaborative synthesis, and managed reasoning. The persistent context triad, one note, pages,
585
00:47:30,480 --> 00:47:36,000
notebook. Microsoft is quietly building a triad for persistent context and most enterprises will
586
00:47:36,000 --> 00:47:41,280
misuse all three parts because they'll treat them as interchangeable note taking with AI. They're not,
587
00:47:41,280 --> 00:47:46,000
these containers behave differently, degrade differently, and govern differently. One note pages
588
00:47:46,000 --> 00:47:50,800
and notebooks. If you don't assign each one a role, your context strategy turns into a scavenger hunt
589
00:47:50,800 --> 00:47:55,920
across apps and nobody can explain which artifact actually matters. One note is personal capture,
590
00:47:55,920 --> 00:48:01,840
fast, low friction, high convenience, and therefore low defensibility. It's where fragments live,
591
00:48:01,840 --> 00:48:07,120
meeting notes, screenshots, half ideas and drafts. It should stay personal and mostly ephemeral.
592
00:48:07,120 --> 00:48:12,480
The moment a team treats one note as the system of record, you've outsourced enterprise truth
593
00:48:12,480 --> 00:48:18,000
to an individual's private workspace and a life cycle you don't control. Pages, loop pages,
594
00:48:18,000 --> 00:48:23,760
are collaborative synthesis. They turn messy thinking into something humans can consume,
595
00:48:23,760 --> 00:48:29,280
a brief, a checklist, a risk table, a plan. Pages feel official because they're tidy, but tidy is not
596
00:48:29,280 --> 00:48:34,960
approved. If a page becomes decision driving, it needs ownership, control change, and review cadence.
597
00:48:34,960 --> 00:48:39,600
Otherwise, it's just a nicer wiki and wiki's decay into confident ambiguity. Notebooks are
598
00:48:39,600 --> 00:48:44,960
curated reasoning environments. They bind sources and intent into a repeatable context scope,
599
00:48:44,960 --> 00:48:49,200
so co-pilot can produce outputs that are traceable and consistent inside a domain. Notebooks don't
600
00:48:49,200 --> 00:48:54,640
replace pages. They feed pages. The notebook is where you constrain the retrieval universe. The
601
00:48:54,640 --> 00:48:58,960
page is what you publish when you want others to consume the result, so the flow is simple. One
602
00:48:58,960 --> 00:49:04,160
note captures notebooks constrain, pages communicate, and when something becomes enterprise truth,
603
00:49:04,160 --> 00:49:08,800
policy, standards, operating procedures, it needs to live in an authoritative publishing model
604
00:49:08,800 --> 00:49:12,880
with governance semantics attached. None of these three containers are your policy engine,
605
00:49:12,880 --> 00:49:18,800
their context surfaces. Next, the practical checklist. How to decide what must be persistent and
606
00:49:18,800 --> 00:49:24,480
what must be allowed to die. The context design, checklist, persistent versus ephemera.
607
00:49:24,480 --> 00:49:29,120
The next mistake is assuming everything should be captured because AI is here now. That's how tenants
608
00:49:29,120 --> 00:49:34,480
turn into landfills with search bars. So the first checklist item is brutally simple. Decide what
609
00:49:34,480 --> 00:49:39,200
is allowed to die. Ephemeral context is the stuff that helps you think, negotiate, and explore,
610
00:49:39,200 --> 00:49:44,320
but shouldn't become part of the enterprise memory. Brainstorming, half-formed options, draft
611
00:49:44,320 --> 00:49:51,280
language. What if we, conversations, early-risk spikes that get resolved? Political compromise
612
00:49:51,280 --> 00:49:57,040
notes that were true for one meeting and toxic forever after. Most teams chat. Most meeting chatter.
613
00:49:57,040 --> 00:50:02,080
Most whiteboard captures. Ephemeral is not worthless. It's just not an asset. It's a consumable
614
00:50:02,080 --> 00:50:07,920
input to get to a decision. And if you persisted by default, Copilot will retrieve it later and
615
00:50:07,920 --> 00:50:12,000
treat it as if it still matters. That's how yesterday's uncertainty becomes today's guidance.
616
00:50:12,000 --> 00:50:16,960
Persistent context is different. Persistent means this content is expected to influence decisions
617
00:50:16,960 --> 00:50:22,400
later and the enterprise is willing to be accountable for it. Policies, standards, architectural
618
00:50:22,400 --> 00:50:28,480
decisions, operating procedures, approved templates, vendor onboarding rules, security baselines,
619
00:50:28,480 --> 00:50:33,280
financial assumptions used in planning, anything that becomes a reference point for why did we do it
620
00:50:33,280 --> 00:50:39,520
this way? Here's the litmus test. If someone will ask who approved this, it needs persistence, not
621
00:50:39,520 --> 00:50:44,720
save the file somewhere. Persistence with ownership, life cycle and a place in the authority hierarchy.
622
00:50:44,720 --> 00:50:51,120
Now, leaders love to skip this and say, just put it all in a notebook. No. A notebook is not a garbage
623
00:50:51,120 --> 00:50:55,280
compactor. It's a bounded reasoning environment. If you treat it like a dumping ground, you are
624
00:50:55,280 --> 00:51:00,400
literally curating your own future hallucinations. You're building a retrieval corpus that contains
625
00:51:00,400 --> 00:51:04,960
contradictions, drafts and opinions, then acting surprised when Copilot synthesizes them into
626
00:51:04,960 --> 00:51:11,200
something that sounds official, so apply the split. Ephemeral use cases, ideation sessions,
627
00:51:11,200 --> 00:51:17,760
negotiation prep, exploratory Q&A, meeting catch-up, quick comparisons, draft emails, give me options,
628
00:51:17,760 --> 00:51:22,480
summarize this thread, what did we decide last week when the decision isn't actually recorded
629
00:51:22,480 --> 00:51:26,960
anywhere else? This is Copilot chat territory. It's fast, it's disposable, it should not become
630
00:51:26,960 --> 00:51:31,600
the enterprise's memory. Persistent use cases, anything that changes how people operate.
631
00:51:31,600 --> 00:51:36,640
That includes what's the correct label and why? Are we allowed to share this externally?
632
00:51:36,640 --> 00:51:41,840
What's the approved process? What is the standard build? What are the non-negotiable controls?
633
00:51:41,840 --> 00:51:46,800
What does confidential mean here? What is the current vendor stance? What is the architecture
634
00:51:46,800 --> 00:51:52,080
decision and its rationale? These questions aren't about productivity. They're about governance and
635
00:51:52,080 --> 00:51:56,480
repeatability. They deserve a persistent context container, whether that's a notebook bound to
636
00:51:56,480 --> 00:52:01,360
authoritative sources or a published knowledge base or a formal policy artifact. Now the uncomfortable
637
00:52:01,360 --> 00:52:05,760
rule that actually holds, if it changes decisions later, it needs persistence and ownership,
638
00:52:05,760 --> 00:52:10,880
not because it's important, because it's a decision input. And in an AI assisted enterprise,
639
00:52:10,880 --> 00:52:15,760
decision inputs are part of the control plane. This is also how you stop wasting effort on pointless
640
00:52:15,760 --> 00:52:20,080
documentation. People document too much when they don't know what counts. If you define the
641
00:52:20,080 --> 00:52:24,640
persistent set, you can let everything else stay ephemeral and stop pretending every meeting
642
00:52:24,640 --> 00:52:29,440
note is corporate memory. Most meeting notes are not knowledge. They are transaction logs for
643
00:52:29,440 --> 00:52:34,000
humans, useful in the moment, dangerous as future truth. So what does this look like in practice?
644
00:52:34,000 --> 00:52:38,880
If you're creating a notebook for a program, don't start by dumping your last 50 files into it.
645
00:52:38,880 --> 00:52:43,600
Start by declaring, what decisions is this notebook allowed to influence?
646
00:52:43,600 --> 00:52:48,320
If the answer is all of them, you've already failed. Define the decision domain. Then collect the
647
00:52:48,320 --> 00:52:52,320
smallest set of authoritative sources that govern that domain, then capture the outputs that
648
00:52:52,320 --> 00:52:56,160
become decision records. Everything else stays outside. If you're building a page, treat it as a
649
00:52:56,160 --> 00:53:01,200
publication surface, not a scratch pad. Pages can be persistent artifacts, but only if you decide
650
00:53:01,200 --> 00:53:06,400
they are. If the page drives decisions, it needs an owner and a review date. If it doesn't stop
651
00:53:06,400 --> 00:53:11,440
treating it like a living standard, if you're using one note, keep it personal and ephemeral by default.
652
00:53:11,440 --> 00:53:15,600
Promote only what matters. Otherwise, you'll create a parallel knowledge system that nobody can
653
00:53:15,600 --> 00:53:20,720
govern. And yes, this also applies to AI outputs. An AI generated summary is ephemeral until you make
654
00:53:20,720 --> 00:53:25,360
it persistent. The moment you paste it into a standard, a brief, a policy draft or an operating
655
00:53:25,360 --> 00:53:29,680
procedure, it becomes part of the enterprise memory and it must inherit governance. That means
656
00:53:29,680 --> 00:53:34,560
labeling, traceability, and review like any other decision artifact. Persistence is not a storage
657
00:53:34,560 --> 00:53:39,040
decision. It's an accountability decision. And once you get the split right, you unlock the next
658
00:53:39,040 --> 00:53:43,600
constraint. Persistence without boundaries is just a bigger surface area for wrong answers.
659
00:53:43,600 --> 00:53:48,240
The context design checklist, boundaries, and constraints. Once you decide something
660
00:53:48,240 --> 00:53:52,880
deserves persistence, the next failure is assuming persistence automatically creates reliability.
661
00:53:52,880 --> 00:53:57,120
It doesn't. Persistence just makes the wrong thing available for longer. So the next checklist item
662
00:53:57,120 --> 00:54:02,640
is boundaries. Not vibes, not be careful. Boundaries that the system can follow and humans can audit.
663
00:54:02,640 --> 00:54:07,440
A notebook without boundaries becomes a multi-tenant junk drawer. It answers whatever you ask,
664
00:54:07,440 --> 00:54:11,760
from whatever it can reach, in whatever format feels convenient that day. That is just chat sprawl
665
00:54:11,760 --> 00:54:17,200
with a nicer sidebar. So define the question space first. Every notebook needs a one sentence charter.
666
00:54:17,200 --> 00:54:22,160
What it is allowed to answer, for whom, and in what operational domain. This notebook answers
667
00:54:22,160 --> 00:54:26,080
questions about third-party vendor onboarding requirements for our EU operations.
668
00:54:26,080 --> 00:54:31,440
This notebook produces security exception assessments for endpoint configuration controls.
669
00:54:31,440 --> 00:54:36,640
This notebook generates weekly program risk briefs for program X. That sentence is not documentation.
670
00:54:36,640 --> 00:54:41,920
It's scope control because scope is how you stop the notebook becoming the place people ask everything
671
00:54:41,920 --> 00:54:47,840
then blame the platform when answers get fuzzy. Next, define exclusions explicitly. This is where the
672
00:54:47,840 --> 00:54:53,040
enterprise stops pretending AI is a colleague with judgment. It isn't. It's a synthesis engine.
673
00:54:53,040 --> 00:54:57,840
If you don't tell it what not to do, it will happily step into legal advice, HR interpretation,
674
00:54:57,840 --> 00:55:02,080
or just tell me what we can get away with. Territory. And it will do it in confident
675
00:55:02,080 --> 00:55:07,200
prose that looks like authority. So exclusions have to be written like refusal rules. This notebook
676
00:55:07,200 --> 00:55:11,920
must refuse to answer questions that require legal interpretation beyond the cited policy text.
677
00:55:11,920 --> 00:55:15,840
This notebook must refuse to recommend data handling decisions without referencing labeled
678
00:55:15,840 --> 00:55:20,560
policy sources. This notebook must escalate to security when a requested action involves
679
00:55:20,560 --> 00:55:25,600
external sharing of sensitive data. Refusal is not rudeness, it's control. Then define an authority
680
00:55:25,600 --> 00:55:30,320
hierarchy, not in a PowerPoint, in the notebooks persistent instructions when sources conflict what
681
00:55:30,320 --> 00:55:35,440
wins. Published policy beats guidance. Guidance beats draft notes. Signed contract beats email
682
00:55:35,440 --> 00:55:40,160
summary. A labeled standard beats an unlabeled deck. If you don't encode this hierarchy,
683
00:55:40,160 --> 00:55:44,160
the retrieval engine will treat the thing that matches the prompt as the winner. That's how
684
00:55:44,160 --> 00:55:48,320
keyword density becomes governance. And yes, this is where some leaders get uncomfortable because
685
00:55:48,320 --> 00:55:53,280
it forces you to admit that we have multiple truths is not a cultural nuance. It's operational
686
00:55:53,280 --> 00:55:59,440
debt. Now add format constraints because format is not aesthetics. Format is how outputs become
687
00:55:59,440 --> 00:56:03,840
usable artifacts instead of chat sludge. If the notebook exists to produce decisions,
688
00:56:03,840 --> 00:56:09,280
then the output format must be decision-shaped. Not a helpful paragraph. So choose the outputs you
689
00:56:09,280 --> 00:56:15,280
will allow. Decision memo with sections, question, constraints, sources used, recommendation,
690
00:56:15,280 --> 00:56:20,320
risks and escalation required. Risk register entry with fields, risk description,
691
00:56:20,320 --> 00:56:26,160
likelihood, impact, mitigation, owner and review date. Executive brief with top three points what
692
00:56:26,160 --> 00:56:31,120
changed since last time open decisions and next actions. If the output matters, the structure is
693
00:56:31,120 --> 00:56:36,000
part of the control plane. Structure forces the model to expose gaps, missing sources,
694
00:56:36,000 --> 00:56:40,880
missing assumptions, missing owners, unstructured pros hides those gaps. Then define the constraint
695
00:56:40,880 --> 00:56:45,680
behaviors when the notebook can't find authoritative sources. This is where most implementations fail
696
00:56:45,680 --> 00:56:50,240
because they assume the system will try harder. No, the system will fill gaps. So the instruction
697
00:56:50,240 --> 00:56:54,560
has to be explicit. If authoritative sources are missing, the notebook must say so, list what it
698
00:56:54,560 --> 00:56:59,040
searched within its source set and recommend where the missing truth should live. That turns
699
00:56:59,040 --> 00:57:03,760
failure into a governance signal instead of a hallucination. Now the practical part, boundaries
700
00:57:03,760 --> 00:57:07,680
aren't only about what the notebook answers, they're about what it is allowed to touch. A notebook
701
00:57:07,680 --> 00:57:11,680
should not be allowed to reference everything you can find. As that's just a denial of service
702
00:57:11,680 --> 00:57:16,320
attack on relevance, it needs a maintained source set with purposeful inclusion and purposeful
703
00:57:16,320 --> 00:57:20,560
exclusion. And the source set should be small enough that someone can review it without a
704
00:57:20,560 --> 00:57:25,600
spreadsheet and a prayer. Because boundaries only work when ownership exists. If the source set can
705
00:57:25,600 --> 00:57:31,520
grow without pruning, your constraint model is temporary. It will erode. Always. And if the
706
00:57:31,520 --> 00:57:36,080
instructions can be edited by anyone, your boundary model becomes political, it will drift toward
707
00:57:36,080 --> 00:57:42,800
convenience. Always. So the checklist item isn't add constraints. It's treat constraints as configuration
708
00:57:42,800 --> 00:57:47,840
version them, review them, own them, test them. Because the moment you rely on informal discipline,
709
00:57:47,840 --> 00:57:52,160
you're back to the original problem, humans compensating for missing architecture. Next,
710
00:57:52,160 --> 00:57:56,960
once boundaries exist, you still need the boring mechanics that make boundaries enforceable over time.
711
00:57:56,960 --> 00:58:02,400
Curated sources, taxonomy and the elimination of duplicates. The context design checklist.
712
00:58:02,400 --> 00:58:08,720
Source curation and taxonomy. Source curation is where most co-pilot strategies quietly die,
713
00:58:08,720 --> 00:58:12,560
because it forces the enterprise to answer a question it has avoided for years.
714
00:58:12,560 --> 00:58:18,480
Which artifacts are allowed to be treated as truth? Not useful, not popular, truth.
715
00:58:19,600 --> 00:58:23,440
If you don't curate the source set, you are delegating authority to the ranking algorithm.
716
00:58:23,440 --> 00:58:27,360
And the ranking algorithm doesn't know what your compliance team meant. It knows what it can
717
00:58:27,360 --> 00:58:32,320
retrieve. So start small. Minimum viable source set, high authority, high signal, low volume.
718
00:58:32,320 --> 00:58:37,760
Pick the artifacts that already have controlled change, explicit ownership and predictable semantics.
719
00:58:37,760 --> 00:58:42,720
Published policies approved standards, canonical decision records maintained operating procedures.
720
00:58:42,720 --> 00:58:48,080
Then stop, because just add one more folder is how you turn a bounded reasoning environment into soup.
721
00:58:48,080 --> 00:58:54,480
Curate by category, authoritative sources, decision inputs. Few, stable, governed.
722
00:58:54,480 --> 00:58:59,840
Interpretive guidance, how the policy is applied. Useful, but must cite the authoritative layer and
723
00:58:59,840 --> 00:59:05,920
declare scope. Operational artifacts, status decks, meeting notes, tickets, retros, contextual,
724
00:59:05,920 --> 00:59:10,320
but not allowed to override policy. If you let these compete with authoritative sources,
725
00:59:10,320 --> 00:59:15,760
they'll win on recency and volume. Now add minimal taxonomy, so curation survives turnover
726
00:59:15,760 --> 00:59:21,200
and can be audited. Every truth source needs a clear title, a one sentence purpose.
727
00:59:21,200 --> 00:59:27,280
What decisions does this control? And and lifecycle markers. Owner last reviewed next review.
728
00:59:27,280 --> 00:59:32,640
Without those it can exist, but it shouldn't be treated as authoritative in an AI reasoning environment.
729
00:59:32,640 --> 00:59:38,720
Finally, prefer links over copies. Copy creates version drift. References preserve a single update path,
730
00:59:38,720 --> 00:59:42,800
assuming the underlying source is actually maintained. If people need a short version,
731
00:59:42,800 --> 00:59:47,760
make it a governed derivative with explicit lineage, not an orphaned paraphrase in someone's one drive.
732
00:59:47,760 --> 00:59:53,760
Next, even a perfect curated set fails if nobody owns it. The context design checklist,
733
00:59:53,760 --> 00:59:58,800
ownership, change control and review cadence. Ownership is where most context strategies collapse,
734
00:59:58,800 --> 01:00:04,000
because everyone can contribute sounds collaborative, but it usually means nobody is accountable.
735
01:00:04,000 --> 01:00:08,800
A context container that influences decisions needs a product owner, a role responsible for
736
01:00:08,800 --> 01:00:13,120
the correctness of the source set and the instructions that govern how co-pilot uses it.
737
01:00:13,120 --> 01:00:17,760
Not a distribution list, not a community maintained wiki pattern. A named function with authority
738
01:00:17,760 --> 01:00:22,480
to reject additions, remove sources and resolve conflicts. Now add change control,
739
01:00:22,480 --> 01:00:27,520
or your curated context becomes a slow motion edit war. Treat context like code,
740
01:00:27,520 --> 01:00:32,160
small changes can have disproportionate impact and unreviewed changes accumulate until behavior
741
01:00:32,160 --> 01:00:38,560
becomes unpredictable. Keep it simple. Intake. How new sources and instruction changes
742
01:00:38,560 --> 01:00:43,280
are proposed with a stated purpose and owner. Review. Enforce your authority gradient policy
743
01:00:43,280 --> 01:00:48,400
versus guidance versus operational artifacts and reject anything that can't justify why it belongs.
744
01:00:48,400 --> 01:00:53,520
Publication, version the notebook instructions and log source set changes, so why did the answer change
745
01:00:53,520 --> 01:01:00,160
has a real answer? Then the part leaders avoid. Review cadence. Set and forget become set and regret.
746
01:01:00,160 --> 01:01:05,280
High-risk domains need frequent review because stailness is a decision risk, not a content quality issue,
747
01:01:05,280 --> 01:01:09,840
and you need event triggers, not just calendar rituals, policy updates, regulatory changes,
748
01:01:09,840 --> 01:01:15,520
or g reogs, major incidents, spikes and corrections. Those triggers force revalidation before drift
749
01:01:15,520 --> 01:01:19,280
becomes normal. Persistent context isn't a storage problem, it's a stewardship model.
750
01:01:19,280 --> 01:01:23,520
Next, if you want trust, you need outputs that behave like decision records,
751
01:01:23,520 --> 01:01:28,480
from answers to receipts, traceability as the adoption engine. This is the point where co-pilot
752
01:01:28,480 --> 01:01:32,880
adoption stops being a training problem and becomes a trust problem. Executives don't reject
753
01:01:32,880 --> 01:01:37,440
co-pilot because it's slow, they reject it because it can't defend itself. The first time an AI
754
01:01:37,440 --> 01:01:42,640
assisted brief goes to a steering committee and someone asks, where did that come from? The room goes
755
01:01:42,640 --> 01:01:47,920
quiet. Not because the answer is impossible, because nobody built the workflow to produce evidence
756
01:01:47,920 --> 01:01:53,520
alongside the pros. That's the real adoption engine. Receipts. A good answer is nice, a traceable answer
757
01:01:53,520 --> 01:01:59,680
is usable, and in an enterprise usable means it can survive review, audit, and blame, so traceability
758
01:01:59,680 --> 01:02:04,240
isn't an enhancement, it's the price of entry, the system needs to produce outputs that behave like
759
01:02:04,240 --> 01:02:09,440
decision records, what sources shape the answer, what assumptions were made, what constraints were
760
01:02:09,440 --> 01:02:14,000
applied, and what the model could not verify. Not a dissertation, just enough structure that a
761
01:02:14,000 --> 01:02:18,320
human can check the chain of truth without re-running the whole investigation. Here's the uncomfortable
762
01:02:18,320 --> 01:02:22,880
truth, people keep trying to make co-pilot sound confident because confidence sells, but confidence
763
01:02:22,880 --> 01:02:27,920
without citations is just a faster way to ship misinformation through the org chart. That distinction
764
01:02:27,920 --> 01:02:33,120
matters. When a notebook is designed correctly, it can produce an answer and show its work,
765
01:02:33,120 --> 01:02:37,440
links to the authoritative sources in the curated set, the relevant sections, and the boundary
766
01:02:37,440 --> 01:02:41,920
conditions that were applied. It's not perfect observability, but it's a defensible artifact,
767
01:02:41,920 --> 01:02:46,880
and defensible artifacts change behavior, because now the executive doesn't have to trust the AI.
768
01:02:46,880 --> 01:02:52,720
They can trust the process. The answer is grounded in a known corpus, produced under known constraints,
769
01:02:52,720 --> 01:02:56,960
and reviewable by the people who already own risk. That's how adoption actually happens, not
770
01:02:56,960 --> 01:03:01,680
by getting users excited, but by making governance comfortable. Now connect that back to notebooks,
771
01:03:01,680 --> 01:03:05,680
because this is where most people miss the whole point. Notebooks aren't about producing
772
01:03:05,680 --> 01:03:10,480
prettier answers. They're about producing repeatable decisions. A notebook with curated sources
773
01:03:10,480 --> 01:03:15,280
and persistent intent can generate the same type of output every week. The same format, the same
774
01:03:15,280 --> 01:03:20,400
authority hierarchy, the same refusal behaviors, the same citation pattern, that creates a stable
775
01:03:20,400 --> 01:03:25,760
operating rhythm, and executives love rhythm, because rhythm is predictability. This is also why outputs
776
01:03:25,760 --> 01:03:30,560
are the third layer of context engineering. If outputs aren't captured as artifacts, the organization
777
01:03:30,560 --> 01:03:35,440
can't learn. Every question gets asked again. Every decision gets relitigated. Every meeting becomes
778
01:03:35,440 --> 01:03:40,480
an archaeological dig through chat logs and half-remembered summaries, receipts, and that cycle.
779
01:03:40,480 --> 01:03:46,000
And this is where the micro behavior matters. Stopletting answers die in chat. If a copilot output
780
01:03:46,000 --> 01:03:51,600
influence the decision, it needs to graduate into a persistent artifact, a loop page, a memo,
781
01:03:51,600 --> 01:03:57,040
a ticket comment, an architecture decision record, a risk entry, something with a life cycle.
782
01:03:57,040 --> 01:04:01,600
Chat is where the thinking happens. The artifact is where the organization remembers. That's also
783
01:04:01,600 --> 01:04:06,160
how you start measuring quality without pretending you can measure AI correctness directly.
784
01:04:06,160 --> 01:04:11,200
If the output is an artifact, it can be reviewed. It can be sampled. It can be corrected. It can be
785
01:04:11,200 --> 01:04:16,720
compared to source changes, and it can be audited when someone inevitably asks, why did we approve this?
786
01:04:16,720 --> 01:04:22,400
Now traceability also solves a problem leaders don't articulate well. Decision latency. When people
787
01:04:22,400 --> 01:04:26,720
don't trust the provenance of information, they slow down. They ask for more meetings, they ask for
788
01:04:26,720 --> 01:04:31,120
more approvals, they ask for just one more review that not because they love process but because they
789
01:04:31,120 --> 01:04:36,320
can't tell what's real. Receipts shrink that latency. If the output includes citations and an explicit
790
01:04:36,320 --> 01:04:41,760
assumption list, reviewers can focus on the actual disagreement, not on reconstructing the context.
791
01:04:41,760 --> 01:04:46,480
And that's the economic benefit. Nobody markets properly. Copilot doesn't save time because it writes
792
01:04:46,480 --> 01:04:51,760
faster. It saves time when it reduces rework and revalidation. Now one more constraint.
793
01:04:51,760 --> 01:04:56,720
Receipts have to be shaped like your governance model, not like a generic AI response.
794
01:04:56,720 --> 01:05:00,880
If you need to defend a recommendation, the output should include sources used,
795
01:05:00,880 --> 01:05:05,440
policy implications, risk notes and escalation triggers. If you need an operating procedure,
796
01:05:05,440 --> 01:05:10,560
the output should include step sequence preconditions exceptions and owner. If you need an executive
797
01:05:10,560 --> 01:05:15,840
brief, the output should include what changed, what matters, and what decision is required. When
798
01:05:15,840 --> 01:05:21,120
outputs have stable structure, people stop arguing about formatting and start arguing about substance.
799
01:05:21,120 --> 01:05:25,200
That's when the system becomes a tool instead of a novelty. So the adoption engine isn't prompt
800
01:05:25,200 --> 01:05:29,680
training. It's building a workflow where copilot outputs leave a paper trail. And once you start
801
01:05:29,680 --> 01:05:34,640
demanding receipts, a lot of the earlier problems become visible immediately. Missing authoritative
802
01:05:34,640 --> 01:05:40,320
sources, stale content, permission fragmentation and unlabeled artifacts that shouldn't be traveling.
803
01:05:40,320 --> 01:05:44,960
Good. Visibility is how you pay down governance debt. Now the story has to move from reasoning to action
804
01:05:44,960 --> 01:05:49,360
because the moment the enterprise trusts the output, it will try to automate the outcome.
805
01:05:49,360 --> 01:05:56,480
When copilot stops talking and starts doing power platform plus service. Now this is where
806
01:05:56,480 --> 01:06:00,640
the enterprise gets reckless. The moment copilot outputs look credible, someone says great,
807
01:06:00,640 --> 01:06:05,280
can we automate that? That's how you cross the line from drafting words to changing state.
808
01:06:05,280 --> 01:06:11,440
Tickets, approvals, access, vendor onboarding, notifications, workflows. Automation doesn't
809
01:06:11,440 --> 01:06:16,000
make weak context less risky. It makes it executable. So the handoff must be explicit copilot
810
01:06:16,000 --> 01:06:21,920
proposes power platform orchestrates service. Now records and governs. If you let a good answer become
811
01:06:21,920 --> 01:06:28,000
an automatic action, you're not adopting AI, your scaling mistakes. Use a boring pattern on purpose.
812
01:06:28,000 --> 01:06:35,040
Event, reasoning, orchestration, audit trail, event, a request arrives through an existing workflow
813
01:06:35,040 --> 01:06:40,560
entry point reasoning copilot produces a structured recommendation with receipts, policy source,
814
01:06:40,560 --> 01:06:44,080
constraints, risk notes, what's missing and escalation flags.
815
01:06:44,080 --> 01:06:49,760
Orchestration, power platform or service, now runs only if the output meets a contract,
816
01:06:49,760 --> 01:06:54,880
required fields present, citations included, labeling constraints, satisfied escalation respected,
817
01:06:54,880 --> 01:06:59,120
no contract, no action. Audit trail, the ticket change record includes the references
818
01:06:59,120 --> 01:07:03,360
and the decision rationale, not for theatre, for post-incident survival.
819
01:07:03,360 --> 01:07:07,280
And remember the part that matters more as you automate, identity.
820
01:07:07,280 --> 01:07:11,200
Flow's runner, someone, user, connector, managed identity, service, principle.
821
01:07:11,200 --> 01:07:15,040
If that identity is broad, you've turned a chat UI into a privileged actuator.
822
01:07:15,040 --> 01:07:19,680
Leased privilege stops being a slogan the first time an automated flow moves data where it shouldn't.
823
01:07:19,680 --> 01:07:25,280
So the system law holds, intent must be enforced by design, contracts,
824
01:07:25,280 --> 01:07:29,120
validations and permission boundaries. Not by, please be careful.
825
01:07:29,120 --> 01:07:35,920
KPIs that actually matter, quality, cost and control.
826
01:07:35,920 --> 01:07:38,800
Most co-pilot programs measure adoption, not outcomes.
827
01:07:38,800 --> 01:07:43,040
Adoption is a vanity metric, it tells you people click the button, it doesn't tell you the answers
828
01:07:43,040 --> 01:07:48,720
were reliable or defensible. So measure three things, quality, cost and control.
829
01:07:48,720 --> 01:07:53,920
Quality, stop tracking helpful. Track failure in high-risk domains,
830
01:07:53,920 --> 01:07:58,640
sample outputs weekly and review them like change requests, correct authoritative grounding,
831
01:07:58,640 --> 01:08:02,000
correct constraints, correct refusal behavior, correct citations.
832
01:08:02,000 --> 01:08:05,360
When failure rises is not the model getting worse, it's your context drifting,
833
01:08:05,360 --> 01:08:08,880
then track rework. If humans routinely rewrite outputs before they can be used,
834
01:08:08,880 --> 01:08:12,080
you didn't save time, you moved effort into AI assisted editing.
835
01:08:12,080 --> 01:08:18,240
Simple artifact feedback, used as is edited, discarded, tells you whether your context design is
836
01:08:18,240 --> 01:08:24,720
improving. Cost, co-pilot is not just a license. The hidden text is entropy, storage growth,
837
01:08:24,720 --> 01:08:28,800
duplicate artifacts and the operational drag of people recreating work because they can't
838
01:08:28,800 --> 01:08:34,000
trust what exists. Track growth of unlabeled content in critical domains, growth of policy like
839
01:08:34,000 --> 01:08:39,680
duplicates and abandoned workspaces that keep feeding retrieval noise. Control, measure the health
840
01:08:39,680 --> 01:08:45,120
of the control plane signals that shape retrieval and handling, labeling coverage and consistency,
841
01:08:45,120 --> 01:08:51,600
purview, low coverage means flat context and higher blast radius. Permission drift,
842
01:08:51,600 --> 01:08:58,560
entra, unmanaged groups, anonymous links, guests sprawl. Driftup means answers fragment and truth
843
01:08:58,560 --> 01:09:05,360
diverges. Freshness, SLAs for curated sources. If truth hasn't been reviewed, it's not truth.
844
01:09:05,360 --> 01:09:09,840
It's historical storage and the KPI leadership actually cares about decision cycle time,
845
01:09:09,840 --> 01:09:14,000
not time to prompt, time to decision. When persistent context and receipts work,
846
01:09:14,000 --> 01:09:18,560
latency drops because teams stop relitigating the same ambiguity. If you can't measure these,
847
01:09:18,560 --> 01:09:23,520
you don't have a co-pilot strategy, you have a UI rollout. The only rule that holds,
848
01:09:23,520 --> 01:09:28,240
prompting isn't strategy. Persistent context is the control plane that makes co-pilot outcomes
849
01:09:28,240 --> 01:09:32,880
reliable, governable and defensible. If you want the next step, watch the episode on designing
850
01:09:32,880 --> 01:09:38,000
authoritative truth placement in Microsoft 365, policy, guidance and discussion don't belong
851
01:09:38,000 --> 01:09:42,800
in the same container. Subscribe if you want fewer co-pilot demos and more architectural receipts
852
01:09:42,800 --> 01:09:47,360
and drop the failure mode you're seeing so the next episode targets the real entropy in your tenant.
















