This episode of the M365.FM Podcast (titled “How to Build a High-Performance Agentic Workforce in 30 Days”) explains why most enterprise AI agent programs fail quickly, and what it really takes to build an AI-driven workforce that delivers measurable business value — not just experimental demos. The episode identifies a core misconception: many organizations assume that simply deploying Microsoft Copilot or a set of AI tools automatically creates an agentic workforce. In reality, this assumption kills adoption within a few weeks because agents amplify existing operational chaos rather than correcting it. To succeed, enterprises must design a disciplined operating model with clear governance, grounded intelligence, and constrained execution that executives can defend and auditors can verify. The podcast lays out a 30-day blueprint built on three non-negotiable pillars — orchestration with Copilot Studio, grounding with Azure AI Search + MCP tools, and identity governance with Entra Agent ID — and explains how to define performance in terms of auditable outcomes rather than activity metrics. Listeners learn what high-performance agents look like in practice, what key performance indicators matter (like ticket deflection and SLA reduction), and which anti-metrics (like prompt counts and token usage) can quietly derail programs. Successful agentic workforce initiatives are anchored in governance, measurable results, and systems that enforce intent over time.
🧠 Core Theme
Most organizations believe deploying Copilot equals building an AI workforce — but without discipline, agents amplify existing chaos instead of reducing it.
A high-performance agentic workforce is defined not by AI adoption, chat usage, or tool counts — but by measurable business outcomes that are auditable, governed, and defensible.
🚫 Why AI Agent Programs Collapse
AI agents often amplify organizational entropy — including unclear ownership, bad data sources, uncontrolled publishing workflows, and PowerPoint-only governance.
The first confident AI mistake tends to be accepted as truth, trust collapses, and adoption quietly dies, often by week two.
Automation ≠ an agentic workforce; automation reduces friction, but an agentic workforce reduces uncertainty.
🧭 The 30-Day Operating Model
The episode outlines a practical roadmap built on three foundational layers — in the correct order:
Copilot Studio Orchestration First
Define how agents will interact with systems, users, and endpoints in a controlled orchestration layer.
Azure AI Search + MCP Grounding Second
Ensure grounding of agent intelligence in authoritative sources, not ad-hoc content or unmanaged context.
Entra Agent ID Governance Third
Secure who agents are, what they’re allowed to do, and how actions are attributed and auditable.
A deliberate design choice is necessary to prevent ghost agents and runtime sprawl later in the lifecycle.
📌 What “High-Performance” Actually Means
Performance must be defined in executive-grade, measurable outcomes:
📊 Service & IT Outcomes
20–40% deflection of Level-1 tickets
15–30% reduction in SLA times
10–25% fewer escalations
⏱️ Productivity Gains
30–60 minutes saved per user per week
≥60% task completion without human handoff
30–50% adoption in target user groups
✅ Quality & Risk Metrics
≥85% grounded accuracy
Zero access violations
Audit logging turned on Day One
The podcast explicitly calls out anti-metrics to avoid:
Prompt counts
Chat volume
Token usage
Number of agents created
These metrics may indicate activity but do not meaningfully reflect business value or governance integrity.
🧠 Core Misconception: Automation vs Agentic Workforce
Automation reduces manual steps;
Agentic workforce reduces uncertainty and operational risk.
Most organizations already have automation — what they lack is a decision system that yields predictable, traceable outcomes.
📌 Governance First — Not After
Governance is not a “checklist” after rollout — it must be designed up-front to prevent sprawl, ghost agents, and uncontrolled behavior.
Identity governance, audit logging, and operational constraints are prerequisites for scaling agents responsibly.
Executives should be able to sign off on performance targets with clear KPIs that align to business goals, not activity counts.
🎯 Leadership and Metrics
Leaders should emphasize defensible outcomes over adoption narratives.
Metrics like ticket deflection, cycle time reduction, grounded accuracy, and audit readiness matter more than chatter about agent counts or feature usage.
The episode reframes success as impact on business workflows rather than tool proliferation.
👣 Key Takeaways
Copilot deployment without structural discipline will fail because it magnifies existing chaos.
A high-performance agentic workforce is not a product of AI technology alone — it’s a system design with governance, grounding, and identity baked in.
Define performance in terms of measurable, auditable outcomes that executives can defend and auditors can verify.
Avoid anti-metrics that sound impressive but don’t indicate real value or risk management.
The 30-day model provides a clear path from pilot to production with measurable business impact.
1
00:00:00,000 --> 00:00:02,120
Most organizations think deploy Copilot
2
00:00:02,120 --> 00:00:03,920
and suddenly they have an agentec workforce.
3
00:00:03,920 --> 00:00:06,160
They are wrong agents don't create discipline.
4
00:00:06,160 --> 00:00:09,360
They amplify whatever entropy already exists.
5
00:00:09,360 --> 00:00:12,560
Bad data, unclear ownership, and controls
6
00:00:12,560 --> 00:00:14,080
that only live in PowerPoint.
7
00:00:14,080 --> 00:00:16,800
So week two arrives, the first confident wrong answer
8
00:00:16,800 --> 00:00:19,880
hits the wrong audience and adoption quietly dies.
9
00:00:19,880 --> 00:00:22,640
This is a 30-day roadmap that produces measurable outcomes,
10
00:00:22,640 --> 00:00:23,440
not a demo.
11
00:00:23,440 --> 00:00:24,680
Three pillars in order.
12
00:00:24,680 --> 00:00:27,560
Copilot Studio orchestration first, Azure AI Search
13
00:00:27,560 --> 00:00:31,200
plus MCP grounding second and Entra Agent ID governance third.
14
00:00:31,200 --> 00:00:33,640
And there's one design choice that prevents ghost agents
15
00:00:33,640 --> 00:00:34,240
later.
16
00:00:34,240 --> 00:00:35,000
It's coming.
17
00:00:35,000 --> 00:00:37,480
Define high performance in executive terms.
18
00:00:37,480 --> 00:00:39,520
Before anyone builds an agent, leadership
19
00:00:39,520 --> 00:00:41,400
has to define high performance in terms
20
00:00:41,400 --> 00:00:42,520
that business can audit.
21
00:00:42,520 --> 00:00:43,720
Not users loved it.
22
00:00:43,720 --> 00:00:45,720
Not we shipped four bots.
23
00:00:45,720 --> 00:00:47,200
Outcomes.
24
00:00:47,200 --> 00:00:49,360
Because the platform will happily generate activity
25
00:00:49,360 --> 00:00:50,520
without impact.
26
00:00:50,520 --> 00:00:52,080
You can have thousands of chats and still
27
00:00:52,080 --> 00:00:54,080
have the same backlog, the same seller breaches
28
00:00:54,080 --> 00:00:55,320
and the same escalations.
29
00:00:55,320 --> 00:00:56,560
That distinction matters.
30
00:00:57,560 --> 00:01:00,680
In executive terms, high performance means the system
31
00:01:00,680 --> 00:01:03,800
measurably changes three things, demand time and risk.
32
00:01:03,800 --> 00:01:05,280
Demand is volume reduction.
33
00:01:05,280 --> 00:01:08,400
If the agent works, fewer tickets get created at all.
34
00:01:08,400 --> 00:01:10,320
Not because users stopped having problems,
35
00:01:10,320 --> 00:01:12,720
but because the first interaction resolves them.
36
00:01:12,720 --> 00:01:13,520
That is deflection.
37
00:01:13,520 --> 00:01:15,680
And it's the only metric that actually hits cost.
38
00:01:15,680 --> 00:01:17,200
Time is cycle reduction.
39
00:01:17,200 --> 00:01:20,120
If a ticket still gets created, it should be created
40
00:01:20,120 --> 00:01:23,320
with better classification, better context, and fewer handoffs.
41
00:01:23,320 --> 00:01:26,120
That shows up as SLA reduction faster first response
42
00:01:26,120 --> 00:01:28,000
and higher first contact resolution.
43
00:01:28,000 --> 00:01:29,760
Risk is controlled behavior.
44
00:01:29,760 --> 00:01:31,880
The agent doesn't helpfully guess.
45
00:01:31,880 --> 00:01:34,880
It either answers with grounded evidence or it escalates.
46
00:01:34,880 --> 00:01:37,240
And every action is attributable to an identity
47
00:01:37,240 --> 00:01:38,200
with an audit trail.
48
00:01:38,200 --> 00:01:40,000
So for a 30-day window, the KPIs
49
00:01:40,000 --> 00:01:42,880
have to be realistic, measurable and tied to one domain.
50
00:01:42,880 --> 00:01:45,000
Here are target leaders can sign their name to.
51
00:01:45,000 --> 00:01:49,400
For service IT, 20% to 40% ticket deflection at LL1,
52
00:01:49,400 --> 00:01:52,480
15% to 30% reduction in SLA time for the subset
53
00:01:52,480 --> 00:01:55,840
of tickets the agent touches and 10% to 25% fewer escalations.
54
00:01:55,840 --> 00:01:57,040
Those aren't vanity numbers.
55
00:01:57,040 --> 00:01:59,200
They come directly from three operational levers,
56
00:01:59,200 --> 00:02:02,560
rooting accuracy, containment boundaries, and handoff latency.
57
00:02:02,560 --> 00:02:06,040
For user productivity, 30 to 60 minutes saved per user
58
00:02:06,040 --> 00:02:07,720
per week in the target group.
59
00:02:07,720 --> 00:02:09,560
Not time saved in theory.
60
00:02:09,560 --> 00:02:12,760
Time saved as measured by reduced back and forth,
61
00:02:12,760 --> 00:02:17,000
fewer status check messages and fewer who owns this detours.
62
00:02:17,000 --> 00:02:21,080
Also, over 60% task completion without a human handoff
63
00:02:21,080 --> 00:02:22,560
for the narrow workflow you choose,
64
00:02:22,560 --> 00:02:25,280
an adoption in the target group of 30 to 50%.
65
00:02:25,280 --> 00:02:27,040
If nobody uses it, it doesn't exist.
66
00:02:27,040 --> 00:02:31,040
For quality and risk, greater than 85% grounded answer accuracy
67
00:02:31,040 --> 00:02:34,280
on an evaluation set, zero access violations,
68
00:02:34,280 --> 00:02:36,520
an audit logging enabled from day one.
69
00:02:36,520 --> 00:02:37,640
Not after the pilot.
70
00:02:37,640 --> 00:02:40,200
Day one, now the antimetrics, these are the numbers teams
71
00:02:40,200 --> 00:02:41,480
love because they're easy.
72
00:02:41,480 --> 00:02:42,520
They are also useless.
73
00:02:42,520 --> 00:02:44,920
Prompt counts, check counts, token consumption,
74
00:02:44,920 --> 00:02:47,320
number of agents, these measure noise, not outcomes,
75
00:02:47,320 --> 00:02:49,520
they also incentivize exactly the wrong behavior.
76
00:02:49,520 --> 00:02:51,440
Build more, publish more, celebrate more.
77
00:02:51,440 --> 00:02:54,040
Meanwhile, the system decays, a better mental model is this.
78
00:02:54,040 --> 00:02:57,080
Every KPI maps to an operational lever you can actually tune.
79
00:02:57,080 --> 00:02:59,040
Deflection and first contact resolution map
80
00:02:59,040 --> 00:03:01,640
to containment design, what the agent must solve
81
00:03:01,640 --> 00:03:03,240
versus what it must escalate.
82
00:03:03,240 --> 00:03:04,560
If you don't define that boundary,
83
00:03:04,560 --> 00:03:06,960
you will either over escalate and waste time
84
00:03:06,960 --> 00:03:09,520
or over confidently answer and destroy trust.
85
00:03:09,520 --> 00:03:12,680
SLA reduction maps to handoff latency and enrichment.
86
00:03:12,680 --> 00:03:15,560
If escalation requires the user to repeat everything,
87
00:03:15,560 --> 00:03:17,640
you didn't build an agent, you built a delay.
88
00:03:17,640 --> 00:03:20,800
The handoff has to carry the context, intent, urgency,
89
00:03:20,800 --> 00:03:24,080
impacted service, device, and what the agent already tried.
90
00:03:24,080 --> 00:03:25,960
Grounded accuracy maps to knowledge coverage
91
00:03:25,960 --> 00:03:27,040
and retrieval quality.
92
00:03:27,040 --> 00:03:29,840
If your content is messy, stale or too large to retrieve
93
00:03:29,840 --> 00:03:31,680
cleanly, the model will improvise.
94
00:03:31,680 --> 00:03:33,920
It's not malice, it's math.
95
00:03:33,920 --> 00:03:36,480
An adoption maps to user experience, short answers,
96
00:03:36,480 --> 00:03:39,240
clear next actions and fewer decisions per interaction.
97
00:03:39,240 --> 00:03:41,440
Paragraphs don't ship work, decisions do.
98
00:03:41,440 --> 00:03:43,280
The next thing leaders miss is ownership.
99
00:03:43,280 --> 00:03:46,360
High performance doesn't come from who built the bot.
100
00:03:46,360 --> 00:03:47,880
It comes from who owns the outcome,
101
00:03:47,880 --> 00:03:49,720
so a sign an outcome owner per use case,
102
00:03:49,720 --> 00:03:52,360
not a maker, not a dev lead, an accountable operator.
103
00:03:52,360 --> 00:03:54,920
For IT triage, that's usually the service owner
104
00:03:54,920 --> 00:03:56,200
or the head of service desk.
105
00:03:56,200 --> 00:03:58,960
They sign the KPI targets, they decide what done means.
106
00:03:58,960 --> 00:04:01,400
They also own deprecating topics that don't perform
107
00:04:01,400 --> 00:04:03,800
because if nobody has authority to kill weak behavior,
108
00:04:03,800 --> 00:04:06,200
the system accumulates entropy generators forever.
109
00:04:06,200 --> 00:04:08,560
Finally, set the system boundary, pick one domain,
110
00:04:08,560 --> 00:04:10,520
one channel, one audience, one backlog.
111
00:04:10,520 --> 00:04:12,120
Performance requires a closed system
112
00:04:12,120 --> 00:04:13,640
where change is observable.
113
00:04:13,640 --> 00:04:15,280
If every department ships an agent
114
00:04:15,280 --> 00:04:17,720
to solve a personal annoyance, you don't get a workforce.
115
00:04:17,720 --> 00:04:19,560
You get a zoo and that's the transition point.
116
00:04:19,560 --> 00:04:21,440
The roadmap starts by forcing a boundary
117
00:04:21,440 --> 00:04:23,720
because without one, everything becomes theater.
118
00:04:23,720 --> 00:04:26,520
The core misconception.
119
00:04:26,520 --> 00:04:28,880
Automation isn't an agentic workforce.
120
00:04:28,880 --> 00:04:31,240
Most leaders have already funded automation.
121
00:04:31,240 --> 00:04:32,720
Some of it even worked.
122
00:04:32,720 --> 00:04:34,920
A power-automate flow here, a ticket template there,
123
00:04:34,920 --> 00:04:37,120
maybe a chatbot that answers the top five questions
124
00:04:37,120 --> 00:04:38,640
when the moon is in the right phase.
125
00:04:38,640 --> 00:04:40,080
That's not an agentic workforce.
126
00:04:40,080 --> 00:04:42,200
That's sparkling automation, isolated wins
127
00:04:42,200 --> 00:04:43,360
that look great in a demo
128
00:04:43,360 --> 00:04:45,480
because they run in a clean, staged world.
129
00:04:45,480 --> 00:04:46,960
But they don't compose into a system.
130
00:04:46,960 --> 00:04:49,120
They don't share a vocabulary of intent.
131
00:04:49,120 --> 00:04:50,600
They don't have consistent boundaries.
132
00:04:50,600 --> 00:04:52,000
They don't learn from failure.
133
00:04:52,000 --> 00:04:54,280
And when they break, nobody can explain why
134
00:04:54,280 --> 00:04:56,440
because they were never instrumented like a system.
135
00:04:56,440 --> 00:04:58,000
They were shipped like a feature.
136
00:04:58,000 --> 00:04:59,600
The uncomfortable truth is this.
137
00:04:59,600 --> 00:05:01,240
Agentic isn't a UI choice.
138
00:05:01,240 --> 00:05:02,520
It's an operating model.
139
00:05:02,520 --> 00:05:04,760
A real agent behaves less like a chat widget
140
00:05:04,760 --> 00:05:06,800
and more like a distributed decision engine.
141
00:05:06,800 --> 00:05:10,000
It takes an event, interprets intent, pulls context,
142
00:05:10,000 --> 00:05:13,480
selects tools, takes action, verifies the outcome,
143
00:05:13,480 --> 00:05:15,960
and then hands off when the risk exceeds its mandate.
144
00:05:15,960 --> 00:05:18,280
That loop is the definition, not the chat transcript.
145
00:05:18,280 --> 00:05:19,960
And yes, that sounds like a lot.
146
00:05:19,960 --> 00:05:21,160
Good, it should.
147
00:05:21,160 --> 00:05:24,360
Because what most organizations build first is the opposite.
148
00:05:24,360 --> 00:05:27,720
A conversational front end bolted onto existing chaos
149
00:05:27,720 --> 00:05:31,040
with permission sprawl and a vague goal like help users.
150
00:05:31,040 --> 00:05:32,160
Helpful isn't the spec.
151
00:05:32,160 --> 00:05:34,480
It's how you get confident wrong behavior at scale.
152
00:05:34,480 --> 00:05:37,160
So the shift leaders need to make is not task completion.
153
00:05:37,160 --> 00:05:38,720
It's outcome completion.
154
00:05:38,720 --> 00:05:41,560
Task completion is, answer the question.
155
00:05:41,560 --> 00:05:43,320
Create the ticket.
156
00:05:43,320 --> 00:05:44,680
Summary is the policy.
157
00:05:44,680 --> 00:05:46,480
It's transactional.
158
00:05:46,480 --> 00:05:50,360
Outcome completion is, resolve the incident without escalation.
159
00:05:50,360 --> 00:05:51,760
Reduce time to restore.
160
00:05:51,760 --> 00:05:53,360
Prevent policy violations.
161
00:05:53,360 --> 00:05:54,640
Outcomes have constraints.
162
00:05:54,640 --> 00:05:55,480
They have ownership.
163
00:05:55,480 --> 00:05:56,320
They have rollback.
164
00:05:56,320 --> 00:05:58,280
They have accountability.
165
00:05:58,280 --> 00:06:01,600
That distinction matters because once you aim at outcomes,
166
00:06:01,600 --> 00:06:05,160
you're forced to design the system that makes outcomes repeatable.
167
00:06:05,160 --> 00:06:07,240
And then there's the part nobody wants to hear.
168
00:06:07,240 --> 00:06:10,200
System learning doesn't happen because the model is smart.
169
00:06:10,200 --> 00:06:12,600
It happens because the platform is instrumented.
170
00:06:12,600 --> 00:06:14,240
If you don't capture failure reasons,
171
00:06:14,240 --> 00:06:18,120
escalation causes, missing knowledge coverage, tool errors and routing ambiguity,
172
00:06:18,120 --> 00:06:19,000
nothing improves.
173
00:06:19,000 --> 00:06:20,720
You don't get an agentic workforce.
174
00:06:20,720 --> 00:06:22,920
You get a static bot that slowly becomes wrong
175
00:06:22,920 --> 00:06:25,240
as policies drift and services change.
176
00:06:25,240 --> 00:06:28,320
Entropy always wins when feedback loops don't exist.
177
00:06:28,320 --> 00:06:31,760
This is where the frontier firm framing actually becomes useful,
178
00:06:31,760 --> 00:06:33,760
if you strip out the hype.
179
00:06:33,760 --> 00:06:34,440
Humans lead.
180
00:06:34,440 --> 00:06:37,200
They define outcomes, boundaries and acceptable risk.
181
00:06:37,200 --> 00:06:38,080
Agents operate.
182
00:06:38,080 --> 00:06:40,800
They execute within those constraints consistently.
183
00:06:40,800 --> 00:06:41,560
Systems learn.
184
00:06:41,560 --> 00:06:44,120
They improve because the organization measures the right things
185
00:06:44,120 --> 00:06:45,480
and updates the design.
186
00:06:45,480 --> 00:06:47,720
But only if you do the boring part, the constraints.
187
00:06:47,720 --> 00:06:51,400
Most rollouts fail for three reasons that are painfully predictable.
188
00:06:51,400 --> 00:06:52,760
First, vague goals.
189
00:06:52,760 --> 00:06:54,480
Improved productivity means nothing.
190
00:06:54,480 --> 00:06:55,480
It produces nothing.
191
00:06:55,480 --> 00:06:57,120
It creates competing interpretations
192
00:06:57,120 --> 00:06:59,560
and a dozen half-built agents that nobody owns.
193
00:06:59,560 --> 00:07:01,360
Second, no constraints.
194
00:07:01,360 --> 00:07:04,840
Unlimited tool access turns an agent into a probabilistic admin.
195
00:07:04,840 --> 00:07:05,960
People call it innovation.
196
00:07:05,960 --> 00:07:07,240
Auditors call it a finding.
197
00:07:07,240 --> 00:07:08,800
Third, uncontrolled publishing.
198
00:07:08,800 --> 00:07:11,120
When every team can publish an agent to everyone,
199
00:07:11,120 --> 00:07:12,240
you don't get empowerment.
200
00:07:12,240 --> 00:07:13,080
You get collision.
201
00:07:13,080 --> 00:07:14,680
Users don't ask for 50 agents.
202
00:07:14,680 --> 00:07:15,960
They ask for one that works.
203
00:07:15,960 --> 00:07:18,000
So they try three, get two wrong answers
204
00:07:18,000 --> 00:07:19,840
and decide the whole thing is a toy.
205
00:07:19,840 --> 00:07:22,000
Everything clicked for most experienced architects
206
00:07:22,000 --> 00:07:23,480
when they realize this.
207
00:07:23,480 --> 00:07:25,320
Automation reduces steps.
208
00:07:25,320 --> 00:07:27,440
An agentic workforce reduces uncertainty.
209
00:07:27,440 --> 00:07:29,360
Automation says, if X, then Y.
210
00:07:29,360 --> 00:07:33,040
Agents say given messy input, what is X and which Y is allowed.
211
00:07:33,040 --> 00:07:34,920
That's why the governance and grounding work
212
00:07:34,920 --> 00:07:36,080
isn't Phase 2.
213
00:07:36,080 --> 00:07:37,040
It's foundational.
214
00:07:37,040 --> 00:07:39,440
If you skip it, the system doesn't become agentic.
215
00:07:39,440 --> 00:07:40,800
It becomes conditional chaos.
216
00:07:40,800 --> 00:07:42,760
So the roadmap can't be a feature rollout.
217
00:07:42,760 --> 00:07:46,120
It has to be a 30-day operating model that forces clarity.
218
00:07:46,120 --> 00:07:49,040
One domain, explicit outcomes, tool boundaries,
219
00:07:49,040 --> 00:07:51,480
evidence requirements, and a publishing path
220
00:07:51,480 --> 00:07:54,160
that doesn't turn every experiment into production.
221
00:07:54,160 --> 00:07:56,400
Because if you don't force that clarity up front,
222
00:07:56,400 --> 00:07:57,920
week two shows up on schedule.
223
00:07:57,920 --> 00:08:00,600
And the platform will do exactly what you configured,
224
00:08:00,600 --> 00:08:02,680
not what you intended.
225
00:08:02,680 --> 00:08:05,560
The 30-day operating model, a 30-day roadmap
226
00:08:05,560 --> 00:08:07,920
fails when it's treated like a project plan.
227
00:08:07,920 --> 00:08:08,520
It isn't.
228
00:08:08,520 --> 00:08:10,680
It's an operating model that constraints behave
229
00:08:10,680 --> 00:08:12,640
you long enough for reality to show up.
230
00:08:12,640 --> 00:08:14,160
So the structure is simple.
231
00:08:14,160 --> 00:08:15,880
Four weeks, each with a different purpose,
232
00:08:15,880 --> 00:08:18,160
and each with a gate you either pass or you stop.
233
00:08:18,160 --> 00:08:18,960
No heroics.
234
00:08:18,960 --> 00:08:20,160
No will fix it later.
235
00:08:20,160 --> 00:08:21,720
Later is where Agents sprawl is born.
236
00:08:21,720 --> 00:08:23,520
Week one is baseline and constraints.
237
00:08:23,520 --> 00:08:25,920
Not building, measuring and boxing the problem in.
238
00:08:25,920 --> 00:08:27,640
You pick one domain and one channel.
239
00:08:27,640 --> 00:08:30,440
For this roadmap, IT service is the least controversial place
240
00:08:30,440 --> 00:08:32,240
to start because the metrics exist.
241
00:08:32,240 --> 00:08:33,520
The workflow is repetitive,
242
00:08:33,520 --> 00:08:35,800
and the political blast radius is manageable.
243
00:08:35,800 --> 00:08:37,400
Then you establish the baseline.
244
00:08:37,400 --> 00:08:39,520
Ticket volume categories, current deflection,
245
00:08:39,520 --> 00:08:42,280
SLA, escalation rate, and the top intent patterns
246
00:08:42,280 --> 00:08:44,040
that show up in real user language.
247
00:08:44,040 --> 00:08:46,520
You also define the containment boundary on day one.
248
00:08:46,520 --> 00:08:47,800
What the agent must solve,
249
00:08:47,800 --> 00:08:50,200
what it must never attempt, and what triggers escalation.
250
00:08:50,200 --> 00:08:51,760
That boundary becomes the contract.
251
00:08:51,760 --> 00:08:53,080
Week two is built in ground.
252
00:08:53,080 --> 00:08:54,520
This is where most teams want to start.
253
00:08:54,520 --> 00:08:56,280
They're impatient and they ship a chat box.
254
00:08:56,280 --> 00:08:59,160
Don't week two means you build the first agent
255
00:08:59,160 --> 00:09:00,840
that can do one thing end-to-end,
256
00:09:00,840 --> 00:09:03,960
classify, retrieve, propose, and either resolve or root.
257
00:09:03,960 --> 00:09:06,040
And you begin grounding discipline immediately.
258
00:09:06,040 --> 00:09:07,480
No source, no answer.
259
00:09:07,480 --> 00:09:09,760
If the agent can't cite a policy, a runbook,
260
00:09:09,760 --> 00:09:12,040
or a known-outage notice, it escalates.
261
00:09:12,040 --> 00:09:14,800
This is also where you create your initial evaluation set
262
00:09:14,800 --> 00:09:16,800
and start scoring grounded accuracy.
263
00:09:16,800 --> 00:09:18,360
Not perfect, measurable.
264
00:09:18,360 --> 00:09:20,440
Week three is orchestrate and integrate.
265
00:09:20,440 --> 00:09:22,080
This is where the system becomes real.
266
00:09:22,080 --> 00:09:24,640
Orchestration turns a response into a workflow.
267
00:09:24,640 --> 00:09:27,840
You integrate the deterministic steps with power automate.
268
00:09:27,840 --> 00:09:31,200
Ticket creation, assignment, user notifications,
269
00:09:31,200 --> 00:09:33,080
logging, and the hand-off payload.
270
00:09:33,080 --> 00:09:34,800
You introduce tool boundaries.
271
00:09:34,800 --> 00:09:37,520
Read operations are default, write operations are gated.
272
00:09:37,520 --> 00:09:39,200
You add the first approval pattern
273
00:09:39,200 --> 00:09:41,600
if you're doing anything that changes state.
274
00:09:41,600 --> 00:09:43,760
And you begin instrumenting failure reasons
275
00:09:43,760 --> 00:09:45,800
so the system can improve without guessing.
276
00:09:45,800 --> 00:09:47,480
Week four is hardened and scale.
277
00:09:47,480 --> 00:09:49,320
Hardening doesn't mean polishing the prompt.
278
00:09:49,320 --> 00:09:51,240
It means making the behavior survivable.
279
00:09:51,240 --> 00:09:54,520
You lock down publishing paths, verify logging,
280
00:09:54,520 --> 00:09:58,440
validate access boundaries, and run adversarial tests.
281
00:09:58,440 --> 00:10:02,120
Prompt injection, tool misuse, and helpful requests
282
00:10:02,120 --> 00:10:04,480
that should trigger escalation.
283
00:10:04,480 --> 00:10:07,400
You identify topics with high confusion and kill them.
284
00:10:07,400 --> 00:10:11,440
You finalize the lifecycle model, pilot, active, deprecated.
285
00:10:11,440 --> 00:10:13,240
And then you prepare the next domain
286
00:10:13,240 --> 00:10:16,160
based on what the metrics proved, not what leadership feels.
287
00:10:16,160 --> 00:10:18,520
Now the work selection rule, because you can't do everything
288
00:10:18,520 --> 00:10:21,360
in 30 days, you choose processes that are high volume,
289
00:10:21,360 --> 00:10:23,120
low variance, and high friction.
290
00:10:23,120 --> 00:10:25,160
High volume means the savings show up quickly.
291
00:10:25,160 --> 00:10:27,480
Low variance means the intent space is stable enough
292
00:10:27,480 --> 00:10:28,720
to root reliably.
293
00:10:28,720 --> 00:10:30,760
High friction means people hate doing it
294
00:10:30,760 --> 00:10:33,040
and will actually use an agent that removes the pain.
295
00:10:33,040 --> 00:10:35,320
Password reset flows, access requests,
296
00:10:35,320 --> 00:10:38,080
how do I policy questions common incident triage,
297
00:10:38,080 --> 00:10:40,240
service catalog routing, that class of work,
298
00:10:40,240 --> 00:10:43,520
and you define done in a way that prevents theater.
299
00:10:43,520 --> 00:10:45,360
Done means three things at once.
300
00:10:45,360 --> 00:10:48,320
Measureable improvement, auditability, and safe rollback.
301
00:10:48,320 --> 00:10:50,520
If you can't roll it back, you didn't build a system.
302
00:10:50,520 --> 00:10:51,840
You built a liability.
303
00:10:51,840 --> 00:10:54,240
Measureable improvement means the KPI is moved
304
00:10:54,240 --> 00:10:56,800
for the slice of work you targeted, not anecdotes,
305
00:10:56,800 --> 00:10:58,160
not screenshots.
306
00:10:58,160 --> 00:11:01,360
Auditability means you can answer what did the agent decide,
307
00:11:01,360 --> 00:11:04,040
what sources did it use, what tool did it call,
308
00:11:04,040 --> 00:11:05,280
and what outcome occurred.
309
00:11:05,280 --> 00:11:07,640
If you can't reconstruct the decision, you can't defend it.
310
00:11:07,640 --> 00:11:09,720
Safe rollback means you can disable the agent
311
00:11:09,720 --> 00:11:12,440
or remove tool access without breaking the underlying process.
312
00:11:12,440 --> 00:11:14,840
That distinction matters because humans still need to work
313
00:11:14,840 --> 00:11:16,000
when the model misbehales.
314
00:11:16,000 --> 00:11:18,800
Now the governance move that prevents parallel chaos.
315
00:11:18,800 --> 00:11:21,120
Single intake, single backlog, single cadence.
316
00:11:21,120 --> 00:11:23,640
Every agent request goes through one intake path.
317
00:11:23,640 --> 00:11:26,160
One queue, one set of prioritization rules.
318
00:11:26,160 --> 00:11:29,680
Not because bureaucracy is fun, but because parallel agent building
319
00:11:29,680 --> 00:11:33,200
creates incompatible vocabularies and duplicated tool chains.
320
00:11:33,200 --> 00:11:36,560
That turns into conditional chaos faster than any threat actor.
321
00:11:36,560 --> 00:11:38,200
Cadence is also non-negotiable.
322
00:11:38,200 --> 00:11:40,800
A daily build loop for shipping small changes,
323
00:11:40,800 --> 00:11:43,240
a weekly governance review for permissions and publishing,
324
00:11:43,240 --> 00:11:45,160
and an end-of-week KPI check.
325
00:11:45,160 --> 00:11:47,200
If the metrics don't move, you don't scale.
326
00:11:47,200 --> 00:11:48,640
You fix.
327
00:11:48,640 --> 00:11:50,800
And that's the punchline of the operating model.
328
00:11:50,800 --> 00:11:54,400
The platform moves fast, but your organization must move deliberately,
329
00:11:54,400 --> 00:11:55,680
otherwise the system will drift,
330
00:11:55,680 --> 00:11:58,120
and it will drift away from your intent.
331
00:11:58,120 --> 00:12:01,960
Choose the first use case, IT, ticket triage as the entry pillar.
332
00:12:01,960 --> 00:12:05,640
If leadership wants a 30-day win that survives contact with reality,
333
00:12:05,640 --> 00:12:08,240
IT, ticket triage is the entry pillar.
334
00:12:08,240 --> 00:12:12,120
Not because IT is special, but because IT has three things most departments don't.
335
00:12:12,120 --> 00:12:14,600
Volume, instrumentation, and consequences.
336
00:12:14,600 --> 00:12:18,400
Tickets already have timestamps, categories, owners, and escalation paths.
337
00:12:18,400 --> 00:12:20,160
That means performance is observable,
338
00:12:20,160 --> 00:12:23,880
and when the system gets something wrong, the impact is obvious enough to fix quickly.
339
00:12:23,880 --> 00:12:25,480
It also wins politically.
340
00:12:25,480 --> 00:12:29,360
HR, finance, and legal are high-risk domains with high sensitivity
341
00:12:29,360 --> 00:12:31,480
and low tolerance for probabilistic behavior.
342
00:12:31,480 --> 00:12:34,920
IT service is still risky, but it's socially acceptable to iterate.
343
00:12:34,920 --> 00:12:38,320
People already expect a service desk to ask clarifying questions.
344
00:12:38,320 --> 00:12:41,520
They don't expect the payroll agent to take a guess.
345
00:12:41,520 --> 00:12:43,880
So the use case is not built in IT chatbot.
346
00:12:43,880 --> 00:12:47,440
The use case is ticket triage as a controlled decision pipeline.
347
00:12:47,440 --> 00:12:51,480
Classify the issue, enrich the context, attempt a resolution when it's safe,
348
00:12:51,480 --> 00:12:53,440
and otherwise root to the correct queue
349
00:12:53,440 --> 00:12:56,320
with enough context that the human doesn't start from zero.
350
00:12:56,320 --> 00:12:57,680
Here's the flow in plain terms.
351
00:12:57,680 --> 00:12:59,560
A user shows up with free text pane,
352
00:12:59,560 --> 00:13:03,280
Teams message, portal form, email, pick one channel first.
353
00:13:03,280 --> 00:13:05,680
The agent's first job is intent classification.
354
00:13:05,680 --> 00:13:07,480
Not sentiment, not personality.
355
00:13:07,480 --> 00:13:09,040
What is this in operational terms?
356
00:13:09,040 --> 00:13:14,240
Password reset, VPN, Outlook, device compliance, access request, known outage.
357
00:13:14,240 --> 00:13:16,040
Something is broken with no signal.
358
00:13:16,040 --> 00:13:19,400
That classification determines everything downstream, then comes enrichment.
359
00:13:19,400 --> 00:13:22,000
This is the part most teams skip because it's not shiny.
360
00:13:22,000 --> 00:13:25,080
The agent needs just enough context to stop wasting human time.
361
00:13:25,080 --> 00:13:28,400
Who the user is, what device they're on, what service they're touching,
362
00:13:28,400 --> 00:13:31,240
whether there's a current incident and whether this is a repeat.
363
00:13:31,240 --> 00:13:35,880
If the organization has an ITSM platform, that's where the prior history lives.
364
00:13:35,880 --> 00:13:39,560
If the organization has a service catalog, that's where rooting should land.
365
00:13:39,560 --> 00:13:42,640
If none of that exists, the agent doesn't magically create it.
366
00:13:42,640 --> 00:13:44,280
It just makes the absence visible.
367
00:13:44,280 --> 00:13:47,760
After enrichment, the agent makes the only decision that matters.
368
00:13:47,760 --> 00:13:50,120
Resolve, root, or create.
369
00:13:50,120 --> 00:13:53,840
Resolve means the agent has a deterministic fix path and the risk is low.
370
00:13:53,840 --> 00:13:56,240
Reset a password through an approved workflow.
371
00:13:56,240 --> 00:13:58,400
Provide a step-by-step runbook with citations.
372
00:13:58,400 --> 00:14:00,560
Confirm the user did it, verify success.
373
00:14:00,560 --> 00:14:01,480
Close the loop.
374
00:14:01,480 --> 00:14:05,200
Root means the agent can't safely execute, but it can identify the right team
375
00:14:05,200 --> 00:14:06,720
and hand them a clean payload.
376
00:14:06,720 --> 00:14:11,720
The intent, the likely service, the urgency, the impact, the evidence, and what was attempted.
377
00:14:11,720 --> 00:14:13,480
Routing without payload is theater.
378
00:14:13,480 --> 00:14:15,600
Payload is where SLA actually improves.
379
00:14:15,600 --> 00:14:20,000
Create means the user insists on escalation, or the process requires a record,
380
00:14:20,000 --> 00:14:21,480
or the system detects risk.
381
00:14:21,480 --> 00:14:26,280
The agent creates the ticket with structured fields, not a copy paste of the conversation.
382
00:14:26,280 --> 00:14:28,200
This is where power automate earns its keep.
383
00:14:28,200 --> 00:14:30,160
Create a sign, notify, and log.
384
00:14:30,160 --> 00:14:32,120
Deterministic steps stay deterministic.
385
00:14:32,120 --> 00:14:34,280
Now define the containment boundary upfront.
386
00:14:34,280 --> 00:14:38,400
The agent must have a contract that says, these are the things it is allowed to solve end to end.
387
00:14:38,400 --> 00:14:40,720
And these are the things it must never attempt.
388
00:14:40,720 --> 00:14:44,080
Never includes anything privileged, anything financially material,
389
00:14:44,080 --> 00:14:48,440
anything that changes access without approval, and anything that lacks an authoritative source.
390
00:14:48,440 --> 00:14:50,440
That boundary is not about limiting the agent.
391
00:14:50,440 --> 00:14:51,760
It's about protecting trust.
392
00:14:51,760 --> 00:14:54,320
And this is where the no source, no answer policy starts.
393
00:14:54,320 --> 00:14:57,000
Not in week three, not after the first incident on day one.
394
00:14:57,000 --> 00:15:01,920
If the agent can't cite a runbook, a policy, a known issue, or a service status update,
395
00:15:01,920 --> 00:15:02,920
it doesn't answer.
396
00:15:02,920 --> 00:15:03,920
It escalates.
397
00:15:03,920 --> 00:15:07,320
The first time an agent gives a confident, wrong answer to a user who's already frustrated,
398
00:15:07,320 --> 00:15:10,040
adoption dies quietly, permanently.
399
00:15:10,040 --> 00:15:13,160
So for ticket triage, citations, and evidence aren't academic.
400
00:15:13,160 --> 00:15:17,720
There how the system earns the right to exist, tie it directly to the KPIs you said earlier.
401
00:15:17,720 --> 00:15:22,600
Deflection comes from resolving the low risk, high volume intents inside the boundary.
402
00:15:22,600 --> 00:15:26,320
First contact resolution comes from clean enrichment plus grounded runbooks.
403
00:15:26,320 --> 00:15:29,960
Fewer escalations come from correct rooting and fewer dead end handoffs.
404
00:15:29,960 --> 00:15:33,360
SLA improvement comes from structured tickets and reduced back and forth.
405
00:15:33,360 --> 00:15:37,240
And the best part is you can measure all of it without inventing new telemetry.
406
00:15:37,240 --> 00:15:40,320
The ITSM system already tracks timestamps and assignments.
407
00:15:40,320 --> 00:15:43,600
You just need to tag agent touched and capture escalation reasons.
408
00:15:43,600 --> 00:15:48,080
The transition to the next section is the uncomfortable constraint that makes triage work.
409
00:15:48,080 --> 00:15:49,320
Topics aren't free.
410
00:15:49,320 --> 00:15:52,400
Every helpful new intent you add creates ambiguity.
411
00:15:52,400 --> 00:15:54,640
Every ambiguous root creates mysteryage.
412
00:15:54,640 --> 00:15:57,320
And mysteryage is just escalation with extra steps.
413
00:15:57,320 --> 00:16:01,080
So if you want ticket triage to become a pillar instead of a pilot, you have to treat intent
414
00:16:01,080 --> 00:16:02,600
like a design asset.
415
00:16:02,600 --> 00:16:04,040
Not a brainstorm list.
416
00:16:04,040 --> 00:16:08,280
Copilot Studio Design Law intent first, topic second.
417
00:16:08,280 --> 00:16:13,360
Copilot Studio encourages people to think in topics because topics are visible.
418
00:16:13,360 --> 00:16:14,560
They feel like progress.
419
00:16:14,560 --> 00:16:17,360
Click name it, write a few trigger phrases, ship it.
420
00:16:17,360 --> 00:16:18,840
That's how topics brawl happens.
421
00:16:18,840 --> 00:16:20,360
And topics brawl isn't just messy.
422
00:16:20,360 --> 00:16:22,080
It's an entropy generator.
423
00:16:22,080 --> 00:16:26,600
Every new topic adds another overlapping root the system can take, which increases ambiguity,
424
00:16:26,600 --> 00:16:30,680
which increases misclassification, which increases escalations, which makes everyone conclude
425
00:16:30,680 --> 00:16:32,400
the agent doesn't work.
426
00:16:32,400 --> 00:16:33,400
It does work.
427
00:16:33,400 --> 00:16:35,600
You just turned routing into a probabilistic game.
428
00:16:35,600 --> 00:16:36,760
So the design law is blunt.
429
00:16:36,760 --> 00:16:38,720
intent first, topic second.
430
00:16:38,720 --> 00:16:41,080
Intent is the operational meaning behind the user's words.
431
00:16:41,080 --> 00:16:42,080
It's stable.
432
00:16:42,080 --> 00:16:44,360
When you set my password will still exist next quarter.
433
00:16:44,360 --> 00:16:48,040
Can't get into my account, still maps to the same underlying outcome.
434
00:16:48,040 --> 00:16:50,120
Intent is what you can instrument and improve.
435
00:16:50,120 --> 00:16:51,360
Topics are just routing tables.
436
00:16:51,360 --> 00:16:53,080
They are implementation detail.
437
00:16:53,080 --> 00:16:54,600
That distinction matters.
438
00:16:54,600 --> 00:16:58,440
Because most organizations start by collecting departmental wish lists.
439
00:16:58,440 --> 00:17:00,120
We need a topic for VPN.
440
00:17:00,120 --> 00:17:01,560
We need a topic for printers.
441
00:17:01,560 --> 00:17:02,960
We need a topic for teams.
442
00:17:02,960 --> 00:17:07,760
And then they create a hundred topics that all trigger on the word can't help or not working.
443
00:17:07,760 --> 00:17:09,120
You didn't build coverage.
444
00:17:09,120 --> 00:17:10,440
You built collisions.
445
00:17:10,440 --> 00:17:13,280
So the first rule is to cap the initial intent space.
446
00:17:13,280 --> 00:17:14,280
Ten to fifteen intents.
447
00:17:14,280 --> 00:17:18,800
Not because the rest don't exist, but because you need stability before you need coverage.
448
00:17:18,800 --> 00:17:23,320
In the first 30 days you're proving that the system can classify and contain reliably.
449
00:17:23,320 --> 00:17:25,800
Not that it can answer every question in the organization.
450
00:17:25,800 --> 00:17:28,880
Here's what those first intents look like in IT triage.
451
00:17:28,880 --> 00:17:30,280
Password and account access.
452
00:17:30,280 --> 00:17:32,280
VPN or remote access.
453
00:17:32,280 --> 00:17:33,280
Email and calendar.
454
00:17:33,280 --> 00:17:34,600
Teams calling and meetings.
455
00:17:34,600 --> 00:17:35,600
Device compliance.
456
00:17:35,600 --> 00:17:36,600
Wi-Fi.
457
00:17:36,600 --> 00:17:37,600
Software install.
458
00:17:37,600 --> 00:17:38,600
Access request.
459
00:17:38,600 --> 00:17:39,760
Service outage check.
460
00:17:39,760 --> 00:17:42,600
And unknown issue as a controlled catch all.
461
00:17:42,600 --> 00:17:45,640
That's enough volume to matter and enough clarity to tune.
462
00:17:45,640 --> 00:17:47,560
Now how do you design intents without guessing?
463
00:17:47,560 --> 00:17:51,360
You use intents signals and co-pilot studio gives you more signals than people use.
464
00:17:51,360 --> 00:17:53,880
The obvious signal is user language patterns.
465
00:17:53,880 --> 00:17:56,440
The phrases and synonyms people actually type.
466
00:17:56,440 --> 00:17:58,360
Not what IT calls it, what users call it.
467
00:17:58,360 --> 00:18:02,720
No my laptop won't connect is not 802.1x supplicant failure.
468
00:18:02,720 --> 00:18:05,280
You model the human input, then translate.
469
00:18:05,280 --> 00:18:06,840
The next signal is metadata.
470
00:18:06,840 --> 00:18:10,800
If you have a portal form, it might include service or category fields.
471
00:18:10,800 --> 00:18:14,760
If you have ITSM integration, you might have existing categories you can map to.
472
00:18:14,760 --> 00:18:17,960
These can reduce ambiguity, but only if your categories aren't garbage.
473
00:18:17,960 --> 00:18:22,600
If your ITSM taxonomy is 15 variations of other, the agent can't salvage it.
474
00:18:22,600 --> 00:18:23,600
Then there's channel context.
475
00:18:23,600 --> 00:18:26,840
A team's message at 9 a.m. Monday is often I'm stuck right now.
476
00:18:26,840 --> 00:18:28,720
A portal submission might be more structured.
477
00:18:28,720 --> 00:18:32,080
An email to a shared mailbox is often a dump of symptoms.
478
00:18:32,080 --> 00:18:33,200
Channel changes language.
479
00:18:33,200 --> 00:18:36,400
That means channel is part of intent detection, not just where you published.
480
00:18:36,400 --> 00:18:39,240
Now the part most people avoid, the fallback strategy.
481
00:18:39,240 --> 00:18:41,640
In Copilot Studio, fallback is not a safety net.
482
00:18:41,640 --> 00:18:42,960
It is a control surface.
483
00:18:42,960 --> 00:18:47,260
If you let fallback behave like I'll try to be helpful anyway, you just build hallucinations
484
00:18:47,260 --> 00:18:48,800
into your routing layer.
485
00:18:48,800 --> 00:18:51,960
The agent will invent an intent, pick a tool and act with confidence.
486
00:18:51,960 --> 00:18:55,440
So you need one controlled fallback, one.
487
00:18:55,440 --> 00:18:56,800
Fallback should do three things.
488
00:18:56,800 --> 00:19:03,120
In order, ask one clarifying question to force disambiguation, check for known outage or incident
489
00:19:03,120 --> 00:19:07,440
context and then escalate with a structured payload if it still can't classify.
490
00:19:07,440 --> 00:19:09,960
No long conversations, no 20 questions.
491
00:19:09,960 --> 00:19:13,300
If the agent can't classify after one clarifier, it roots.
492
00:19:13,300 --> 00:19:16,280
That keeps the system fast and keeps the failure mode predictable.
493
00:19:16,280 --> 00:19:17,680
And yes, it feels strict.
494
00:19:17,680 --> 00:19:18,680
Good.
495
00:19:18,680 --> 00:19:20,640
Strict is how you prevent conditional chaos.
496
00:19:20,640 --> 00:19:22,160
Now topic lifecycle.
497
00:19:22,160 --> 00:19:25,000
This is the part that prevents your backlog from becoming a museum.
498
00:19:25,000 --> 00:19:27,280
You kill weak topics early, not later.
499
00:19:27,280 --> 00:19:28,880
Early.
500
00:19:28,880 --> 00:19:31,480
Set deprecation criteria upfront.
501
00:19:31,480 --> 00:19:37,800
Low usage, high confusion, low containment, high escalation, high unknown fallback rate,
502
00:19:37,800 --> 00:19:40,080
or repeated misroots into the wrong cues.
503
00:19:40,080 --> 00:19:43,920
If a topic can't hit containment and routing accuracy targets inside two weeks, it doesn't
504
00:19:43,920 --> 00:19:44,920
get a third month.
505
00:19:44,920 --> 00:19:49,920
It gets removed or merged because every weak topic you keep becomes permanent ambiguity.
506
00:19:49,920 --> 00:19:51,880
And ambiguity compounds.
507
00:19:51,880 --> 00:19:56,000
Finally, close the loop with a design choice that prevents ghost agents and sprawl later,
508
00:19:56,000 --> 00:19:59,960
treat intents as a shared enterprise asset, not as per agent inventions.
509
00:19:59,960 --> 00:20:04,480
One intent registry, one naming scheme, one owner, one backlog.
510
00:20:04,480 --> 00:20:06,760
Agents can vary by channel and audience.
511
00:20:06,760 --> 00:20:07,760
Intents shouldn't.
512
00:20:07,760 --> 00:20:10,200
When intents are stable, topics become small and boring.
513
00:20:10,200 --> 00:20:12,040
That's the point.
514
00:20:12,040 --> 00:20:14,040
Orchestration becomes the real product.
515
00:20:14,040 --> 00:20:17,120
Event to decision to action, to verification, to hand off.
516
00:20:17,120 --> 00:20:19,080
Topics just tell the system where to start.
517
00:20:19,080 --> 00:20:22,920
And that transition matters because once routing is disciplined, you can finally build
518
00:20:22,920 --> 00:20:24,520
the thing people think they are buying.
519
00:20:24,520 --> 00:20:25,920
A control plane.
520
00:20:25,920 --> 00:20:27,120
Orchestration is a control plane.
521
00:20:27,120 --> 00:20:30,280
Once intents are disciplined, the next mistake is thinking the work is done.
522
00:20:30,280 --> 00:20:31,280
It isn't.
523
00:20:31,280 --> 00:20:32,280
Routing is just a switchboard.
524
00:20:32,280 --> 00:20:34,080
The real product is orchestration.
525
00:20:34,080 --> 00:20:37,920
Orchestration is the control plane that turns an interaction into a verified outcome.
526
00:20:37,920 --> 00:20:41,040
Event to reasoning, to action, to verification, to hand off.
527
00:20:41,040 --> 00:20:43,240
Without that loop, you have a conversational index.
528
00:20:43,240 --> 00:20:45,480
With it, you have something that can actually replace work.
529
00:20:45,480 --> 00:20:48,800
That distinction matters because most early agents behave like this.
530
00:20:48,800 --> 00:20:52,240
The user types a problem, the agent answers in paragraphs, and then the user still has
531
00:20:52,240 --> 00:20:53,840
to do the next five steps.
532
00:20:53,840 --> 00:20:55,360
They copy text into a ticket.
533
00:20:55,360 --> 00:20:56,640
They hunt for the right form.
534
00:20:56,640 --> 00:20:57,640
They message the wrong team.
535
00:20:57,640 --> 00:20:58,640
They repeat themselves.
536
00:20:58,640 --> 00:21:00,320
The agent helped, but nothing moved.
537
00:21:00,320 --> 00:21:02,520
A control plane agent doesn't aim to be eloquent.
538
00:21:02,520 --> 00:21:03,960
It aims to be operational.
539
00:21:03,960 --> 00:21:06,760
So the orchestration pattern is simple and repeatable.
540
00:21:06,760 --> 00:21:08,280
First, classify.
541
00:21:08,280 --> 00:21:10,280
Confirm the intent and the containment boundary.
542
00:21:10,280 --> 00:21:13,560
If the user asks for something outside the boundary, the agent doesn't negotiate.
543
00:21:13,560 --> 00:21:14,560
It roots.
544
00:21:14,560 --> 00:21:18,040
That prevents the slow drift from helpful assistant into probabilistic operator.
545
00:21:18,040 --> 00:21:19,800
Second, retrieve.
546
00:21:19,800 --> 00:21:23,160
Pull the minimum authoritative knowledge needed to propose a solution.
547
00:21:23,160 --> 00:21:27,120
That can be a runbook, a policy, a service status item, or a known issue record.
548
00:21:27,120 --> 00:21:28,440
No source means no answer.
549
00:21:28,440 --> 00:21:30,640
This is where the agent proves it's not improvising.
550
00:21:30,640 --> 00:21:31,800
Third, propose.
551
00:21:31,800 --> 00:21:34,200
The agent gives a short, actionable recommendation.
552
00:21:34,200 --> 00:21:35,200
Not an essay.
553
00:21:35,200 --> 00:21:37,680
Two or three steps max, written like instructions.
554
00:21:37,680 --> 00:21:38,800
And here's the weird part.
555
00:21:38,800 --> 00:21:40,640
The proposal is not the action.
556
00:21:40,640 --> 00:21:42,440
It's a plan the system can verify.
557
00:21:42,440 --> 00:21:43,880
Fourth, confirm.
558
00:21:43,880 --> 00:21:47,440
This is where human in the loop becomes precise instead of performative.
559
00:21:47,440 --> 00:21:49,480
You don't approve the whole agent.
560
00:21:49,480 --> 00:21:50,960
You approve decision points.
561
00:21:50,960 --> 00:21:51,960
Should I reset your password?
562
00:21:51,960 --> 00:21:54,640
Should I create a ticket in this category?
563
00:21:54,640 --> 00:21:56,680
Should I request access on your behalf?
564
00:21:56,680 --> 00:21:59,520
The approval happens at the boundary between read and write.
565
00:21:59,520 --> 00:22:00,520
Fifth, execute.
566
00:22:00,520 --> 00:22:03,240
Only after confirmation and only through allowed tools.
567
00:22:03,240 --> 00:22:06,920
This is where power automate and MCP style tool boundaries matter.
568
00:22:06,920 --> 00:22:11,680
The agent should not be composing ad hoc API calls like a drunk junior developer.
569
00:22:11,680 --> 00:22:14,120
It should call known tools with known parameters.
570
00:22:14,120 --> 00:22:15,720
Sixth, verify.
571
00:22:15,720 --> 00:22:17,600
The agent checks whether the action worked.
572
00:22:17,600 --> 00:22:19,120
Did the password reset succeed?
573
00:22:19,120 --> 00:22:20,520
Did the ticket get created?
574
00:22:20,520 --> 00:22:22,680
Did the user confirm access is restored?
575
00:22:22,680 --> 00:22:26,840
Verification is where agents stop being theater and start being reliable.
576
00:22:26,840 --> 00:22:29,200
Seventh, hand off.
577
00:22:29,200 --> 00:22:32,960
If the agent can't resolve, it escalates with a structured payload.
578
00:22:32,960 --> 00:22:36,560
Intent, enriched context, sources used, steps attempted,
579
00:22:36,560 --> 00:22:38,520
and what it needs the human to decide.
580
00:22:38,520 --> 00:22:40,040
The human shouldn't read a transcript.
581
00:22:40,040 --> 00:22:41,200
They should read a case file.
582
00:22:41,200 --> 00:22:43,480
That's orchestration.
583
00:22:43,480 --> 00:22:47,000
Now the uncomfortable constraint, tool invocation boundaries.
584
00:22:47,000 --> 00:22:50,200
Every tool you connect is an entropy generator if you don't gate it.
585
00:22:50,200 --> 00:22:51,240
Start read only.
586
00:22:51,240 --> 00:22:52,480
List get search.
587
00:22:52,480 --> 00:22:54,720
Delay create update delete.
588
00:22:54,720 --> 00:22:57,880
If you allow write operations on day one, you're not accelerating delivery.
589
00:22:57,880 --> 00:22:59,720
You're creating an incident with better marketing.
590
00:22:59,720 --> 00:23:02,080
So implement a two tier tool policy.
591
00:23:02,080 --> 00:23:05,320
Tier one tools are read only and safe.
592
00:23:05,320 --> 00:23:06,880
Check service status.
593
00:23:06,880 --> 00:23:08,520
Search the knowledge base.
594
00:23:08,520 --> 00:23:09,640
Look up ticket history.
595
00:23:09,640 --> 00:23:11,200
Fetch device compliance state.
596
00:23:11,200 --> 00:23:12,880
These tools reduce uncertainty.
597
00:23:12,880 --> 00:23:14,040
They don't change state.
598
00:23:14,040 --> 00:23:16,480
Tier two tools are write actions and risky.
599
00:23:16,480 --> 00:23:17,400
Create a ticket.
600
00:23:17,400 --> 00:23:22,320
Update a user attribute, grant access, reset credentials, trigger changes in downstream systems.
601
00:23:22,320 --> 00:23:25,520
These tools require explicit approval and stronger logging.
602
00:23:25,520 --> 00:23:27,200
Some require stronger authentication.
603
00:23:27,200 --> 00:23:28,560
The point is not to slow down.
604
00:23:28,560 --> 00:23:31,720
The point is to keep autonomy proportional to risk.
605
00:23:31,720 --> 00:23:34,400
Human in the loop also needs to be placed correctly.
606
00:23:34,400 --> 00:23:36,360
Approve everything is just a slow agent.
607
00:23:36,360 --> 00:23:41,400
Approve nothing is how you end up explaining to leadership why an LLM updated a production
608
00:23:41,400 --> 00:23:42,400
record.
609
00:23:42,400 --> 00:23:45,760
The control plane placement is approve at irreversible boundaries.
610
00:23:45,760 --> 00:23:50,480
Actions, privileged operations, external communications, anything financially material, anything
611
00:23:50,480 --> 00:23:53,520
that could become an audit question, everything else should run.
612
00:23:53,520 --> 00:23:55,240
And here's why this improves adoption.
613
00:23:55,240 --> 00:23:56,680
Users don't want explanations.
614
00:23:56,680 --> 00:23:57,800
They want next actions.
615
00:23:57,800 --> 00:24:00,000
A good orchestration response looks like.
616
00:24:00,000 --> 00:24:02,040
I can do A or B. Here's what I found.
617
00:24:02,040 --> 00:24:03,040
Pick one.
618
00:24:03,040 --> 00:24:04,600
That turns chat into decisions.
619
00:24:04,600 --> 00:24:05,920
Decisions turn into outcomes.
620
00:24:05,920 --> 00:24:07,440
Outcomes are what executives fund.
621
00:24:07,440 --> 00:24:10,720
So by the end of this section, the system is no longer an agent that talks.
622
00:24:10,720 --> 00:24:13,640
It's a control plane that roots, acts and verifies.
623
00:24:13,640 --> 00:24:19,000
An orchestration has a hidden dependency and this is where most builds stall, context enrichment.
624
00:24:19,000 --> 00:24:22,200
Because reasoning without context is just confident guessing.
625
00:24:22,200 --> 00:24:24,320
Context enrichment without overreach.
626
00:24:24,320 --> 00:24:28,600
Context enrichment is where most smart agents quietly become privacy incidents.
627
00:24:28,600 --> 00:24:30,440
Because enrichment feels harmless.
628
00:24:30,440 --> 00:24:34,480
Pull a little identity, a little device info, maybe some recent tickets, maybe a list
629
00:24:34,480 --> 00:24:35,480
of installed apps.
630
00:24:35,480 --> 00:24:36,480
What could go wrong?
631
00:24:36,480 --> 00:24:37,480
Overreach goes wrong.
632
00:24:37,480 --> 00:24:39,280
And it goes wrong in two ways at the same time.
633
00:24:39,280 --> 00:24:40,480
You increase risk.
634
00:24:40,480 --> 00:24:41,760
And you decrease accuracy.
635
00:24:41,760 --> 00:24:46,360
The model gets more tokens, more noise, more chance to anchor on irrelevant detail.
636
00:24:46,360 --> 00:24:48,280
You trade it clarity for context hoarding.
637
00:24:48,280 --> 00:24:50,440
So the rule is minimum viable context.
638
00:24:50,440 --> 00:24:53,560
Only the facts the process needs to make the next decision safely.
639
00:24:53,560 --> 00:24:56,040
For IT triage, that minimum set is boring.
640
00:24:56,040 --> 00:24:57,440
And that's why it works.
641
00:24:57,440 --> 00:24:59,120
First, user identity.
642
00:24:59,120 --> 00:25:00,400
Not a biography.
643
00:25:00,400 --> 00:25:03,760
Just the stable identifiers that matter for rooting and policy.
644
00:25:03,760 --> 00:25:08,320
User principle name, department if it maps to support groups, location if it maps to service
645
00:25:08,320 --> 00:25:11,720
availability, and whether the user is privileged.
646
00:25:11,720 --> 00:25:13,960
User's change the containment boundary.
647
00:25:13,960 --> 00:25:17,800
An agent can't treat a help desk admin like an intern with a locked out mailbox.
648
00:25:17,800 --> 00:25:19,160
Second, device state.
649
00:25:19,160 --> 00:25:23,480
If the use case touches endpoint compliance, you need device ID, OS management state and
650
00:25:23,480 --> 00:25:24,560
compliance status.
651
00:25:24,560 --> 00:25:26,400
You don't need a full inventory dump.
652
00:25:26,400 --> 00:25:27,400
The question is simple.
653
00:25:27,400 --> 00:25:30,480
Can this user do the thing they're asking for on the device they're using?
654
00:25:30,480 --> 00:25:31,800
Third, service context.
655
00:25:31,800 --> 00:25:33,160
Is there an active incident?
656
00:25:33,160 --> 00:25:34,520
Is the service degraded?
657
00:25:34,520 --> 00:25:36,000
Is there a change window in progress?
658
00:25:36,000 --> 00:25:37,640
This one is the hidden time saver.
659
00:25:37,640 --> 00:25:39,080
Half of my teams is broken.
660
00:25:39,080 --> 00:25:40,280
Isn't a user problem.
661
00:25:40,280 --> 00:25:41,200
It's an outage.
662
00:25:41,200 --> 00:25:44,560
If the agent can detect that early, it stops the endless troubleshooting theatre and
663
00:25:44,560 --> 00:25:46,280
routes to the right message.
664
00:25:46,280 --> 00:25:47,280
Status.
665
00:25:47,280 --> 00:25:48,280
Expectation.
666
00:25:48,280 --> 00:25:49,280
And next check.
667
00:25:49,280 --> 00:25:50,800
Fourth, recent history.
668
00:25:50,800 --> 00:25:51,800
Recent tickets.
669
00:25:51,800 --> 00:25:52,800
Recent similar incidents.
670
00:25:52,800 --> 00:25:54,040
Recent failed attempts.
671
00:25:54,040 --> 00:25:57,400
This is how you avoid re-triaging the same person every week.
672
00:25:57,400 --> 00:25:58,400
But keep it tight.
673
00:25:58,400 --> 00:25:59,400
Last five tickets.
674
00:25:59,400 --> 00:26:00,400
Last seven days.
675
00:26:00,400 --> 00:26:01,400
Same category.
676
00:26:01,400 --> 00:26:04,000
Anything beyond that becomes narrative, not signal.
677
00:26:04,000 --> 00:26:06,120
Fifth, service catalog mapping.
678
00:26:06,120 --> 00:26:10,440
If your ITSM has a catalog, pull the service and the default assignment group that turns
679
00:26:10,440 --> 00:26:14,800
I need help into this belongs to Group X with fields Y populated.
680
00:26:14,800 --> 00:26:16,520
Which is what actually reduces SLA.
681
00:26:16,520 --> 00:26:19,800
Now the sources, people say M365 signals like that's one thing.
682
00:26:19,800 --> 00:26:20,800
It isn't.
683
00:26:20,800 --> 00:26:24,000
It's a pile of systems with different governance, different data boundaries and
684
00:26:24,000 --> 00:26:25,640
different failure modes.
685
00:26:25,640 --> 00:26:29,520
Enrichment sources you can justify in an IT triage workflow are usually
686
00:26:29,520 --> 00:26:33,480
Entra directory attributes, device compliance signals from endpoint management,
687
00:26:33,480 --> 00:26:37,400
ITSM fields and ticket history and a curated service status source.
688
00:26:37,400 --> 00:26:39,880
In some orgs, that service status is in service now.
689
00:26:39,880 --> 00:26:42,040
In others, it's in a team's channel post nobody owns.
690
00:26:42,040 --> 00:26:45,680
Either way, make it a source the agent can cite, not a rumor it repeats.
691
00:26:45,680 --> 00:26:48,000
And this is where the system law shows up again.
692
00:26:48,000 --> 00:26:49,000
Normalize inputs.
693
00:26:49,000 --> 00:26:53,320
If your taxonomy for service has 15 spellings of the same thing, enrichment will amplify
694
00:26:53,320 --> 00:26:54,320
the mess.
695
00:26:54,320 --> 00:26:57,320
The agent can only be as deterministic as the labels you feed it.
696
00:26:57,320 --> 00:27:02,800
So define a small taxonomy for the pilot, service, urgency, impact, environment, not 50 fields,
697
00:27:02,800 --> 00:27:03,800
four.
698
00:27:03,800 --> 00:27:04,800
And make the mapping explicit.
699
00:27:04,800 --> 00:27:09,840
If the user says Outlook, the system maps to Exchange Online, not email-ish problem.
700
00:27:09,840 --> 00:27:12,320
Then add a verification step before any right action.
701
00:27:12,320 --> 00:27:15,160
This is the difference between enrichment and hallucination.
702
00:27:15,160 --> 00:27:18,720
The agent should play back the enriched facts and ask for confirmation when those facts
703
00:27:18,720 --> 00:27:19,960
will change behavior.
704
00:27:19,960 --> 00:27:24,000
Your on-device X, it's non-compliant and this request requires compliance.
705
00:27:24,000 --> 00:27:25,200
Is that correct?
706
00:27:25,200 --> 00:27:29,280
Your requesting access to system Y, which is a privileged app, confirm.
707
00:27:29,280 --> 00:27:30,800
Verification doesn't mean you approve everything.
708
00:27:30,800 --> 00:27:33,760
It means the agent doesn't treat guest context as truth.
709
00:27:33,760 --> 00:27:37,720
Now privacy, if you're tempted to pull HR attributes, manager chains, performance data
710
00:27:37,720 --> 00:27:40,440
or everything in graph because it's available, stop.
711
00:27:40,440 --> 00:27:41,440
That's not enrichment.
712
00:27:41,440 --> 00:27:42,920
That's surveillance, cosplay.
713
00:27:42,920 --> 00:27:47,440
The minimum viable context principle protects you because it forces a justification.
714
00:27:47,440 --> 00:27:49,520
What decision does this field influence?
715
00:27:49,520 --> 00:27:52,720
If the answer is none, the field doesn't belong in the agent.
716
00:27:52,720 --> 00:27:57,600
And yes, more context sometimes improves accuracy, but accuracy without boundary becomes a liability.
717
00:27:57,600 --> 00:28:01,560
The agent will learn to use data it shouldn't and users will learn they can prompt it into
718
00:28:01,560 --> 00:28:02,560
revealing it.
719
00:28:02,560 --> 00:28:05,680
So keep enrichment tightly scoped, audited and reversible.
720
00:28:05,680 --> 00:28:10,160
If a field becomes risky, remove it and the system still functions because context is the
721
00:28:10,160 --> 00:28:12,320
hidden engine of orchestration.
722
00:28:12,320 --> 00:28:16,680
But the next failure mode is what happens when the agent has context has a plan and still
723
00:28:16,680 --> 00:28:18,320
answers wrong with confidence.
724
00:28:18,320 --> 00:28:22,720
That's grounding and it kills adoption faster than any outage ever will.
725
00:28:22,720 --> 00:28:25,680
The failure mode that kills adoption, confident wrong answer.
726
00:28:25,680 --> 00:28:28,640
If you remember one thing about adoption, make it this.
727
00:28:28,640 --> 00:28:30,880
Users will forgive, I don't know.
728
00:28:30,880 --> 00:28:34,080
They will not forgive, I'm sure, followed by being wrong.
729
00:28:34,080 --> 00:28:38,080
It's the failure mode that kills an agent program in week two because the first time an agent
730
00:28:38,080 --> 00:28:41,800
answers confidently and incorrectly, the user doesn't file a bug report.
731
00:28:41,800 --> 00:28:45,040
They don't politely provide feedback, they screenshot it, paste it into a team's chat
732
00:28:45,040 --> 00:28:47,840
and the story becomes co-pilot make stuff up.
733
00:28:47,840 --> 00:28:51,800
And once that narrative exists, every future success gets dismissed as luck.
734
00:28:51,800 --> 00:28:53,680
This is why grounding isn't an enhancement.
735
00:28:53,680 --> 00:28:55,240
It's a survival requirement.
736
00:28:55,240 --> 00:28:57,360
Grounding failures usually happen for three reasons.
737
00:28:57,360 --> 00:29:01,320
First, the agent answers from its general model knowledge instead of your enterprise truth.
738
00:29:01,320 --> 00:29:05,400
It's fine when the question is generic and catastrophic when the question is policy, HR,
739
00:29:05,400 --> 00:29:06,880
security or internal process.
740
00:29:06,880 --> 00:29:11,320
The model can sound correct while being wrong in exactly the ways that matter to auditors.
741
00:29:11,320 --> 00:29:13,600
Second, the agent retrieves the wrong document.
742
00:29:13,600 --> 00:29:16,760
Not because retrieval is broken but because your content is ambiguous.
743
00:29:16,760 --> 00:29:20,440
Two similar policies, a stale runbook, a SharePoint page with three unrelated procedures
744
00:29:20,440 --> 00:29:21,920
jammed into one.
745
00:29:21,920 --> 00:29:24,600
Retrieval doesn't fix entropy, it indexes it.
746
00:29:24,600 --> 00:29:27,280
Third, the agent blends sources.
747
00:29:27,280 --> 00:29:31,360
It pulls one chunk from one dock, another chunk from another dock and then stitches a reasonable
748
00:29:31,360 --> 00:29:32,960
answer that never existed.
749
00:29:32,960 --> 00:29:35,520
It feels helpful, it also becomes impossible to defend.
750
00:29:35,520 --> 00:29:38,560
So the control rule has to be explicit and enforced.
751
00:29:38,560 --> 00:29:40,960
Citations required for any non-trivial claim.
752
00:29:40,960 --> 00:29:43,320
Not citations are nice, required.
753
00:29:43,320 --> 00:29:45,840
If the user asks, "What's the VPN setup?"
754
00:29:45,840 --> 00:29:50,200
The agent can cite, "If the user asks, can I access this customer data set from my personal
755
00:29:50,200 --> 00:29:51,200
device?"
756
00:29:51,200 --> 00:29:54,840
The agent's sites or escalates, "If the agent can't produce a source, it doesn't answer,
757
00:29:54,840 --> 00:29:55,840
it routes."
758
00:29:55,840 --> 00:30:00,160
Also where you separate knowledge from action, answering and doing our different risk classes
759
00:30:00,160 --> 00:30:03,120
and the platform won't keep them separate unless you design it that way.
760
00:30:03,120 --> 00:30:07,720
A grounded answer is red behavior, it's retrieval plus summarization with evidence, an action
761
00:30:07,720 --> 00:30:10,800
is right behavior, it's changing state in a system of record.
762
00:30:10,800 --> 00:30:15,160
You don't grant permissions because the agent wrote a persuasive paragraph, you grant permissions
763
00:30:15,160 --> 00:30:19,840
because a tool call executed under a constrained identity with explicit approval and a log
764
00:30:19,840 --> 00:30:20,840
you can defend.
765
00:30:20,840 --> 00:30:24,680
That distinction matters because many teams accidentally couple them.
766
00:30:24,680 --> 00:30:26,920
The agent answers, therefore it acts.
767
00:30:26,920 --> 00:30:30,600
That's how you end up with tool misuse driven by conversational confidence.
768
00:30:30,600 --> 00:30:35,440
Now define grounded accuracy because vague quality discussions turn into feelings.
769
00:30:35,440 --> 00:30:40,880
Grounded accuracy means in a test set of real questions, the agent's answer is supported
770
00:30:40,880 --> 00:30:44,240
by the cited source and the source is the correct source for that question.
771
00:30:44,240 --> 00:30:48,360
Not close, not sounds right, supported and correct.
772
00:30:48,360 --> 00:30:52,080
You measure it with sampling, you don't need a PhD evaluation framework to start.
773
00:30:52,080 --> 00:30:53,480
You need a fixed question set.
774
00:30:53,480 --> 00:30:57,960
The top intents, the top policies, the top runbooks and the top known issue questions.
775
00:30:57,960 --> 00:31:02,120
You run them weekly, you score correct with correct citation, correct with wrong citation,
776
00:31:02,120 --> 00:31:06,240
incorrect with citation, incorrect with no citation and escalated appropriately.
777
00:31:06,240 --> 00:31:11,640
Your target in 30 days is greater than 85% grounded accuracy on that evaluation set.
778
00:31:11,640 --> 00:31:15,600
That's realistic if you constrain scope and enforce no source, no answer.
779
00:31:15,600 --> 00:31:20,680
And you categorize failure reasons because fixing the wrong thing wastes weeks.
780
00:31:20,680 --> 00:31:23,640
Using doc, the knowledge doesn't exist in an indexable form.
781
00:31:23,640 --> 00:31:28,040
Wrong doc, retrieval pulled in adjacent policy often because metadata is weak.
782
00:31:28,040 --> 00:31:30,760
Wrong inference, the agent made a leap the doc didn't support.
783
00:31:30,760 --> 00:31:33,200
Stale doc, the truth changed, the index didn't.
784
00:31:33,200 --> 00:31:36,480
Now the part everyone avoids until it hurts, red teaming prompts.
785
00:31:36,480 --> 00:31:37,480
Do it early.
786
00:31:37,480 --> 00:31:41,840
Not because you expect a nation state attacker in week one, but because normal users accidentally
787
00:31:41,840 --> 00:31:43,240
behave like attackers.
788
00:31:43,240 --> 00:31:47,760
They paste emails, they paste error messages, they paste internal links and sometimes they
789
00:31:47,760 --> 00:31:50,080
paste instructions that conflict with policy.
790
00:31:50,080 --> 00:31:51,400
The model will try to comply.
791
00:31:51,400 --> 00:31:56,240
So your red team set includes prompt injection attempts, requests to ignore policy, requests
792
00:31:56,240 --> 00:32:01,160
to reveal sensitive data and instructions to perform actions outside the containment boundary.
793
00:32:01,160 --> 00:32:05,680
You run them against the agent before you expand the pilot group and the key is you don't
794
00:32:05,680 --> 00:32:08,440
treat failures as the model is dumb.
795
00:32:08,440 --> 00:32:13,800
You treat them as design signals, tighten boundaries, improve retrieval, add an escalation clause,
796
00:32:13,800 --> 00:32:16,720
remove an unsafe tool because trust doesn't come from charm.
797
00:32:16,720 --> 00:32:20,880
It comes from predictable behavior and grounding is what makes behavior predictable, but grounding
798
00:32:20,880 --> 00:32:23,440
can't be implemented as vibes and prompt warnings.
799
00:32:23,440 --> 00:32:27,400
It needs a computable knowledge layer and a retrieval strategy you can tune.
800
00:32:27,400 --> 00:32:28,640
SharePoints brawl won't save you.
801
00:32:28,640 --> 00:32:31,160
That's why the next section is Azure AI search.
802
00:32:31,160 --> 00:32:35,000
Turning knowledge into something the system can actually retrieve on purpose.
803
00:32:35,000 --> 00:32:37,960
As your AI search make knowledge computable.
804
00:32:37,960 --> 00:32:39,560
SharePoint is not a knowledge strategy.
805
00:32:39,560 --> 00:32:41,680
It's a document landfill with a search box.
806
00:32:41,680 --> 00:32:43,360
And yes, Microsoft search has improved.
807
00:32:43,360 --> 00:32:46,840
The pilot can sometimes find the right page, but sometimes is exactly the problem.
808
00:32:46,840 --> 00:32:51,880
An agentic workforce can't run on probabilistic discovery when the output needs to be audible,
809
00:32:51,880 --> 00:32:53,320
repeatable and fast.
810
00:32:53,320 --> 00:32:55,880
The system needs knowledge it can retrieve on purpose.
811
00:32:55,880 --> 00:32:57,960
That's what Azure AI search actually does.
812
00:32:57,960 --> 00:32:59,480
It doesn't make content smarter.
813
00:32:59,480 --> 00:33:01,080
It makes content computable.
814
00:33:01,080 --> 00:33:05,360
Indexed, chunked, tagged, refreshed and security trimmed so retrieval becomes a designed
815
00:33:05,360 --> 00:33:07,240
behavior instead of a hope.
816
00:33:07,240 --> 00:33:10,840
That distinction matters because grounding collapses when retrieval is accidental.
817
00:33:10,840 --> 00:33:15,440
So the simple version is Azure AI search turns your messy pile of documents into an index
818
00:33:15,440 --> 00:33:17,280
the agent can query with structure.
819
00:33:17,280 --> 00:33:18,680
And the structure is the whole game.
820
00:33:18,680 --> 00:33:21,720
The first design choice is what you index.
821
00:33:21,720 --> 00:33:25,280
Most organizations point at the SharePoint site and call it done.
822
00:33:25,280 --> 00:33:27,960
That's how you get blended answers and stale policy conflicts.
823
00:33:27,960 --> 00:33:31,280
Instead, index the knowledge that is allowed to be operational truth.
824
00:33:31,280 --> 00:33:36,480
Runbooks approved SOPs, policy documents, known issue articles, service status notices and
825
00:33:36,480 --> 00:33:38,200
service catalog entries.
826
00:33:38,200 --> 00:33:41,520
Not drafts, not personal notes, not the random wiki nobody owns.
827
00:33:41,520 --> 00:33:45,760
If it doesn't have an owner and a life cycle, it doesn't belong in an index feeding production
828
00:33:45,760 --> 00:33:46,760
answers.
829
00:33:46,760 --> 00:33:47,760
Then comes chunking.
830
00:33:47,760 --> 00:33:50,360
This is where retrieval either stays clean or becomes a smear.
831
00:33:50,360 --> 00:33:54,280
chunking is splitting documents into smaller pieces so the system can retrieve the exact
832
00:33:54,280 --> 00:33:56,000
part that answers the question.
833
00:33:56,000 --> 00:34:00,520
If your chunk is too large, the model gets an entire page with three procedures and it
834
00:34:00,520 --> 00:34:01,640
will blend them.
835
00:34:01,640 --> 00:34:06,400
If the chunk is too small, the model loses context and starts inventing transitions.
836
00:34:06,400 --> 00:34:08,120
The right answer is boring.
837
00:34:08,120 --> 00:34:09,680
Product by atomic procedure.
838
00:34:09,680 --> 00:34:14,080
One policy section per chunk, one runbook step sequence per chunk, one exception clause per
839
00:34:14,080 --> 00:34:15,080
chunk.
840
00:34:15,080 --> 00:34:16,880
The goal is not to index the dog.
841
00:34:16,880 --> 00:34:18,880
The goal is to index the decision unit.
842
00:34:18,880 --> 00:34:20,720
That's the atomic knowledge rule.
843
00:34:20,720 --> 00:34:24,680
If a piece of content can't stand alone as an answer source, it's not a good chunk.
844
00:34:24,680 --> 00:34:25,680
Next is metadata.
845
00:34:25,680 --> 00:34:28,720
Without metadata, retrieval becomes a popularity contest.
846
00:34:28,720 --> 00:34:32,760
Metadata is how you turn VPN policy into VPN policy for contractors.
847
00:34:32,760 --> 00:34:39,200
One EU applies to Windows, updated 2025-01-12-Owner security ops.
848
00:34:39,200 --> 00:34:41,320
The agent doesn't need all of that in the response.
849
00:34:41,320 --> 00:34:44,560
It needs it for filtering and ranking so it doesn't retrieve the wrong thing.
850
00:34:44,560 --> 00:34:49,480
So tag content by service, audience, region, risk tier, dog type and last review date.
851
00:34:49,480 --> 00:34:51,960
Keep the taxonomy small, consistent and enforced.
852
00:34:51,960 --> 00:34:55,840
If your metadata is optional, it will be missing on the documents that matter most.
853
00:34:55,840 --> 00:34:57,160
That's how entropy works.
854
00:34:57,160 --> 00:34:59,680
Now security trimming, this is not a nice to have.
855
00:34:59,680 --> 00:35:04,120
If the index can retrieve content, the user shouldn't see, you will eventually leak something.
856
00:35:04,120 --> 00:35:07,920
Not because the model is malicious, because a user will ask a question that causes retrieval
857
00:35:07,920 --> 00:35:11,480
to surface restricted content and the system will try to be helpful.
858
00:35:11,480 --> 00:35:13,920
So the index must respect access controls.
859
00:35:13,920 --> 00:35:17,720
Retrieval should only return chunks the requesting user is entitled to read.
860
00:35:17,720 --> 00:35:22,680
In other words, your knowledge plane must obey the same boundary rules as your data plane.
861
00:35:22,680 --> 00:35:24,480
Refresh cadence is the next trap.
862
00:35:24,480 --> 00:35:25,920
Static index becomes wrong.
863
00:35:25,920 --> 00:35:30,160
Next policies change, outages resolve, runbooks get updated after incidents.
864
00:35:30,160 --> 00:35:35,200
If your index refreshes weekly and your operations change daily, the agent will confidently answer
865
00:35:35,200 --> 00:35:36,680
with yesterday's truth.
866
00:35:36,680 --> 00:35:37,840
Users will notice.
867
00:35:37,840 --> 00:35:38,840
Trust will die.
868
00:35:38,840 --> 00:35:43,880
So set refresh cadence by dog type, service status and known issues refresh frequently.
869
00:35:43,880 --> 00:35:48,120
Policies refresh on publish, runbooks refresh on change control and make last indexed visible
870
00:35:48,120 --> 00:35:52,840
in telemetry because stale answers look identical to hallucinations from the user's perspective.
871
00:35:52,840 --> 00:35:56,720
Now the output requirement shows sources, not because citations feel academic because
872
00:35:56,720 --> 00:35:59,280
they're the only way to make the system defensible.
873
00:35:59,280 --> 00:36:03,060
The agent response should include the answer, the linked source and the specific section
874
00:36:03,060 --> 00:36:04,920
title or except reference.
875
00:36:04,920 --> 00:36:07,160
If the system can't provide that, it escalates.
876
00:36:07,160 --> 00:36:08,240
No source, no answer.
877
00:36:08,240 --> 00:36:11,440
That rule becomes enforceable when retrieval is a design component.
878
00:36:11,440 --> 00:36:12,440
And here's the weird part.
879
00:36:12,440 --> 00:36:16,280
Once you implement Azure AI search, you stop arguing about prompt quality as if it's
880
00:36:16,280 --> 00:36:17,280
the product.
881
00:36:17,280 --> 00:36:18,640
Prompts become thin glue.
882
00:36:18,640 --> 00:36:19,640
Retrieval becomes the product.
883
00:36:19,640 --> 00:36:24,360
Co-pilot studio can sit on top as the orchestration layer, but as your AI search becomes the grounded
884
00:36:24,360 --> 00:36:28,440
knowledge backbone, it's how you make policy computable, not just searchable.
885
00:36:28,440 --> 00:36:32,160
But retrieval still isn't action, knowing the right runbook step doesn't execute the
886
00:36:32,160 --> 00:36:33,160
step.
887
00:36:33,160 --> 00:36:36,880
Knowing which ticket category applies doesn't create the ticket for that you need tools
888
00:36:36,880 --> 00:36:39,520
that are predictable, governed and reusable.
889
00:36:39,520 --> 00:36:43,040
That's where MCP shows up and why it matters more than most people want to admit.
890
00:36:43,040 --> 00:36:45,880
MCP, turning co-pilot from chat into a system.
891
00:36:45,880 --> 00:36:48,800
Azure AI search makes knowledge retrievable on purpose.
892
00:36:48,800 --> 00:36:51,520
But the next failure mode shows up immediately.
893
00:36:51,520 --> 00:36:54,440
The agent still can't do anything predictable with that knowledge.
894
00:36:54,440 --> 00:36:58,800
It can explain a runbook, it can cite a policy, and then it stops waiting for a human to
895
00:36:58,800 --> 00:37:00,720
carry the work across the finish line.
896
00:37:00,720 --> 00:37:02,240
That's where MCP comes in.
897
00:37:02,240 --> 00:37:06,040
Most people hear model context protocol and think it's a developer convenience.
898
00:37:06,040 --> 00:37:07,320
It is not.
899
00:37:07,320 --> 00:37:12,280
In enterprise terms, MCP is a standard contract for tools, a predictable way for an agent
900
00:37:12,280 --> 00:37:15,960
to discover capabilities, call them and receive structured results.
901
00:37:15,960 --> 00:37:18,680
Especpo glue, more reusable capability.
902
00:37:18,680 --> 00:37:22,840
And that distinction matters because without a standard tool interface, every agent becomes
903
00:37:22,840 --> 00:37:24,840
a one off integration project.
904
00:37:24,840 --> 00:37:27,720
One bot talks to service now through a custom connector.
905
00:37:27,720 --> 00:37:31,320
Another talks to Gira through a different pattern, a third one hits graph with a different
906
00:37:31,320 --> 00:37:32,560
auth model.
907
00:37:32,560 --> 00:37:34,720
Over time, you're not building an agent ecosystem.
908
00:37:34,720 --> 00:37:36,560
You're building an integration junkyard.
909
00:37:36,560 --> 00:37:40,520
Okay, so basically, MCP makes tools legible to agents.
910
00:37:40,520 --> 00:37:43,120
A tool isn't just an API endpoint.
911
00:37:43,120 --> 00:37:46,400
A tool becomes name, description, parameters and expected output.
912
00:37:46,400 --> 00:37:50,040
That's what gives the agent the ability to plan and execute in a loop without you hard
913
00:37:50,040 --> 00:37:51,040
coding every branch.
914
00:37:51,040 --> 00:37:55,040
It's the difference between a human seeing a labeled button that says create ticket versus
915
00:37:55,040 --> 00:37:59,200
a human being handed a raw rest API spec and being told, figure it out.
916
00:37:59,200 --> 00:38:01,640
Now why does that translate into fast ROI?
917
00:38:01,640 --> 00:38:06,320
Because MCP shifts effort from building new agents to reusing the same small set of enterprise
918
00:38:06,320 --> 00:38:07,320
tools everywhere.
919
00:38:07,320 --> 00:38:10,000
Build one good ticket create tool.
920
00:38:10,000 --> 00:38:14,280
Use it across IT triage, HR requests, access requests and facilities.
921
00:38:14,280 --> 00:38:16,640
Build one service status lookup tool.
922
00:38:16,640 --> 00:38:18,520
Reuse it across every support experience.
923
00:38:18,520 --> 00:38:21,640
Build one KB retrieval with citations tool.
924
00:38:21,640 --> 00:38:23,400
Reuse it everywhere grounded answers matter.
925
00:38:23,400 --> 00:38:25,480
That's how you scale without multiplying chaos.
926
00:38:25,480 --> 00:38:30,520
But the uncomfortable truth is MCP also accelerates the risk you are already going to have.
927
00:38:30,520 --> 00:38:32,200
Because tools are authority.
928
00:38:32,200 --> 00:38:36,200
And when you give an agent a tool that can write, you've handed it a lever that moves production
929
00:38:36,200 --> 00:38:37,200
systems.
930
00:38:37,200 --> 00:38:40,920
Even law for MCP in the first 30 days is strict.
931
00:38:40,920 --> 00:38:42,760
Read operations first.
932
00:38:42,760 --> 00:38:44,600
Write operations gated.
933
00:38:44,600 --> 00:38:46,960
Read list get search tools are where you start.
934
00:38:46,960 --> 00:38:49,840
They increase accuracy without changing state.
935
00:38:49,840 --> 00:38:54,080
Write tools create update delete only enter the system when you've already proven
936
00:38:54,080 --> 00:38:56,640
routing stability grounding discipline and logging.
937
00:38:56,640 --> 00:38:59,240
And even then you gate them behind explicit approvals.
938
00:38:59,240 --> 00:39:00,800
This is not a philosophical stance.
939
00:39:00,800 --> 00:39:02,640
It's entropy management.
940
00:39:02,640 --> 00:39:07,460
The fastest way to create an agentic incident is to connect a right capable tool with no
941
00:39:07,460 --> 00:39:10,120
life cycle, no allow list and no rollback.
942
00:39:10,120 --> 00:39:11,960
And yes, tokens sprawl becomes real here.
943
00:39:11,960 --> 00:39:16,360
API tokens, client secrets and unmanaged credentials become shadow admin keys.
944
00:39:16,360 --> 00:39:19,680
The moment they get copied into three environments and ten agents.
945
00:39:19,680 --> 00:39:22,360
The organization forgets they exist until one expires.
946
00:39:22,360 --> 00:39:24,720
Or worse, gets reused somewhere it shouldn't.
947
00:39:24,720 --> 00:39:28,320
So MCP governance starts with a tool allow list and a denial list.
948
00:39:28,320 --> 00:39:32,800
That means only approved MCP servers and approved tools inside those servers.
949
00:39:32,800 --> 00:39:37,520
Denialist means explicitly block classes of tools you know you don't want.
950
00:39:37,520 --> 00:39:42,040
Delete operations bulk updates privilege grants anything that changes identity access or
951
00:39:42,040 --> 00:39:44,400
finance without a human boundary.
952
00:39:44,400 --> 00:39:47,360
That's governance by design not governance after cleanup.
953
00:39:47,360 --> 00:39:51,240
Now how does MCP actually turn chat into a system?
954
00:39:51,240 --> 00:39:52,840
Because it creates a closed loop.
955
00:39:52,840 --> 00:39:57,840
The agent can reason call a tool read the response validated and decide the next step.
956
00:39:57,840 --> 00:40:02,400
But loop is what makes an agent definition hold runs tools in a loop to achieve a goal
957
00:40:02,400 --> 00:40:03,400
without tools.
958
00:40:03,400 --> 00:40:06,280
The agent is a narrator with tools it becomes an operator.
959
00:40:06,280 --> 00:40:07,920
But only if the tools are predictable.
960
00:40:07,920 --> 00:40:10,280
So you keep tool descriptions short and specific.
961
00:40:10,280 --> 00:40:14,440
You don't expose 50 vaguely named operations and hope the model chooses the right one.
962
00:40:14,440 --> 00:40:16,800
That is just conditional chaos with better packaging.
963
00:40:16,800 --> 00:40:21,560
You expose a small well label tool set that matches your orchestration steps.
964
00:40:21,560 --> 00:40:27,400
Classify retrieve check status, create ticket, update ticket, notify user, request approval.
965
00:40:27,400 --> 00:40:31,400
When you instrument tool outcomes tool errors matter more than model errors because tool errors
966
00:40:31,400 --> 00:40:36,240
break workflows track which tool failed, why it failed and what the agent did next.
967
00:40:36,240 --> 00:40:40,960
If the agent retries endlessly you've built an infinite loop that burns budget and trust.
968
00:40:40,960 --> 00:40:45,560
Finally the key design choice that prevents ghost agents later shows up again.
969
00:40:45,560 --> 00:40:51,560
Prefer reusable tools over reusable agents agents are experiences tools are capabilities.
970
00:40:51,560 --> 00:40:55,880
When you standardize tools agents can stay small, domain scoped and disposable.
971
00:40:55,880 --> 00:41:00,160
And you don't every agent becomes a fragile snowflake that nobody can maintain.
972
00:41:00,160 --> 00:41:02,240
So MCP isn't the cool protocol.
973
00:41:02,240 --> 00:41:06,640
It's the enforcement layer that makes tools composable, governable and repeatable.
974
00:41:06,640 --> 00:41:10,200
Now that you have retrieval that's computable in tools that are standardized you can finally
975
00:41:10,200 --> 00:41:12,120
connect the two into a real system.
976
00:41:12,120 --> 00:41:14,480
So next it stops being theoretical.
977
00:41:14,480 --> 00:41:19,520
IT ticket triage end to end with co-pilot studio routing, Azure AI search grounding,
978
00:41:19,520 --> 00:41:23,640
MCP tools for actions and power automate doing the deterministic work.
979
00:41:23,640 --> 00:41:27,040
Demo architecture one, IT ticket triage end to end.
980
00:41:27,040 --> 00:41:31,800
Here's the end to end demo architecture that makes executives stop asking so it's chat and
981
00:41:31,800 --> 00:41:34,160
start asking when can this hit production.
982
00:41:34,160 --> 00:41:38,160
The flow is intentionally boring because boring is how enterprise systems survive.
983
00:41:38,160 --> 00:41:43,360
Start with the users issue coming in through one channel, pick teams or a portal first, don't do all of them.
984
00:41:43,360 --> 00:41:46,160
Channel sprawl is just topics sprawl with better UI.
985
00:41:46,160 --> 00:41:52,840
The user types free text, VPN died, outlook won't send, can't access SharePoint, whatever.
986
00:41:52,840 --> 00:41:55,800
The first step is intent classification in co-pilot studio.
987
00:41:55,800 --> 00:41:59,000
This is where your 10, 15 intents actually earn their keep.
988
00:41:59,000 --> 00:42:03,920
The agent selects an intent and immediately applies the containment boundary tied to that intent.
989
00:42:03,920 --> 00:42:06,520
Then the agent enriches context, not everything.
990
00:42:06,520 --> 00:42:11,520
Minimum viable context, user identity, device compliance state if relevant and service context
991
00:42:11,520 --> 00:42:12,840
like known outages.
992
00:42:12,840 --> 00:42:15,480
This enrichment should come from predictable sources.
993
00:42:15,480 --> 00:42:19,880
Entra attributes endpoint management signals and the IT SM ticket history.
994
00:42:19,880 --> 00:42:23,920
If the organization doesn't have those sources well defined, the demo still works.
995
00:42:23,920 --> 00:42:28,200
It just surfaces the real constraint your operations are not computable yet.
996
00:42:28,200 --> 00:42:32,760
Now the fork in the road, resolve root or create, resolve means the agent can safely close
997
00:42:32,760 --> 00:42:33,760
the loop.
998
00:42:33,760 --> 00:42:36,400
This is where the grounded knowledge path triggers.
999
00:42:36,400 --> 00:42:39,880
It retrieves a runbook or known issue article and answers with citations.
1000
00:42:39,880 --> 00:42:44,320
If the fix requires a deterministic step, like triggering a password reset workflow,
1001
00:42:44,320 --> 00:42:46,400
that step should not be done by reasoning.
1002
00:42:46,400 --> 00:42:50,480
It should be done by a tool called in this demo, Power Automate handles that deterministic
1003
00:42:50,480 --> 00:42:51,480
execution.
1004
00:42:51,480 --> 00:42:55,600
The agent proposes the action, confirms with the user at the right boundary, then Power
1005
00:42:55,600 --> 00:42:57,640
Automate executes and logs.
1006
00:42:57,640 --> 00:43:00,800
Root means the agent can't solve, but it can root cleanly.
1007
00:43:00,800 --> 00:43:02,280
The output isn't a transcript.
1008
00:43:02,280 --> 00:43:08,000
It's a payload, intent, impacted service, urgency, device state, and what was already attempted.
1009
00:43:08,000 --> 00:43:11,280
That payload becomes a ticket description plus structured fields.
1010
00:43:11,280 --> 00:43:13,400
The human gets a case file, not a chat log.
1011
00:43:13,400 --> 00:43:14,680
That's what reduces SLA.
1012
00:43:14,680 --> 00:43:15,960
The human doesn't retry out.
1013
00:43:15,960 --> 00:43:17,880
They start where the agent left off.
1014
00:43:17,880 --> 00:43:20,480
Create means you must open a ticket no matter what.
1015
00:43:20,480 --> 00:43:23,960
Policy requires it, access requires it, or the user insists.
1016
00:43:23,960 --> 00:43:28,320
The agent still adds value by making the ticket structured and pre-classified.
1017
00:43:28,320 --> 00:43:32,000
That's the difference between we deployed co-pilot and we reduced backlog.
1018
00:43:32,000 --> 00:43:36,240
Now where each product fits, co-pilot studio drives the orchestration and routing.
1019
00:43:36,240 --> 00:43:37,520
Topics are just the front door.
1020
00:43:37,520 --> 00:43:39,480
The core logic is the decision loop.
1021
00:43:39,480 --> 00:43:44,160
Classify, enrich, retrieve, propose, confirm, execute, verify, hand off.
1022
00:43:44,160 --> 00:43:45,760
Power Automate is the system's muscle.
1023
00:43:45,760 --> 00:43:49,480
It handles the deterministic steps you never want the model inventing.
1024
00:43:49,480 --> 00:43:55,800
Create ticket, update ticket, assign queue, notify user, write to audit log, post to teams,
1025
00:43:55,800 --> 00:43:56,880
and trigger approvals.
1026
00:43:56,880 --> 00:44:00,000
If it's an if this then that action, power automate does it.
1027
00:44:00,000 --> 00:44:02,920
The agent decides when to call it, not how to rewrite it.
1028
00:44:02,920 --> 00:44:08,400
The ITSM backend, service now, Gira service management, whatever, remains the system of record.
1029
00:44:08,400 --> 00:44:09,400
That matters.
1030
00:44:09,400 --> 00:44:11,160
You are not replacing ITSM.
1031
00:44:11,160 --> 00:44:14,920
You're improving the front end decision quality and reducing the human time spent on intake,
1032
00:44:14,920 --> 00:44:16,400
classification, and backend fourth.
1033
00:44:16,400 --> 00:44:20,480
If someone tries to rebuild ITSM with agents, the demo should fail on purpose.
1034
00:44:20,480 --> 00:44:21,720
Because that's how programs die.
1035
00:44:21,720 --> 00:44:23,400
Now layer in MCP where it's useful.
1036
00:44:23,400 --> 00:44:28,840
In the demo, MCP provides standardized tools to interact with ITSM and status sources.
1037
00:44:28,840 --> 00:44:29,840
Retools first.
1038
00:44:29,840 --> 00:44:31,640
List open incidents for this user.
1039
00:44:31,640 --> 00:44:33,080
Check service status.
1040
00:44:33,080 --> 00:44:35,480
Retrieve ticket templates, fetch routing groups.
1041
00:44:35,480 --> 00:44:38,880
Write tools only where you've already defined approval and rollback.
1042
00:44:38,880 --> 00:44:41,600
Create ticket, update ticket, request access.
1043
00:44:41,600 --> 00:44:44,200
The point is tool predictability, not novelty.
1044
00:44:44,200 --> 00:44:46,120
These criteria are not subjective.
1045
00:44:46,120 --> 00:44:47,120
Containment rate.
1046
00:44:47,120 --> 00:44:50,080
How many interactions resolved without ticket creation?
1047
00:44:50,080 --> 00:44:55,120
That's deflection.resolution time for the interactions the agent touches does cycle time drop.
1048
00:44:55,120 --> 00:44:57,800
That's SLA impact escalation reduction.
1049
00:44:57,800 --> 00:45:00,600
Fewer wrong handoffs, fewer ping pong assignments.
1050
00:45:00,600 --> 00:45:02,120
That's operational stability.
1051
00:45:02,120 --> 00:45:03,440
And you instrument it.
1052
00:45:03,440 --> 00:45:07,920
Every run logs detected intent, confidence, retrieved sources, tools invoked, execution
1053
00:45:07,920 --> 00:45:10,760
status, escalation reason, and final outcome.
1054
00:45:10,760 --> 00:45:12,720
If you can't answer why did it do that?
1055
00:45:12,720 --> 00:45:13,720
You don't have an agent.
1056
00:45:13,720 --> 00:45:14,840
You have a magic trick.
1057
00:45:14,840 --> 00:45:18,760
The best part of this demo is you can run it with a small pilot group in week two and
1058
00:45:18,760 --> 00:45:20,160
you can improve it daily.
1059
00:45:20,160 --> 00:45:24,680
You'll discover missing knowledge coverage, ambiguous intents and tool errors immediately.
1060
00:45:24,680 --> 00:45:25,680
That's not failure.
1061
00:45:25,680 --> 00:45:27,560
That's the system finally telling the truth.
1062
00:45:27,560 --> 00:45:28,560
And here's the transition.
1063
00:45:28,560 --> 00:45:32,320
Once you can triage end to end, the next demo isn't about tickets.
1064
00:45:32,320 --> 00:45:34,040
It's about trust.
1065
00:45:34,040 --> 00:45:37,440
Grounded policy answers with evidence or nothing.
1066
00:45:37,440 --> 00:45:39,120
Demo architecture two.
1067
00:45:39,120 --> 00:45:40,920
Grounded policy answers with evidence.
1068
00:45:40,920 --> 00:45:43,000
The second demo exists for one reason.
1069
00:45:43,000 --> 00:45:46,400
Policy answers are where hallucinations become career limiting.
1070
00:45:46,400 --> 00:45:50,400
IT and HR questions feel harmless until an agent confidently tells someone they're allowed
1071
00:45:50,400 --> 00:45:52,840
to do something they're explicitly not allowed to do.
1072
00:45:52,840 --> 00:45:54,280
Or it cites the wrong clause.
1073
00:45:54,280 --> 00:45:59,000
Or it mixes two versions of the same policy and produces a third policy that never existed.
1074
00:45:59,000 --> 00:46:00,880
Nobody audits the chat transcript for tone.
1075
00:46:00,880 --> 00:46:01,880
They audit it for harm.
1076
00:46:01,880 --> 00:46:05,520
So this demo is built around the highest risk question types.
1077
00:46:05,520 --> 00:46:08,960
SOPs, runbooks, compliance rules, and internal policy.
1078
00:46:08,960 --> 00:46:12,240
The staff people ask in a hurry, copy into emails, and then treat as truth.
1079
00:46:12,240 --> 00:46:14,280
The architecture is intentionally strict.
1080
00:46:14,280 --> 00:46:17,800
User asks, can contractors store customer data in one drive?
1081
00:46:17,800 --> 00:46:21,360
Or what's the process for requesting elevated access?
1082
00:46:21,360 --> 00:46:26,360
Or are we allowed to use personal devices for M365?
1083
00:46:26,360 --> 00:46:28,040
The agent's first behavior is not to answer.
1084
00:46:28,040 --> 00:46:30,880
It's to classify the question type and set the response mode.
1085
00:46:30,880 --> 00:46:34,240
If it's policy or compliance, the agent goes into evidence mode.
1086
00:46:34,240 --> 00:46:37,560
Retrieve first, site always, and refuse to speculate.
1087
00:46:37,560 --> 00:46:41,440
Now the backbone, as your AI search as the source of truth, not SharePoint search,
1088
00:46:41,440 --> 00:46:43,240
not I think I saw a doc.
1089
00:46:43,240 --> 00:46:48,280
As your AI search index, designed for policy retrieval, chunked by atomic sections,
1090
00:46:48,280 --> 00:46:53,680
tagged with metadata like policy domain, audience, region, and last review date,
1091
00:46:53,680 --> 00:46:57,960
and security trimmed so the user only retrieves what they're allowed to read.
1092
00:46:57,960 --> 00:47:01,040
The retrieval step pulls the top chunks that match the question,
1093
00:47:01,040 --> 00:47:02,680
but the key design constraint is this.
1094
00:47:02,680 --> 00:47:05,200
The agent can only answer from retrieved content.
1095
00:47:05,200 --> 00:47:07,400
It can paraphrase, it can compress, it can explain.
1096
00:47:07,400 --> 00:47:10,280
It cannot invent, that's how you get audit-ready responses.
1097
00:47:10,280 --> 00:47:13,440
The output format is also strict because format is a control mechanism.
1098
00:47:13,440 --> 00:47:17,400
The response is short answer, source, next action, escalation path.
1099
00:47:17,400 --> 00:47:19,200
Short answer means one paragraph max.
1100
00:47:19,200 --> 00:47:23,080
If the agent needs five paragraphs, it doesn't understand the policy boundary well enough,
1101
00:47:23,080 --> 00:47:24,880
or the policy itself is ambiguous.
1102
00:47:24,880 --> 00:47:28,960
Either way, long answers are where the model starts, helping.
1103
00:47:28,960 --> 00:47:33,640
Source means the exact document in section with a link if your environment allows it.
1104
00:47:33,640 --> 00:47:37,520
If there are multiple sources, the agent lists them explicitly and calls out the conflict.
1105
00:47:37,520 --> 00:47:39,040
It does not blend them.
1106
00:47:39,040 --> 00:47:42,480
Next action means the operational step the user should take.
1107
00:47:42,480 --> 00:47:44,320
Submit request via this form.
1108
00:47:44,320 --> 00:47:49,240
Open a ticket in this category, use this approved storage location, or escalate to security
1109
00:47:49,240 --> 00:47:50,240
ops.
1110
00:47:50,240 --> 00:47:52,200
Policies without next actions don't reduce work.
1111
00:47:52,200 --> 00:47:54,040
They create more meetings.
1112
00:47:54,040 --> 00:47:56,440
Escalation path is the final safety valve.
1113
00:47:56,440 --> 00:48:01,360
If the question involves exceptions, regulatory jurisdiction, or privileged access the agent
1114
00:48:01,360 --> 00:48:04,760
routes, it doesn't negotiate policy exceptions in chat.
1115
00:48:04,760 --> 00:48:09,280
Now the inevitable reality, policy conflicts, you will have them every enterprise does.
1116
00:48:09,280 --> 00:48:14,360
Two docs disagree, one is newer, one is unofficial, one has the right title but the wrong audience.
1117
00:48:14,360 --> 00:48:18,360
The system needs a conflict strategy that doesn't rely on trust the model.
1118
00:48:18,360 --> 00:48:20,720
So the demo includes conflict handling rules.
1119
00:48:20,720 --> 00:48:25,280
If two policies conflict, the agent answers conflict detected, sides both, and escalates
1120
00:48:25,280 --> 00:48:26,520
to the policy owner.
1121
00:48:26,520 --> 00:48:31,480
That escalation payload includes the retrieved chunks and the reason the system flagged ambiguity,
1122
00:48:31,480 --> 00:48:33,400
you don't hide the mess, you surface it.
1123
00:48:33,400 --> 00:48:38,000
If the doc is outdated, the agent says it's outdated, sides the last reviewed date, and escalates.
1124
00:48:38,000 --> 00:48:41,760
Stale truth is not truth and if the index doesn't contain the answer, the agent refuses,
1125
00:48:41,760 --> 00:48:42,760
no source, no answer.
1126
00:48:42,760 --> 00:48:44,880
The refusal is not, I'm sorry.
1127
00:48:44,880 --> 00:48:46,160
It's operational.
1128
00:48:46,160 --> 00:48:48,280
I can't find an approved source.
1129
00:48:48,280 --> 00:48:51,760
I can open a ticket to the policy owner and include your question.
1130
00:48:51,760 --> 00:48:55,080
Now measuring accuracy because this is where most teams lie to themselves.
1131
00:48:55,080 --> 00:48:59,640
You build an evaluator set, a fixed list of real policy questions that matter.
1132
00:48:59,640 --> 00:49:04,640
Then you score failures with a taxonomy missing doc wrong doc wrong inference stale doc or conflict.
1133
00:49:04,640 --> 00:49:08,240
That taxonomy matters because each failure has a different fix.
1134
00:49:08,240 --> 00:49:12,520
Missing doc is content work, wrong doc is metadata or chunking, wrong inference is response
1135
00:49:12,520 --> 00:49:13,520
constraints.
1136
00:49:13,520 --> 00:49:17,440
Stale doc is lifecycle governance and the demo closes with the proof point your stakeholders
1137
00:49:17,440 --> 00:49:19,000
actually care about.
1138
00:49:19,000 --> 00:49:20,720
You can show the evidence trail.
1139
00:49:20,720 --> 00:49:23,560
Every answer has citations, every escalation has a reason.
1140
00:49:23,560 --> 00:49:27,800
Every I don't know becomes a backlog item to improve knowledge coverage.
1141
00:49:27,800 --> 00:49:31,640
That's the difference between co pilot answer the question and the organization can trust
1142
00:49:31,640 --> 00:49:33,040
what it answered.
1143
00:49:33,040 --> 00:49:37,080
Demo architecture three approvals via teams adaptive cards.
1144
00:49:37,080 --> 00:49:41,720
This third demo is where the agent work force stops sounding like marketing and starts looking
1145
00:49:41,720 --> 00:49:46,320
like a control system because approvals are the moment an agent either earns trust or
1146
00:49:46,320 --> 00:49:47,880
becomes a liability.
1147
00:49:47,880 --> 00:49:52,120
Most organizations try to handle risk with a generic human in the loop rule.
1148
00:49:52,120 --> 00:49:53,520
Someone should approve it.
1149
00:49:53,520 --> 00:49:55,640
It's not control that's delay.
1150
00:49:55,640 --> 00:49:57,560
A real enterprise pattern is tighter.
1151
00:49:57,560 --> 00:50:02,320
The agent runs everything it can deterministically then it poses only a decision boundaries that
1152
00:50:02,320 --> 00:50:05,280
are irreversible, privileged or ordered relevant.
1153
00:50:05,280 --> 00:50:08,240
That boundary becomes visible through adaptive cards and teams.
1154
00:50:08,240 --> 00:50:12,160
And yes teams is the right place for this not because it's trendy but because it's where
1155
00:50:12,160 --> 00:50:14,600
approvals already happen in the real world.
1156
00:50:14,600 --> 00:50:16,800
Managers, service owners, security reviewers.
1157
00:50:16,800 --> 00:50:18,200
They don't want to read paragraphs.
1158
00:50:18,200 --> 00:50:20,200
They want to click a decision and move on.
1159
00:50:20,200 --> 00:50:21,960
So the demo flow starts with detection.
1160
00:50:21,960 --> 00:50:25,840
The user asks for something that crosses a risk threshold, grant access to this share
1161
00:50:25,840 --> 00:50:27,240
point side.
1162
00:50:27,240 --> 00:50:28,240
Approve an exception.
1163
00:50:28,240 --> 00:50:30,200
Reset MFA for a user.
1164
00:50:30,200 --> 00:50:32,040
Create a mailbox delegation.
1165
00:50:32,040 --> 00:50:33,480
Approve a spend request.
1166
00:50:33,480 --> 00:50:34,840
Approve a change.
1167
00:50:34,840 --> 00:50:36,960
It doesn't matter which domain you pick.
1168
00:50:36,960 --> 00:50:39,200
What matters is the shape of the workflow.
1169
00:50:39,200 --> 00:50:41,880
Request, validate, approve, execute, log.
1170
00:50:41,880 --> 00:50:45,280
Step one, the agent classifies the intent and evaluates risk.
1171
00:50:45,280 --> 00:50:46,280
This is not a vibe check.
1172
00:50:46,280 --> 00:50:47,280
It's a rules check.
1173
00:50:47,280 --> 00:50:51,200
If the intent maps to a right action or touches sensitive data or changes permissions,
1174
00:50:51,200 --> 00:50:52,960
the agent flips into approval mode.
1175
00:50:52,960 --> 00:50:56,720
It gathers the minimum context required to make the decision reviewable.
1176
00:50:56,720 --> 00:50:58,320
Who is requesting?
1177
00:50:58,320 --> 00:50:59,840
What resource is being changed?
1178
00:50:59,840 --> 00:51:00,840
Why?
1179
00:51:00,840 --> 00:51:01,840
For how long?
1180
00:51:01,840 --> 00:51:02,840
And what policy applies?
1181
00:51:02,840 --> 00:51:07,040
And it retrieves the relevant policy source because approvals without policy context
1182
00:51:07,040 --> 00:51:09,800
just become whoever clicks first wins.
1183
00:51:09,800 --> 00:51:12,360
Step two, the agent generates the approval payload.
1184
00:51:12,360 --> 00:51:14,560
This is a structured object, not a narrative.
1185
00:51:14,560 --> 00:51:19,000
Request our target resource, requested action, scope, duration, justification, evidence
1186
00:51:19,000 --> 00:51:22,840
link and the downstream action that will execute if approved.
1187
00:51:22,840 --> 00:51:25,840
The agent also includes a deny reason required field.
1188
00:51:25,840 --> 00:51:28,080
Denials without reasons create shadow workflows.
1189
00:51:28,080 --> 00:51:30,360
People just resubmit until someone approves.
1190
00:51:30,360 --> 00:51:31,680
Entropy wins.
1191
00:51:31,680 --> 00:51:34,880
Step three, teams adaptive card is posted to the approver.
1192
00:51:34,880 --> 00:51:35,880
The card format matters.
1193
00:51:35,880 --> 00:51:39,600
It should be short enough that the approver can decide in 10 seconds, but complete enough
1194
00:51:39,600 --> 00:51:41,360
that they don't need a follow-up meeting.
1195
00:51:41,360 --> 00:51:46,160
So the card contains the action summary in one line, the justification in one sentence,
1196
00:51:46,160 --> 00:51:48,920
the policy citation as a link, and three buttons.
1197
00:51:48,920 --> 00:51:51,720
Approved, deny and request more info.
1198
00:51:51,720 --> 00:51:53,480
That third button is not politeness.
1199
00:51:53,480 --> 00:51:58,600
It is how you prevent deny from becoming the default because the approver lacked context.
1200
00:51:58,600 --> 00:52:00,960
Now the important part, the card is not just the UI.
1201
00:52:00,960 --> 00:52:05,160
It is the enforcement mechanism because clicking approved triggers a deterministic workflow
1202
00:52:05,160 --> 00:52:07,280
that is version controlled and logged.
1203
00:52:07,280 --> 00:52:09,400
This is where Power Automate does the heavy lifting.
1204
00:52:09,400 --> 00:52:13,000
It captures the approval decision, stamps who approved when they approved the exact
1205
00:52:13,000 --> 00:52:17,160
payload they approved and then executes the right action through the approved toolpath.
1206
00:52:17,160 --> 00:52:18,280
No ad hoc calls.
1207
00:52:18,280 --> 00:52:20,760
No agent decided to improvise.
1208
00:52:20,760 --> 00:52:25,200
If the approval is denied, the workflow logs the reason and routes it back to the requester
1209
00:52:25,200 --> 00:52:26,360
with the next step.
1210
00:52:26,360 --> 00:52:29,400
What to change, who to contact or what policy blocked it.
1211
00:52:29,400 --> 00:52:32,520
That reduces rework and stops the agent from becoming a dead end.
1212
00:52:32,520 --> 00:52:37,160
If the approver requests more info, the agent re-engages the requester with one targeted question,
1213
00:52:37,160 --> 00:52:39,360
updates the payload and re-issues the card.
1214
00:52:39,360 --> 00:52:40,360
No long chat.
1215
00:52:40,360 --> 00:52:42,040
Just fill the missing field and continue.
1216
00:52:42,040 --> 00:52:45,480
Now the guardrails, you need a risk threshold model, not complex.
1217
00:52:45,480 --> 00:52:46,880
Three tiers.
1218
00:52:46,880 --> 00:52:47,880
No risk.
1219
00:52:47,880 --> 00:52:52,160
Read only actions, status checks, knowledge retrieval, no approvals.
1220
00:52:52,160 --> 00:52:53,160
Medium risk.
1221
00:52:53,160 --> 00:52:56,800
Write actions that are reversible and low impact, like creating a ticket or updating a
1222
00:52:56,800 --> 00:52:58,600
non-sensitive field.
1223
00:52:58,600 --> 00:52:59,600
Optional approvals.
1224
00:52:59,600 --> 00:53:00,600
High risk.
1225
00:53:00,600 --> 00:53:01,600
Access changes.
1226
00:53:01,600 --> 00:53:02,600
Privileged changes.
1227
00:53:02,600 --> 00:53:03,600
Financial actions.
1228
00:53:03,600 --> 00:53:05,000
External communication.
1229
00:53:05,000 --> 00:53:07,160
Or anything that changes identity.
1230
00:53:07,160 --> 00:53:08,160
Mandatory approvals.
1231
00:53:08,160 --> 00:53:10,840
And you enforce this in orchestration, not in documentation.
1232
00:53:10,840 --> 00:53:12,600
The operational effect is immediate.
1233
00:53:12,600 --> 00:53:16,200
Cycle time drops because the agent does the prep work and routes the decision to the correct
1234
00:53:16,200 --> 00:53:17,680
human with the right context.
1235
00:53:17,680 --> 00:53:22,640
Risk drops because right actions only execute after an explicit, log decision.
1236
00:53:22,640 --> 00:53:25,760
And adoption rises because users see outcomes, not chat.
1237
00:53:25,760 --> 00:53:28,160
This demo also closes an early open loop.
1238
00:53:28,160 --> 00:53:32,560
The design choice that prevents ghost agents later is the same choice that makes approvals
1239
00:53:32,560 --> 00:53:33,560
work.
1240
00:53:33,560 --> 00:53:35,080
Put decision boundaries in the system.
1241
00:53:35,080 --> 00:53:36,080
Not in the slide deck.
1242
00:53:36,080 --> 00:53:37,880
Agents Brawl is predictable.
1243
00:53:37,880 --> 00:53:38,880
Design for it upfront.
1244
00:53:38,880 --> 00:53:40,880
Agents Brawl isn't a later problem.
1245
00:53:40,880 --> 00:53:45,760
It's the default outcome of giving people a new capability with no enforced boundary.
1246
00:53:45,760 --> 00:53:48,120
Most organizations think the risk is too few agents.
1247
00:53:48,120 --> 00:53:51,440
The real risk is too many because users don't want 50 agents.
1248
00:53:51,440 --> 00:53:55,200
They want five that work every time in the same places they already work.
1249
00:53:55,200 --> 00:53:58,080
When the ecosystem turns into a catalog, nobody understands.
1250
00:53:58,080 --> 00:53:59,440
Adoption doesn't fail loudly.
1251
00:53:59,440 --> 00:54:00,520
It just decays.
1252
00:54:00,520 --> 00:54:04,240
People go back to emailing the service desk and forwarding PDFs because it's faster than
1253
00:54:04,240 --> 00:54:06,600
deciding which agent might be correct.
1254
00:54:06,600 --> 00:54:09,200
And sprawl creates a second quieter problem.
1255
00:54:09,200 --> 00:54:10,200
Ghost agents.
1256
00:54:10,200 --> 00:54:14,280
Ghost agents are agents that nobody uses, nobody owns, and nobody remembers.
1257
00:54:14,280 --> 00:54:19,160
But they still exist, still connect to knowledge, still have two permissions, and still generate
1258
00:54:19,160 --> 00:54:20,160
audit surface.
1259
00:54:20,160 --> 00:54:22,360
They're the perfect entropy generators.
1260
00:54:22,360 --> 00:54:24,960
Invisible most days, catastrophic on the wrong day.
1261
00:54:24,960 --> 00:54:29,200
So if leadership wants scale without liability, the program has to treat sprawl as inevitable
1262
00:54:29,200 --> 00:54:31,200
and designed for it upfront.
1263
00:54:31,200 --> 00:54:32,840
Start with the uncomfortable truth.
1264
00:54:32,840 --> 00:54:35,720
Self-service agent creation is not democratization.
1265
00:54:35,720 --> 00:54:37,840
It is distributed risk creation.
1266
00:54:37,840 --> 00:54:42,600
And the platform will happily allow it because platforms optimize for capability, not
1267
00:54:42,600 --> 00:54:44,080
for your audit findings.
1268
00:54:44,080 --> 00:54:45,600
Distinction matters.
1269
00:54:45,600 --> 00:54:49,720
So the first control is a catalog discipline that feels boring and saves the program.
1270
00:54:49,720 --> 00:54:53,760
Every agent needs a name that describes an outcome, an owner that signs the outcome, a
1271
00:54:53,760 --> 00:54:58,400
business sponsor that signs the risk and a life cycle state that reflects reality.
1272
00:54:58,400 --> 00:55:01,080
Pilot, active, deprecated, retired.
1273
00:55:01,080 --> 00:55:04,040
Not V1 and V2 and test final final.
1274
00:55:04,040 --> 00:55:05,120
Life cycle state isn't a label.
1275
00:55:05,120 --> 00:55:06,280
It's a policy trigger.
1276
00:55:06,280 --> 00:55:08,320
Pilot agents can't publish broadly.
1277
00:55:08,320 --> 00:55:10,160
Deprecated agents can't receive new users.
1278
00:55:10,160 --> 00:55:12,280
Retired agents lose tool access and knowledge bindings.
1279
00:55:12,280 --> 00:55:15,920
If those transitions don't remove capability, they aren't life cycle states.
1280
00:55:15,920 --> 00:55:17,840
They are stickers.
1281
00:55:17,840 --> 00:55:21,320
Next adopt a reuse strategy that prevents the agent for everything pattern.
1282
00:55:21,320 --> 00:55:23,120
The right reuse unit isn't the agent.
1283
00:55:23,120 --> 00:55:25,480
It's the tool.
1284
00:55:25,480 --> 00:55:26,640
Agents are experiences.
1285
00:55:26,640 --> 00:55:28,080
Tools are capabilities.
1286
00:55:28,080 --> 00:55:33,920
If the organization standardizes tools, ticket creation, status checks, identity lookup,
1287
00:55:33,920 --> 00:55:39,280
KB retrieval, teams can build small, scoped agents without reinventing integrations.
1288
00:55:39,280 --> 00:55:43,000
If the organization tries to standardize agents, it builds brittle monoliths that nobody
1289
00:55:43,000 --> 00:55:44,800
trusts and everybody forks.
1290
00:55:44,800 --> 00:55:46,880
Forking is how sprawl becomes permanent.
1291
00:55:46,880 --> 00:55:51,840
So the rule is, prefer reusable MCPN points and flows over reusable agents.
1292
00:55:51,840 --> 00:55:52,840
Build once.
1293
00:55:52,840 --> 00:55:53,920
Reuse everywhere.
1294
00:55:53,920 --> 00:55:55,360
Then control publishing.
1295
00:55:55,360 --> 00:55:58,280
This is the simplest governance posture that still allows innovation.
1296
00:55:58,280 --> 00:56:00,120
You can build but you can't publish.
1297
00:56:00,120 --> 00:56:02,240
People can prototype in a constrained environment.
1298
00:56:02,240 --> 00:56:03,520
They can test with themselves.
1299
00:56:03,520 --> 00:56:05,400
They can even share inside a small group.
1300
00:56:05,400 --> 00:56:09,240
But the moment an agent becomes enterprise facing, it enters a publishing world.
1301
00:56:09,240 --> 00:56:16,120
Workflow with review gates, data sources, tools, write permissions, grounding strategy, telemetry
1302
00:56:16,120 --> 00:56:17,920
and owner assignment.
1303
00:56:17,920 --> 00:56:20,720
If you don't separate build from publish, you'll never catch sprawl.
1304
00:56:20,720 --> 00:56:25,480
You'll only discover it when the CEO asks why there are three VPN helpers and none of them
1305
00:56:25,480 --> 00:56:26,480
agree.
1306
00:56:26,480 --> 00:56:28,640
Now define the retirement policy before you need it.
1307
00:56:28,640 --> 00:56:30,680
This is where programs usually lie to themselves.
1308
00:56:30,680 --> 00:56:32,080
We'll clean it up later.
1309
00:56:32,080 --> 00:56:34,120
They won't.
1310
00:56:34,120 --> 00:56:35,720
Retirement needs automatic triggers.
1311
00:56:35,720 --> 00:56:40,120
No usage over a time window, high escalation rates repeated incorrect routing, missing
1312
00:56:40,120 --> 00:56:44,000
owner, missing sponsor or tool access that no longer matches the approved pattern.
1313
00:56:44,000 --> 00:56:49,120
When a trigger hits, the owner gets a review task, improve, merge or retire.
1314
00:56:49,120 --> 00:56:50,840
And retirement must be real.
1315
00:56:50,840 --> 00:56:55,360
Remove it from discovery, revoke tool permissions, archive logs and keep the audit record.
1316
00:56:55,360 --> 00:56:58,680
And if you only hide the agent but leave the permissions, you didn't retire it.
1317
00:56:58,680 --> 00:57:00,120
You just made it harder to notice.
1318
00:57:00,120 --> 00:57:04,360
Finally, the design choice that prevents topic sprawl from turning into agents sprawl,
1319
00:57:04,360 --> 00:57:06,720
create intents as an enterprise registry.
1320
00:57:06,720 --> 00:57:08,760
One intent taxonomy shared across agents.
1321
00:57:08,760 --> 00:57:12,880
If each team invents its own intent set, the ecosystem fragments immediately.
1322
00:57:12,880 --> 00:57:17,240
Password reset becomes account unlock, becomes login problem, becomes access issue.
1323
00:57:17,240 --> 00:57:19,760
And now routing quality collapses across every agent.
1324
00:57:19,760 --> 00:57:21,680
The user doesn't care which team built it.
1325
00:57:21,680 --> 00:57:23,840
They care that the system behaves consistently.
1326
00:57:23,840 --> 00:57:27,320
So make intent to shared asset make tools reusable and make publishing gated.
1327
00:57:27,320 --> 00:57:29,320
That's how you get scale without panic.
1328
00:57:29,320 --> 00:57:31,840
Because the platform will not stop you from creating entropy.
1329
00:57:31,840 --> 00:57:35,360
You have to enter agent ID identity for nonhumans.
1330
00:57:35,360 --> 00:57:40,720
Once agents sprawl becomes visible, most organizations reach for governance as paperwork, a spreadsheet,
1331
00:57:40,720 --> 00:57:44,200
a review meeting, a naming convention and a promise to be careful.
1332
00:57:44,200 --> 00:57:45,360
That is not governance.
1333
00:57:45,360 --> 00:57:47,080
That's documentation of drift.
1334
00:57:47,080 --> 00:57:51,280
The only governance that holds in an enterprise is enforced identity because identity is the
1335
00:57:51,280 --> 00:57:56,120
anchor point for permissions, conditional access, audit, trails and incident response.
1336
00:57:56,120 --> 00:58:00,880
Without it, your agent ecosystem is just anonymous tool chains acting on real systems.
1337
00:58:00,880 --> 00:58:03,640
So enter agent ID matters for one blunt reason.
1338
00:58:03,640 --> 00:58:06,280
Agents become actors, actors need identities.
1339
00:58:06,280 --> 00:58:10,200
In intra terms, an agent identity is the non-human principle that represents the agent
1340
00:58:10,200 --> 00:58:12,480
when it touches data or invokes tools.
1341
00:58:12,480 --> 00:58:15,560
It's the thing that answers the question auditors always ask.
1342
00:58:15,560 --> 00:58:17,560
And your team's always struggle to answer.
1343
00:58:17,560 --> 00:58:18,560
Who did what?
1344
00:58:18,560 --> 00:58:21,960
When using what permissions and under which policy controls?
1345
00:58:21,960 --> 00:58:24,200
If the answer is the agent did it, you don't have an answer.
1346
00:58:24,200 --> 00:58:25,200
You have a story.
1347
00:58:25,200 --> 00:58:27,560
Entra agent ID turns that story into a record.
1348
00:58:27,560 --> 00:58:30,320
Now here's the part most organizations miss.
1349
00:58:30,320 --> 00:58:33,160
Agent ID isn't about whether a user can chat with the agent.
1350
00:58:33,160 --> 00:58:35,440
That surface access.
1351
00:58:35,440 --> 00:58:38,600
Agent ID is about what the agent can do once it's engaged.
1352
00:58:38,600 --> 00:58:43,120
What data it can read, what systems it can call and what actions it can execute.
1353
00:58:43,120 --> 00:58:47,280
That distinction matters because most early rollouts over focus on user access lists and
1354
00:58:47,280 --> 00:58:49,560
under focus on agent capability boundaries.
1355
00:58:49,560 --> 00:58:52,120
Over time, user access stays roughly stable.
1356
00:58:52,120 --> 00:58:54,000
Agent capabilities always expand.
1357
00:58:54,000 --> 00:58:57,680
They expand because someone adds just one more connector, just one more tool, just one
1358
00:58:57,680 --> 00:59:00,560
more right action and those exceptions accumulate.
1359
00:59:00,560 --> 00:59:02,200
Entropy always wins through exceptions.
1360
00:59:02,200 --> 00:59:06,000
So the first architectural law for agent ID is least privileged by default.
1361
00:59:06,000 --> 00:59:07,920
Not least privileged for users.
1362
00:59:07,920 --> 00:59:09,680
Least privileged for the agent as an actor.
1363
00:59:09,680 --> 00:59:13,720
That means the agent identity gets only the permissions needed for its defined outcome
1364
00:59:13,720 --> 00:59:16,600
in its defined scope, in its defined environment.
1365
00:59:16,600 --> 00:59:20,720
If the agent triages it issues, it does not need permissions to create accounts.
1366
00:59:20,720 --> 00:59:24,200
If it answers policy questions, it does not need permissions to grant access.
1367
00:59:24,200 --> 00:59:29,040
If it can read device compliance state, it does not need the ability to update device configuration.
1368
00:59:29,040 --> 00:59:33,200
Read and write must stay separate identities or at least separate permission sets because
1369
00:59:33,200 --> 00:59:36,200
write permissions are how experiments become incidents.
1370
00:59:36,200 --> 00:59:39,120
Next, conditional access becomes your control surface.
1371
00:59:39,120 --> 00:59:42,000
Most people treat conditional access as user security.
1372
00:59:42,000 --> 00:59:46,440
In architectural terms, conditional access is an execution policy layer for identities.
1373
00:59:46,440 --> 00:59:51,480
If the agent identity exists, you can constrain when and where that identity can be used.
1374
00:59:51,480 --> 00:59:56,280
To comply and devices, restrict to trusted locations, restrict session behavior,
1375
00:59:56,280 --> 00:59:59,040
and block high-risk sign-in conditions.
1376
00:59:59,040 --> 01:00:02,200
This is how you stop toolchains from becoming shadow admins.
1377
01:00:02,200 --> 01:00:04,920
And you don't wait until after the first scary incident.
1378
01:00:04,920 --> 01:00:08,080
You design the baseline policy the same way you design the agent.
1379
01:00:08,080 --> 01:00:10,080
Start restrictive then loosen with evidence.
1380
01:00:10,080 --> 01:00:14,040
Now the thing leadership actually cares about isn't the identity object, it's the audit
1381
01:00:14,040 --> 01:00:15,040
anchor.
1382
01:00:15,040 --> 01:00:19,120
Agent ID gives you a stable principle that shows up in logs across the stack.
1383
01:00:19,120 --> 01:00:24,720
A sign-in activity where applicable, power platform activity, tool invocation, approvals,
1384
01:00:24,720 --> 01:00:27,360
ticket creation, and downstream system changes.
1385
01:00:27,360 --> 01:00:31,160
When something goes wrong, you can trace the chain without that you're debugging a ghost.
1386
01:00:31,160 --> 01:00:33,920
And this is where the escalation model becomes real.
1387
01:00:33,920 --> 01:00:37,240
Privileged actions require stronger boundaries than normal actions.
1388
01:00:37,240 --> 01:00:39,840
So your orchestration needs a privileged tier model.
1389
01:00:39,840 --> 01:00:42,640
Standard actions run under the base agent identity.
1390
01:00:42,640 --> 01:00:47,600
Privileged actions either require step-up authentication, explicit approvals, or a separate
1391
01:00:47,600 --> 01:00:52,320
privileged agent identity that can only be used through an approval workflow.
1392
01:00:52,320 --> 01:00:56,840
You don't let the same agent casually move from reading a runbook to changing access
1393
01:00:56,840 --> 01:00:57,840
controls.
1394
01:00:57,840 --> 01:00:59,160
That's not autonomy.
1395
01:00:59,160 --> 01:01:01,920
That's uncontrolled privilege drift.
1396
01:01:01,920 --> 01:01:06,120
Finally, agent ID is how you make ownership enforceable.
1397
01:01:06,120 --> 01:01:09,680
Earlier the system required every agent to have an owner and a sponsor.
1398
01:01:09,680 --> 01:01:12,440
Agent ID ties that to an identity life cycle.
1399
01:01:12,440 --> 01:01:16,840
When the owner changes roles, the sponsor changes, or the agent gets deprecated, you can
1400
01:01:16,840 --> 01:01:22,320
change access, rotate secrets, revoke permissions, and retire the identity.
1401
01:01:22,320 --> 01:01:26,360
That's what stops ghost agents from retaining power after everyone forgot they exist.
1402
01:01:26,360 --> 01:01:29,800
So the simple version is "entra agent ID" makes agents accountable.
1403
01:01:29,800 --> 01:01:33,280
And an enterprise system's accountability is the only thing that scales.
1404
01:01:33,280 --> 01:01:37,200
Because as soon as non-humans can act, you're no longer managing chatbots, you're managing
1405
01:01:37,200 --> 01:01:39,120
identities with tools.
1406
01:01:39,120 --> 01:01:40,920
Governance that doesn't kill innovation.
1407
01:01:40,920 --> 01:01:45,000
Most leaders hear governance and picture a committee that meets monthly to approve nothing.
1408
01:01:45,000 --> 01:01:46,000
That's not governance.
1409
01:01:46,000 --> 01:01:48,280
It's delay disguised as control.
1410
01:01:48,280 --> 01:01:52,080
Real governance is a design constraint that lets building happen fast while keeping the
1411
01:01:52,080 --> 01:01:53,360
blast radius small.
1412
01:01:53,360 --> 01:01:58,000
So the program starts with a posture that feels restrictive but actually accelerates delivery.
1413
01:01:58,000 --> 01:02:01,000
Retrieval only first, then control dried operations.
1414
01:02:01,000 --> 01:02:03,480
Retrieval only agents are where innovation should be cheap.
1415
01:02:03,480 --> 01:02:07,040
They read approved knowledge, site sources, and escalate when they can't.
1416
01:02:07,040 --> 01:02:11,360
That gives you immediate deflection and reduces human workload without risking state changes
1417
01:02:11,360 --> 01:02:12,880
in downstream systems.
1418
01:02:12,880 --> 01:02:15,960
And it trains your organization on how to build with discipline.
1419
01:02:15,960 --> 01:02:20,480
And ownership, metadata, evaluation sets, and no source, no answer.
1420
01:02:20,480 --> 01:02:23,320
If you can't govern retrieval, you have no business governing actions.
1421
01:02:23,320 --> 01:02:27,720
Then you introduce right operations as a gated capability, not a default feature.
1422
01:02:27,720 --> 01:02:31,520
Create ticket, update ticket, trigger and approval, send a templated email.
1423
01:02:31,520 --> 01:02:34,640
These are controlled actions with audit logs and rollbacks.
1424
01:02:34,640 --> 01:02:39,560
Anything beyond that, privilege changes, access grants, deletions, lives behind higher gates
1425
01:02:39,560 --> 01:02:41,200
or separate identities.
1426
01:02:41,200 --> 01:02:44,440
Autonomy expands only when observability proves it's safe.
1427
01:02:44,440 --> 01:02:47,400
Now the scaling mechanism is environment strategy.
1428
01:02:47,400 --> 01:02:51,560
Not because power platform needs more environments for fun, but because environments are your safety
1429
01:02:51,560 --> 01:02:52,560
lanes.
1430
01:02:52,560 --> 01:02:54,440
You need three lanes and they are not optional.
1431
01:02:54,440 --> 01:02:56,000
Lane one is personal productivity.
1432
01:02:56,000 --> 01:02:58,040
This is where experimentation lives.
1433
01:02:58,040 --> 01:02:59,400
People can build, test and learn.
1434
01:02:59,400 --> 01:03:00,760
They can't publish broadly.
1435
01:03:00,760 --> 01:03:02,080
Tool access is minimal.
1436
01:03:02,080 --> 01:03:03,360
Knowledge access is constrained.
1437
01:03:03,360 --> 01:03:06,200
The purpose is skill building without enterprise liability.
1438
01:03:06,200 --> 01:03:07,400
Lane two is departmental.
1439
01:03:07,400 --> 01:03:11,440
This is where teams solve real workflow problems with a bounded audience.
1440
01:03:11,440 --> 01:03:13,520
Publishing is limited to the department or group.
1441
01:03:13,520 --> 01:03:17,120
Tool access expands, but only to approved connectors and MCP tools.
1442
01:03:17,120 --> 01:03:19,840
This is where you prove adoption without creating global sprawl.
1443
01:03:19,840 --> 01:03:21,320
Lane three is business critical.
1444
01:03:21,320 --> 01:03:22,320
This is production.
1445
01:03:22,320 --> 01:03:23,320
ALM exists.
1446
01:03:23,320 --> 01:03:24,320
Dev test.
1447
01:03:24,320 --> 01:03:25,320
Prod separation exists.
1448
01:03:25,320 --> 01:03:26,320
Owners exist.
1449
01:03:26,320 --> 01:03:27,320
Sponsors exist.
1450
01:03:27,320 --> 01:03:28,320
Audit logging is on.
1451
01:03:28,320 --> 01:03:29,320
Tool allow lists are enforced.
1452
01:03:29,320 --> 01:03:33,560
And if an agent touches sensitive data or performs right actions, it passes review gates
1453
01:03:33,560 --> 01:03:36,000
before anyone outside the build team can use it.
1454
01:03:36,000 --> 01:03:40,960
That lane structure is how you scale without turning every cool idea into enterprise risk.
1455
01:03:40,960 --> 01:03:45,160
Now for the part that makes or breaks governance, data boundary enforcement, DLP and connector
1456
01:03:45,160 --> 01:03:46,760
policies aren't paperwork.
1457
01:03:46,760 --> 01:03:52,720
They're the only thing standing between useful automation and accidental data exfiltration.
1458
01:03:52,720 --> 01:03:57,040
If a builder can casually connect an agent to a personal mailbox, a consumer storage
1459
01:03:57,040 --> 01:04:01,800
service and a high trust internal system in the same flow, you didn't build an agent platform,
1460
01:04:01,800 --> 01:04:05,160
you built a leakage machine so you define connector boundaries by lane.
1461
01:04:05,160 --> 01:04:08,760
In personal productivity, block out bound and high risk connectors.
1462
01:04:08,760 --> 01:04:14,000
And departmental allow a curated set in business critical allow what's needed, but only through
1463
01:04:14,000 --> 01:04:18,520
approved tool surfaces with identities you can audit cross boundary data movement becomes
1464
01:04:18,520 --> 01:04:23,120
a design decision not a maker decision human in the loop also needs to be used correctly
1465
01:04:23,120 --> 01:04:24,120
here.
1466
01:04:24,120 --> 01:04:26,240
The governance mistake is to force approvals everywhere.
1467
01:04:26,240 --> 01:04:28,000
That's how you kill adoption.
1468
01:04:28,000 --> 01:04:32,280
The correct placement is still the same decision boundaries with risk approvals for right
1469
01:04:32,280 --> 01:04:37,160
actions above a threshold approvals for privilege approvals for external communications
1470
01:04:37,160 --> 01:04:38,840
everything else runs.
1471
01:04:38,840 --> 01:04:42,360
And this is the principle that stops chaos without stopping builders.
1472
01:04:42,360 --> 01:04:45,520
You can build, but you can't publish that posture sounds cynical.
1473
01:04:45,520 --> 01:04:47,200
It is, it's also true.
1474
01:04:47,200 --> 01:04:48,840
Builders don't need permission to explore.
1475
01:04:48,840 --> 01:04:53,360
They need a path to graduate their work into production without bypassing controls.
1476
01:04:53,360 --> 01:04:56,320
So the governance model gives them a clear graduation path.
1477
01:04:56,320 --> 01:04:57,720
Prove rooting stability.
1478
01:04:57,720 --> 01:04:59,040
Prove grounded accuracy.
1479
01:04:59,040 --> 01:05:00,040
Prove tool discipline.
1480
01:05:00,040 --> 01:05:01,040
Prove logging.
1481
01:05:01,040 --> 01:05:02,040
Assign an owner.
1482
01:05:02,040 --> 01:05:03,040
Then publish.
1483
01:05:03,040 --> 01:05:04,800
The gates are explicit and the gates are fast.
1484
01:05:04,800 --> 01:05:07,320
If you hide the gates behind meetings, people root around them.
1485
01:05:07,320 --> 01:05:10,880
If you enforce the gates in design, people ship safely.
1486
01:05:10,880 --> 01:05:13,480
Governance that doesn't kill innovation is simple.
1487
01:05:13,480 --> 01:05:16,880
Constraint where agents can act, constrain what they can touch and make publishing the
1488
01:05:16,880 --> 01:05:18,360
only real checkpoint.
1489
01:05:18,360 --> 01:05:22,560
Then the system can scale because control that depends on memory always erodes.
1490
01:05:22,560 --> 01:05:24,800
Control that depends on design holds.
1491
01:05:24,800 --> 01:05:25,800
Observability.
1492
01:05:25,800 --> 01:05:27,600
The flight recorder problem.
1493
01:05:27,600 --> 01:05:31,200
Governance without observability is just optimism with a meeting invite.
1494
01:05:31,200 --> 01:05:34,400
Most organizations can describe what they intended an agent to do.
1495
01:05:34,400 --> 01:05:36,720
They cannot describe what it actually did.
1496
01:05:36,720 --> 01:05:37,720
Not reliably.
1497
01:05:37,720 --> 01:05:38,800
Not at scale.
1498
01:05:38,800 --> 01:05:43,480
Not after the first incident review when legal asks for the exact sequence of actions.
1499
01:05:43,480 --> 01:05:44,960
This is the flight recorder problem.
1500
01:05:44,960 --> 01:05:47,800
In aviation, nobody debates whether they need logs after a crash.
1501
01:05:47,800 --> 01:05:51,520
In enterprise AI, teams still treat logs like a premium feature.
1502
01:05:51,520 --> 01:05:53,560
Nice later, not required now.
1503
01:05:53,560 --> 01:05:56,760
That's how you end up with agents that can act but can't be explained.
1504
01:05:56,760 --> 01:05:59,800
And an agent that can't be explained is an agent you can't defend.
1505
01:05:59,800 --> 01:06:04,480
Your observability becomes the third pillar of control alongside grounding and identity.
1506
01:06:04,480 --> 01:06:07,160
It is what makes intent enforceable over time.
1507
01:06:07,160 --> 01:06:08,360
Start with the minimum.
1508
01:06:08,360 --> 01:06:09,360
Decision logs.
1509
01:06:09,360 --> 01:06:10,920
Not chat transcripts.
1510
01:06:10,920 --> 01:06:11,920
Decision logs.
1511
01:06:11,920 --> 01:06:13,720
A transcript tells you what the user said.
1512
01:06:13,720 --> 01:06:17,800
It does not tell you why the agent chose an intent, why it called a tool, why it escalated
1513
01:06:17,800 --> 01:06:18,960
or why it refused.
1514
01:06:18,960 --> 01:06:21,400
You need a record of the agent's decision points.
1515
01:06:21,400 --> 01:06:22,400
Classification result.
1516
01:06:22,400 --> 01:06:23,400
Confidence.
1517
01:06:23,400 --> 01:06:24,400
Retrieved sources.
1518
01:06:24,400 --> 01:06:25,400
Tool calls attempted.
1519
01:06:25,400 --> 01:06:26,400
Tool results.
1520
01:06:26,400 --> 01:06:27,880
Verification outcomes.
1521
01:06:27,880 --> 01:06:31,400
Proofers requested approvals received and the final disposition.
1522
01:06:31,400 --> 01:06:34,960
That means every meaningful step produces a log event you can query.
1523
01:06:34,960 --> 01:06:36,600
Not an export you have to dig up later.
1524
01:06:36,600 --> 01:06:39,040
A queryable record.
1525
01:06:39,040 --> 01:06:43,280
And yes, the platform gives you pieces, power platform admin views, agent registries, purview
1526
01:06:43,280 --> 01:06:46,120
DSPM for AI and defender signals.
1527
01:06:46,120 --> 01:06:50,800
But those surfaces only help if you treat observability as a design requirement, not a retrospective
1528
01:06:50,800 --> 01:06:51,800
cleanup task.
1529
01:06:51,800 --> 01:06:53,520
Audit logging is the first non-negotiable.
1530
01:06:53,520 --> 01:06:57,980
If the agent can access enterprise data or execute actions, its activity must land in
1531
01:06:57,980 --> 01:07:00,120
audit logs with a stable identity anchor.
1532
01:07:00,120 --> 01:07:02,360
This is where agent ID stops being theoretical.
1533
01:07:02,360 --> 01:07:06,200
It becomes the principle you can search for when you need to know what happened.
1534
01:07:06,200 --> 01:07:07,840
The next surface is monitoring.
1535
01:07:07,840 --> 01:07:10,000
Not just is it up, but is it behaving?
1536
01:07:10,000 --> 01:07:12,600
So the metrics that matter are not messages sent.
1537
01:07:12,600 --> 01:07:16,200
Their operational outcomes, containment rate, how often the interaction ends without a
1538
01:07:16,200 --> 01:07:20,280
human, escalation rate, how often it puns to a human, and why.
1539
01:07:20,280 --> 01:07:24,280
And then, the first thing you need to know is the data.
1540
01:07:24,280 --> 01:07:29,120
And then, the first thing you need to know is the data.
1541
01:07:29,120 --> 01:07:32,480
And then, the second thing you need to know is the data.
1542
01:07:32,480 --> 01:07:36,160
And then, the second thing you need to know is the data.
1543
01:07:36,160 --> 01:07:39,920
And then, the second thing you need to know is the data.
1544
01:07:39,920 --> 01:07:43,920
And then, the second thing you need to know is the data.
1545
01:07:43,920 --> 01:07:46,920
And then, the second thing you need to know is the data.
1546
01:07:46,920 --> 01:07:49,920
And then, the second thing you need to know is the data.
1547
01:07:49,920 --> 01:07:52,840
And then, the second thing you need to know is the data.
1548
01:07:52,840 --> 01:07:56,600
And then, the second thing you need to know is the data.
1549
01:07:56,600 --> 01:08:00,000
And then, the second thing you need to know is the data.
1550
01:08:00,000 --> 01:08:03,280
And then, the second thing you need to know is the data.
1551
01:08:03,280 --> 01:08:06,960
And then, the second thing you need to know is the data.
1552
01:08:06,960 --> 01:08:09,560
And then, the second thing you need to know is the data.
1553
01:08:09,560 --> 01:08:12,600
And then, the second thing you need to know is the data.
1554
01:08:12,600 --> 01:08:15,560
And then, the second thing you need to know is the data.
1555
01:08:15,560 --> 01:08:18,560
And then, the second thing you need to know is the data.
1556
01:08:18,560 --> 01:08:21,480
And then, the second thing you need to know is the data.
1557
01:08:21,480 --> 01:08:24,440
And then, the second thing you need to know is the data.
1558
01:08:24,440 --> 01:08:27,440
And then, the second thing you need to know is the data.
1559
01:08:27,440 --> 01:08:30,040
And then, the second thing you need to know is the data.
1560
01:08:30,040 --> 01:08:32,440
And then, the second thing you need to know is the data.
1561
01:08:32,440 --> 01:08:35,000
And then, the second thing you need to know is the data.
1562
01:08:35,000 --> 01:08:37,400
And then, the second thing you need to know is the data.
1563
01:08:37,400 --> 01:08:40,200
And then, the second thing you need to know is the data.
1564
01:08:40,200 --> 01:08:42,840
And then, the second thing you need to know is the data.
1565
01:08:42,840 --> 01:08:46,040
And then, the second thing you need to know is the data.
1566
01:08:46,040 --> 01:08:48,280
And then, the second thing you need to know is the data.
1567
01:08:48,280 --> 01:08:51,200
And then, the second thing you need to know is the data.
1568
01:08:51,200 --> 01:08:54,160
And then, the second thing you need to know is the data.
1569
01:08:54,160 --> 01:08:57,160
And then, the second thing you need to know is the data.
1570
01:08:57,160 --> 01:08:59,760
And then, the second thing you need to know is the data.
1571
01:08:59,760 --> 01:09:02,920
And then, the second thing you need to know is the data.
1572
01:09:02,920 --> 01:09:05,960
And then, the second thing you need to know is the data.
1573
01:09:05,960 --> 01:09:08,640
And then, the second thing you need to know is the data.
1574
01:09:08,640 --> 01:09:11,680
And then, the second thing you need to know is the data.
1575
01:09:11,680 --> 01:09:14,920
And then, the second thing you need to know is the data.
1576
01:09:14,920 --> 01:09:17,920
And then, the third thing you need to know is the data.
1577
01:09:17,920 --> 01:09:20,840
And then, the third thing you need to know is the data.
1578
01:09:20,840 --> 01:09:23,800
And then, the third thing you need to know is the data.
1579
01:09:23,800 --> 01:09:26,800
And then, the third thing you need to know is the data.
1580
01:09:26,800 --> 01:09:29,400
And then, the third thing you need to know is the data.
1581
01:09:29,400 --> 01:09:32,560
And then, the third thing you need to know is the data.
1582
01:09:32,560 --> 01:09:35,560
And then, the third thing you need to know is the data.
1583
01:09:35,560 --> 01:09:38,280
And then, the third thing you need to know is the data.
1584
01:09:38,280 --> 01:09:41,320
And then, the third thing you need to know is the data.
1585
01:09:41,320 --> 01:09:44,560
And then, the third thing you need to know is the data.
1586
01:09:44,560 --> 01:09:47,560
And then, the third thing you need to know is the data.
1587
01:09:47,560 --> 01:09:50,200
Turns all of this into a working enterprise system.
1588
01:09:50,200 --> 01:09:52,520
Days one, 10, build the first working system.
1589
01:09:52,520 --> 01:09:54,840
Days one through 10 aren't for vision decks.
1590
01:09:54,840 --> 01:09:57,120
Therefore, forcing a working system into existence
1591
01:09:57,120 --> 01:09:59,800
with constraints tight enough that it can't lie to you.
1592
01:09:59,800 --> 01:10:03,000
Days one and two, pick the use case, lock the KPI targets,
1593
01:10:03,000 --> 01:10:04,720
and draw the escalation boundary in ink.
1594
01:10:04,720 --> 01:10:07,120
You're not selecting AI opportunities.
1595
01:10:07,120 --> 01:10:09,000
You're selecting one operational flow,
1596
01:10:09,000 --> 01:10:12,520
where deflection and cycle time can move inside 30 days.
1597
01:10:12,520 --> 01:10:15,720
IT Ticket triage wins because it's measurable, high volume,
1598
01:10:15,720 --> 01:10:17,800
and politically survivable.
1599
01:10:17,800 --> 01:10:20,000
Then you set targets that can't be gamed.
1600
01:10:20,000 --> 01:10:23,040
20% to 40% deflection for the chosen intake channel,
1601
01:10:23,040 --> 01:10:26,120
15% to 30% SLA reduction for the subset of tickets
1602
01:10:26,120 --> 01:10:29,680
the agent touches and 10% to 25% fewer escalations
1603
01:10:29,680 --> 01:10:31,320
caused by mis-routing.
1604
01:10:31,320 --> 01:10:34,440
You also define the hard line, what the agent must never do.
1605
01:10:34,440 --> 01:10:37,160
Anything involving access grants, privilege, changes,
1606
01:10:37,160 --> 01:10:39,880
or policy exceptions escalates by design.
1607
01:10:39,880 --> 01:10:41,160
No debates later.
1608
01:10:41,160 --> 01:10:43,160
The agent doesn't try.
1609
01:10:43,160 --> 01:10:45,080
It routes.
1610
01:10:45,080 --> 01:10:48,040
Days three through five, design the intents and topics,
1611
01:10:48,040 --> 01:10:50,120
then implement the fail-safe behaviors.
1612
01:10:50,120 --> 01:10:52,440
This is the week where topics brawl either starts
1613
01:10:52,440 --> 01:10:53,680
or gets prevented.
1614
01:10:53,680 --> 01:10:56,280
You create 10 to 15 intents, not departments,
1615
01:10:56,280 --> 01:10:59,120
not everything users might ask, stable intents
1616
01:10:59,120 --> 01:11:01,040
that map to a containment boundary.
1617
01:11:01,040 --> 01:11:03,160
For each intent, you write three things.
1618
01:11:03,160 --> 01:11:06,320
Success criteria, allowed actions, and escalation triggers.
1619
01:11:06,320 --> 01:11:08,640
If you can't express those in one screen of text,
1620
01:11:08,640 --> 01:11:11,400
the intent is too broad, then you implement fallback.
1621
01:11:11,400 --> 01:11:12,480
One fallback.
1622
01:11:12,480 --> 01:11:14,520
Fallback is not be helpful.
1623
01:11:14,520 --> 01:11:16,840
Fallback is controlled uncertainty.
1624
01:11:16,840 --> 01:11:19,720
Ask one clarifying question, then escalate.
1625
01:11:19,720 --> 01:11:21,760
If your fallback tries to answer anyway,
1626
01:11:21,760 --> 01:11:23,960
you're building confident wrongness into the core.
1627
01:11:23,960 --> 01:11:26,440
And you set kill criteria now, not after the pilot
1628
01:11:26,440 --> 01:11:29,720
embarrasses you, low usage, high confusion, high escalation
1629
01:11:29,720 --> 01:11:30,720
rate, low containment.
1630
01:11:30,720 --> 01:11:33,000
If a topic fails, it gets merged or retired.
1631
01:11:33,000 --> 01:11:34,320
Dead topics aren't harmless.
1632
01:11:34,320 --> 01:11:36,640
They create ambiguous rooting forever.
1633
01:11:36,640 --> 01:11:38,920
Days six through eight, implement orchestration
1634
01:11:38,920 --> 01:11:42,120
and enrichment, connect ITSM with deterministic automation.
1635
01:11:42,120 --> 01:11:44,400
This is where you build the actual loop, classify,
1636
01:11:44,400 --> 01:11:48,760
enrich, retrieve, propose, confirm, execute, verify,
1637
01:11:48,760 --> 01:11:49,640
handoff.
1638
01:11:49,640 --> 01:11:51,640
The orchestration lives in co-pilot studio
1639
01:11:51,640 --> 01:11:54,800
because you need a control plane, not just a chat interface.
1640
01:11:54,800 --> 01:11:57,800
The enrichment is minimal, identity, device state,
1641
01:11:57,800 --> 01:12:01,000
if relevant, recent incidents, and service status.
1642
01:12:01,000 --> 01:12:02,000
Don't horde context.
1643
01:12:02,000 --> 01:12:05,040
Context hoarding becomes privacy risk and retrieval confusion.
1644
01:12:05,040 --> 01:12:07,680
Then you connect your ITSM system with power automate,
1645
01:12:07,680 --> 01:12:10,240
not because it's pretty, but because it's deterministic.
1646
01:12:10,240 --> 01:12:12,760
Ticket creation, assignment, notifications,
1647
01:12:12,760 --> 01:12:14,720
and logging belong in a workflow engine,
1648
01:12:14,720 --> 01:12:16,520
not in an LLM's improvisation.
1649
01:12:16,520 --> 01:12:19,520
At this stage, tool discipline matters more than cleverness.
1650
01:12:19,520 --> 01:12:21,080
Start read only where possible.
1651
01:12:21,080 --> 01:12:23,760
Check service health, list known incidents,
1652
01:12:23,760 --> 01:12:25,320
pull user ticket history.
1653
01:12:25,320 --> 01:12:28,480
If you must write, keep it reversible, create a ticket,
1654
01:12:28,480 --> 01:12:32,800
add a note, post an update, anything privileged waits.
1655
01:12:32,800 --> 01:12:36,280
Days nine and ten, pilot users create an evaluation set
1656
01:12:36,280 --> 01:12:38,400
and baseline containment and accuracy.
1657
01:12:38,400 --> 01:12:41,280
Pick a small pilot group that represents real usage,
1658
01:12:41,280 --> 01:12:42,280
not enthusiasts.
1659
01:12:42,280 --> 01:12:44,280
You want normal people with normal impatience,
1660
01:12:44,280 --> 01:12:47,320
give them one clear instruction, use this for these issues
1661
01:12:47,320 --> 01:12:49,200
and if it escalates, that's expected.
1662
01:12:49,200 --> 01:12:52,160
Now build the evaluation set, this is not test cases.
1663
01:12:52,160 --> 01:12:53,960
It's a fixed list of the top questions
1664
01:12:53,960 --> 01:12:55,920
and intends the agent must handle.
1665
01:12:55,920 --> 01:12:59,360
VPN access, password resets, device compliance questions,
1666
01:12:59,360 --> 01:13:02,920
known outages, ticket status, and basic how-to procedures.
1667
01:13:02,920 --> 01:13:05,320
You run the same set every week to see drift
1668
01:13:05,320 --> 01:13:07,400
and you measure three numbers immediately,
1669
01:13:07,400 --> 01:13:09,840
containment rate, escalation reasons,
1670
01:13:09,840 --> 01:13:12,640
and grounded accuracy for anything that produced an answer
1671
01:13:12,640 --> 01:13:13,480
with a source.
1672
01:13:13,480 --> 01:13:15,320
If you don't have sources yet, you still measure
1673
01:13:15,320 --> 01:13:18,120
refusal correctness, did it escalate when it should?
1674
01:13:18,120 --> 01:13:19,400
Now the gate to proceed.
1675
01:13:19,400 --> 01:13:21,600
You do not advance to days 11 through 20
1676
01:13:21,600 --> 01:13:23,200
because the team feels good.
1677
01:13:23,200 --> 01:13:26,240
You advance because the system meets three conditions.
1678
01:13:26,240 --> 01:13:29,760
Measureable lift against the baseline, no access violations,
1679
01:13:29,760 --> 01:13:32,080
and logging turned on with traceable outcomes.
1680
01:13:32,080 --> 01:13:35,480
Lift means the pilot produced a real reduction in human touches
1681
01:13:35,480 --> 01:13:37,680
for the selected intents, even if it's small.
1682
01:13:37,680 --> 01:13:40,200
No access violations means the agent didn't retrieve
1683
01:13:40,200 --> 01:13:42,880
restricted content or execute actions outside boundary.
1684
01:13:42,880 --> 01:13:45,040
Logging means you can explain every escalation
1685
01:13:45,040 --> 01:13:45,960
and every tool call.
1686
01:13:45,960 --> 01:13:48,200
If you can't pass that gate in 10 days,
1687
01:13:48,200 --> 01:13:50,120
the program doesn't need more features.
1688
01:13:50,120 --> 01:13:52,680
It needs tighter scope because the first 10 days
1689
01:13:52,680 --> 01:13:55,000
are about proving the system boundary holds.
1690
01:13:55,000 --> 01:13:57,240
Once it does, you're allowed to solve the next problem,
1691
01:13:57,240 --> 01:13:58,360
trust at scale.
1692
01:13:58,360 --> 01:14:01,040
And trust only comes from grounding plus tool discipline,
1693
01:14:01,040 --> 01:14:02,360
enforced relentlessly.
1694
01:14:02,360 --> 01:14:06,000
Days 11 to 20, ground stabilize and reduce entropy.
1695
01:14:06,000 --> 01:14:09,520
Days 11 through 20 are where most programs either become a system
1696
01:14:09,520 --> 01:14:12,600
or become a clever demo that everyone quietly stops using.
1697
01:14:12,600 --> 01:14:15,200
Week 2 proved you can root and contain within a boundary.
1698
01:14:15,200 --> 01:14:17,320
Now you have to make the answers defensible,
1699
01:14:17,320 --> 01:14:20,480
the tools predictable, and the failure modes measurable.
1700
01:14:20,480 --> 01:14:23,800
This is the phase where entropy shows up as small exceptions.
1701
01:14:23,800 --> 01:14:26,440
And small exceptions are how agent programs die.
1702
01:14:26,440 --> 01:14:29,800
Days 11 through 13, build the Azure AI search index
1703
01:14:29,800 --> 01:14:32,200
and make the chunking strategy non-negotiable.
1704
01:14:32,200 --> 01:14:33,760
You're not connecting SharePoint,
1705
01:14:33,760 --> 01:14:35,920
you're building an operational knowledge index,
1706
01:14:35,920 --> 01:14:38,360
so you start by choosing what qualifies as truth.
1707
01:14:38,360 --> 01:14:41,440
Approved runbooks, SOPs, known issue articles and policies
1708
01:14:41,440 --> 01:14:43,760
with owners, if it's a draft it doesn't go in.
1709
01:14:43,760 --> 01:14:45,320
If it's ownerless it doesn't go in.
1710
01:14:45,320 --> 01:14:47,920
If it changes without change control it doesn't go in.
1711
01:14:47,920 --> 01:14:49,880
Then chunking, don't overthink it.
1712
01:14:49,880 --> 01:14:52,440
The unit of retrieval is the decision unit.
1713
01:14:52,440 --> 01:14:55,760
One procedure, one exception clause, one policy section.
1714
01:14:55,760 --> 01:14:57,760
If your chunk contains multiple outcomes,
1715
01:14:57,760 --> 01:14:59,960
you've created ambiguity on purpose.
1716
01:14:59,960 --> 01:15:02,760
Metadata follows, service, audience, region,
1717
01:15:02,760 --> 01:15:04,720
risk tier and last reviewed date.
1718
01:15:04,720 --> 01:15:06,920
Make metadata mandatory in the content pipeline
1719
01:15:06,920 --> 01:15:08,600
because optional metadata is metadata
1720
01:15:08,600 --> 01:15:10,200
that won't exist where you need it.
1721
01:15:10,200 --> 01:15:12,280
Finally, design refresh cadence like you mean it.
1722
01:15:12,280 --> 01:15:15,520
Policies refresh on publish, known issues refresh frequently,
1723
01:15:15,520 --> 01:15:19,280
runbooks refresh on change, and you keep last indexed.
1724
01:15:19,280 --> 01:15:21,680
Visible in telemetry, because stale answers
1725
01:15:21,680 --> 01:15:24,200
and hallucinations look identical to the user.
1726
01:15:24,200 --> 01:15:28,080
Days 14 through 16 integrate MCP tools in read-only mode
1727
01:15:28,080 --> 01:15:30,560
and enforce no source, no answer.
1728
01:15:30,560 --> 01:15:32,680
This is where programs get tempted to move fast
1729
01:15:32,680 --> 01:15:34,760
by making the agent do things.
1730
01:15:34,760 --> 01:15:36,560
Don't, not yet.
1731
01:15:36,560 --> 01:15:40,080
Start with MCP tools that only read check service health,
1732
01:15:40,080 --> 01:15:42,880
list known incidents, look up a user's open tickets,
1733
01:15:42,880 --> 01:15:45,920
retrieve a service catalog entry, pull a ticket template.
1734
01:15:45,920 --> 01:15:47,320
This makes the agent more accurate
1735
01:15:47,320 --> 01:15:49,200
without letting it change state.
1736
01:15:49,200 --> 01:15:52,040
And it gives you tool telemetry without operational risk.
1737
01:15:52,040 --> 01:15:54,520
Then you enforce the grounding rule with teeth.
1738
01:15:54,520 --> 01:15:57,520
If the response is non-trivial and there's no retrieved source
1739
01:15:57,520 --> 01:15:59,280
it refuses and escalates.
1740
01:15:59,280 --> 01:16:01,880
Not a friendly guess, not based on my knowledge.
1741
01:16:01,880 --> 01:16:04,720
A controlled refusal with an escalation path.
1742
01:16:04,720 --> 01:16:06,760
This is also where you explicitly separate knowledge
1743
01:16:06,760 --> 01:16:07,440
from action.
1744
01:16:07,440 --> 01:16:09,480
The agent can answer from grounded content.
1745
01:16:09,480 --> 01:16:12,080
The agent can propose an action, but execution only happens
1746
01:16:12,080 --> 01:16:15,160
through a tool call and only after you hit a decision boundary.
1747
01:16:15,160 --> 01:16:17,480
That separation is what stops healthfulness
1748
01:16:17,480 --> 01:16:20,440
from becoming unauthorized change.
1749
01:16:20,440 --> 01:16:23,160
Days 17 and 18 add approvals for write actions
1750
01:16:23,160 --> 01:16:24,960
and introduce adaptive cards.
1751
01:16:24,960 --> 01:16:26,560
Now you're allowed to add write tools
1752
01:16:26,560 --> 01:16:28,400
but only inside a governed pattern.
1753
01:16:28,400 --> 01:16:31,320
Propose, confirm, approve, execute, log, pick one
1754
01:16:31,320 --> 01:16:34,080
or two write actions that are low risk and reversible.
1755
01:16:34,080 --> 01:16:35,600
Ticket creation is the obvious one.
1756
01:16:35,600 --> 01:16:37,560
Updating a ticket with context is another.
1757
01:16:37,560 --> 01:16:41,280
Anything involving identity, access, finance or deletion stays out.
1758
01:16:41,280 --> 01:16:43,560
Adaptive cards and teams become the approval surface
1759
01:16:43,560 --> 01:16:45,200
because they force clarity.
1760
01:16:45,200 --> 01:16:47,840
The approver sees who requested, what will change, why,
1761
01:16:47,840 --> 01:16:49,040
and which policy applies.
1762
01:16:49,040 --> 01:16:51,520
They click approve, deny or request more info,
1763
01:16:51,520 --> 01:16:53,640
no paragraph debates, no hidden side channels.
1764
01:16:53,640 --> 01:16:55,960
And the card isn't just UX, it's control.
1765
01:16:55,960 --> 01:16:58,600
The approval decision triggers a deterministic workflow,
1766
01:16:58,600 --> 01:17:01,240
records the payload, stamps who approved,
1767
01:17:01,240 --> 01:17:03,520
and then executes through the approved two path.
1768
01:17:03,520 --> 01:17:05,160
If you can't show the approval record,
1769
01:17:05,160 --> 01:17:07,320
you didn't approve anything, you just delayed it.
1770
01:17:07,320 --> 01:17:08,840
Days 19 and 20.
1771
01:17:08,840 --> 01:17:11,920
Red team prompts, injection tests and tighten fallbacks.
1772
01:17:11,920 --> 01:17:14,240
This is where you stop pretending users behave nicely.
1773
01:17:14,240 --> 01:17:16,360
You test prompt injection, you test instructions
1774
01:17:16,360 --> 01:17:17,760
that try to override policy,
1775
01:17:17,760 --> 01:17:20,040
you test ignore previous instructions patterns,
1776
01:17:20,040 --> 01:17:22,560
you test misleading inputs designed to force retrieval
1777
01:17:22,560 --> 01:17:23,680
of restricted content,
1778
01:17:23,680 --> 01:17:25,360
you test social engineering prompts
1779
01:17:25,360 --> 01:17:27,120
that try to get the agent to generate an action
1780
01:17:27,120 --> 01:17:28,200
it should never take.
1781
01:17:28,200 --> 01:17:30,320
And you don't treat failures as model problems,
1782
01:17:30,320 --> 01:17:32,080
you treat them as boundary problems.
1783
01:17:32,080 --> 01:17:33,880
If the agent retrieved the wrong content,
1784
01:17:33,880 --> 01:17:35,440
fixed chunking and metadata,
1785
01:17:35,440 --> 01:17:37,280
if it answered without sources,
1786
01:17:37,280 --> 01:17:38,680
tighten the response policy.
1787
01:17:38,680 --> 01:17:40,000
If it tried to call a write tool
1788
01:17:40,000 --> 01:17:41,880
when it shouldn't, tighten orchestration.
1789
01:17:41,880 --> 01:17:43,920
If it got stuck in two retry loops,
1790
01:17:43,920 --> 01:17:46,600
add hard stop conditions and escalation triggers.
1791
01:17:46,600 --> 01:17:49,120
Now the gate to proceed in today's 21 through 30
1792
01:17:49,120 --> 01:17:52,080
is strict because this is where scale becomes liability.
1793
01:17:52,080 --> 01:17:54,640
You proceed only when three conditions are true.
1794
01:17:54,640 --> 01:17:58,160
Grounded accuracy in your evaluator set is above 85%,
1795
01:17:58,160 --> 01:18:00,520
rooting stability holds under real usage,
1796
01:18:00,520 --> 01:18:03,320
and escalation rates are trending down for the right reasons.
1797
01:18:03,320 --> 01:18:05,040
Not because the agent refuses everything,
1798
01:18:05,040 --> 01:18:06,400
because it retrieves correctly,
1799
01:18:06,400 --> 01:18:08,960
sites correctly and escalates only at real boundaries.
1800
01:18:08,960 --> 01:18:10,400
Once you pass that gate,
1801
01:18:10,400 --> 01:18:13,600
you're ready for the final phase, scale without panic.
1802
01:18:13,600 --> 01:18:16,040
That means identity alignment, publishing discipline
1803
01:18:16,040 --> 01:18:18,280
and life cycle controls that stop the ecosystem
1804
01:18:18,280 --> 01:18:20,400
from rotting the moment it grows.
1805
01:18:20,400 --> 01:18:22,840
Days 21 30 scale to a workforce
1806
01:18:22,840 --> 01:18:24,480
without creating a liability.
1807
01:18:24,480 --> 01:18:28,120
Days 21 through 30 are where leadership usually ruins the win.
1808
01:18:28,880 --> 01:18:31,400
Week one and two produced a working system boundary,
1809
01:18:31,400 --> 01:18:33,600
week three proved grounding and approvals.
1810
01:18:33,600 --> 01:18:36,960
Now leadership sees momentum and asks for more agents
1811
01:18:36,960 --> 01:18:39,520
across more domains, more connectors, more autonomy,
1812
01:18:39,520 --> 01:18:40,320
more channels.
1813
01:18:40,320 --> 01:18:43,160
That request is how liability gets invited into production.
1814
01:18:43,160 --> 01:18:45,520
So this final phase is not build more.
1815
01:18:45,520 --> 01:18:47,400
It's make the first one survivable,
1816
01:18:47,400 --> 01:18:49,200
then expand with discipline.
1817
01:18:49,200 --> 01:18:50,960
Days 21 through 23.
1818
01:18:50,960 --> 01:18:53,960
Align the agent with enterprise identity and least privilege.
1819
01:18:53,960 --> 01:18:56,200
This is where entry agent ID stops being conceptual
1820
01:18:56,200 --> 01:18:57,360
and becomes a dependency.
1821
01:18:57,360 --> 01:18:59,520
The agent needs a stable identity anchor
1822
01:18:59,520 --> 01:19:02,320
for audit, conditional access and incident response.
1823
01:19:02,320 --> 01:19:04,760
That identity also forces the uncomfortable question.
1824
01:19:04,760 --> 01:19:07,040
What permissions does this agent actually need?
1825
01:19:07,040 --> 01:19:09,240
And what permissions did it accidentally inherit
1826
01:19:09,240 --> 01:19:11,240
because someone clicked allow all?
1827
01:19:11,240 --> 01:19:13,200
So you do a least privilege review as a gate,
1828
01:19:13,200 --> 01:19:14,280
not as a cleanup task.
1829
01:19:14,280 --> 01:19:17,320
You separate read capability from write capability.
1830
01:19:17,320 --> 01:19:19,800
You keep write actions behind approvals.
1831
01:19:19,800 --> 01:19:21,920
And if the agent touches sensitive systems,
1832
01:19:21,920 --> 01:19:23,720
you apply conditional access constraints
1833
01:19:23,720 --> 01:19:27,280
that match reality, restrict where the identity can be used,
1834
01:19:27,280 --> 01:19:31,320
restrict session behavior and block high-risk sign-in conditions.
1835
01:19:31,320 --> 01:19:33,280
This is where you prevent the classic failure.
1836
01:19:33,280 --> 01:19:35,520
A helpful agent becomes a privileged actor
1837
01:19:35,520 --> 01:19:37,960
without anyone explicitly deciding it should.
1838
01:19:37,960 --> 01:19:39,880
Days 24 through 26.
1839
01:19:39,880 --> 01:19:42,040
Production ALM and controlled publishing.
1840
01:19:42,040 --> 01:19:43,400
If the agent is business critical,
1841
01:19:43,400 --> 01:19:45,000
it doesn't ship like a maker project.
1842
01:19:45,000 --> 01:19:47,560
You enforce dev, test and proc separation
1843
01:19:47,560 --> 01:19:50,440
so you can change behavior without gambling in production.
1844
01:19:50,440 --> 01:19:52,120
You publish through a controlled path,
1845
01:19:52,120 --> 01:19:54,840
so builders can't accidentally make something global.
1846
01:19:54,840 --> 01:19:57,000
You lock down who can update the production agent
1847
01:19:57,000 --> 01:19:58,640
and you keep rollback real.
1848
01:19:58,640 --> 01:20:00,960
If containment drops or escalation spike,
1849
01:20:00,960 --> 01:20:02,440
you revert and investigate.
1850
01:20:02,440 --> 01:20:05,520
No hero debugging, no late night prompt edits in prod.
1851
01:20:05,520 --> 01:20:07,640
And you formalize the, you can build,
1852
01:20:07,640 --> 01:20:10,880
but you can't publish posture into a workflow.
1853
01:20:10,880 --> 01:20:12,480
Publishing requires a named owner,
1854
01:20:12,480 --> 01:20:14,840
named sponsor, tool-allow-list confirmation,
1855
01:20:14,840 --> 01:20:16,400
grounding policy confirmation,
1856
01:20:16,400 --> 01:20:18,360
and audit logging verification.
1857
01:20:18,360 --> 01:20:19,760
Not to slow teams down,
1858
01:20:19,760 --> 01:20:23,280
to keep the system deterministic as its scales.
1859
01:20:23,280 --> 01:20:26,200
Days 27 and 28, roll out to a target group
1860
01:20:26,200 --> 01:20:28,720
with adoption tied to workflows, not training theatre.
1861
01:20:28,720 --> 01:20:30,800
This is where most orgs waste time.
1862
01:20:30,800 --> 01:20:33,560
They run a co-pilot training, teach people how to prompt,
1863
01:20:33,560 --> 01:20:35,600
and then wonder why adoption stalls.
1864
01:20:35,600 --> 01:20:38,360
People don't adopt prompts, they adopt outcomes.
1865
01:20:38,360 --> 01:20:39,960
So you roll out by workflow,
1866
01:20:39,960 --> 01:20:41,400
you put the agent in the channel
1867
01:20:41,400 --> 01:20:43,080
where the work already starts.
1868
01:20:43,080 --> 01:20:45,920
You give users three things, what issues it handles,
1869
01:20:45,920 --> 01:20:47,120
what it will escalate,
1870
01:20:47,120 --> 01:20:50,080
and what evidence it will show when it answers, that's it.
1871
01:20:50,080 --> 01:20:51,720
Then you measure real usage,
1872
01:20:51,720 --> 01:20:53,160
task completion without handoff,
1873
01:20:53,160 --> 01:20:55,000
containment rate and escalation reasons.
1874
01:20:55,000 --> 01:20:57,640
If adoption is low, you don't fix it with more training.
1875
01:20:57,640 --> 01:20:59,000
You fix it by tightening, rooting,
1876
01:20:59,000 --> 01:21:00,280
shortening the answer format
1877
01:21:00,280 --> 01:21:03,200
and adding the next action buttons that remove friction.
1878
01:21:03,200 --> 01:21:04,920
Days 29 and 30,
1879
01:21:04,920 --> 01:21:08,040
executive readout in the next 90-day plan.
1880
01:21:08,040 --> 01:21:10,080
The executive readout isn't we built an agent,
1881
01:21:10,080 --> 01:21:10,920
that get up to,
1882
01:21:10,920 --> 01:21:12,280
it's KPI deltas,
1883
01:21:12,280 --> 01:21:14,480
risk posture and operational truth.
1884
01:21:14,480 --> 01:21:17,120
Show the baseline and the change, deflection,
1885
01:21:17,120 --> 01:21:20,160
SLA impact, escalation reduction, and time saved,
1886
01:21:20,160 --> 01:21:23,520
show grounded accuracy and the evaluator set results,
1887
01:21:23,520 --> 01:21:24,920
show audit readiness,
1888
01:21:24,920 --> 01:21:27,080
logging enabled, tool usage tracked,
1889
01:21:27,080 --> 01:21:29,400
approvals captured and identity anchored.
1890
01:21:29,400 --> 01:21:31,600
Then show what you refuse to do on purpose,
1891
01:21:31,600 --> 01:21:33,960
no custom LLMs, no broad-right permissions,
1892
01:21:33,960 --> 01:21:36,160
no uncontrolled connectors, no agent sprawl.
1893
01:21:36,160 --> 01:21:38,960
Executives need to hear that restraint created the result
1894
01:21:38,960 --> 01:21:41,800
because the instinct will be to remove restraint next quarter.
1895
01:21:41,800 --> 01:21:44,720
Finally, define exit criteria before you declare victory.
1896
01:21:44,720 --> 01:21:46,440
Success is measurable ROI
1897
01:21:46,440 --> 01:21:48,960
plus audit-ready telemetry plus life cycle controls
1898
01:21:48,960 --> 01:21:50,000
that prevent sprawl.
1899
01:21:50,000 --> 01:21:51,520
If any one of those is missing,
1900
01:21:51,520 --> 01:21:53,160
you didn't build an agente workforce.
1901
01:21:53,160 --> 01:21:56,640
You built a short-lived demo with a future incident attached
1902
01:21:56,640 --> 01:21:58,520
and the last transition is the simplest truth
1903
01:21:58,520 --> 01:21:59,680
in the entire roadmap.
1904
01:21:59,680 --> 01:22:02,080
Agents don't scale because they're intelligent.
1905
01:22:02,080 --> 01:22:05,120
They scale because you made their decisions enforceable.
1906
01:22:05,120 --> 01:22:08,560
Conclusion, the law replace work, don't imitate chat.
1907
01:22:08,560 --> 01:22:09,640
The law is simple,
1908
01:22:09,640 --> 01:22:12,400
co-pilot succeeds when orchestration replaces work,
1909
01:22:12,400 --> 01:22:15,320
grounding enforces truth and identity enforces boundaries.
1910
01:22:15,320 --> 01:22:17,600
Everything else is just sparkling automation.
1911
01:22:17,600 --> 01:22:18,720
If you want the next step,
1912
01:22:18,720 --> 01:22:21,480
listen to the next M365FM episode
1913
01:22:21,480 --> 01:22:23,360
on building an enterprise agent catalog
1914
01:22:23,360 --> 01:22:25,840
that prevents sprawl without stalling delivery.
1915
01:22:25,840 --> 01:22:28,240
Subscribe for more uncomfortable architecture truths
1916
01:22:28,240 --> 01:22:29,880
about Entra, co-pilot studio,
1917
01:22:29,880 --> 01:22:31,440
and the systems that quietly break
1918
01:22:31,440 --> 01:22:33,160
when governance stays optional.
















