This episode of the M365.FM Podcast — “Why Copilot Agents Fail & How to Make Them Successful” — examines the common reasons enterprise Copilot agent programs collapse and offers a practical framework to avoid those pitfalls. The core insight is that many teams treat agents as assistive features — fancy UIs and prompt generators — instead of recognizing them as executable authority engines that act on systems, data, and decisions. The result is often “agent sprawl” and programs that fail not because of bad models, but because of identity ambiguity, lack of governance, absence of scoped execution contracts, poor grounding, and mismatch between metrics and business outcomes. Rather than focusing on vanity metrics like agent counts or prompt volumes, the episode emphasizes measurable outcomes like ticket deflection, SLA improvement, cost per task, and grounded accuracy. It lays out principles for agent design, governance, identity, and operationalization that help organizations scale Copilot agents with accountability, auditability, and real business value.

Apple Podcasts podcast player iconSpotify podcast player iconYoutube Music podcast player iconSpreaker podcast player iconPodchaser podcast player iconAmazon Music podcast player icon

🧠 Core Theme

  • Copilot agents fail when organizations build them as features instead of systems with control planes.

  • The episode reframes agent success as measurable outcomes with governance, not adoption or usage metrics.

  • Understanding where and why failures occur is essential to building sustainable, scalable, and safe agent programs.


Common Failure Causes


🚫 1. Agents as Assistants, Not Actors

  • Treating agents as “smart chat” or text generators results in:

    • Shallow usefulness

    • Inconsistent behavior

    • No impact on workflows

  • Real value comes when agents execute actions on systems — and that requires clearly scoped authority and boundaries.


🧩 2. Identity Ambiguity

  • Agents often run under:

    • Shared accounts

    • Generic automation principals

    • Unscoped identities

  • Without unique, non-human identities:

    • Actions lack accountability

    • Audit trails become narrative, not atomic evidence

    • Rollbacks and incident containment become impossible


🔒 3. No Scoped Execution Contracts

  • Execution without contracts means:

    • Agents get broad tool access

    • Uncontrolled variation in triggers and outcomes

    • Risk amplifies with each new use case

  • Contracts define:

    • What actions are allowed

    • Under what conditions

    • What triggers escalation


📡 4. Poor Grounding

  • Agents must be grounded on authoritative data sources, not:

    • unmanaged content silos

    • stale copies

    • unclassified knowledge pools

  • Without proper grounding:

    • Accuracy degrades

    • Misinformation propagates

    • Trust collapses quickly


📉 5. Metrics That Don’t Matter

Vanity metrics that mask real issues:

  • Number of agents created

  • Prompt volume

  • Chat activity

  • Model token usage

These do not reflect productivity or risk management.


🛠️ What Makes Agents Successful


🎯 1. Identity With Accountability

  • Assign every agent a unique identity.

  • Ensure least-privilege access.

  • Tie actions to identifiable principals for audit and rollback.


📜 2. Execution Contracts

Contracts assert:

  • What operations an agent can perform

  • Under what constraints

  • What tools it may call
    Contracts make behavior predictable and auditable.


🔍 3. Grounding & Authoritative Knowledge

  • Agents must derive context and evidence from:

    • curated content

    • governed data domains

    • validated sources

  • This prevents hallucination-like inconsistency and misbehavior.


📊 4. Outcome-Driven Metrics

Useful performance indicators include:

  • Ticket deflection rate

  • SLA reduction

  • Task completion rate

  • Cost per completed task

  • Grounded accuracy rate

  • Audit-ready execution logs

These reflect real value and safe operation.


🧠 Agent Program Design Principles


🔹 Agents as Products

Treat each agent as a product with:

  • Clear sponsor

  • Defined users

  • Performance KPIs

  • Release and retirement lifecycle

Not as one-off toys built by individual teams.


🔹 Governance Before Scale

  • Governance must be built into agent design from day one.

  • Identity, contracts, grounding, metrics must be non-optional layers.

  • Governance should enable — not block — innovation with control.


🔹 Role of Copilot Studio

Copilot Studio is positioned as the orchestration fabric:

  • Triggers

  • Tool access

  • Execution flows
    But orchestration alone is not governance — it must integrate identity, contracts, and grounding.


🚀 Measuring Success

Rather than adoption metrics, measure:

Operational Impact

  • Reduced human workload

  • Faster decision cycles

  • SLA adherence improvements

Risk Management

  • Traceable audit trails

  • Least-privilege enforcement

  • Containment playbooks for drift

Cost Efficiency

  • Lower cost per task

  • Controlled resource usage

  • Avoidance of runaway token costs


📌 Leadership & Governance Implications


🛡️ Governance Is a Control Plane, Not a Checklist

  • Policies are enforcement mechanisms, not documents.

  • Control planes must enforce:

    • Identity boundaries

    • Tool contracts

    • Data governance

    • Drift detection
      Without enforcement, governance is just paper over entropy.


👤 People & Org Design Matter

  • Ownership and accountability must be explicit.

  • Agent development should be steered by:

    • Security teams

    • Architecture governance boards

    • Business owners with measurable goals

  • Siloed development leads to fragmentation and risk.


🎯 Key Takeaways

  • Copilot agents fail by scaling chaos, not by technical bugs.

  • Successful agents require:

    • Unique identities

    • Scoped execution contracts

    • Authoritative grounding

    • Outcome-driven metrics

  • Governance enables scalable productivity — it does not block it.

  • Metrics must reflect value and risk, not activity.

Transcript

1
00:00:00,000 --> 00:00:04,280
Most enterprises blame co-pilot agent failures on early platform chaos.

2
00:00:04,280 --> 00:00:06,280
That story is comforting, it's also wrong.

3
00:00:06,280 --> 00:00:09,600
Agents fail because teams deploy conversation where they need control,

4
00:00:09,600 --> 00:00:12,080
then act surprised when outcomes aren't repeatable,

5
00:00:12,080 --> 00:00:13,880
auditable or safe to scale.

6
00:00:13,880 --> 00:00:17,680
In the next hour, this goes from provocation to a Monday morning mandate.

7
00:00:17,680 --> 00:00:20,920
How to design agents like govern systems, not chat experiences.

8
00:00:20,920 --> 00:00:25,440
If you're building co-pilot studio agents inside a real tenant, identity data,

9
00:00:25,440 --> 00:00:27,640
power platform service, now subscribe.

10
00:00:27,640 --> 00:00:30,360
This channel is where the architecture gets enforced.

11
00:00:30,360 --> 00:00:32,760
Before the sprawl becomes policy.

12
00:00:32,760 --> 00:00:35,560
The foundational misunderstanding a chat is not assisted.

13
00:00:35,560 --> 00:00:37,360
The foundational misunderstanding is simple.

14
00:00:37,360 --> 00:00:39,560
People treat chat like a system interface.

15
00:00:39,560 --> 00:00:42,160
Chat is not a system, it's a user experience layer.

16
00:00:42,160 --> 00:00:45,800
That distinction matters because enterprises don't run on friendly conversations.

17
00:00:45,800 --> 00:00:48,560
They run on inputs, state outputs and accountability.

18
00:00:48,560 --> 00:00:49,880
They run on repeatability.

19
00:00:49,880 --> 00:00:53,920
They run on the ability to prove what happened, why it happened and who authorized it.

20
00:00:53,920 --> 00:00:57,440
A chat box gives you none of that by default, it gives you a vibe.

21
00:00:57,440 --> 00:00:59,200
Here's what chat does architecturally.

22
00:00:59,200 --> 00:01:00,560
It hides the boundaries.

23
00:01:00,560 --> 00:01:05,480
It collapses intent capture, decisioning and execution into one continuous stream of text.

24
00:01:05,480 --> 00:01:08,480
And the moment you do that, you lose the ability to say,

25
00:01:08,480 --> 00:01:10,480
this is the point where the system decided,

26
00:01:10,480 --> 00:01:12,920
and this is the point where the system acted.

27
00:01:12,920 --> 00:01:14,800
In other words, you can't draw the audit line.

28
00:01:14,800 --> 00:01:16,600
Enterprise systems run on contracts.

29
00:01:16,600 --> 00:01:18,080
A request has a defined shape.

30
00:01:18,080 --> 00:01:19,400
Inputs get validated.

31
00:01:19,400 --> 00:01:21,720
State changes happen inside boundaries.

32
00:01:21,720 --> 00:01:26,200
Outputs have lineage, failures stop, retry or escalate with evidence.

33
00:01:26,200 --> 00:01:28,080
A chat doesn't do contracts by default.

34
00:01:28,080 --> 00:01:29,240
It does interpretation.

35
00:01:29,240 --> 00:01:33,080
Sometimes useful, sometimes wrong, always harder to reproduce than a workflow.

36
00:01:33,080 --> 00:01:34,840
That's not a moral critique of AI.

37
00:01:34,840 --> 00:01:36,360
That's a system behavior description.

38
00:01:36,360 --> 00:01:40,960
The enterprise problem starts when teams mistake fluent language for bounded decision logic.

39
00:01:40,960 --> 00:01:45,000
They assume that if the agent can talk through a process, it can run the process.

40
00:01:45,000 --> 00:01:46,600
But language is not a control plane.

41
00:01:46,600 --> 00:01:48,160
Language is not policy enforcement.

42
00:01:48,160 --> 00:01:49,720
Language is not a transaction boundary.

43
00:01:49,720 --> 00:01:51,320
Language is just output.

44
00:01:51,320 --> 00:01:54,160
The cost of this shows up as friendly ambiguity.

45
00:01:54,160 --> 00:01:57,280
Everyone loves friendly ambiguity in demos because it feels flexible.

46
00:01:57,280 --> 00:02:00,120
In production, it creates three predictable outcomes.

47
00:02:00,120 --> 00:02:04,040
In consistent actions, untraceable rationale and audit discomfort.

48
00:02:04,040 --> 00:02:08,320
In consistent actions happen because the agent doesn't have a deterministic decision boundary.

49
00:02:08,320 --> 00:02:11,200
The same request from two users becomes two different interpretations.

50
00:02:11,200 --> 00:02:14,880
The same request from the same user on Tuesday becomes a slightly different path on Wednesday

51
00:02:14,880 --> 00:02:18,680
because context changed, the model changed, the connector throttled, or a knowledge source

52
00:02:18,680 --> 00:02:20,160
return different chunks.

53
00:02:20,160 --> 00:02:22,800
And now your process is a probability distribution.

54
00:02:22,800 --> 00:02:26,080
Untraceable rationale happens because chat encourages narrative.

55
00:02:26,080 --> 00:02:27,080
The agent explains.

56
00:02:27,080 --> 00:02:28,080
It justifies.

57
00:02:28,080 --> 00:02:29,080
It sounds reasonable.

58
00:02:29,080 --> 00:02:33,360
But the organization can't verify which data it used, which policy it applied, which tool

59
00:02:33,360 --> 00:02:34,840
contracted and vote.

60
00:02:34,840 --> 00:02:37,640
And what preconditions were true at the moment of action.

61
00:02:37,640 --> 00:02:38,640
You get a story.

62
00:02:38,640 --> 00:02:39,640
You don't get evidence.

63
00:02:39,640 --> 00:02:41,560
Audit discomfort is the inevitable end state.

64
00:02:41,560 --> 00:02:43,240
Not because auditors hate AI.

65
00:02:43,240 --> 00:02:44,720
Auditors hate ambiguity.

66
00:02:44,720 --> 00:02:47,520
Outcomes you can't reconstruct and decisions you can't attribute.

67
00:02:47,520 --> 00:02:48,760
That's not a control story.

68
00:02:48,760 --> 00:02:50,280
That's an incident review waiting to happen.

69
00:02:50,280 --> 00:02:52,440
So what is the real job of an enterprise agent?

70
00:02:52,440 --> 00:02:54,000
It is not answer questions.

71
00:02:54,000 --> 00:02:55,360
That's a small safe subset.

72
00:02:55,360 --> 00:03:00,040
The real job is delegated decisioning plus delegated execution under constraints.

73
00:03:00,040 --> 00:03:04,400
Delegated decisioning means the agent can choose a path, which workflow to trigger, which record

74
00:03:04,400 --> 00:03:08,480
to retrieve, which policy applies, which exception requires escalation.

75
00:03:08,480 --> 00:03:13,360
Delegated execution means the agent can cause state change, create a ticket, update a record,

76
00:03:13,360 --> 00:03:17,200
submit an approval, notify a user, write to a system of record.

77
00:03:17,200 --> 00:03:20,240
And the phrase under constraints is the entire point.

78
00:03:20,240 --> 00:03:24,320
Constraints are what turn AI from a probabilistic assistant into an operational component.

79
00:03:24,320 --> 00:03:27,800
Most organizations skip constraints because constraints don't demo well.

80
00:03:27,800 --> 00:03:29,360
Constraints look like friction.

81
00:03:29,360 --> 00:03:30,760
Constraints look like governance.

82
00:03:30,760 --> 00:03:32,480
Constraints look like someone saying no.

83
00:03:32,480 --> 00:03:33,760
But constraints are the design.

84
00:03:33,760 --> 00:03:34,880
They are the architecture.

85
00:03:34,880 --> 00:03:38,960
When you deploy chat first agents, you're delegating without defining boundaries.

86
00:03:38,960 --> 00:03:43,680
You are handing a conversational interface, a set of tools and saying, be smart.

87
00:03:43,680 --> 00:03:47,240
Then you're surprised that it behaves like a conversational interface with tools.

88
00:03:47,240 --> 00:03:50,920
And once the agent can act, the consequences stop being theoretical.

89
00:03:50,920 --> 00:03:52,800
The failure mode isn't it answered wrong.

90
00:03:52,800 --> 00:03:55,760
The failure mode is it updated the wrong thing.

91
00:03:55,760 --> 00:03:57,360
It approved the wrong thing.

92
00:03:57,360 --> 00:03:59,280
It exposed the wrong thing.

93
00:03:59,280 --> 00:04:01,520
Or the quietest failure of all.

94
00:04:01,520 --> 00:04:04,040
People stop trusting it and routed around it.

95
00:04:04,040 --> 00:04:06,360
That's why more prompts doesn't fix this.

96
00:04:06,360 --> 00:04:07,480
Prompts can shape tone.

97
00:04:07,480 --> 00:04:09,400
Prompts can reduce some ambiguity.

98
00:04:09,400 --> 00:04:13,080
Prompts can nudge behavior, but prompts cannot manufacture a control plane that doesn't

99
00:04:13,080 --> 00:04:14,080
exist.

100
00:04:14,080 --> 00:04:15,080
They can't create missing contracts.

101
00:04:15,080 --> 00:04:17,160
They can't make identity decisions explicit.

102
00:04:17,160 --> 00:04:20,640
They can't enforce tool allow lists with preconditions and refusal rules.

103
00:04:20,640 --> 00:04:25,080
They can't produce an audit trail if the system of record was never part of the design.

104
00:04:25,080 --> 00:04:27,080
So the first mandate is structural.

105
00:04:27,080 --> 00:04:29,080
Stop asking chat to behave like a system.

106
00:04:29,080 --> 00:04:32,080
If you want enterprise outcomes, you need enterprise mechanics.

107
00:04:32,080 --> 00:04:34,000
And that's where this gets uncomfortable.

108
00:04:34,000 --> 00:04:35,760
Because it means the agent isn't the product.

109
00:04:35,760 --> 00:04:38,920
The product is the architecture around it.

110
00:04:38,920 --> 00:04:40,240
Truth one.

111
00:04:40,240 --> 00:04:43,120
Most agents fail because they're too conversational.

112
00:04:43,120 --> 00:04:45,480
Here's the first truth that gets people defensive.

113
00:04:45,480 --> 00:04:48,520
First co-pilot agents fail because they are too conversational.

114
00:04:48,520 --> 00:04:52,480
Not because conversation is bad, because conversation becomes the center of gravity and everything

115
00:04:52,480 --> 00:04:54,000
else becomes optional.

116
00:04:54,000 --> 00:04:58,840
Identity discipline, input validation, transaction boundaries, evidence, the chat first pattern

117
00:04:58,840 --> 00:05:00,360
looks harmless at the start.

118
00:05:00,360 --> 00:05:04,680
A team opens co-pilot studio, writes friendly instructions, connects knowledge, adds tools

119
00:05:04,680 --> 00:05:06,760
and ships, the demo works.

120
00:05:06,760 --> 00:05:08,080
Then it hits the tenant.

121
00:05:08,080 --> 00:05:13,360
Partial permissions, messy data, throttling edge cases in a decade of just this once exceptions.

122
00:05:13,360 --> 00:05:17,000
And a chat first agent, intent capture is vague by design.

123
00:05:17,000 --> 00:05:20,240
The user asks, "Can you help me with onboarding?"

124
00:05:20,240 --> 00:05:22,760
And the agent has to decide what help means.

125
00:05:22,760 --> 00:05:27,880
Answer questions, create accounts, request equipment, assign training, open tickets, notify

126
00:05:27,880 --> 00:05:28,880
managers.

127
00:05:28,880 --> 00:05:32,760
The user didn't specify and the agent doesn't have a contract that forces specificity.

128
00:05:32,760 --> 00:05:33,760
So it improvises.

129
00:05:33,760 --> 00:05:35,400
And improvisation is fine for discovery.

130
00:05:35,400 --> 00:05:36,800
It's lethal for execution.

131
00:05:36,800 --> 00:05:38,280
The next failure is tool choice.

132
00:05:38,280 --> 00:05:42,880
In a conversational pattern, tool invocation becomes a kind of live improvisational routing.

133
00:05:42,880 --> 00:05:47,000
The model selects what it thinks is the right connector, the right action, the right flow,

134
00:05:47,000 --> 00:05:49,720
based on whatever context it has in that moment.

135
00:05:49,720 --> 00:05:51,520
If two tools overlap, it guesses.

136
00:05:51,520 --> 00:05:53,840
If a tool fails, it tries something adjacent.

137
00:05:53,840 --> 00:05:57,560
If the knowledge source returns an ambiguous chunk, it fills the gap with language.

138
00:05:57,560 --> 00:06:00,640
This is where more prompts becomes the wrong fix.

139
00:06:00,640 --> 00:06:03,720
Teams respond to bad behavior by adding more instructions.

140
00:06:03,720 --> 00:06:04,720
Always do X.

141
00:06:04,720 --> 00:06:05,720
Never do Y.

142
00:06:05,720 --> 00:06:07,040
Ask clarifying questions.

143
00:06:07,040 --> 00:06:08,600
Confirm before you update.

144
00:06:08,600 --> 00:06:13,400
Play stack paragraphs of policy into an 8,000 character text box and call it governance.

145
00:06:13,400 --> 00:06:15,240
But prompts don't create determinism.

146
00:06:15,240 --> 00:06:16,240
Prompts create persuasion.

147
00:06:16,240 --> 00:06:18,640
They're an influence layer on a distributed decision engine.

148
00:06:18,640 --> 00:06:23,040
When the system has conflicting signals, user language, retrieved context, tool schemers,

149
00:06:23,040 --> 00:06:26,320
permission failures, connector timeouts, the prompt isn't a compiler.

150
00:06:26,320 --> 00:06:27,400
It's a suggestion.

151
00:06:27,400 --> 00:06:29,680
An enterprise is don't run on suggestions.

152
00:06:29,680 --> 00:06:33,760
This is also why more language doesn't equal more intelligence.

153
00:06:33,760 --> 00:06:34,760
Vibosities are masking layer.

154
00:06:34,760 --> 00:06:40,000
A chat first agent can produce a long, confident explanation while the underlying decision boundary

155
00:06:40,000 --> 00:06:41,240
remains undefined.

156
00:06:41,240 --> 00:06:42,560
The output sounds like control.

157
00:06:42,560 --> 00:06:45,000
The system behavior is still probabilistic.

158
00:06:45,000 --> 00:06:47,080
Stakeholders notice this in one specific way.

159
00:06:47,080 --> 00:06:49,760
They can't get the same result twice.

160
00:06:49,760 --> 00:06:51,840
Different wording produces different rooting.

161
00:06:51,840 --> 00:06:53,360
Different day produces different answers.

162
00:06:53,360 --> 00:06:56,400
The same action request assumes different preconditions.

163
00:06:56,400 --> 00:06:59,640
Then IT gets asked what happens when we roll this out to 30,000 people.

164
00:06:59,640 --> 00:07:02,720
A conversational agent can't answer that reliably.

165
00:07:02,720 --> 00:07:06,320
And when an agent can't predict its own action path, you're not deploying automation.

166
00:07:06,320 --> 00:07:07,920
You're deploying conditional chaos.

167
00:07:07,920 --> 00:07:10,240
Now there is a place where chat works extremely well.

168
00:07:10,240 --> 00:07:11,400
Chat is great at discovery.

169
00:07:11,400 --> 00:07:12,400
Triage.

170
00:07:12,400 --> 00:07:13,400
Clarification.

171
00:07:13,400 --> 00:07:14,400
Samarization.

172
00:07:14,400 --> 00:07:16,360
Help me understand what's going on.

173
00:07:16,360 --> 00:07:17,840
Show me the options.

174
00:07:17,840 --> 00:07:19,200
Explain the policy.

175
00:07:19,200 --> 00:07:20,880
Which team owns this?

176
00:07:20,880 --> 00:07:25,120
That's where probabilistic behavior is acceptable because the output is advisory, not state

177
00:07:25,120 --> 00:07:26,120
changing.

178
00:07:26,120 --> 00:07:29,360
Chat is also great at absorbing messy inputs from humans and turning them into structured

179
00:07:29,360 --> 00:07:30,360
parameters.

180
00:07:30,360 --> 00:07:31,360
That's valuable.

181
00:07:31,360 --> 00:07:32,440
But it's a front end, not the engine.

182
00:07:32,440 --> 00:07:34,120
And the mistake is asking chat to be the engine.

183
00:07:34,120 --> 00:07:38,600
The moment you let conversation carry the workflow, you've made the workflow dependent on

184
00:07:38,600 --> 00:07:39,800
language variance.

185
00:07:39,800 --> 00:07:41,400
And language variance is infinite.

186
00:07:41,400 --> 00:07:43,360
Users will ask the same thing 10 different ways.

187
00:07:43,360 --> 00:07:44,360
They will omit details.

188
00:07:44,360 --> 00:07:45,600
They will paste screenshots.

189
00:07:45,600 --> 00:07:50,040
They will drop half a ticket history into the chat and expect the agent to just know.

190
00:07:50,040 --> 00:07:53,400
So the right design move is not make the agent more conversational.

191
00:07:53,400 --> 00:07:54,400
It's the opposite.

192
00:07:54,400 --> 00:07:57,320
Make the agent less conversational where it matters.

193
00:07:57,320 --> 00:08:00,800
Shrink the conversation surface to parameter collection and confirmation.

194
00:08:00,800 --> 00:08:03,280
Post-decision boundaries into explicit routing.

195
00:08:03,280 --> 00:08:05,680
Push execution into deterministic systems.

196
00:08:05,680 --> 00:08:09,440
Flows, APIs, orchestration that you can test, version and audit.

197
00:08:09,440 --> 00:08:11,040
Which raises the obvious question.

198
00:08:11,040 --> 00:08:13,480
If chat can't be the center of gravity, what can?

199
00:08:13,480 --> 00:08:15,120
A control surface.

200
00:08:15,120 --> 00:08:16,200
Delegation with contracts.

201
00:08:16,200 --> 00:08:19,760
An agent designed like an enterprise system, not a personality.

202
00:08:19,760 --> 00:08:20,760
Truth too.

203
00:08:20,760 --> 00:08:23,440
An agent is a delegated control surface.

204
00:08:23,440 --> 00:08:26,360
Truth number two is the one that breaks most org charts.

205
00:08:26,360 --> 00:08:28,480
An agent isn't an AI employee.

206
00:08:28,480 --> 00:08:30,240
It's a delegated control surface.

207
00:08:30,240 --> 00:08:34,000
The difference matters because employees come with training, supervision, professional

208
00:08:34,000 --> 00:08:36,920
judgment and critical personal accountability.

209
00:08:36,920 --> 00:08:37,960
Agents come with none of that.

210
00:08:37,960 --> 00:08:42,440
They come with permissions, tools, data pathways and a model that will happily produce a plausible

211
00:08:42,440 --> 00:08:44,920
plan even when the environment is lying to it.

212
00:08:44,920 --> 00:08:48,800
When someone says we're giving the HR agent access to onboarding systems, what they're

213
00:08:48,800 --> 00:08:53,800
really saying is we're exposing a set of execution pathways to a probabilistic decision engine.

214
00:08:53,800 --> 00:08:54,800
That's not inspirational.

215
00:08:54,800 --> 00:08:56,120
That's a risk statement.

216
00:08:56,120 --> 00:09:01,120
Capability of an agent equals the sum of what you exposed, not what you wrote in instructions,

217
00:09:01,120 --> 00:09:02,680
not what the demo showed.

218
00:09:02,680 --> 00:09:04,000
Exposed pathways.

219
00:09:04,000 --> 00:09:08,000
Identity context, connector permissions, action schemers, knowledge sources and whatever

220
00:09:08,000 --> 00:09:10,000
the platform can reach at runtime.

221
00:09:10,000 --> 00:09:11,600
That's the real surface area.

222
00:09:11,600 --> 00:09:15,960
And in enterprise systems, surface area is destiny because agents don't amplify your best

223
00:09:15,960 --> 00:09:16,960
process.

224
00:09:16,960 --> 00:09:19,320
They amplify your weakest governance path.

225
00:09:19,320 --> 00:09:23,560
If your tenant already has temporary exceptions over permission service accounts, stale

226
00:09:23,560 --> 00:09:28,480
app registrations, connectors, nobody owns, sharepoint sites that grew like mold and agent

227
00:09:28,480 --> 00:09:29,480
doesn't fix that.

228
00:09:29,480 --> 00:09:32,640
It automates it at speed with a friendly explanation attached.

229
00:09:32,640 --> 00:09:35,680
This is why the AI employee metaphor is so dangerous.

230
00:09:35,680 --> 00:09:39,240
It tricks leadership into thinking the hard part is adoption and change management, but

231
00:09:39,240 --> 00:09:43,960
architecturally the hard part is delegation mechanics who authorizes who executes and

232
00:09:43,960 --> 00:09:47,560
who owns the outcome when things go wrong start with authorization.

233
00:09:47,560 --> 00:09:50,720
In a real system, an action doesn't happen because someone asked nicely.

234
00:09:50,720 --> 00:09:54,560
It happens because an identity with the right entitlements invoked a defined operation under

235
00:09:54,560 --> 00:09:55,640
policy.

236
00:09:55,640 --> 00:09:57,480
With agents, people blur that line.

237
00:09:57,480 --> 00:09:59,880
They assume the user's request is the authorization.

238
00:09:59,880 --> 00:10:00,880
It isn't.

239
00:10:00,880 --> 00:10:02,720
It's input.

240
00:10:02,720 --> 00:10:04,280
Authorization is the control plane decision.

241
00:10:04,280 --> 00:10:08,280
Should this user in this context be allowed to cause the state change using this tool right

242
00:10:08,280 --> 00:10:09,280
now?

243
00:10:09,280 --> 00:10:10,520
Then execution.

244
00:10:10,520 --> 00:10:14,800
Execution is where the damage happens because execution changes state outside the chat.

245
00:10:14,800 --> 00:10:17,440
Create update, delete, approve, notify, escalate.

246
00:10:17,440 --> 00:10:22,120
If you let a model choose those operations freely, you've moved from assistant to unbounded

247
00:10:22,120 --> 00:10:23,120
operator.

248
00:10:23,120 --> 00:10:25,280
And that's fine in consumer software.

249
00:10:25,280 --> 00:10:28,920
Enterprises don't get to do fine and then ownership.

250
00:10:28,920 --> 00:10:32,200
When an agent updates a record incorrectly, who owns that incident?

251
00:10:32,200 --> 00:10:36,400
The maker, the system owner, the identity team, the business sponsor, everyone will point

252
00:10:36,400 --> 00:10:38,920
at everyone else unless you force it into the design.

253
00:10:38,920 --> 00:10:44,280
That's why agents need an explicit responsibility model, not a vague product owner label in a spreadsheet.

254
00:10:44,280 --> 00:10:46,400
Here's the uncomfortable truth.

255
00:10:46,400 --> 00:10:49,520
A candidate execution without explicit ownership becomes political debt fast.

256
00:10:49,520 --> 00:10:51,920
Now connect this back to what people actually build.

257
00:10:51,920 --> 00:10:56,080
A copilot agent is a UI that fronts a bundle of integrations.

258
00:10:56,080 --> 00:10:58,000
Each integration has a failure mode.

259
00:10:58,000 --> 00:11:02,400
Permission denied, throttling, schema change, timeout, bad data, partial success.

260
00:11:02,400 --> 00:11:04,720
When those happen, the model will try to be helpful.

261
00:11:04,720 --> 00:11:05,800
It will root around.

262
00:11:05,800 --> 00:11:07,480
It will attempt an alternative.

263
00:11:07,480 --> 00:11:09,000
It will summarize a guess.

264
00:11:09,000 --> 00:11:13,440
And if you didn't design a refusal rule or an escalation path, it will keep going.

265
00:11:13,440 --> 00:11:15,880
That means your agent is not helping employees.

266
00:11:15,880 --> 00:11:19,880
It is making runtime decisions in a distributed environment you barely control.

267
00:11:19,880 --> 00:11:22,400
So the mandate is not make the agent smarter.

268
00:11:22,400 --> 00:11:25,640
The mandate is make delegation explicit.

269
00:11:25,640 --> 00:11:27,240
Define the identity model.

270
00:11:27,240 --> 00:11:31,280
Run as user, run as service or hybrid with consequences documented.

271
00:11:31,280 --> 00:11:33,040
Define the tool allow list.

272
00:11:33,040 --> 00:11:34,360
Which actions exist?

273
00:11:34,360 --> 00:11:35,800
What inputs they accept?

274
00:11:35,800 --> 00:11:37,320
What outputs they return?

275
00:11:37,320 --> 00:11:38,800
And what errors look like?

276
00:11:38,800 --> 00:11:40,120
Define the decision boundaries.

277
00:11:40,120 --> 00:11:41,520
What the agent may infer?

278
00:11:41,520 --> 00:11:42,720
What it must verify?

279
00:11:42,720 --> 00:11:43,840
And when it must stop?

280
00:11:43,840 --> 00:11:45,680
And define the audit surface?

281
00:11:45,680 --> 00:11:50,000
It gets written to the system of records so the business can reconstruct why something happened.

282
00:11:50,000 --> 00:11:54,400
That's the difference between an agent as a novelty and an agent as an enterprise-controlled

283
00:11:54,400 --> 00:11:55,400
surface.

284
00:11:55,400 --> 00:11:58,760
And once you accept that framing, enthusiasm becomes irrelevant.

285
00:11:58,760 --> 00:12:00,760
Architecture becomes the product.

286
00:12:00,760 --> 00:12:02,920
Which is exactly where this goes next.

287
00:12:02,920 --> 00:12:07,960
Deterministic ROI only shows up when the design itself is deterministic.

288
00:12:07,960 --> 00:12:10,080
The architectural mandate.

289
00:12:10,080 --> 00:12:12,720
Deterministic ROI requires deterministic design.

290
00:12:12,720 --> 00:12:14,840
So here's the mandate stated plainly.

291
00:12:14,840 --> 00:12:18,280
The moment an agent can take action, you stop building an agent.

292
00:12:18,280 --> 00:12:19,440
You start building a system.

293
00:12:19,440 --> 00:12:24,120
And systems only produce ROI when they behave predictably under real conditions.

294
00:12:24,120 --> 00:12:28,680
Partial data, partial permissions, outages, policy changes, and users who never read your

295
00:12:28,680 --> 00:12:29,920
documentation.

296
00:12:29,920 --> 00:12:35,360
That distinction matters because most ROI conversations about agents are really demo conversations.

297
00:12:35,360 --> 00:12:39,280
They measure how impressed someone felt, how quickly a pilot team shipped, how many questions

298
00:12:39,280 --> 00:12:41,680
the agent answered without embarrassing itself in a meeting.

299
00:12:41,680 --> 00:12:42,680
That's not ROI.

300
00:12:42,680 --> 00:12:44,280
That's novelty with a budget.

301
00:12:44,280 --> 00:12:46,440
Deterministic ROI is different.

302
00:12:46,440 --> 00:12:50,680
Deterministic ROI means you can name the outcome, measure the delta, and trust it will repeat

303
00:12:50,680 --> 00:12:54,160
tomorrow at 10x scale without turning into an incident backlog.

304
00:12:54,160 --> 00:12:58,280
And that only happens when the design is deterministic where it needs to be.

305
00:12:58,280 --> 00:13:01,040
Enterprises tolerate probabilistic behavior in one place.

306
00:13:01,040 --> 00:13:02,040
Interpretation.

307
00:13:02,040 --> 00:13:06,040
They tolerate variance in how a user asks a question, how an agent summarizes information,

308
00:13:06,040 --> 00:13:07,760
how it suggests next steps.

309
00:13:07,760 --> 00:13:08,760
That's the reasoning layer.

310
00:13:08,760 --> 00:13:09,760
And it's useful.

311
00:13:09,760 --> 00:13:12,120
They do not tolerate probabilistic behavior and execution.

312
00:13:12,120 --> 00:13:16,320
Not in approvals, not in record updates, not in entitlement changes, not in ticket routing,

313
00:13:16,320 --> 00:13:19,560
not in anything that moves state in a system of record.

314
00:13:19,560 --> 00:13:22,360
So the architectural mandate is a separation of concerns.

315
00:13:22,360 --> 00:13:24,600
Let the model do what models are good at.

316
00:13:24,600 --> 00:13:29,520
Language normalization, intent extraction, ambiguity detection, summarization, and classification.

317
00:13:29,520 --> 00:13:33,320
Then hand execution to what enterprises already trust.

318
00:13:33,320 --> 00:13:38,880
Deterministic workflows, explicit APIs, validated inputs, and governed identities.

319
00:13:38,880 --> 00:13:41,400
The core line is still the simplest one.

320
00:13:41,400 --> 00:13:42,640
Chat is for discovery.

321
00:13:42,640 --> 00:13:44,800
Agents are for execution.

322
00:13:44,800 --> 00:13:46,840
Confuse the two and you automate ambiguity.

323
00:13:46,840 --> 00:13:49,520
And yes, some people will argue, but the model can do tool calling.

324
00:13:49,520 --> 00:13:50,840
They can choose actions.

325
00:13:50,840 --> 00:13:51,840
Of course it can.

326
00:13:51,840 --> 00:13:52,840
That's not the question.

327
00:13:52,840 --> 00:13:57,240
The question is whether you can prove ahead of time what it will do with a given intent.

328
00:13:57,240 --> 00:14:00,000
If you can't predict the action path, you can't govern it.

329
00:14:00,000 --> 00:14:01,680
If you can't govern it, you can't scale it.

330
00:14:01,680 --> 00:14:05,680
If you can't scale it, your ROI is a one time demo artifact.

331
00:14:05,680 --> 00:14:07,400
Determinism isn't about removing AI.

332
00:14:07,400 --> 00:14:11,240
It's about placing AI inside a design that can survive entropy.

333
00:14:11,240 --> 00:14:13,720
Because entropy is the default state of every tenant.

334
00:14:13,720 --> 00:14:18,120
Connectors drift, permissions drift, knowledge sources drift, people leave.

335
00:14:18,120 --> 00:14:21,720
Temporary exceptions become permanent, platforms update, models update.

336
00:14:21,720 --> 00:14:24,440
Suddenly the agent behaves differently and no one can explain why.

337
00:14:24,440 --> 00:14:26,640
A deterministic design assumes drift.

338
00:14:26,640 --> 00:14:27,880
It constrains it.

339
00:14:27,880 --> 00:14:31,800
And it creates a stable envelope where change can happen without turning behavior into

340
00:14:31,800 --> 00:14:32,800
roulette.

341
00:14:32,800 --> 00:14:34,240
So what does that actually look like in practice?

342
00:14:34,240 --> 00:14:35,640
It looks like contracts.

343
00:14:35,640 --> 00:14:37,760
Instructions contracts.

344
00:14:37,760 --> 00:14:40,960
A contract says for this intent these are the required inputs.

345
00:14:40,960 --> 00:14:42,360
These are the allowable tools.

346
00:14:42,360 --> 00:14:45,760
These are the preconditions and these are the outcomes we will write back to the system

347
00:14:45,760 --> 00:14:46,760
of record.

348
00:14:46,760 --> 00:14:51,240
If any of those aren't true, the agent refuses, escalates or roots to a human.

349
00:14:51,240 --> 00:14:52,240
No improvisation.

350
00:14:52,240 --> 00:14:54,680
No best effort updates to production systems.

351
00:14:54,680 --> 00:14:58,040
This is also where people misunderstand deterministic as rigid.

352
00:14:58,040 --> 00:14:59,360
It isn't.

353
00:14:59,360 --> 00:15:05,040
You can be flexible in how users express intent and strict in how the system executes.

354
00:15:05,040 --> 00:15:07,960
Exactly how mature enterprise software already works.

355
00:15:07,960 --> 00:15:10,320
Humans type messy things into forms all day.

356
00:15:10,320 --> 00:15:12,480
The system doesn't take the mess literally.

357
00:15:12,480 --> 00:15:15,120
It validates normalizes and enforces policy.

358
00:15:15,120 --> 00:15:16,160
Agents need the same shape.

359
00:15:16,160 --> 00:15:20,920
So when someone asks for proven ROI, the real answer isn't "build a better prompt".

360
00:15:20,920 --> 00:15:25,200
The real answer is "design an agent that produces repeatable outcomes".

361
00:15:25,200 --> 00:15:30,200
Repeatable means the same intent and the same state lead to the same execution path.

362
00:15:30,200 --> 00:15:34,560
Legal means you can track throughput, cycle time, escalation rate and error rates without

363
00:15:34,560 --> 00:15:36,520
arguing about definitions.

364
00:15:36,520 --> 00:15:41,600
Stable means you can operate it, version it, test it, roll it back and audit it.

365
00:15:41,600 --> 00:15:43,520
That's deterministic ROI.

366
00:15:43,520 --> 00:15:47,200
And it's the only ROI that survives beyond the pilot phase.

367
00:15:47,200 --> 00:15:49,160
Everything else is the familiar pattern.

368
00:15:49,160 --> 00:15:53,560
Initial excitement then drift, then exceptions, then manual overrides, then quite abandonment.

369
00:15:53,560 --> 00:15:55,680
So the mandate is architectural honesty.

370
00:15:55,680 --> 00:15:59,040
The side where you accept probability and where you demand certainty.

371
00:15:59,040 --> 00:16:03,240
To make this concrete, the next piece is the decision model that replaces chat first agents

372
00:16:03,240 --> 00:16:04,240
entirely.

373
00:16:04,240 --> 00:16:08,280
Because once you see it, you stop designing conversations, you start designing systems that

374
00:16:08,280 --> 00:16:09,680
happen to speak.

375
00:16:09,680 --> 00:16:15,480
The decision model, event, reasoning, orchestration, execution, record.

376
00:16:15,480 --> 00:16:21,120
Here's the replacement model, not a better chat, a decision system that happens to speak.

377
00:16:21,120 --> 00:16:26,160
Event reasoning, orchestration, execution, record, five stages, clear boundaries, clear ownership,

378
00:16:26,160 --> 00:16:27,360
clear evidence.

379
00:16:27,360 --> 00:16:31,960
Not with the event because enterprises love to pretend an agent just helps people.

380
00:16:31,960 --> 00:16:32,960
It doesn't.

381
00:16:32,960 --> 00:16:35,240
An enterprise agent should start because something explicit happened.

382
00:16:35,240 --> 00:16:39,680
A user request in teams, a form submission, a service now ticket moved to a new state,

383
00:16:39,680 --> 00:16:42,600
a high risk sign-in event, a scheduled daily run.

384
00:16:42,600 --> 00:16:44,760
You don't let the agent wake up based on vibes.

385
00:16:44,760 --> 00:16:46,280
An event is the trigger contract.

386
00:16:46,280 --> 00:16:48,080
It answers, "What starts this?

387
00:16:48,080 --> 00:16:49,080
Who started it?

388
00:16:49,080 --> 00:16:51,640
And what context is guaranteed at the start?"

389
00:16:51,640 --> 00:16:54,720
That matters because most agent drift starts at the beginning.

390
00:16:54,720 --> 00:16:58,600
If the start condition is loose, everything downstream becomes interpretive.

391
00:16:58,600 --> 00:17:00,640
Loose triggers create lose outcomes.

392
00:17:00,640 --> 00:17:02,680
Deterministic systems don't start that way.

393
00:17:02,680 --> 00:17:07,800
Next reasoning, this is where the model earns its keep, but only inside a bounded context.

394
00:17:07,800 --> 00:17:12,760
Reesoning means interpret messy human language, detect ambiguity, extract parameters, classify

395
00:17:12,760 --> 00:17:15,880
intent and decide what must be verified before anything happens.

396
00:17:15,880 --> 00:17:20,240
It's also where the agent can say, "I can do that, but I need these three inputs."

397
00:17:20,240 --> 00:17:22,640
Or, "I can't do that without approval."

398
00:17:22,640 --> 00:17:27,200
That context means the agent is not allowed to treat the entire tenant as its memory.

399
00:17:27,200 --> 00:17:32,000
It gets a defined set of knowledge sources, a defined set of operational data, and a defined

400
00:17:32,000 --> 00:17:33,880
set of assumptions it may make.

401
00:17:33,880 --> 00:17:36,480
Everything else is either verification or refusal.

402
00:17:36,480 --> 00:17:40,200
This is where a lot of teams get lazy and call retrieval reasoning.

403
00:17:40,200 --> 00:17:41,520
Retrieval is just scavenging.

404
00:17:41,520 --> 00:17:45,320
Reesoning is deciding what to do with what you found and more importantly, what you're

405
00:17:45,320 --> 00:17:47,400
not allowed to do without stronger evidence.

406
00:17:47,400 --> 00:17:48,720
Then orchestration.

407
00:17:48,720 --> 00:17:53,120
It is policy not creativity.

408
00:17:53,120 --> 00:17:55,440
Orcustration decides which tool is allowed to run for this intent under these conditions

409
00:17:55,440 --> 00:17:56,440
with these inputs.

410
00:17:56,440 --> 00:18:01,520
It is the layer that converts the user, ask the thing, into we are invoking exactly this

411
00:18:01,520 --> 00:18:02,800
contract.

412
00:18:02,800 --> 00:18:05,600
If two tools overlap, orchestration doesn't guess.

413
00:18:05,600 --> 00:18:06,600
It roots.

414
00:18:06,600 --> 00:18:07,800
This is where the allow list lives.

415
00:18:07,800 --> 00:18:09,320
This is where preconditions live.

416
00:18:09,320 --> 00:18:11,040
This is where refusal rules live.

417
00:18:11,040 --> 00:18:14,840
If the user wants an action that isn't in the contract, the agent doesn't improvise.

418
00:18:14,840 --> 00:18:15,840
It stops.

419
00:18:15,840 --> 00:18:19,160
It escalates.

420
00:18:19,160 --> 00:18:23,240
And yes, co-pilot studio can do orchestration in different ways.

421
00:18:23,240 --> 00:18:24,240
Topics.

422
00:18:24,240 --> 00:18:25,240
Generative orchestration.

423
00:18:25,240 --> 00:18:26,240
Tool definitions.

424
00:18:26,240 --> 00:18:27,240
Agent flows.

425
00:18:27,240 --> 00:18:28,240
The mechanism isn't the point.

426
00:18:28,240 --> 00:18:31,400
The point is that orchestration behaves like an authorization compiler.

427
00:18:31,400 --> 00:18:34,600
It turns intent into permitted operations or it rejects it.

428
00:18:34,600 --> 00:18:36,320
After orchestration comes execution.

429
00:18:36,320 --> 00:18:38,400
And execution should be boring.

430
00:18:38,400 --> 00:18:41,280
Execution is where deterministic systems do deterministic work.

431
00:18:41,280 --> 00:18:42,280
Power automate.

432
00:18:42,280 --> 00:18:43,280
Logic apps.

433
00:18:43,280 --> 00:18:44,920
APIs with typed schemas.

434
00:18:44,920 --> 00:18:49,040
Logic flows with idempotency, so run it again doesn't duplicate side effects.

435
00:18:49,040 --> 00:18:50,360
Retrieves with timeouts.

436
00:18:50,360 --> 00:18:51,600
Known failure states.

437
00:18:51,600 --> 00:18:53,920
The stuff that incident reviews actually understand.

438
00:18:53,920 --> 00:18:58,560
The agent should not execute by narrating and then doing ad hoc tool calls until it feels

439
00:18:58,560 --> 00:18:59,560
done.

440
00:18:59,560 --> 00:19:01,640
That's conditional chaos disguised as helpfulness.

441
00:19:01,640 --> 00:19:05,760
Instead execution runs through components you can version, test and observe.

442
00:19:05,760 --> 00:19:07,000
You can run them in isolation.

443
00:19:07,000 --> 00:19:08,080
You can gate releases.

444
00:19:08,080 --> 00:19:09,080
You can roll back.

445
00:19:09,080 --> 00:19:12,040
You can prove exactly what happened when an action mutated state.

446
00:19:12,040 --> 00:19:13,720
And finally, record.

447
00:19:13,720 --> 00:19:17,560
This is the stage most teams skip and its wide trust evaporates.

448
00:19:17,560 --> 00:19:21,320
A system that can't write its outcomes and rationale to a system of record isn't an

449
00:19:21,320 --> 00:19:22,320
enterprise system.

450
00:19:22,320 --> 00:19:24,680
It's a suggestion engine with side effects.

451
00:19:24,680 --> 00:19:28,680
Record means capture the event, the intent, the key inputs, the policy decision, the tools

452
00:19:28,680 --> 00:19:32,120
invoked, the outcome and the handoff if it escalated.

453
00:19:32,120 --> 00:19:34,400
Service now is a common center of gravity here.

454
00:19:34,400 --> 00:19:38,560
Not because it's magic, but because it already represents state ownership and auditability

455
00:19:38,560 --> 00:19:39,560
for work.

456
00:19:39,560 --> 00:19:40,880
It's where work becomes accountable.

457
00:19:40,880 --> 00:19:44,120
The record is also how you measure ROI without lying to yourself.

458
00:19:44,120 --> 00:19:46,440
Not users liked it.

459
00:19:46,440 --> 00:19:52,440
Actual deltas, cycle time, escalation reduction, rework rate, exception volume, manual intervention.

460
00:19:52,440 --> 00:19:54,200
So this model does one crucial thing.

461
00:19:54,200 --> 00:19:58,320
It moves the agent from being a conversational blob to being a controlled pipeline.

462
00:19:58,320 --> 00:20:00,120
Humans can still type messy requests.

463
00:20:00,120 --> 00:20:03,840
The model can still reason, but action happens only through contracts and evidence lands

464
00:20:03,840 --> 00:20:06,000
in systems that governance already understands.

465
00:20:06,000 --> 00:20:08,040
Now the uncomfortable part.

466
00:20:08,040 --> 00:20:12,320
Once you build this way, you'll notice how many popular agent designs are the exact opposite

467
00:20:12,320 --> 00:20:14,520
and they fail in the same three ways every time.

468
00:20:14,520 --> 00:20:18,840
The anti-patterns, three ways enterprises build conditional chaos.

469
00:20:18,840 --> 00:20:22,760
Now you can spot the anti-patterns instantly because they all violate the same law.

470
00:20:22,760 --> 00:20:26,440
They collapse, decisioning and execution back into chat, then pretend governance will catch

471
00:20:26,440 --> 00:20:27,440
up later.

472
00:20:27,440 --> 00:20:29,640
There are three versions of this.

473
00:20:29,640 --> 00:20:32,040
Enterprises rotate through them like it's a maturity model.

474
00:20:32,040 --> 00:20:33,040
It isn't.

475
00:20:33,040 --> 00:20:35,440
It's just three different ways to automate ambiguity.

476
00:20:35,440 --> 00:20:36,440
Anti-pattern 1.

477
00:20:36,440 --> 00:20:37,640
Decide while you talk.

478
00:20:37,640 --> 00:20:40,920
This is the agent that narrates a plan and commits actions in the same breath.

479
00:20:40,920 --> 00:20:44,560
The user asks, can you offboard this contractor?

480
00:20:44,560 --> 00:20:49,600
The agent starts explaining the steps, but while it's explaining, it's also calling tools.

481
00:20:49,600 --> 00:20:54,080
Disabling the account, revoking sessions, removing licenses, closing access groups, updating

482
00:20:54,080 --> 00:20:55,240
a ticket.

483
00:20:55,240 --> 00:21:00,720
It feels efficient because it keeps the conversation moving, but architecturally it's catastrophic.

484
00:21:00,720 --> 00:21:04,080
Because the system has no hard boundary between thinking and doing.

485
00:21:04,080 --> 00:21:05,720
The plan becomes execution.

486
00:21:05,720 --> 00:21:09,480
And once the agent can do partial work, you get the worst possible failure mode, half

487
00:21:09,480 --> 00:21:12,800
completed state change with a polite summary at the end.

488
00:21:12,800 --> 00:21:17,160
In incident terms, this is how you end up with account disabled, but mailbox still accessible

489
00:21:17,160 --> 00:21:22,040
ticket updated, but approvals missing, access removed in one system but not the other.

490
00:21:22,040 --> 00:21:24,840
Then the user asks, wait, did it actually do it?

491
00:21:24,840 --> 00:21:28,800
And no one has a clean answer because the conversation transcript is not an audit log.

492
00:21:28,800 --> 00:21:33,760
The enterprise requires a commit point, a confirmation, a transaction boundary.

493
00:21:33,760 --> 00:21:37,640
If the agent can't separate, I understand what you want from I am now changing state.

494
00:21:37,640 --> 00:21:38,880
You don't have automation.

495
00:21:38,880 --> 00:21:41,200
You have a probabilistic operator.

496
00:21:41,200 --> 00:21:43,440
Antipatent 2, retrieval equals reasoning.

497
00:21:43,440 --> 00:21:46,040
This is the agent that gets praised because it knows a lot.

498
00:21:46,040 --> 00:21:50,200
It has SharePoint, it has PDFs, it has a knowledge base, it can search, it can quote internal

499
00:21:50,200 --> 00:21:53,880
docs, and the team assumes that because it can retrieve context, it can make operational

500
00:21:53,880 --> 00:21:54,880
decisions.

501
00:21:54,880 --> 00:21:56,640
But retrieval is not reasoning.

502
00:21:56,640 --> 00:21:58,120
Retrieval is just context scavenging.

503
00:21:58,120 --> 00:22:02,360
In other words, the agent can find text that looks relevant, but it still has to decide

504
00:22:02,360 --> 00:22:06,240
what that text means, whether it applies, whether it's current, whether the user is allowed

505
00:22:06,240 --> 00:22:11,400
to act on it, and whether the text implies an executable workflow or just a guideline.

506
00:22:11,400 --> 00:22:13,080
And here's the weird part.

507
00:22:13,080 --> 00:22:17,600
Retrieval makes agents look more confident while being less safe because now the agent can

508
00:22:17,600 --> 00:22:20,240
anchor a wrong decision to a real paragraph.

509
00:22:20,240 --> 00:22:24,640
It can cite a policy snippet that is outdated, incomplete, or scoped to a different business

510
00:22:24,640 --> 00:22:25,640
unit.

511
00:22:25,640 --> 00:22:28,320
The output feels grounded, the decision is still unbounded.

512
00:22:28,320 --> 00:22:32,160
This is where audit risk quietly grows if the organization can't attribute which source,

513
00:22:32,160 --> 00:22:35,880
which version, which section, and which rule actually govern the action, we retrieved

514
00:22:35,880 --> 00:22:39,560
something becomes a liability, not a control.

515
00:22:39,560 --> 00:22:40,720
Retrieval supports reasoning.

516
00:22:40,720 --> 00:22:43,080
It does not replace it.

517
00:22:43,080 --> 00:22:44,560
Antipatern 3.

518
00:22:44,560 --> 00:22:48,360
Prompt branching logic that nobody can explain after the third exception.

519
00:22:48,360 --> 00:22:52,880
This is the agent that starts as a clean pilot, a few intents, a few flows, a few prompts.

520
00:22:52,880 --> 00:22:55,240
Then reality shows up, someone needs an exception.

521
00:22:55,240 --> 00:22:59,240
Then another, then a special case for one region, then a temporary bypass because a connector

522
00:22:59,240 --> 00:23:02,120
is flaky, then a workaround because the knowledge source isn't a big deal.

523
00:23:02,120 --> 00:23:05,080
It's outdated yet, so the team keeps adding conditional text.

524
00:23:05,080 --> 00:23:08,560
More instructions, more if the user says x, do y, unless z.

525
00:23:08,560 --> 00:23:12,240
The logic lives in prompts, topic nodes, and scattered tool descriptions.

526
00:23:12,240 --> 00:23:16,240
It's not versioned like a workflow, it's not tested like software, it's not even readable

527
00:23:16,240 --> 00:23:17,640
as a policy document.

528
00:23:17,640 --> 00:23:20,000
Over time, the agent becomes an entropy museum.

529
00:23:20,000 --> 00:23:21,960
Every workaround preserved forever.

530
00:23:21,960 --> 00:23:25,440
And the symptom pattern is consistent, first, manual overrides.

531
00:23:25,440 --> 00:23:27,880
People rerun the workflow just to be safe.

532
00:23:27,880 --> 00:23:29,800
Second, inconsistent approvals.

533
00:23:29,800 --> 00:23:34,000
The same request goes down different paths because the branching rules depend on phrasing

534
00:23:34,000 --> 00:23:35,320
and context chunks.

535
00:23:35,320 --> 00:23:37,480
Third, silent failure modes.

536
00:23:37,480 --> 00:23:41,800
The agent times out, falls back, or partially completes actions while telling the user it

537
00:23:41,800 --> 00:23:42,960
handled it.

538
00:23:42,960 --> 00:23:45,840
This is how trust dies, not loudly, quietly.

539
00:23:45,840 --> 00:23:49,440
Users stop escalating issues because they stop expecting the agent to behave.

540
00:23:49,440 --> 00:23:53,760
They root around it, they go back to email, teams, messages, and tribal knowledge.

541
00:23:53,760 --> 00:23:55,800
Leadership still thinks there's an agent program.

542
00:23:55,800 --> 00:23:59,760
In reality, there's an abandoned chat surface connected to production systems.

543
00:23:59,760 --> 00:24:04,240
If you want a quick diagnostic when an agent fails, ask where the decision boundary lives.

544
00:24:04,240 --> 00:24:07,160
If it lives in conversation, you build conditional chaos.

545
00:24:07,160 --> 00:24:11,960
If it lives in contracts, orchestration, and a system of record, you build something that

546
00:24:11,960 --> 00:24:13,480
can survive scale.

547
00:24:13,480 --> 00:24:17,200
Now it's time to show what contracts first looks like when a regulated enterprise does

548
00:24:17,200 --> 00:24:18,800
it on purpose.

549
00:24:18,800 --> 00:24:22,400
The success case, regulated enterprise that started with contracts.

550
00:24:22,400 --> 00:24:25,880
A regulated enterprise doesn't get agents right because they're smarter.

551
00:24:25,880 --> 00:24:29,000
They get it right because the environment punishes improvisation.

552
00:24:29,000 --> 00:24:32,720
If you're in financial services, farmer, aerospace, or regulated manufacturing, you don't get

553
00:24:32,720 --> 00:24:35,640
to hide behind the model did something weird.

554
00:24:35,640 --> 00:24:38,040
You either produce evidence or you produce risk.

555
00:24:38,040 --> 00:24:41,560
So the teams that succeed don't start by asking what can co-pilot do.

556
00:24:41,560 --> 00:24:45,400
They start by asking, what are we allowed to delegate under what constraints with what

557
00:24:45,400 --> 00:24:46,400
proof?

558
00:24:46,400 --> 00:24:48,920
That distinction matters because it flips the build order.

559
00:24:48,920 --> 00:24:50,880
The success pattern looks boring on purpose.

560
00:24:50,880 --> 00:24:52,920
It starts with contracts and identity.

561
00:24:52,920 --> 00:24:54,480
Only later does it add conversation.

562
00:24:54,480 --> 00:24:58,320
In one program like this, the initial scope wasn't automate everything.

563
00:24:58,320 --> 00:25:02,480
It was one value stream that already had a system of record in a known escalation path.

564
00:25:02,480 --> 00:25:05,840
Think service requests, approvals, or control changes.

565
00:25:05,840 --> 00:25:08,400
The agent didn't own the process.

566
00:25:08,400 --> 00:25:09,920
It owned a narrow slice.

567
00:25:09,920 --> 00:25:15,360
Intake normalization, parameter validation, rooting and execution of a fixed set of actions,

568
00:25:15,360 --> 00:25:18,360
and the first architectural decision was the one most pilots ignore.

569
00:25:18,360 --> 00:25:19,840
Who does this run as?

570
00:25:19,840 --> 00:25:22,040
They made the identity model explicit up front.

571
00:25:22,040 --> 00:25:24,400
User delegated actions, state user delegated.

572
00:25:24,400 --> 00:25:27,880
Service executed actions ran under dedicated identities with least privilege.

573
00:25:27,880 --> 00:25:32,320
No shared maker credentials, no temporary admin access because it was faster.

574
00:25:32,320 --> 00:25:36,680
Every permission was treated as a contract, not a convenience, because in regulated environments

575
00:25:36,680 --> 00:25:40,280
identity drift becomes audit drift fast.

576
00:25:40,280 --> 00:25:42,800
The second decision was context architecture.

577
00:25:42,800 --> 00:25:45,520
They didn't attach share point and pray.

578
00:25:45,520 --> 00:25:49,200
They treated knowledge like a product with owners, life cycle and versioning.

579
00:25:49,200 --> 00:25:50,880
Policies had effective dates.

580
00:25:50,880 --> 00:25:52,880
Operating procedures had controlled revisions.

581
00:25:52,880 --> 00:25:56,640
If the agent referenced content, it had to be attributable to a source that governance

582
00:25:56,640 --> 00:25:58,520
already recognized as authoritative.

583
00:25:58,520 --> 00:26:01,440
That one move eliminated a whole class of failures.

584
00:26:01,440 --> 00:26:04,600
The agent quoting plausible but obsolete guidance.

585
00:26:04,600 --> 00:26:05,760
Then came tools.

586
00:26:05,760 --> 00:26:06,760
And here's the subtle win.

587
00:26:06,760 --> 00:26:09,720
They separated reasoning from execution by design.

588
00:26:09,720 --> 00:26:11,440
The model did the reasoning in Copilot.

589
00:26:11,440 --> 00:26:15,800
It extracted intent, detected ambiguity and assembled a structured request.

590
00:26:15,800 --> 00:26:20,080
But the actual work, creating records, updating status, rooting approvals, ran through

591
00:26:20,080 --> 00:26:21,920
deterministic workflows.

592
00:26:21,920 --> 00:26:26,360
Power platform handled the execution layer because it already supports the things enterprises

593
00:26:26,360 --> 00:26:28,120
need to stay sane.

594
00:26:28,120 --> 00:26:34,600
Typed inputs, idempotency patterns, retries, timeouts, and predictable failure states.

595
00:26:34,600 --> 00:26:38,600
Copilot didn't wing it with a string of tool calls until it felt finished.

596
00:26:38,600 --> 00:26:40,680
It invoked defined contracts.

597
00:26:40,680 --> 00:26:43,560
And for the system of record, they didn't try to make Copilot the ledger.

598
00:26:43,560 --> 00:26:46,560
They used service now or in equivalent as the state authority.

599
00:26:46,560 --> 00:26:49,160
Every agent initiated action wrote an entry.

600
00:26:49,160 --> 00:26:53,240
What was requested, what was executed, which workflow ran, what the outcome was, and

601
00:26:53,240 --> 00:26:57,120
what escalation occurred if something didn't meet preconditions.

602
00:26:57,120 --> 00:27:00,040
So when someone asked why did this ticket get approved?

603
00:27:00,040 --> 00:27:02,080
The answer wasn't a chat transcript and a shrug.

604
00:27:02,080 --> 00:27:03,760
It was a reconstructable chain.

605
00:27:03,760 --> 00:27:08,200
Now, the measured outcomes in these environments tend to look unexciting in a keynote and

606
00:27:08,200 --> 00:27:10,840
extremely valuable in operations.

607
00:27:10,840 --> 00:27:14,240
Escalation loops dropped because the agent didn't bounce between teams based on vibe.

608
00:27:14,240 --> 00:27:16,040
It routed based on contract.

609
00:27:16,040 --> 00:27:19,480
Approvals became predictable because the agent didn't invent preconditions.

610
00:27:19,480 --> 00:27:21,040
It validated them.

611
00:27:21,040 --> 00:27:25,680
And audits stopped being theatrical because evidence existed by default, not as a retroactive

612
00:27:25,680 --> 00:27:26,680
scramble.

613
00:27:26,680 --> 00:27:28,720
The most important outcome wasn't time-saved.

614
00:27:28,720 --> 00:27:32,920
It was trust preserved because once users see an agent behave consistently, same intent,

615
00:27:32,920 --> 00:27:35,360
same path, they stop treating it like a novelty.

616
00:27:35,360 --> 00:27:37,240
They start treating it like a service.

617
00:27:37,240 --> 00:27:41,360
And services can be operated, monitored, improved, versioned, and scaled.

618
00:27:41,360 --> 00:27:42,560
Here's what most people miss.

619
00:27:42,560 --> 00:27:45,120
This wasn't a Copilot Studio success story.

620
00:27:45,120 --> 00:27:48,240
It was an architecture success story implemented with Copilot Studio.

621
00:27:48,240 --> 00:27:52,600
Same platform, same connectors, same models, different sequence, different constraints, different

622
00:27:52,600 --> 00:27:53,960
definition of done.

623
00:27:53,960 --> 00:27:57,680
And that's why regulated enterprises often look like they're moving slower in month one

624
00:27:57,680 --> 00:28:01,720
and then suddenly they're the only ones with agents that survive month six.

625
00:28:01,720 --> 00:28:06,680
Because they built the boring parts first, contracts, identity boundaries, and systems of

626
00:28:06,680 --> 00:28:08,280
record.

627
00:28:08,280 --> 00:28:09,560
Conversation came last.

628
00:28:09,560 --> 00:28:12,880
Which is exactly why the opposite build order failed so reliably.

629
00:28:12,880 --> 00:28:17,080
Because when you start with chat, you end with a trust problem you can't patch later.

630
00:28:17,080 --> 00:28:18,560
The failure case.

631
00:28:18,560 --> 00:28:21,120
The chat first agent that lost trust quietly.

632
00:28:21,120 --> 00:28:25,680
Now for the contrast case, same platform, same shiny features, completely different outcome.

633
00:28:25,680 --> 00:28:30,160
This one usually starts with a sentence like, we just need an agent that helps people.

634
00:28:30,160 --> 00:28:34,920
No outcome definition, no bounded scope, no contracts, just a chat surface and optimism.

635
00:28:34,920 --> 00:28:37,800
So a small team spins up an agent in Copilot Studio.

636
00:28:37,800 --> 00:28:40,920
They give it broad instructions, be helpful, be concise, follow policy.

637
00:28:40,920 --> 00:28:45,440
They attach a pile of SharePoint sites because more knowledge is better.

638
00:28:45,440 --> 00:28:49,520
They connect a few tools because the demo needs motion, create a ticket, update a record,

639
00:28:49,520 --> 00:28:50,840
maybe send an email.

640
00:28:50,840 --> 00:28:54,480
And because they want adoption, they deploy it in teams where everyone already lives.

641
00:28:54,480 --> 00:28:57,120
The first week looks great, people ask simple questions.

642
00:28:57,120 --> 00:28:58,160
The agent answers.

643
00:28:58,160 --> 00:28:59,880
It finds the right dog more often than not.

644
00:28:59,880 --> 00:29:01,440
It creates a few tickets.

645
00:29:01,440 --> 00:29:02,480
Leadership gets a screenshot.

646
00:29:02,480 --> 00:29:04,160
The project gets declared a win.

647
00:29:04,160 --> 00:29:05,400
Then the environment shows up.

648
00:29:05,400 --> 00:29:07,920
A user asks the same question in a slightly different way.

649
00:29:07,920 --> 00:29:09,760
The agent roots to a different topic.

650
00:29:09,760 --> 00:29:13,480
Or it doesn't root at all and falls back to a generative answer that sounds plausible.

651
00:29:13,480 --> 00:29:17,160
Someone pests a ticket thread into the chat and the agent quietly times out.

652
00:29:17,160 --> 00:29:19,960
Someone else asks for an action that should be allowed.

653
00:29:19,960 --> 00:29:23,440
But the connector runs under a different identity model than anyone documented.

654
00:29:23,440 --> 00:29:24,520
So it fails.

655
00:29:24,520 --> 00:29:26,640
And the agent masks it with language.

656
00:29:26,640 --> 00:29:28,960
This is where trust starts eroding.

657
00:29:28,960 --> 00:29:33,280
Not from one catastrophic incident, but from tiny inconsistencies that accumulate it.

658
00:29:33,280 --> 00:29:36,080
The most damaging moments are the ones that look like success.

659
00:29:36,080 --> 00:29:38,960
The agent says done, but the record didn't change.

660
00:29:38,960 --> 00:29:42,880
Or it created the record, but missed a required field, so downstream automation didn't

661
00:29:42,880 --> 00:29:43,880
run.

662
00:29:43,880 --> 00:29:47,560
Or it updated the wrong object because two systems have similar names and the tool schema

663
00:29:47,560 --> 00:29:49,000
wasn't explicit enough.

664
00:29:49,000 --> 00:29:51,000
In a chat interface, that's just one more message.

665
00:29:51,000 --> 00:29:52,800
In operations, it's rework.

666
00:29:52,800 --> 00:29:54,080
So humans compensate.

667
00:29:54,080 --> 00:29:56,240
They start double checking everything the agent does.

668
00:29:56,240 --> 00:29:57,560
They rerun steps manually.

669
00:29:57,560 --> 00:30:01,640
They keep the agent open as a suggestion box, but they stop letting it touch systems.

670
00:30:01,640 --> 00:30:04,240
The intervention rate climbs, but nobody measures it.

671
00:30:04,240 --> 00:30:08,080
The agent's usage metric looks fine for a while because people still chat with it.

672
00:30:08,080 --> 00:30:09,560
They just don't trust it.

673
00:30:09,560 --> 00:30:11,080
Then comes the second phase.

674
00:30:11,080 --> 00:30:12,080
Drift.

675
00:30:12,080 --> 00:30:13,080
Drift.

676
00:30:13,080 --> 00:30:14,080
Drift.

677
00:30:14,080 --> 00:30:15,080
Drift.

678
00:30:15,080 --> 00:30:16,080
Drift.

679
00:30:16,080 --> 00:30:17,080
Drift.

680
00:30:17,080 --> 00:30:18,080
Drift.

681
00:30:18,080 --> 00:30:19,080
Drift.

682
00:30:19,080 --> 00:30:20,080
Drift.

683
00:30:20,080 --> 00:30:21,080
Drift.

684
00:30:21,080 --> 00:30:22,080
Drift.

685
00:30:22,080 --> 00:30:23,080
Drift.

686
00:30:23,080 --> 00:30:24,080
Drift.

687
00:30:24,080 --> 00:30:25,080
Drift.

688
00:30:25,080 --> 00:30:26,080
Drift.

689
00:30:26,080 --> 00:30:27,080
Drift.

690
00:30:27,080 --> 00:30:28,080
Drift.

691
00:30:28,080 --> 00:30:29,080
Drift.

692
00:30:29,080 --> 00:30:30,080
Drift.

693
00:30:30,080 --> 00:30:33,920
Over a couple months, the agent becomes harder to predict even for its builders.

694
00:30:33,920 --> 00:30:36,280
Now add normal platform reality.

695
00:30:36,280 --> 00:30:41,600
Connector throttling, model updates, content indexing changes, permission adjustments, conditional

696
00:30:41,600 --> 00:30:43,080
access shifts.

697
00:30:43,080 --> 00:30:45,720
Nothing dramatic, just the slow churn of a real tenant.

698
00:30:45,720 --> 00:30:49,720
The agent starts behaving differently on Mondays than it did on Fridays, and the team can't

699
00:30:49,720 --> 00:30:54,320
tell if that's AI being AI or a dependency drifting underneath it.

700
00:30:54,320 --> 00:30:56,120
Users don't file bug reports for that.

701
00:30:56,120 --> 00:30:57,360
They root around it.

702
00:30:57,360 --> 00:31:00,080
And this is the quiet death that most agent programs misread.

703
00:31:00,080 --> 00:31:01,560
The agent doesn't fail loudly.

704
00:31:01,560 --> 00:31:02,720
It becomes optional.

705
00:31:02,720 --> 00:31:04,480
People stop recommending it to new hires.

706
00:31:04,480 --> 00:31:05,880
Managers stop pointing teams to it.

707
00:31:05,880 --> 00:31:07,360
The team's channel gets less traffic.

708
00:31:07,360 --> 00:31:08,600
The agent still exists.

709
00:31:08,600 --> 00:31:11,000
The organization tells itself it's early days.

710
00:31:11,000 --> 00:31:12,560
What actually happened is simpler.

711
00:31:12,560 --> 00:31:17,120
The system never earned the right to be trusted because its design never made outcomes repeatable.

712
00:31:17,120 --> 00:31:21,440
And if you ask the team afterward what went wrong, you'll hear the comforting story again.

713
00:31:21,440 --> 00:31:23,160
The platform is immature.

714
00:31:23,160 --> 00:31:24,800
Or users weren't trained.

715
00:31:24,800 --> 00:31:26,080
Or we need better prompts.

716
00:31:26,080 --> 00:31:27,560
But the pattern is architectural.

717
00:31:27,560 --> 00:31:30,560
They deployed conversation where they needed a control plane.

718
00:31:30,560 --> 00:31:31,920
They let intense day vague.

719
00:31:31,920 --> 00:31:34,520
They let tool selection be improvisational.

720
00:31:34,520 --> 00:31:36,720
They treated knowledge sprawl as context.

721
00:31:36,720 --> 00:31:39,720
They skipped a system of record for decisions and outcomes.

722
00:31:39,720 --> 00:31:41,440
They built a chat-shaped workflow.

723
00:31:41,440 --> 00:31:43,560
Then acted surprised when it behaved like one.

724
00:31:43,560 --> 00:31:46,800
So the contrast case isn't a warning about co-pilot studio.

725
00:31:46,800 --> 00:31:48,440
It's a warning about build order.

726
00:31:48,440 --> 00:31:51,640
Chat first feels fast because it ships a surface.

727
00:31:51,640 --> 00:31:54,200
Architecture first feels slower because it ships constraints.

728
00:31:54,200 --> 00:31:58,840
But only one of those survives contact with 30,000 users, three business units and a security

729
00:31:58,840 --> 00:32:02,720
team that eventually notices you automated the weakest path in the tenant.

730
00:32:02,720 --> 00:32:03,720
And that's the lesson.

731
00:32:03,720 --> 00:32:04,720
Same tools.

732
00:32:04,720 --> 00:32:05,720
Different architecture.

733
00:32:05,720 --> 00:32:06,520
Different reality.

734
00:32:06,520 --> 00:32:08,200
The question becomes actionable.

735
00:32:08,200 --> 00:32:11,440
If Monday morning is real, where does an enterprise start?

736
00:32:11,440 --> 00:32:16,080
Before the next helpful agent becomes another quietly abandoned teams tab.

737
00:32:16,080 --> 00:32:17,720
Monday mandate part one.

738
00:32:17,720 --> 00:32:18,720
Start with outcomes.

739
00:32:18,720 --> 00:32:19,720
Not use cases.

740
00:32:19,720 --> 00:32:23,320
So Monday morning if you want this to stop being theater, you start with outcomes.

741
00:32:23,320 --> 00:32:24,320
Not use cases.

742
00:32:24,320 --> 00:32:25,320
Not ideas.

743
00:32:25,320 --> 00:32:27,320
Not what co-pilot studio can do.

744
00:32:27,320 --> 00:32:28,320
Outcomes.

745
00:32:28,320 --> 00:32:31,320
Because use cases are how organizations hide from accountability.

746
00:32:31,320 --> 00:32:32,320
A use case can be anything.

747
00:32:32,320 --> 00:32:33,320
It can be a demo.

748
00:32:33,320 --> 00:32:34,320
It can be a chatbot.

749
00:32:34,320 --> 00:32:37,320
It can be a sharepoint search box with better manners.

750
00:32:37,320 --> 00:32:41,880
And it can still produce exactly zero operational improvement while everyone nods like progress

751
00:32:41,880 --> 00:32:42,880
happened.

752
00:32:42,880 --> 00:32:43,880
An outcome is different.

753
00:32:43,880 --> 00:32:46,040
An outcome forces a before and after.

754
00:32:46,040 --> 00:32:47,400
Cycle time moves.

755
00:32:47,400 --> 00:32:48,960
Escalation rate drops.

756
00:32:48,960 --> 00:32:49,960
Thruput increases.

757
00:32:49,960 --> 00:32:51,760
Rework decreases.

758
00:32:51,760 --> 00:32:53,960
Compliance evidence becomes easier to produce.

759
00:32:53,960 --> 00:32:54,960
Those are outcomes.

760
00:32:54,960 --> 00:32:58,640
They're measurable deltas that survive outside the agent teams slide deck.

761
00:32:58,640 --> 00:33:00,640
And the first thing leadership has to accept is this.

762
00:33:00,640 --> 00:33:03,200
If the outcome can't be measured, the agent isn't a product.

763
00:33:03,200 --> 00:33:04,600
It's a distraction.

764
00:33:04,600 --> 00:33:08,120
So define the outcome in the language the business already understands.

765
00:33:08,120 --> 00:33:12,920
For service and operations, mean time to resolution, first contact resolution, deflection

766
00:33:12,920 --> 00:33:17,920
rate with strict definitions, ticket reopen rate, approval time, exception volume.

767
00:33:17,920 --> 00:33:24,800
For sales, qualified opportunities, time to quote, proposal cycle time, RFP throughput.

768
00:33:24,800 --> 00:33:30,040
For HR onboarding completion time, case backlog, handoff count, error rate in payroll or benefits

769
00:33:30,040 --> 00:33:35,160
changes, not vanity metrics like number of conversations, and not sentiment metrics like

770
00:33:35,160 --> 00:33:39,640
users said it was helpful, unless helpful is tied to a measured operational delta.

771
00:33:39,640 --> 00:33:42,800
Next map the value stream, not the tool chain, the value stream.

772
00:33:42,800 --> 00:33:44,440
Where does the decision actually happen?

773
00:33:44,440 --> 00:33:46,160
Where does state actually change?

774
00:33:46,160 --> 00:33:47,560
Where does risk concentrate?

775
00:33:47,560 --> 00:33:49,000
Where do humans get pulled into loops?

776
00:33:49,000 --> 00:33:50,480
Because the system doesn't know what to do.

777
00:33:50,480 --> 00:33:52,680
This is where most agent programs get exposed.

778
00:33:52,680 --> 00:33:56,520
They pick the most visible pain point, not the most architecturally leverageable one.

779
00:33:56,520 --> 00:33:58,800
They automate questions because questions are easy.

780
00:33:58,800 --> 00:34:02,920
They avoid decisions because decisions create accountability, but ROI lives in decisions,

781
00:34:02,920 --> 00:34:05,960
specifically repeatable decisions with bounded variation.

782
00:34:05,960 --> 00:34:08,520
So you're looking for work that has three properties.

783
00:34:08,520 --> 00:34:11,640
First repeatable intent, the request shows up in recognizable forms.

784
00:34:11,640 --> 00:34:15,800
Users ask it a hundred times a week, it's not a one off, solve my unique situation.

785
00:34:15,800 --> 00:34:17,400
It's a recurring demand signal.

786
00:34:17,400 --> 00:34:19,000
Second bounded variation.

787
00:34:19,000 --> 00:34:23,040
There are edge cases, but the majority of requests fall into a small number of shapes.

788
00:34:23,040 --> 00:34:25,920
You can enumerate them, you can define what normal looks like.

789
00:34:25,920 --> 00:34:29,920
If every request is a snowflake, you don't have an agent opportunity, you have a consulting

790
00:34:29,920 --> 00:34:30,920
practice.

791
00:34:30,920 --> 00:34:32,680
Third, clear success criteria.

792
00:34:32,680 --> 00:34:34,800
You can say what done means without a meeting.

793
00:34:34,800 --> 00:34:36,720
The ticket is created with these fields.

794
00:34:36,720 --> 00:34:38,880
The approval is captured with these conditions.

795
00:34:38,880 --> 00:34:41,800
The record is updated and the downstream automation ran.

796
00:34:41,800 --> 00:34:44,840
If success is subjective, you'll get subjective behavior.

797
00:34:44,840 --> 00:34:48,720
And subjective behavior is just another name for non-determinism.

798
00:34:48,720 --> 00:34:50,360
Now once you've done that, you make a decision.

799
00:34:50,360 --> 00:34:54,560
Most teams avoid, are you building a discovery experience or an execution experience?

800
00:34:54,560 --> 00:34:56,480
Different experiences optimize for understanding.

801
00:34:56,480 --> 00:34:57,720
They tolerate ambiguity.

802
00:34:57,720 --> 00:35:01,400
They help users find the right policy, the right owner, the right option, the right next

803
00:35:01,400 --> 00:35:02,400
step.

804
00:35:02,400 --> 00:35:06,800
They can live in chat with, relatively, low risk because the output is guidance.

805
00:35:06,800 --> 00:35:09,000
Execution experiences optimize for changing state.

806
00:35:09,000 --> 00:35:11,920
They require contracts, validation, and systems of record.

807
00:35:11,920 --> 00:35:15,520
They can still use chat as an interface, but the design is dominated by control.

808
00:35:15,520 --> 00:35:16,840
You don't mix these casually.

809
00:35:16,840 --> 00:35:21,200
If you do, you'll end up with an agent that helps users and occasionally performs actions,

810
00:35:21,200 --> 00:35:22,960
which is the worst possible positioning.

811
00:35:22,960 --> 00:35:26,360
The actions will be blamed when they're wrong and the help will be ignored when it's

812
00:35:26,360 --> 00:35:27,360
right.

813
00:35:27,360 --> 00:35:28,360
So pick one per workflow.

814
00:35:28,360 --> 00:35:32,240
If the goal is execution, design the conversation to collect missing parameters and confirm

815
00:35:32,240 --> 00:35:33,240
intent.

816
00:35:33,240 --> 00:35:37,760
If the goal is discovery explicitly refused action and wrote to the appropriate system,

817
00:35:37,760 --> 00:35:42,120
then you tie the outcome to an operating rhythm who owns the metric, who reviews it weekly,

818
00:35:42,120 --> 00:35:45,400
who decides what changes when the metric drifts, because the agent will drift.

819
00:35:45,400 --> 00:35:49,680
The only question is whether drift becomes silent decay or managed iteration.

820
00:35:49,680 --> 00:35:52,160
This is why starting with outcomes is not bureaucratic.

821
00:35:52,160 --> 00:35:54,600
As a control mechanism, it forces bounded scope.

822
00:35:54,600 --> 00:35:55,600
It forces measurement.

823
00:35:55,600 --> 00:35:57,160
It forces a system of record.

824
00:35:57,160 --> 00:36:00,960
It forces you to admit what you're delegating and once outcomes are clear, the next part

825
00:36:00,960 --> 00:36:02,720
becomes non-negotiable.

826
00:36:02,720 --> 00:36:03,720
Boundaries.

827
00:36:03,720 --> 00:36:05,040
Not best practices.

828
00:36:05,040 --> 00:36:06,880
Not guidelines.

829
00:36:06,880 --> 00:36:10,920
Boundaries that define what the agent is allowed to do, what must be true before it does

830
00:36:10,920 --> 00:36:13,760
it and exactly where it must stop.

831
00:36:13,760 --> 00:36:15,480
Monday mandate part two.

832
00:36:15,480 --> 00:36:17,960
Intent contracts and decision boundaries.

833
00:36:17,960 --> 00:36:20,840
Once you've picked outcomes, you don't brainstorm prompts.

834
00:36:20,840 --> 00:36:24,160
You write contracts because the agent doesn't need permission to talk.

835
00:36:24,160 --> 00:36:25,640
It needs permission to act.

836
00:36:25,640 --> 00:36:29,600
An intent contract is the simplest artifact that forces that discipline.

837
00:36:29,600 --> 00:36:34,440
It's a plain language definition of one intent with a machine enforceable boundary behind

838
00:36:34,440 --> 00:36:35,440
it.

839
00:36:35,440 --> 00:36:36,880
It answers five questions.

840
00:36:36,880 --> 00:36:38,080
What the intent is.

841
00:36:38,080 --> 00:36:39,720
What the agent is allowed to do.

842
00:36:39,720 --> 00:36:41,280
What inputs are required.

843
00:36:41,280 --> 00:36:44,120
What systems it may touch and what evidence it must produce.

844
00:36:44,120 --> 00:36:48,240
But if that sounds like paperwork, good paperwork is what keeps state change from becoming

845
00:36:48,240 --> 00:36:49,240
folklore.

846
00:36:49,240 --> 00:36:50,640
Start with the intent itself.

847
00:36:50,640 --> 00:36:55,240
Let's say it's a system operation, not like a marketing feature, create service request,

848
00:36:55,240 --> 00:37:01,720
reset MFA method, generate RFP draft, submit onboarding task, update incident status.

849
00:37:01,720 --> 00:37:03,920
If you can't name it crisply, it's not an intent.

850
00:37:03,920 --> 00:37:05,480
It's a category of vibes.

851
00:37:05,480 --> 00:37:08,600
Then define the allowed actions and write them like an allow list.

852
00:37:08,600 --> 00:37:10,360
Not help with onboarding it does.

853
00:37:10,360 --> 00:37:13,760
Instead create a service now request using catalog item X.

854
00:37:13,760 --> 00:37:16,920
Assign the request to group Y based on parameter Z.

855
00:37:16,920 --> 00:37:20,240
Send notification to manager via teams using template T.

856
00:37:20,240 --> 00:37:22,440
The agent can only do what's enumerated.

857
00:37:22,440 --> 00:37:24,880
Everything else becomes refusal or escalation by design.

858
00:37:24,880 --> 00:37:28,800
This is where teams usually push back with, but users won't know what to ask for.

859
00:37:28,800 --> 00:37:29,800
That's fine.

860
00:37:29,800 --> 00:37:33,800
The model can translate messy language into a known intent, but translation is not authorization.

861
00:37:33,800 --> 00:37:36,120
The intent contract is authorization.

862
00:37:36,120 --> 00:37:37,520
Next required inputs.

863
00:37:37,520 --> 00:37:40,720
Every action capable intent has a minimum parameter set.

864
00:37:40,720 --> 00:37:43,360
If the parameters aren't present, the agent doesn't guess.

865
00:37:43,360 --> 00:37:48,680
It asks and it asks narrowly, for example, which user, which system, which effective date,

866
00:37:48,680 --> 00:37:49,960
which costs center.

867
00:37:49,960 --> 00:37:52,640
Which approver, these aren't conversational flourishes.

868
00:37:52,640 --> 00:37:55,080
They are preconditions for safe execution.

869
00:37:55,080 --> 00:37:56,920
And the design move here is subtle.

870
00:37:56,920 --> 00:38:00,840
The agent should never request information it can deterministically retrieve.

871
00:38:00,840 --> 00:38:04,680
If Entra can provide the user's ID, the agent shouldn't ask the user to type it.

872
00:38:04,680 --> 00:38:09,560
If service now can return the ticket number from context, don't make the user restate it.

873
00:38:09,560 --> 00:38:13,760
The only questions the agent asks are the missing fields required to execute the contract.

874
00:38:13,760 --> 00:38:18,200
That keeps the conversation surface minimal and keeps the system boundary explicit.

875
00:38:18,200 --> 00:38:19,520
Now define preconditions.

876
00:38:19,520 --> 00:38:23,120
Pre-conditions are the must be true before tool invocation rules.

877
00:38:23,120 --> 00:38:27,560
This is where most agent programs silently fail because they treat policies as documentation

878
00:38:27,560 --> 00:38:29,920
instead of runtime gates.

879
00:38:29,920 --> 00:38:32,040
Pre-conditions are executable checks.

880
00:38:32,040 --> 00:38:33,400
User has role X.

881
00:38:33,400 --> 00:38:34,960
Ticket is in state Y.

882
00:38:34,960 --> 00:38:37,640
Record exists and is owned by this business unit.

883
00:38:37,640 --> 00:38:42,200
MFA reset allowed only if identity risk score is below threshold.

884
00:38:42,200 --> 00:38:45,240
Approval must be present before status change.

885
00:38:45,240 --> 00:38:48,760
In a deterministic design, preconditions run before any state change.

886
00:38:48,760 --> 00:38:50,600
If a precondition fails, the agent stops.

887
00:38:50,600 --> 00:38:52,000
It does not attempt to work around.

888
00:38:52,000 --> 00:38:53,480
It does not pick an adjacent tool.

889
00:38:53,480 --> 00:38:57,160
It does not rewrite the policy in friendly language and proceed anyway.

890
00:38:57,160 --> 00:38:59,560
It escalates, which means you need refusal rules.

891
00:38:59,560 --> 00:39:00,560
Refusal rules aren't.

892
00:39:00,560 --> 00:39:02,520
The agent says no sometimes.

893
00:39:02,520 --> 00:39:04,000
They are explicit boundaries.

894
00:39:04,000 --> 00:39:07,960
If the request touches privileged access, refuse and root to human approval.

895
00:39:07,960 --> 00:39:12,160
If the user lacks entitlement, refuse and provide the self-service path.

896
00:39:12,160 --> 00:39:15,600
If the context is ambiguous, refuse and ask for clarification.

897
00:39:15,600 --> 00:39:20,160
If the system of record is unavailable, refuse and create a fallback ticket.

898
00:39:20,160 --> 00:39:21,160
Refusal is a feature.

899
00:39:21,160 --> 00:39:23,160
It's the control plane proving it exists.

900
00:39:23,160 --> 00:39:27,720
Then comes separation of concerns because contracts only work when the architecture respects

901
00:39:27,720 --> 00:39:28,720
them.

902
00:39:28,720 --> 00:39:30,840
Intent capture lives in the conversation layer.

903
00:39:30,840 --> 00:39:32,880
Decisioning lives in the orchestration layer.

904
00:39:32,880 --> 00:39:34,680
Execution lives in deterministic tools.

905
00:39:34,680 --> 00:39:36,120
Recording lives in the system of record.

906
00:39:36,120 --> 00:39:40,360
If you collapse those layers back into a chat transcript, you'll get the same failure again,

907
00:39:40,360 --> 00:39:41,640
just with nicer phrasing.

908
00:39:41,640 --> 00:39:44,480
So on Monday, you don't start by improving the agent.

909
00:39:44,480 --> 00:39:48,680
You start by writing three to five intent contracts for one outcome and you enforce them

910
00:39:48,680 --> 00:39:52,600
with decision boundaries, the model cannot talk its way around.

911
00:39:52,600 --> 00:39:54,400
And you'll notice something immediately.

912
00:39:54,400 --> 00:39:58,480
The agent becomes less magical and more useful because it becomes predictable.

913
00:39:58,480 --> 00:40:02,600
Now contracts alone still don't save you because contracts without identity discipline

914
00:40:02,600 --> 00:40:04,280
are just aspirational text.

915
00:40:04,280 --> 00:40:06,680
So the next question isn't what can the agent do.

916
00:40:06,680 --> 00:40:09,360
It's the one nobody wants to answer in a steering committee.

917
00:40:09,360 --> 00:40:11,120
Who does the agent run as?

918
00:40:11,120 --> 00:40:12,440
Identity as control plane.

919
00:40:12,440 --> 00:40:14,080
Who does the agent run as?

920
00:40:14,080 --> 00:40:18,880
If you want one question that exposes whether an agent program is real or just a pilot hobby,

921
00:40:18,880 --> 00:40:19,880
it's this.

922
00:40:19,880 --> 00:40:20,880
Who does the agent run as?

923
00:40:20,880 --> 00:40:21,880
Not who built it?

924
00:40:21,880 --> 00:40:23,840
Not who owns the team's channel?

925
00:40:23,840 --> 00:40:25,560
Not who pays for the license.

926
00:40:25,560 --> 00:40:30,520
At runtime, what identity actually calls the tool touches the data and changes state?

927
00:40:30,520 --> 00:40:32,520
Because the agent doesn't act as a concept.

928
00:40:32,520 --> 00:40:33,520
It acts as a token.

929
00:40:33,520 --> 00:40:38,760
And tokens obey the rules of entra, conditional access, connector, or whatever accidental

930
00:40:38,760 --> 00:40:41,800
privilege you left lying around in the tenant.

931
00:40:41,800 --> 00:40:45,680
There are two dominant execution models and both come with consequences you don't get

932
00:40:45,680 --> 00:40:46,920
to negotiate.

933
00:40:46,920 --> 00:40:49,480
Model one, user delegated execution.

934
00:40:49,480 --> 00:40:52,760
That means the agent calls tools using the signed in user's permissions.

935
00:40:52,760 --> 00:40:54,560
It can only retrieve what they can retrieve.

936
00:40:54,560 --> 00:40:56,600
It can only update what they can update.

937
00:40:56,600 --> 00:41:00,520
In enterprise terms, this is the least surprising model because it preserves the existing access

938
00:41:00,520 --> 00:41:01,520
graph.

939
00:41:01,520 --> 00:41:03,480
It also gives you the cleanest accountability story.

940
00:41:03,480 --> 00:41:08,160
The user asked the user had access the action happened, but the cost is operational fragility.

941
00:41:08,160 --> 00:41:11,800
User delegated execution inherits every problem in human identity.

942
00:41:11,800 --> 00:41:16,960
MFA prompts, session expiry, conditional access changes, device compliance, role changes

943
00:41:16,960 --> 00:41:17,960
and licensing mismatches.

944
00:41:17,960 --> 00:41:22,080
And it also means two users can get two different outcomes from the same request because

945
00:41:22,080 --> 00:41:23,560
their entitlements differ.

946
00:41:23,560 --> 00:41:25,520
That's not AI in consistency.

947
00:41:25,520 --> 00:41:26,920
That's identity truth.

948
00:41:26,920 --> 00:41:30,840
And when people complain that an agent is unreliable, half the time they're describing

949
00:41:30,840 --> 00:41:32,960
authorization variability.

950
00:41:32,960 --> 00:41:35,840
Model two, service executed execution.

951
00:41:35,840 --> 00:41:38,920
That means the agent calls tools under a non-human identity.

952
00:41:38,920 --> 00:41:43,520
A service principle managed identity or a dedicated run is account with fixed permissions.

953
00:41:43,520 --> 00:41:48,000
This is the model teams choose when they want consistent behavior across users or when

954
00:41:48,000 --> 00:41:51,240
the workflow must run without a specific person present.

955
00:41:51,240 --> 00:41:55,560
It is also the model that quietly creates the biggest blast radius in your environment.

956
00:41:55,560 --> 00:41:59,920
Because now the agent has a stable reusable identity with standing privileged.

957
00:41:59,920 --> 00:42:02,520
If you overscope it, you didn't just create an agent.

958
00:42:02,520 --> 00:42:05,640
You created an automation back door with a chat interface.

959
00:42:05,640 --> 00:42:09,800
And once that exists, the question shifts from can the user do this to can the agent do

960
00:42:09,800 --> 00:42:12,440
this, which is exactly the wrong direction for governance.

961
00:42:12,440 --> 00:42:13,840
So the rule is simple.

962
00:42:13,840 --> 00:42:15,880
Least privilege isn't a policy document.

963
00:42:15,880 --> 00:42:19,000
It's an architectural boundary enforced by identity design.

964
00:42:19,000 --> 00:42:22,240
Every tool you expose has to bind to an identity model intentionally.

965
00:42:22,240 --> 00:42:25,440
If it runs as the user, you accept variance and you design for it.

966
00:42:25,440 --> 00:42:29,080
If it runs as a service, you constrain permissions to the minimum contract set and you treat

967
00:42:29,080 --> 00:42:30,440
changes like code.

968
00:42:30,440 --> 00:42:33,840
Now layer entra on top of this because entra is not a branding exercise.

969
00:42:33,840 --> 00:42:39,280
It is the policy engine that decides whether your architecture exists at runtime.

970
00:42:39,280 --> 00:42:43,960
Conditional access policies will shape agent behavior in ways builders often don't predict.

971
00:42:43,960 --> 00:42:45,840
Step up or sign in risk.

972
00:42:45,840 --> 00:42:47,160
Device posture.

973
00:42:47,160 --> 00:42:48,360
Location constraints.

974
00:42:48,360 --> 00:42:49,360
Session lifetime.

975
00:42:49,360 --> 00:42:50,880
Token issuance controls.

976
00:42:50,880 --> 00:42:53,680
The agent doesn't get exceptions because the maker is excited.

977
00:42:53,680 --> 00:42:58,480
If your action path depends on a token that CA will sometimes deny, your agent is probabilistic

978
00:42:58,480 --> 00:43:00,200
before the model even speaks.

979
00:43:00,200 --> 00:43:02,000
And then there's the slow killer.

980
00:43:02,000 --> 00:43:03,000
Identity drift.

981
00:43:03,000 --> 00:43:05,520
Over time, access reviews don't happen.

982
00:43:05,520 --> 00:43:07,600
People get added to groups temporarily.

983
00:43:07,600 --> 00:43:10,640
App registrations accumulate permissions because it unblocked a project.

984
00:43:10,640 --> 00:43:11,640
Owners leave.

985
00:43:11,640 --> 00:43:13,040
Secrets expire.

986
00:43:13,040 --> 00:43:14,360
Service accounts get reused.

987
00:43:14,360 --> 00:43:16,680
You don't notice because the system still mostly works.

988
00:43:16,680 --> 00:43:20,080
Then you add an agent and it starts traversing those pathways at scale.

989
00:43:20,080 --> 00:43:22,040
The agent doesn't introduce new entropy.

990
00:43:22,040 --> 00:43:23,640
It operationalizes the entropy.

991
00:43:23,640 --> 00:43:24,640
You already tolerated.

992
00:43:24,640 --> 00:43:26,960
So Monday morning identity work is brutally specific.

993
00:43:26,960 --> 00:43:29,200
You decide the runners model per intent contract.

994
00:43:29,200 --> 00:43:32,000
You document it as part of the contract, not as tribal knowledge.

995
00:43:32,000 --> 00:43:36,280
You prove it with testing under different user profiles because works for me is not an

996
00:43:36,280 --> 00:43:37,760
identity strategy.

997
00:43:37,760 --> 00:43:41,880
And you establish and access review cadence for the identities that matter because agents

998
00:43:41,880 --> 00:43:44,160
don't stay inside the boundaries you hoped for.

999
00:43:44,160 --> 00:43:46,520
They stay inside the boundaries you enforced.

1000
00:43:46,520 --> 00:43:50,280
Once you answer who does it run as, a second truth becomes obvious.

1001
00:43:50,280 --> 00:43:53,800
Access determines what knowledge is even possible for the agent to retrieve.

1002
00:43:53,800 --> 00:43:57,720
And that's where context architecture stops being a content project and becomes an engineering

1003
00:43:57,720 --> 00:43:59,000
discipline.

1004
00:43:59,000 --> 00:44:04,920
And architecture manage knowledge beats attach SharePoint and pray once identity is explicit

1005
00:44:04,920 --> 00:44:08,840
context becomes the next point of failure not because models can't retrieve information

1006
00:44:08,840 --> 00:44:09,840
they can.

1007
00:44:09,840 --> 00:44:14,120
The failure is that enterprises treat knowledge as an attachment, not as a managed product

1008
00:44:14,120 --> 00:44:17,840
and that mistake scales beautifully because the fastest way to ship an agent is to point

1009
00:44:17,840 --> 00:44:21,280
at SharePoint dump in a few PDFs at a site or two and call it done.

1010
00:44:21,280 --> 00:44:22,280
It feels like progress.

1011
00:44:22,280 --> 00:44:26,800
The agent starts answering questions people see citations everyone relaxes.

1012
00:44:26,800 --> 00:44:30,880
And the boring questions arrive which version of that policy did it use who owns that page

1013
00:44:30,880 --> 00:44:32,240
what changed last week.

1014
00:44:32,240 --> 00:44:35,280
Why did the agent quote something that was replaced three months ago?

1015
00:44:35,280 --> 00:44:38,200
Why does it contradict what the service desk tells users?

1016
00:44:38,200 --> 00:44:39,520
This is the uncomfortable truth.

1017
00:44:39,520 --> 00:44:42,520
Retrieval is only as safe as the knowledge lifecycle behind it.

1018
00:44:42,520 --> 00:44:45,200
So context architecture starts with a redefinition.

1019
00:44:45,200 --> 00:44:46,520
Knowledge sources aren't sources.

1020
00:44:46,520 --> 00:44:50,720
Their dependencies and dependencies require ownership, life cycle, change control and version

1021
00:44:50,720 --> 00:44:51,720
discipline.

1022
00:44:51,720 --> 00:44:55,960
Without those, an agent becomes a high speed amplifier for outdated guidance.

1023
00:44:56,960 --> 00:44:59,040
The first design rule is separation.

1024
00:44:59,040 --> 00:45:01,280
Static knowledge is not operational data.

1025
00:45:01,280 --> 00:45:04,800
Static knowledge is policy, procedures, FAQs and reference material.

1026
00:45:04,800 --> 00:45:06,760
It changes but it changes relatively slowly.

1027
00:45:06,760 --> 00:45:09,280
It should be curated, reviewed and attributable.

1028
00:45:09,280 --> 00:45:13,080
It should have a defined effective date concept even if it's informal.

1029
00:45:13,080 --> 00:45:17,360
Most importantly, it should be treated as a product with an owner who cares when it's wrong.

1030
00:45:17,360 --> 00:45:22,400
Operational data is tickets, approvals, user records, asset state, entitlements and anything

1031
00:45:22,400 --> 00:45:24,000
that represents current truth.

1032
00:45:24,000 --> 00:45:25,920
You don't want that as documents.

1033
00:45:25,920 --> 00:45:31,320
You want it through deterministic tools with schema, with validation and with authorization

1034
00:45:31,320 --> 00:45:32,320
boundaries.

1035
00:45:32,320 --> 00:45:36,120
Operational data belongs in the execution and record layers, not in a pile of files the

1036
00:45:36,120 --> 00:45:37,320
model rummages through.

1037
00:45:37,320 --> 00:45:41,720
When teams mix these, they get an agent that confidently answers with yesterday's reality

1038
00:45:41,720 --> 00:45:44,040
and then executes against today's systems.

1039
00:45:44,040 --> 00:45:45,280
That gap creates incidents.

1040
00:45:45,280 --> 00:45:46,880
The second rule is bounding.

1041
00:45:46,880 --> 00:45:48,240
Context has to be intentionally narrow.

1042
00:45:48,240 --> 00:45:51,360
The human impulse is to say, "Give it everything so it has the best chance."

1043
00:45:51,360 --> 00:45:56,680
That's exactly backwards. Overbroad context increases ambiguity and latency and it drives

1044
00:45:56,680 --> 00:46:00,520
the model into probabilistic selection, which chunk matters, which policy applies, which

1045
00:46:00,520 --> 00:46:02,320
document is authoritative.

1046
00:46:02,320 --> 00:46:04,160
More context doesn't mean more accuracy.

1047
00:46:04,160 --> 00:46:06,360
It means more opportunities to be wrong with confidence.

1048
00:46:06,360 --> 00:46:12,280
So instead of HR SharePoint, context needs to look like this policy set, these procedures,

1049
00:46:12,280 --> 00:46:17,000
these approved templates, this set of known definitions, this glossary of terms, bounded,

1050
00:46:17,000 --> 00:46:18,320
versioned and attributable.

1051
00:46:18,320 --> 00:46:22,080
That's also how you prevent cross-business unit contamination, where one group's process

1052
00:46:22,080 --> 00:46:26,040
becomes the answer for everyone because the agent found it first.

1053
00:46:26,040 --> 00:46:27,440
The third rule is life cycle.

1054
00:46:27,440 --> 00:46:29,680
If the agent can cite it, someone has to maintain it.

1055
00:46:29,680 --> 00:46:34,040
That means when a policy changes, the knowledge artifact is updated, the old one is archived

1056
00:46:34,040 --> 00:46:38,520
or marked clearly and the agent is re-evaluated against the intents that depend on it, not as

1057
00:46:38,520 --> 00:46:40,280
a nice to have it as a release step.

1058
00:46:40,280 --> 00:46:42,080
If that sounds heavy, good.

1059
00:46:42,080 --> 00:46:43,920
Enterprises already do this for code.

1060
00:46:43,920 --> 00:46:45,560
Context is now part of the runtime.

1061
00:46:45,560 --> 00:46:46,760
Therefore it gets the same discipline.

1062
00:46:46,760 --> 00:46:48,440
Now, a practical warning.

1063
00:46:48,440 --> 00:46:51,520
PDF-heavy approaches are a performance and reliability tax.

1064
00:46:51,520 --> 00:46:56,160
PDFs tend to be long, inconsistent in structure and full of duplicated sections.

1065
00:46:56,160 --> 00:46:58,680
Retrieval becomes slow, chunking becomes sloppy.

1066
00:46:58,680 --> 00:47:02,440
The model retrieves partial paragraphs that lose the conditions and exceptions that made

1067
00:47:02,440 --> 00:47:03,840
the policy safe.

1068
00:47:03,840 --> 00:47:07,960
And when response latency increases, you hit the exact failure modes, people are seeing

1069
00:47:07,960 --> 00:47:12,840
in real deployments, timeouts, non-responses, fallback behavior and users resending the same

1070
00:47:12,840 --> 00:47:15,880
request until the agent does something twice.

1071
00:47:15,880 --> 00:47:20,840
So if the context is important enough to govern behavior, it's important enough to structure,

1072
00:47:20,840 --> 00:47:25,900
convert key guidance into stable pages, structured content, or curated knowledge entries with

1073
00:47:25,900 --> 00:47:27,760
clear titles and scope.

1074
00:47:27,760 --> 00:47:30,920
Treat the source of truth as a design choice, not an accident.

1075
00:47:30,920 --> 00:47:32,480
The last rule is attribution.

1076
00:47:32,480 --> 00:47:36,200
The agent must be able to say where it got the answer and the organization must be able

1077
00:47:36,200 --> 00:47:38,800
to trace that source back to an owner and a version.

1078
00:47:38,800 --> 00:47:40,320
Otherwise, you don't have knowledge.

1079
00:47:40,320 --> 00:47:41,520
You have plausible text.

1080
00:47:41,520 --> 00:47:46,760
This is how context becomes an architectural asset instead of an embarrassment in audit meetings.

1081
00:47:46,760 --> 00:47:48,600
So the Monday move is simple.

1082
00:47:48,600 --> 00:47:53,320
Stop thinking about knowledge as things we attach and start thinking about it as managed

1083
00:47:53,320 --> 00:47:55,120
context with boundaries.

1084
00:47:55,120 --> 00:47:59,920
Curated, version it, own it, test it because once context is bounded and trustworthy, tool

1085
00:47:59,920 --> 00:48:04,600
design becomes the real enforcement layer and tools are where determinism either exists

1086
00:48:04,600 --> 00:48:07,080
or collapses back into improvisation.

1087
00:48:07,080 --> 00:48:08,720
Tool first rooting.

1088
00:48:08,720 --> 00:48:10,880
Tools are contracts, not features.

1089
00:48:10,880 --> 00:48:14,160
Once context is bounded, the next failure point is predictable.

1090
00:48:14,160 --> 00:48:15,160
Tools.

1091
00:48:15,160 --> 00:48:16,760
Most teams treat tools like features.

1092
00:48:16,760 --> 00:48:18,520
Let's connect outlook.

1093
00:48:18,520 --> 00:48:20,520
Let's add service now.

1094
00:48:20,520 --> 00:48:21,920
Let's give it dataverse.

1095
00:48:21,920 --> 00:48:25,120
As if the agent is collecting plugins like a browser.

1096
00:48:25,120 --> 00:48:27,480
That thinking is why agents drift into chaos.

1097
00:48:27,480 --> 00:48:29,800
In an enterprise, a tool is not a capability.

1098
00:48:29,800 --> 00:48:30,800
It's a contract.

1099
00:48:30,800 --> 00:48:35,000
It is an explicitly exposed pathway to change state in a system you already struggle to

1100
00:48:35,000 --> 00:48:36,000
govern.

1101
00:48:36,000 --> 00:48:37,000
So tool first rooting means this.

1102
00:48:37,000 --> 00:48:40,360
You design the tool surface area before you design the conversation.

1103
00:48:40,360 --> 00:48:43,400
Because the conversation is just how users request contracts.

1104
00:48:43,400 --> 00:48:44,960
The contracts are what actually happen.

1105
00:48:44,960 --> 00:48:47,200
A tool contract has four parts.

1106
00:48:47,200 --> 00:48:50,440
Purpose, inputs, outputs and failure modes.

1107
00:48:50,440 --> 00:48:52,600
Purpose is not integrates with service now.

1108
00:48:52,600 --> 00:48:56,880
Purpose is create incident with these required fields in these assignment groups under these

1109
00:48:56,880 --> 00:48:57,880
conditions.

1110
00:48:57,880 --> 00:48:59,880
Inputs are typed parameters.

1111
00:48:59,880 --> 00:49:05,120
Ticket category, affected service, urgency, request identity, business unit.

1112
00:49:05,120 --> 00:49:06,440
Outputs are not success.

1113
00:49:06,440 --> 00:49:10,360
Purpose are record ID, state, timestamps and any downstream workflow status.

1114
00:49:10,360 --> 00:49:15,000
And failure modes are where adult architecture lives, permission denied, validation error,

1115
00:49:15,000 --> 00:49:20,200
throttling, timeout, duplicate detection, partial success and system unavailable.

1116
00:49:20,200 --> 00:49:23,760
If you can't name the failure modes, you can't operate the agent.

1117
00:49:23,760 --> 00:49:24,760
You're just hoping.

1118
00:49:24,760 --> 00:49:27,880
This is also where deterministic execution actually gets enforced.

1119
00:49:27,880 --> 00:49:29,200
The model can reason all day.

1120
00:49:29,200 --> 00:49:33,000
The only thing that matters is whether the action layer accepts or rejects the request

1121
00:49:33,000 --> 00:49:34,400
based on explicit rules.

1122
00:49:34,400 --> 00:49:38,080
That's why power automate logic apps and API backed actions belong here.

1123
00:49:38,080 --> 00:49:43,240
They validate inputs, they apply fixed mappings, they return structured errors and they can be

1124
00:49:43,240 --> 00:49:45,200
wrapped in idempotent patterns.

1125
00:49:45,200 --> 00:49:48,160
So retries don't create duplicate side effects.

1126
00:49:48,160 --> 00:49:50,640
An agent calling a tool is not automation.

1127
00:49:50,640 --> 00:49:54,120
It's invoking a contract so the allow list matters more than the prompt.

1128
00:49:54,120 --> 00:49:57,080
Tool first routing starts by shrinking the surface area.

1129
00:49:57,080 --> 00:49:59,520
Enterprises love broad tools because they feel flexible.

1130
00:49:59,520 --> 00:50:02,320
Let the agent read and write anything in dataverse.

1131
00:50:02,320 --> 00:50:04,760
Give it access to all sharepoint sites.

1132
00:50:04,760 --> 00:50:07,040
Allow create update delete so it can help.

1133
00:50:07,040 --> 00:50:09,480
That flexibility is just unpriced risk.

1134
00:50:09,480 --> 00:50:11,280
The principle is simple.

1135
00:50:11,280 --> 00:50:15,680
Prohibit create, update and delete unless the business sponsor can explain the exact

1136
00:50:15,680 --> 00:50:19,560
outcome metric it supports and the exact audit evidence it will produce.

1137
00:50:19,560 --> 00:50:21,280
Read only tools are your default.

1138
00:50:21,280 --> 00:50:22,600
Write tools are exceptions.

1139
00:50:22,600 --> 00:50:24,560
Delete tools are almost never defensible.

1140
00:50:24,560 --> 00:50:29,600
And when you do allow write actions, you split them into narrow single purpose operations.

1141
00:50:29,600 --> 00:50:33,440
Add update ticket instead set ticket state from new to in progress.

1142
00:50:33,440 --> 00:50:35,560
Add work note with a required template.

1143
00:50:35,560 --> 00:50:37,680
Assign to group from an approved list.

1144
00:50:37,680 --> 00:50:38,840
Each one becomes governable.

1145
00:50:38,840 --> 00:50:40,400
Now tool selection policy.

1146
00:50:40,400 --> 00:50:44,360
This is the part everyone hand waves with the model will choose the right action.

1147
00:50:44,360 --> 00:50:45,360
That's not governance.

1148
00:50:45,360 --> 00:50:47,360
That's gambling with a nicer UI.

1149
00:50:47,360 --> 00:50:50,480
Tool selection has to be deterministic from the organization's point of view.

1150
00:50:50,480 --> 00:50:54,280
Which means you either root based on intent contracts and preconditions or you don't

1151
00:50:54,280 --> 00:50:55,280
root at all.

1152
00:50:55,280 --> 00:50:58,000
If two tools can satisfy the same intent you pick one.

1153
00:50:58,000 --> 00:51:01,520
We don't let the agent improvise between them based on wording.

1154
00:51:01,520 --> 00:51:05,400
This is why the orchestration layer should behave like a policy router.

1155
00:51:05,400 --> 00:51:10,540
Given intent x and verified context y call tool z with payload schema s.

1156
00:51:10,540 --> 00:51:14,720
If the context doesn't satisfy the preconditions the router doesn't try something else.

1157
00:51:14,720 --> 00:51:16,040
It refuses or escalates.

1158
00:51:16,040 --> 00:51:19,840
And yes, co pilot studio gives you multiple ways to implement this.

1159
00:51:19,840 --> 00:51:25,840
Topics, tool definitions, agent flows and emerging protocols like mcp for control tool invocation.

1160
00:51:25,840 --> 00:51:27,440
The mechanism isn't the point.

1161
00:51:27,440 --> 00:51:31,320
The point is that tool invocation is a governed decision, not a creative act.

1162
00:51:31,320 --> 00:51:33,560
Now at guardrails that people pretend are optional.

1163
00:51:33,560 --> 00:51:36,600
First require confirmation for high impact actions.

1164
00:51:36,600 --> 00:51:37,600
Not are you sure?

1165
00:51:37,600 --> 00:51:38,840
See you in a friendly sentence.

1166
00:51:38,840 --> 00:51:42,600
A structured confirmation that restates the action in precise terms.

1167
00:51:42,600 --> 00:51:43,600
What will change?

1168
00:51:43,600 --> 00:51:44,600
Where?

1169
00:51:44,600 --> 00:51:46,960
Under which identity and what record will be written?

1170
00:51:46,960 --> 00:51:49,080
Second, separate preparation from commit.

1171
00:51:49,080 --> 00:51:53,400
Let the model assemble the request, validate the parameters and present the plan.

1172
00:51:53,400 --> 00:51:56,880
Then commit through deterministic execution only after preconditions and confirmations

1173
00:51:56,880 --> 00:51:57,880
are satisfied.

1174
00:51:57,880 --> 00:52:02,560
Third, log every tool call with enough detail to reconstruct reality later.

1175
00:52:02,560 --> 00:52:06,880
If you can't answer what happened without rereading a chat transcript you didn't log.

1176
00:52:06,880 --> 00:52:08,640
You performed theater.

1177
00:52:08,640 --> 00:52:10,960
Tool first routing is how you keep the agent honest.

1178
00:52:10,960 --> 00:52:15,560
You stop asking what can it do and start asking what contracts are we willing to expose under

1179
00:52:15,560 --> 00:52:17,640
which identities with which evidence.

1180
00:52:17,640 --> 00:52:21,320
That's how you build an agent that can act without becoming a liability.

1181
00:52:21,320 --> 00:52:26,360
And once the tool surface is explicit, orchestration stops being a vague agent flow.

1182
00:52:26,360 --> 00:52:30,960
It becomes a structure you can test, version and keep small on purpose.

1183
00:52:30,960 --> 00:52:31,960
Orchestration design.

1184
00:52:31,960 --> 00:52:35,120
Topics, flows and the minimal conversation surface.

1185
00:52:35,120 --> 00:52:38,760
Once tools are contracts, orchestration is the thing that prevents those contracts from

1186
00:52:38,760 --> 00:52:43,560
turning into a bucket of sharp objects sitting on a desk labeled AI.

1187
00:52:43,560 --> 00:52:47,760
Orchestration is how you decide repeatedly which contract is allowed to run in which order,

1188
00:52:47,760 --> 00:52:49,840
with which inputs and with what commit points.

1189
00:52:49,840 --> 00:52:53,800
It's the difference between a governed service and a chat session that happens to touch production.

1190
00:52:53,800 --> 00:52:57,120
In co-pilot studio terms people will argue about mechanisms.

1191
00:52:57,120 --> 00:53:01,800
Topics versus generative orchestration, agent flows versus power automate, actions versus

1192
00:53:01,800 --> 00:53:05,040
prompts, MCP versus connectors.

1193
00:53:05,040 --> 00:53:07,040
Mechanisms are implementation details.

1194
00:53:07,040 --> 00:53:11,040
Orchestration is the structure you impose so the mechanism can't drift into improvisation.

1195
00:53:11,040 --> 00:53:15,520
Start with topics because topics are still the cleanest way to make a path repeatable when

1196
00:53:15,520 --> 00:53:16,840
the outcome matters.

1197
00:53:16,840 --> 00:53:18,960
A topic is not old school bot design.

1198
00:53:18,960 --> 00:53:21,880
It's a deterministic route through a known decision boundary.

1199
00:53:21,880 --> 00:53:26,520
If the intent triggers a state change, you want the predictable container, trigger, parameter

1200
00:53:26,520 --> 00:53:32,920
collection, validation, tool invocation, confirmation, record update, and a final response that reflects

1201
00:53:32,920 --> 00:53:34,400
the actual outcome.

1202
00:53:34,400 --> 00:53:38,080
And the design rule is simple, structured topics for execution paths.

1203
00:53:38,080 --> 00:53:42,600
Use freeform conversation where you can tolerate variance, clarifying questions, knowledge, lookup

1204
00:53:42,600 --> 00:53:43,600
and triage.

1205
00:53:43,600 --> 00:53:46,960
But don't let freeform language drive execution branching because that's how you get the

1206
00:53:46,960 --> 00:53:50,680
same request going down different paths depending on phrasing, context, chunks or which

1207
00:53:50,680 --> 00:53:52,440
prompt got edited last week.

1208
00:53:52,440 --> 00:53:55,880
So the conversation surface becomes minimal on purpose.

1209
00:53:55,880 --> 00:54:01,360
The agent should ask only for missing parameters, not for storytelling, not for help me understand,

1210
00:54:01,360 --> 00:54:04,760
not for open and a dialogue that makes the user feel heard while the system collects

1211
00:54:04,760 --> 00:54:06,360
unreliable inputs.

1212
00:54:06,360 --> 00:54:09,360
Minimal conversation means if a parameter is required ask for it.

1213
00:54:09,360 --> 00:54:10,560
If it isn't required don't.

1214
00:54:10,560 --> 00:54:13,760
If a parameter can be derived deterministically, derive it.

1215
00:54:13,760 --> 00:54:14,760
Don't ask.

1216
00:54:14,760 --> 00:54:19,040
If ambiguity exists, the agent should say so explicitly and force this ambiguity before

1217
00:54:19,040 --> 00:54:20,040
it touches tools.

1218
00:54:20,040 --> 00:54:23,480
That sounds strict because it is execution needs strictness.

1219
00:54:23,480 --> 00:54:24,480
Now flows.

1220
00:54:24,480 --> 00:54:28,440
Flow is where long running work stops being a chat problem and becomes a systems problem,

1221
00:54:28,440 --> 00:54:29,960
which is where it belongs.

1222
00:54:29,960 --> 00:54:34,080
If the action takes more than a few seconds, document generation multi-system updates approvals

1223
00:54:34,080 --> 00:54:36,520
provisioning don't trap it inside a conversational loop.

1224
00:54:36,520 --> 00:54:40,600
Kick off a flow, return a receipt, provide a status check, and the turn.

1225
00:54:40,600 --> 00:54:42,480
And the hidden benefit is operational.

1226
00:54:42,480 --> 00:54:46,680
Flow can be built with id-impotency patterns, correlation IDs, retries and timeouts that

1227
00:54:46,680 --> 00:54:47,760
chat cannot give you.

1228
00:54:47,760 --> 00:54:51,880
If the user asks twice, the system should detect same request already in progress and return

1229
00:54:51,880 --> 00:54:53,840
status, not run the action twice.

1230
00:54:53,840 --> 00:54:57,480
That's not an optimization, that's how you avoid double execution incidents.

1231
00:54:57,480 --> 00:55:01,840
So, orchestration should treat long running execution as asynchronous by default, the agent

1232
00:55:01,840 --> 00:55:05,680
becomes the interface, the workflow becomes the engine, the system of record becomes the

1233
00:55:05,680 --> 00:55:10,120
truth, which leads to the next design rule, status updates aren't politeness, they're

1234
00:55:10,120 --> 00:55:11,640
control.

1235
00:55:11,640 --> 00:55:17,000
If execution takes time, the agent must report state transitions, accepted, validating,

1236
00:55:17,000 --> 00:55:22,240
awaiting approval, executing, completed, failed, escalated, those aren't progress messages,

1237
00:55:22,240 --> 00:55:27,920
they are how you prevent users from resubmitting, how you preserve trust, and how you make operations

1238
00:55:27,920 --> 00:55:28,920
debuggable.

1239
00:55:28,920 --> 00:55:33,120
And yes, you want retries, but you want id-impotent retries, not try again and hope.

1240
00:55:33,120 --> 00:55:36,840
If the workflow can safely retry a step, it should, if it can't, it should stop and

1241
00:55:36,840 --> 00:55:38,560
hand off with the evidence captured.

1242
00:55:38,560 --> 00:55:42,680
Now, escalation, escalation isn't a fallback topic with a generic apology.

1243
00:55:42,680 --> 00:55:45,120
Escalation is a designed hand off where state is preserved.

1244
00:55:45,120 --> 00:55:49,400
If the agent can't proceed missing entitlement failed precondition and big US input system

1245
00:55:49,400 --> 00:55:54,400
down, it should produce a structured package, what the user asked, what it verified, what

1246
00:55:54,400 --> 00:55:58,600
failed, what it attempted, and what a human needs to complete the work.

1247
00:55:58,600 --> 00:56:01,880
Then root that package into the right queue with ownership and SLA.

1248
00:56:01,880 --> 00:56:05,720
That's how you avoid the most common operational dead end.

1249
00:56:05,720 --> 00:56:06,880
Try again later.

1250
00:56:06,880 --> 00:56:10,760
Try again later is how you create duplicate tickets in consistent records and user work

1251
00:56:10,760 --> 00:56:16,160
around. So the orchestration structure you want is almost boring, a small set of execution

1252
00:56:16,160 --> 00:56:20,400
topics with strict parameter collection, a deterministic tool invocation policy, asynchronous

1253
00:56:20,400 --> 00:56:25,920
flows for real work, explicit status transitions, id-impotent retries, designed escalation with

1254
00:56:25,920 --> 00:56:29,720
state, not a shrug, and here's the part that will bother people who want agents to feel

1255
00:56:29,720 --> 00:56:30,800
magical.

1256
00:56:30,800 --> 00:56:34,760
When orchestration is done correctly, the conversation gets smaller, not bigger.

1257
00:56:34,760 --> 00:56:38,560
Because the goal isn't to simulate a helpful colleague, the goal is to run a controlled system

1258
00:56:38,560 --> 00:56:40,680
through a human-friendly interface.

1259
00:56:40,680 --> 00:56:44,600
Once you accept that, the architecture becomes obvious, and you can finally do the thing

1260
00:56:44,600 --> 00:56:47,160
most agent programs avoid until it's too late.

1261
00:56:47,160 --> 00:56:51,840
You can manage agents like a portfolio, not like a craft project.

1262
00:56:51,840 --> 00:56:55,320
Governance, agent portfolio management, not one off botcraft.

1263
00:56:55,320 --> 00:56:57,600
Here's what happens after the first agentships.

1264
00:56:57,600 --> 00:56:58,960
Nothing stays the first agent.

1265
00:56:58,960 --> 00:57:02,200
It becomes a template, a precedent, and an excuse.

1266
00:57:02,200 --> 00:57:06,400
And if governance doesn't exist as an operating system, you don't get an agent program.

1267
00:57:06,400 --> 00:57:08,040
You get agents sprawl.

1268
00:57:08,040 --> 00:57:11,040
Examples of semi-owned chat surfaces wired into production systems.

1269
00:57:11,040 --> 00:57:12,120
Each one, temporary.

1270
00:57:12,120 --> 00:57:14,120
Each one, carrying a little more security debt.

1271
00:57:14,120 --> 00:57:18,040
So governance, in agent terms, isn't a committee, its portfolio management.

1272
00:57:18,040 --> 00:57:23,360
A portfolio means an inventory, an owner, a life cycle, and a repeatable way to decide what

1273
00:57:23,360 --> 00:57:27,360
gets built, what gets promoted, what gets retired, and what gets blocked.

1274
00:57:27,360 --> 00:57:28,360
And it needs a loop.

1275
00:57:28,360 --> 00:57:32,320
A real one, plan implement, manage, improve, extend, not as a slide, as an operating rhythm

1276
00:57:32,320 --> 00:57:35,760
that survives maker turnover and leadership attention drift.

1277
00:57:35,760 --> 00:57:38,880
One is where the organization forces discipline.

1278
00:57:38,880 --> 00:57:43,160
Outcomes, scoped intents, risk tier, and the non-negotiables.

1279
00:57:43,160 --> 00:57:46,560
Identity model, system of record, audit requirement, and refusal rules.

1280
00:57:46,560 --> 00:57:48,680
This is also where you classify the agent.

1281
00:57:48,680 --> 00:57:50,720
Because not every agent deserves the same controls.

1282
00:57:50,720 --> 00:57:55,560
A discovery agent that only retrieves curated policy guidance isn't governed like an execution

1283
00:57:55,560 --> 00:57:57,320
agent that can mutate records.

1284
00:57:57,320 --> 00:58:00,640
If you govern them the same, you'll either suffocate harmless agents or under-governed

1285
00:58:00,640 --> 00:58:01,800
dangerous ones.

1286
00:58:01,800 --> 00:58:03,040
Both outcomes are common.

1287
00:58:03,040 --> 00:58:04,520
Both are failures.

1288
00:58:04,520 --> 00:58:07,320
Agent is where teams prove they can build within constraints.

1289
00:58:07,320 --> 00:58:10,400
Tool allow lists, environment strategy, and release gates.

1290
00:58:10,400 --> 00:58:14,120
This is where you prevent the most predictable architecture erosion.

1291
00:58:14,120 --> 00:58:17,040
Building in production because it's just a small change.

1292
00:58:17,040 --> 00:58:20,360
Agents degrade that way faster than apps because prompt edits feel harmless.

1293
00:58:20,360 --> 00:58:21,360
They aren't.

1294
00:58:21,360 --> 00:58:22,720
Their behavior changes.

1295
00:58:22,720 --> 00:58:26,200
Manage is where organizations stop pretending ownership is implicit.

1296
00:58:26,200 --> 00:58:29,360
Every agent needs an accountable owner and a technical owner.

1297
00:58:29,360 --> 00:58:33,560
Accountable means they take the heat when the agent causes rework, risk, or reputation

1298
00:58:33,560 --> 00:58:34,560
or damage.

1299
00:58:34,560 --> 00:58:37,560
Technical means they can actually change the thing when it drifts.

1300
00:58:37,560 --> 00:58:39,920
If those aren't named, the agent is already orphaned.

1301
00:58:39,920 --> 00:58:41,680
It just hasn't been discovered yet.

1302
00:58:41,680 --> 00:58:43,400
Inventory is the other half of manage.

1303
00:58:43,400 --> 00:58:45,040
You need a tenant-wide list.

1304
00:58:45,040 --> 00:58:46,360
What agents exist.

1305
00:58:46,360 --> 00:58:47,520
Where they're deployed.

1306
00:58:47,520 --> 00:58:48,600
What channels they run in.

1307
00:58:48,600 --> 00:58:49,600
What data they touch.

1308
00:58:49,600 --> 00:58:53,280
What tools they can invoke and which identities they run as.

1309
00:58:53,280 --> 00:58:54,720
Without that you don't have governance.

1310
00:58:54,720 --> 00:58:57,240
You have hope.

1311
00:58:57,240 --> 00:59:00,240
Improve is where the feedback loop gets operational.

1312
00:59:00,240 --> 00:59:02,160
Not users said it was cool.

1313
00:59:02,160 --> 00:59:03,440
No signals.

1314
00:59:03,440 --> 00:59:04,440
Escalation patterns.

1315
00:59:04,440 --> 00:59:05,440
Failure modes.

1316
00:59:05,440 --> 00:59:06,440
Intervention rates.

1317
00:59:06,440 --> 00:59:08,320
And which intents produce exceptions.

1318
00:59:08,320 --> 00:59:10,640
And yes, this is where you do the boring work.

1319
00:59:10,640 --> 00:59:11,640
Titan contracts.

1320
00:59:11,640 --> 00:59:12,800
Prune knowledge.

1321
00:59:12,800 --> 00:59:13,800
Remove tools.

1322
00:59:13,800 --> 00:59:14,800
Split topics.

1323
00:59:14,800 --> 00:59:15,800
Adjust refusal paths.

1324
00:59:15,800 --> 00:59:18,520
And fix the things that keep creating human cleanup.

1325
00:59:18,520 --> 00:59:22,160
Extend is where scale happens without repeating the same mistakes.

1326
00:59:22,160 --> 00:59:24,760
Standardized patterns become reusable assets.

1327
00:59:24,760 --> 00:59:26,520
Intent contract templates.

1328
00:59:26,520 --> 00:59:28,400
Tool contract patterns.

1329
00:59:28,400 --> 00:59:29,400
Environment setups.

1330
00:59:29,400 --> 00:59:30,640
And testing baselines.

1331
00:59:30,640 --> 00:59:35,520
This is also where you decide when to add advanced capability, new connectors, new channels,

1332
00:59:35,520 --> 00:59:40,000
autonomous triggers, without turning the platform into an uncontrolled feature buffet.

1333
00:59:40,000 --> 00:59:43,920
Now the governance mechanics enterprises always underestimate environments, DLP and

1334
00:59:43,920 --> 00:59:45,400
release gates.

1335
00:59:45,400 --> 00:59:46,680
Environments aren't bureaucracy.

1336
00:59:46,680 --> 00:59:48,280
They are blast radius control.

1337
00:59:48,280 --> 00:59:51,560
If makers can build anywhere they will, if they can publish anywhere they will.

1338
00:59:51,560 --> 00:59:52,560
That's not malice.

1339
00:59:52,560 --> 00:59:54,360
That's entropy.

1340
00:59:54,360 --> 00:59:56,960
So you need a clear environment strategy.

1341
00:59:56,960 --> 00:59:58,160
Development for building.

1342
00:59:58,160 --> 00:59:59,360
Test for validation.

1343
00:59:59,360 --> 01:00:00,360
Production for runtime.

1344
01:00:00,360 --> 01:00:04,760
Promotions happen through solutions and pipelines, not through copy paste and I changed one line

1345
01:00:04,760 --> 01:00:05,960
of instructions.

1346
01:00:05,960 --> 01:00:07,520
And the gating rule is simple.

1347
01:00:07,520 --> 01:00:11,960
If the agent can act it needs a release gate that includes testing and an owner sign off.

1348
01:00:11,960 --> 01:00:13,440
Then DLP.

1349
01:00:13,440 --> 01:00:17,120
Not as a checkbox but as a constraint on connectors, triggers and knowledge sources.

1350
01:00:17,120 --> 01:00:21,920
You don't allow any connector that works that you allow the ones that match the contracts.

1351
01:00:21,920 --> 01:00:24,560
DLP is where you prevent the classic drift.

1352
01:00:24,560 --> 01:00:28,640
Someone adds a new connector to solve a one-off problem and suddenly the agent can exfiltrate

1353
01:00:28,640 --> 01:00:31,840
data to a place governance doesn't monitor.

1354
01:00:31,840 --> 01:00:37,000
And finally the part most programs skip until the incident change control that survives

1355
01:00:37,000 --> 01:00:38,560
people leaving.

1356
01:00:38,560 --> 01:00:39,560
Agents are easy to build.

1357
01:00:39,560 --> 01:00:40,560
That's the point.

1358
01:00:40,560 --> 01:00:41,560
It's also the problem.

1359
01:00:41,560 --> 01:00:45,400
If the only person who understands an agent is the maker who built it at 2am, it isn't

1360
01:00:45,400 --> 01:00:46,560
an enterprise asset.

1361
01:00:46,560 --> 01:00:47,560
It's a pending outage.

1362
01:00:47,560 --> 01:00:49,880
So governance has to enforce three realities.

1363
01:00:49,880 --> 01:00:52,200
Ownership, inventory and release discipline.

1364
01:00:52,200 --> 01:00:55,760
Because once you operate agents as a portfolio, a second truth shows up.

1365
01:00:55,760 --> 01:01:00,280
You can stop arguing about AI quality in abstract terms and start measuring whether the architecture

1366
01:01:00,280 --> 01:01:01,280
is holding.

1367
01:01:01,280 --> 01:01:05,120
And measurement is where governance becomes real or becomes theatre.

1368
01:01:05,120 --> 01:01:07,760
Metrics that matter, measuring architecture, not sentiment.

1369
01:01:07,760 --> 01:01:11,200
If governance is portfolio management, metrics are how you prove the portfolio isn't

1370
01:01:11,200 --> 01:01:12,360
quietly rotting.

1371
01:01:12,360 --> 01:01:15,520
And the first thing to accept is that most agent metrics are comfort metrics.

1372
01:01:15,520 --> 01:01:17,400
They tell you the agent is being talked to.

1373
01:01:17,400 --> 01:01:20,440
They don't tell you the agent is doing the job you delegated.

1374
01:01:20,440 --> 01:01:23,560
Session counts, conversation volume, thumbs up ratios.

1375
01:01:23,560 --> 01:01:24,560
Nice to have.

1376
01:01:24,560 --> 01:01:27,360
I don't answer the only question that matters in an enterprise.

1377
01:01:27,360 --> 01:01:31,640
Did the system behave predictably under control and with evidence so the metric set has to

1378
01:01:31,640 --> 01:01:33,760
measure architecture, not sentiment?

1379
01:01:33,760 --> 01:01:39,040
It has to expose where your design is probabilistic, where it's deterministic, and where humans

1380
01:01:39,040 --> 01:01:41,560
are doing cleanup work you didn't admit existed.

1381
01:01:41,560 --> 01:01:44,760
Start with the most honest metric in the entire program.

1382
01:01:44,760 --> 01:01:45,760
Intervention rate.

1383
01:01:45,760 --> 01:01:50,440
Intervention rate is how often a human overrides, red does or corrects what the agent produced.

1384
01:01:50,440 --> 01:01:52,400
Not escalations as a concept.

1385
01:01:52,400 --> 01:01:56,880
Local interventions, users say is never mind, reopens the ticket, changes fields after the

1386
01:01:56,880 --> 01:02:01,880
agent created the record, reruns the flow manually, or asks a human to validate before they

1387
01:02:01,880 --> 01:02:02,880
trusted.

1388
01:02:02,880 --> 01:02:07,440
A high intervention rate means your agent is producing work-shaped output that still requires

1389
01:02:07,440 --> 01:02:11,440
human verification, that is not automation, that is outsourced uncertainty.

1390
01:02:11,440 --> 01:02:15,800
And the enterprise tends to fund that for months because the agent looks busy.

1391
01:02:15,800 --> 01:02:18,240
Intervention rate is how you stop funding theatre.

1392
01:02:18,240 --> 01:02:20,160
Second, determinism score.

1393
01:02:20,160 --> 01:02:24,720
This is the metric most people don't measure because it forces uncomfortable design changes.

1394
01:02:24,720 --> 01:02:26,320
The definition is simple.

1395
01:02:26,320 --> 01:02:30,240
Given the same intent and the same context, does the agent take the same action path?

1396
01:02:30,240 --> 01:02:34,440
Not, does it respond similarly, does it root to the same topic, call the same tool, validate

1397
01:02:34,440 --> 01:02:37,280
the same preconditions and produce the same state transition?

1398
01:02:37,280 --> 01:02:42,520
If the action path changes based on phrasing, random retrieval chunks or prompt drift,

1399
01:02:42,520 --> 01:02:43,720
determinism is low.

1400
01:02:43,720 --> 01:02:48,280
And low determinism is the root cause of operational distrust because humans can't build stable

1401
01:02:48,280 --> 01:02:51,520
expectations around something that behaves differently every time.

1402
01:02:51,520 --> 01:02:55,680
So the practical way to measure determinism is to maintain a fixed test set of canonical

1403
01:02:55,680 --> 01:03:00,040
requests, 10 to 50 high value intents, and rerun them after every change.

1404
01:03:00,040 --> 01:03:03,440
Same user profile, same data conditions where possible, same environment.

1405
01:03:03,440 --> 01:03:05,520
Measure whether the root and tool cause remains stable.

1406
01:03:05,520 --> 01:03:09,360
If you can't keep the path stable, the solution isn't trained users.

1407
01:03:09,360 --> 01:03:12,240
The solution is tighter contracts and smaller tool surfaces.

1408
01:03:12,240 --> 01:03:14,520
Third, audit completeness.

1409
01:03:14,520 --> 01:03:17,200
Audit completeness is not, we have a transcript.

1410
01:03:17,200 --> 01:03:18,200
Transcripts are narrative.

1411
01:03:18,200 --> 01:03:20,600
Audits require reconstructable evidence.

1412
01:03:20,600 --> 01:03:22,120
Audit completeness asks.

1413
01:03:22,120 --> 01:03:26,800
For every agent initiated action, can the organization show what was requested, what identity

1414
01:03:26,800 --> 01:03:32,480
executed it, what tool or workflow ran, what record changed, what preconditions were checked,

1415
01:03:32,480 --> 01:03:35,040
what approval occurred and what the outcome was?

1416
01:03:35,040 --> 01:03:37,760
If the answer is kind of, you don't have an agent system.

1417
01:03:37,760 --> 01:03:40,240
You have a conversational interface with side effects.

1418
01:03:40,240 --> 01:03:44,240
This is where a system of record stops being an integration detail and becomes the core

1419
01:03:44,240 --> 01:03:45,240
control.

1420
01:03:45,240 --> 01:03:48,800
If you're not writing structured entries into the system of record for agent actions,

1421
01:03:48,800 --> 01:03:51,840
you are building an unorditable automation layer.

1422
01:03:51,840 --> 01:03:54,480
That will end exactly how you think it will end.

1423
01:03:54,480 --> 01:03:58,560
Now add operational KPIs, but tie them to strict definitions.

1424
01:03:58,560 --> 01:04:02,320
Resolution rate only matters if resolved means the workflow reached a defined terminal

1425
01:04:02,320 --> 01:04:03,320
state.

1426
01:04:03,320 --> 01:04:07,680
Deflection rate only matters if deflection means no human head to intervene later.

1427
01:04:07,680 --> 01:04:11,840
Mean time to resolution only matters if you measure it from request to verified state change,

1428
01:04:11,840 --> 01:04:13,560
not from chat start to chat end.

1429
01:04:13,560 --> 01:04:17,640
Otherwise you can improve the metric by ending sessions faster while pushing work downstream.

1430
01:04:17,640 --> 01:04:22,480
The last metric that separates mature programs from pilot programs is refusal quality.

1431
01:04:22,480 --> 01:04:24,840
If refusal is a feature, it should be measurable.

1432
01:04:24,840 --> 01:04:29,200
How often the agent refuses, why it refuses and whether refusal roots users into a clean

1433
01:04:29,200 --> 01:04:31,600
escalation path with preserved state.

1434
01:04:31,600 --> 01:04:34,840
A refusal that creates a usable ticket with context is a control.

1435
01:04:34,840 --> 01:04:38,960
A refusal that tells the user to try again later is just a delayed failure.

1436
01:04:38,960 --> 01:04:42,360
So the metric set becomes a dashboard of architectural truth.

1437
01:04:42,360 --> 01:04:47,040
A convention rate, determinism score, audit completeness, operational KPIs with real definitions

1438
01:04:47,040 --> 01:04:48,640
and refusal quality.

1439
01:04:48,640 --> 01:04:50,960
And once you track those, two things happen.

1440
01:04:50,960 --> 01:04:54,160
First, you stop arguing about whether the agent is good.

1441
01:04:54,160 --> 01:04:56,360
You start seeing where the system is uncontrolled.

1442
01:04:56,360 --> 01:05:01,240
Second, you realize metrics without testing discipline are just forensic reports after trust

1443
01:05:01,240 --> 01:05:02,240
is already damaged.

1444
01:05:02,240 --> 01:05:06,360
So the next mandate is evidence test like software because you are shipping a system,

1445
01:05:06,360 --> 01:05:08,000
not a personality.

1446
01:05:08,000 --> 01:05:09,000
Testing discipline.

1447
01:05:09,000 --> 01:05:11,320
From try it out to evidence.

1448
01:05:11,320 --> 01:05:15,480
Once you start measuring determinism and intervention, you run into a problem that can't be solved

1449
01:05:15,480 --> 01:05:16,640
with optimism.

1450
01:05:16,640 --> 01:05:18,640
You can't manage what you can't reproduce.

1451
01:05:18,640 --> 01:05:23,280
And most co-pilot agent programs are built on the least reproducible testing method in enterprise

1452
01:05:23,280 --> 01:05:24,280
IT.

1453
01:05:24,280 --> 01:05:28,840
Someone trying it out in a chat pane, getting a decent answer once and shipping.

1454
01:05:28,840 --> 01:05:29,840
That's not testing.

1455
01:05:29,840 --> 01:05:30,840
That's a vibe check.

1456
01:05:30,840 --> 01:05:31,920
Agents don't need more encouragement.

1457
01:05:31,920 --> 01:05:33,120
They need evidence.

1458
01:05:33,120 --> 01:05:37,160
So the testing discipline has to look like software discipline adapted for a probabilistic

1459
01:05:37,160 --> 01:05:39,680
component, not because the platform is broken.

1460
01:05:39,680 --> 01:05:43,360
Because the platform is doing exactly what probabilistic systems do, they vary.

1461
01:05:43,360 --> 01:05:47,280
Your job is to bound that variance until it becomes operationally acceptable.

1462
01:05:47,280 --> 01:05:48,800
The first shift is simple.

1463
01:05:48,800 --> 01:05:50,120
Move testing left.

1464
01:05:50,120 --> 01:05:53,840
Early, repeatable and tied to the intents you actually care about.

1465
01:05:53,840 --> 01:05:56,960
Before anything goes near production, you define a core scenario set.

1466
01:05:56,960 --> 01:06:00,200
10 to 50 scenarios that represent the real outcomes you're delegating.

1467
01:06:00,200 --> 01:06:03,320
Not edge cases, not fun prompts, not clever demos.

1468
01:06:03,320 --> 01:06:07,400
The boring high frequency paths that will generate tickets, approvals, changes and audit

1469
01:06:07,400 --> 01:06:08,400
questions.

1470
01:06:08,400 --> 01:06:12,040
And you test them every time you change anything that can alter behavior.

1471
01:06:12,040 --> 01:06:17,120
Instructions, topic triggers, two definitions, knowledge sources, connector auth, models,

1472
01:06:17,120 --> 01:06:18,640
even environment policies.

1473
01:06:18,640 --> 01:06:20,320
Because those are all behavior changes.

1474
01:06:20,320 --> 01:06:22,120
Some are just disguised as configuration.

1475
01:06:22,120 --> 01:06:24,600
Copilot Studio gives you multiple ways to do this.

1476
01:06:24,600 --> 01:06:26,800
And the point isn't which feature is best.

1477
01:06:26,800 --> 01:06:30,640
The point is you stop relying on memory and start relying on repeatable runs.

1478
01:06:30,640 --> 01:06:33,520
Use the built-in evaluation capability when it fits.

1479
01:06:33,520 --> 01:06:36,440
Single turn checks for quality, groundedness and completeness.

1480
01:06:36,440 --> 01:06:39,800
It's useful for quickly catching regressions in generative answers.

1481
01:06:39,800 --> 01:06:44,120
And it can generate test sets based on your agent metadata, knowledge sources or past

1482
01:06:44,120 --> 01:06:45,120
chats.

1483
01:06:45,120 --> 01:06:46,120
That's fine for breadth.

1484
01:06:46,120 --> 01:06:48,400
But don't confuse that with end-to-end proof.

1485
01:06:48,400 --> 01:06:52,120
Single turn evaluation won't tell you whether a multi-turn execution path collects parameters

1486
01:06:52,120 --> 01:06:57,200
correctly, validates preconditions, calls the right tool and writes to the system of record.

1487
01:06:57,200 --> 01:06:59,040
It won't show you item potency failures.

1488
01:06:59,040 --> 01:07:04,320
It won't show you what happens when a connector throws an HTTP error halfway through a workflow.

1489
01:07:04,320 --> 01:07:07,840
So for execution scenarios you need multi-turn automated testing.

1490
01:07:07,840 --> 01:07:12,080
That's where bulk testing approaches like the Copilot Studio Kit become the adult option.

1491
01:07:12,080 --> 01:07:16,640
You run test conversations that include the messy user request, the clarification turns,

1492
01:07:16,640 --> 01:07:20,520
the confirmation steps, the tool invocation, the error path and the escalation path.

1493
01:07:20,520 --> 01:07:23,800
And you run them at scale because non-determinism hides in volume.

1494
01:07:23,800 --> 01:07:25,680
The goal isn't to prove it works.

1495
01:07:25,680 --> 01:07:29,080
The goal is to discover where it fails before users do.

1496
01:07:29,080 --> 01:07:31,480
Now add the most ignored part of agent testing.

1497
01:07:31,480 --> 01:07:32,720
Identity profiles.

1498
01:07:32,720 --> 01:07:35,440
An agent that works under a maker account proves nothing.

1499
01:07:35,440 --> 01:07:40,640
You test with designated user profiles that reflect real entitlements, standard employee,

1500
01:07:40,640 --> 01:07:45,520
privileged operator, regional user, contractor, service desk, analyst, same intent, different

1501
01:07:45,520 --> 01:07:47,280
identity, different access graph.

1502
01:07:47,280 --> 01:07:48,440
That's not a corner case.

1503
01:07:48,440 --> 01:07:50,600
That's the system.

1504
01:07:50,600 --> 01:07:55,040
Because many agent failures are just authorization reality finally becoming visible.

1505
01:07:55,040 --> 01:07:59,240
The test discipline has to surface that deliberately, not by surprise in a team's channel.

1506
01:07:59,240 --> 01:08:01,560
Then you treat tests as architecture artifacts.

1507
01:08:01,560 --> 01:08:02,560
They are versioned.

1508
01:08:02,560 --> 01:08:03,560
They are owned.

1509
01:08:03,560 --> 01:08:04,560
They live with the solution.

1510
01:08:04,560 --> 01:08:05,560
They run in pipelines.

1511
01:08:05,560 --> 01:08:06,560
And they gate releases.

1512
01:08:06,560 --> 01:08:10,320
If you can't describe your release gate in one sentence, we won't promote this agent unless

1513
01:08:10,320 --> 01:08:15,560
core scenarios pass under these identities and tool calls produce these records.

1514
01:08:15,560 --> 01:08:17,240
You don't have a release process.

1515
01:08:17,240 --> 01:08:19,360
You have hope with deployment permissions.

1516
01:08:19,360 --> 01:08:21,880
And yes, you need regression after every change.

1517
01:08:21,880 --> 01:08:27,280
Agents drift from small edits, knowledge updates, connector changes, model updates, capacity

1518
01:08:27,280 --> 01:08:28,280
limits.

1519
01:08:28,280 --> 01:08:29,280
Those aren't hypothetical.

1520
01:08:29,280 --> 01:08:31,320
They're routine platform reality.

1521
01:08:31,320 --> 01:08:35,200
So the discipline is same test set, same thresholds, rerun continuously.

1522
01:08:35,200 --> 01:08:39,480
Finally, testing has to include failure path design, not just success path validation.

1523
01:08:39,480 --> 01:08:40,960
Force the tool to fail.

1524
01:08:40,960 --> 01:08:45,960
Remove permissions, break the dependency, exceed the token limit with oversized inputs.

1525
01:08:45,960 --> 01:08:46,960
Trigger throttling.

1526
01:08:46,960 --> 01:08:48,320
Watch what the agent does.

1527
01:08:48,320 --> 01:08:51,280
If it masks the failure with language, that's a defect.

1528
01:08:51,280 --> 01:08:55,080
If it escalates with state preserved and evidence attached, that's architecture.

1529
01:08:55,080 --> 01:08:57,320
This is what enterprise grade actually means.

1530
01:08:57,320 --> 01:09:00,920
Not fewer failures, but failures that are predictable, explainable and containable.

1531
01:09:00,920 --> 01:09:04,640
Once you have that evidence loop, you stop debating whether the agent is ready.

1532
01:09:04,640 --> 01:09:05,640
You know.

1533
01:09:05,640 --> 01:09:08,360
And that's when operations becomes the next inevitability.

1534
01:09:08,360 --> 01:09:10,240
Drift doesn't stop because you tested once.

1535
01:09:10,240 --> 01:09:12,080
It just becomes visible sooner.

1536
01:09:12,080 --> 01:09:15,160
Agent ops, drift, incidents and platform reality.

1537
01:09:15,160 --> 01:09:17,520
Now take that testing discipline and assume it worked.

1538
01:09:17,520 --> 01:09:18,800
You shipped.

1539
01:09:18,800 --> 01:09:20,320
Core scenarios pass.

1540
01:09:20,320 --> 01:09:22,000
The agent behaves.

1541
01:09:22,000 --> 01:09:23,920
Everyone celebrates.

1542
01:09:23,920 --> 01:09:25,960
Then the real system starts operating.

1543
01:09:25,960 --> 01:09:30,400
The one made of people, policies, connectors, models and monthly platform updates.

1544
01:09:30,400 --> 01:09:34,280
This is where agent ops stops being optional and becomes the only thing between successful

1545
01:09:34,280 --> 01:09:36,960
pilot and quiet abandonment.

1546
01:09:36,960 --> 01:09:39,880
Agent ops is just the operational truth of agents.

1547
01:09:39,880 --> 01:09:42,840
These systems drift and drift creates incidents.

1548
01:09:42,840 --> 01:09:44,960
And drift does not require malice or incompetence.

1549
01:09:44,960 --> 01:09:46,160
It only requires time.

1550
01:09:46,160 --> 01:09:47,640
Drift shows up in three places.

1551
01:09:47,640 --> 01:09:49,520
First you change the agent.

1552
01:09:49,520 --> 01:09:51,640
Instructions, triggers, prompts, knowledge, tools.

1553
01:09:51,640 --> 01:09:54,040
That's a behavior change wearing a friendly UI.

1554
01:09:54,040 --> 01:09:56,320
Second, dependencies changed.

1555
01:09:56,320 --> 01:10:02,040
Access, schemers, catalogs, choice values, conditional access, DLP, tokens, secrets, licensing,

1556
01:10:02,040 --> 01:10:03,040
capacity.

1557
01:10:03,040 --> 01:10:04,840
Third, the platform moved.

1558
01:10:04,840 --> 01:10:09,880
Models update, retrieval indexing shifts, throttling changes, preview features, sunset.

1559
01:10:09,880 --> 01:10:11,560
The agent didn't get weird yet.

1560
01:10:11,560 --> 01:10:13,040
Your runtime changed.

1561
01:10:13,040 --> 01:10:14,280
So the mandate is simple.

1562
01:10:14,280 --> 01:10:15,920
Treat agents like services.

1563
01:10:15,920 --> 01:10:16,920
Services have runbooks.

1564
01:10:16,920 --> 01:10:17,920
They have telemetry.

1565
01:10:17,920 --> 01:10:19,080
They have incident reviews.

1566
01:10:19,080 --> 01:10:20,080
They have backlogs.

1567
01:10:20,080 --> 01:10:23,520
They have owners who get paged when reality disagrees with the demo.

1568
01:10:23,520 --> 01:10:28,320
But with reliability patterns, because most agent incidents are boring in the same way.

1569
01:10:28,320 --> 01:10:34,080
Timeouts, partial completions, duplicate execution, silent tool failures, inconsistent routing,

1570
01:10:34,080 --> 01:10:37,280
users resubmitting the same request because the agent went quiet.

1571
01:10:37,280 --> 01:10:38,280
So you design for that.

1572
01:10:38,280 --> 01:10:41,280
Retrieves exist, but only where the operation is idempotent.

1573
01:10:41,280 --> 01:10:45,720
If a create ticket action can run twice and create two incidents, you don't retry.

1574
01:10:45,720 --> 01:10:50,400
You add a correlation ID and a deduplication check in the execution layer, then retry becomes

1575
01:10:50,400 --> 01:10:52,200
resume, not repeat.

1576
01:10:52,200 --> 01:10:53,200
Timeouts exist.

1577
01:10:53,200 --> 01:10:58,080
So every long-running action needs explicit time budgets and user-visible state transitions,

1578
01:10:58,080 --> 01:11:01,520
accepted, queued, awaiting approval, executing, completed.

1579
01:11:01,520 --> 01:11:07,080
If it fails, the agent returns a receipt with a record ID or escalation artifact, not a paragraph.

1580
01:11:07,080 --> 01:11:09,160
Fallbacks exist, but fallbacks must be engineered.

1581
01:11:09,160 --> 01:11:13,480
If the model can't confidently root, it should ask a disambiguation question or escalate

1582
01:11:13,480 --> 01:11:14,480
with context.

1583
01:11:14,480 --> 01:11:17,800
A fallback that produces plausible text is not resilience.

1584
01:11:17,800 --> 01:11:19,160
It's fraud with better grammar.

1585
01:11:19,160 --> 01:11:21,360
And human in the loop checkpoints aren't a concession.

1586
01:11:21,360 --> 01:11:22,360
There are control points.

1587
01:11:22,360 --> 01:11:27,040
High-impact changes require a human approval gate, even if the agent did everything else.

1588
01:11:27,040 --> 01:11:29,680
That's how you keep autonomy from turning into liability.

1589
01:11:29,680 --> 01:11:34,160
Now observability, this is where most agent programs collapse because they rely on chat

1590
01:11:34,160 --> 01:11:35,960
transcripts like their logs.

1591
01:11:35,960 --> 01:11:36,960
They're not.

1592
01:11:36,960 --> 01:11:38,720
Transcripts are narrative, not telemetry.

1593
01:11:38,720 --> 01:11:43,560
You need instrumentation for, which intent fired, which topic ran, which tools were invoked,

1594
01:11:43,560 --> 01:11:48,000
which identity executed them, how long each step took, what errors returned, how often

1595
01:11:48,000 --> 01:11:51,760
users abandoned, and how often escalation happened.

1596
01:11:51,760 --> 01:11:54,640
Uncalled telemetry matters more than the agent said something.

1597
01:11:54,640 --> 01:11:57,800
And you need to capture latency because latency creates user behavior.

1598
01:11:57,800 --> 01:12:02,080
When agents take too long, users resubmit, resubmission creates duplicates.

1599
01:12:02,080 --> 01:12:03,600
Duplicates create rework.

1600
01:12:03,600 --> 01:12:05,000
Rework destroys trust.

1601
01:12:05,000 --> 01:12:06,680
This is not user training.

1602
01:12:06,680 --> 01:12:08,000
This is system behavior.

1603
01:12:08,000 --> 01:12:09,920
Then you need an operational posture.

1604
01:12:09,920 --> 01:12:12,680
Weekly review, not quarterly post-mortems.

1605
01:12:12,680 --> 01:12:17,600
Look for drift signals, rising intervention rate, rising fallback rate, increased unrecognized

1606
01:12:17,600 --> 01:12:21,720
utterances, increased tool call failures, rising latency, more escalating.

1607
01:12:21,720 --> 01:12:24,760
In the specific intent, those are leading indicators.

1608
01:12:24,760 --> 01:12:27,160
If you wait for the incident, you've already lost adoption.

1609
01:12:27,160 --> 01:12:32,080
When an incident does happen, the review has to be brutally specific, not the model hallucinated.

1610
01:12:32,080 --> 01:12:33,080
That's not a root cause.

1611
01:12:33,080 --> 01:12:35,320
The root cause is always architectural.

1612
01:12:35,320 --> 01:12:40,480
Missing precondition checks, ambiguous rooting, overbroad tool surface, unowned knowledge, identity

1613
01:12:40,480 --> 01:12:42,640
mismatch, or inadequate logging.

1614
01:12:42,640 --> 01:12:46,200
And every incident produces a backlog item that hardens the system.

1615
01:12:46,200 --> 01:12:50,640
Titer contracts, better validation, narrower tools, improved escalation artifacts, additional

1616
01:12:50,640 --> 01:12:52,520
tests, and clearer runbooks.

1617
01:12:52,520 --> 01:12:54,160
That's the real promise of agent ops.

1618
01:12:54,160 --> 01:12:55,520
Not that incidents disappear.

1619
01:12:55,520 --> 01:12:59,480
That incidents become containable, explainable, and less frequent over time.

1620
01:12:59,480 --> 01:13:02,480
Because the uncomfortable truth is that platforms will keep moving.

1621
01:13:02,480 --> 01:13:03,720
The tenant will keep changing.

1622
01:13:03,720 --> 01:13:06,200
Your organization will keep accumulating entropy.

1623
01:13:06,200 --> 01:13:11,120
The question is whether your agent program treats that as AI being AI or whether you run

1624
01:13:11,120 --> 01:13:12,120
it like production.

1625
01:13:12,120 --> 01:13:15,200
If you choose production, drift becomes a managed cost.

1626
01:13:15,200 --> 01:13:19,800
If you choose vibes, drift becomes your entire story, where it's in the executive playbook.

1627
01:13:19,800 --> 01:13:22,280
The five decisions leadership must force.

1628
01:13:22,280 --> 01:13:24,320
At this point, the pattern should be obvious.

1629
01:13:24,320 --> 01:13:26,160
Agent failure isn't a tooling problem.

1630
01:13:26,160 --> 01:13:28,880
It's a leadership problem disguised as a maker problem.

1631
01:13:28,880 --> 01:13:31,920
Because agents sit at the intersection of accountability and automation.

1632
01:13:31,920 --> 01:13:35,640
And if leadership doesn't force a few hard decisions up front, the system will make those

1633
01:13:35,640 --> 01:13:37,040
decisions for you.

1634
01:13:37,040 --> 01:13:41,360
Accidentally, inconsistently, and usually during an incident review.

1635
01:13:41,360 --> 01:13:43,840
So here are the five decisions leadership has to force.

1636
01:13:43,840 --> 01:13:46,640
Not suggest, not encourage, force.

1637
01:13:46,640 --> 01:13:47,640
Decision one.

1638
01:13:47,640 --> 01:13:50,040
The agent owns end to end and what stays human.

1639
01:13:50,040 --> 01:13:52,720
This is the boundary between assistance and delegation.

1640
01:13:52,720 --> 01:13:58,000
If the agent owns an outcome, it owns the workflow path, the tool calls, and the evidence.

1641
01:13:58,000 --> 01:14:02,720
If a human owns it, the agent can guide, summarize, and prepare, but it does not commit.

1642
01:14:02,720 --> 01:14:05,920
Leadership has to pick because ambiguity here creates blame later.

1643
01:14:05,920 --> 01:14:09,720
And the fastest way to kill adoption is to ship an agent that sometimes acts and sometimes

1644
01:14:09,720 --> 01:14:12,120
doesn't with no reliable rule behind it.

1645
01:14:12,120 --> 01:14:13,120
Decision two.

1646
01:14:13,120 --> 01:14:16,360
The identity model and the least privileged boundary.

1647
01:14:16,360 --> 01:14:22,120
Leadership has to decide whether execution runs as the user or as a service identity per intent.

1648
01:14:22,120 --> 01:14:25,880
Then enforce least privilege as a design constraint, not a security aspiration.

1649
01:14:25,880 --> 01:14:29,200
This is also where leaders stop tolerating temporary access.

1650
01:14:29,200 --> 01:14:32,480
Agents will operationalize whatever access that you've accumulated.

1651
01:14:32,480 --> 01:14:37,120
If the business wants action-capable agents, the business funds, identity hygiene, access

1652
01:14:37,120 --> 01:14:42,960
reviews, app permission discipline, and a clear run as pattern that survives staff turnover.

1653
01:14:42,960 --> 01:14:43,960
Decision three.

1654
01:14:43,960 --> 01:14:47,960
This record and the audit requirement, an agent that acts without writing to a system of record

1655
01:14:47,960 --> 01:14:50,040
is not an enterprise system.

1656
01:14:50,040 --> 01:14:52,960
It's an opinionated chat session with side effects.

1657
01:14:52,960 --> 01:14:56,720
Leadership has to name the record authority per workflow, service now dynamics a line of

1658
01:14:56,720 --> 01:15:02,000
business system, and make it non-negotiable that agent actions produce reconstructable evidence.

1659
01:15:02,000 --> 01:15:06,520
Request, identity, preconditions, approvals, tool invocation outcome.

1660
01:15:06,520 --> 01:15:10,000
If that evidence doesn't exist, the agent isn't allowed to execute.

1661
01:15:10,000 --> 01:15:11,000
Simple rule.

1662
01:15:11,000 --> 01:15:12,720
Massive impact.

1663
01:15:12,720 --> 01:15:13,720
Section four.

1664
01:15:13,720 --> 01:15:14,720
Release gates.

1665
01:15:14,720 --> 01:15:16,600
Testing thresholds and rollback expectations.

1666
01:15:16,600 --> 01:15:21,480
If leadership wants deterministic ROI, leadership forces deterministic release discipline.

1667
01:15:21,480 --> 01:15:27,000
That means pre-production environments, version changes, test sets that cover core intents

1668
01:15:27,000 --> 01:15:30,760
under real identity profiles and a past threshold that blocks promotion.

1669
01:15:30,760 --> 01:15:34,000
It also means rollback is designed, not improvised.

1670
01:15:34,000 --> 01:15:38,600
When the platform shifts under you, and it will, you need a way to revert behavior quickly

1671
01:15:38,600 --> 01:15:41,360
without a late night prompt edit in production.

1672
01:15:41,360 --> 01:15:42,360
Decision five.

1673
01:15:42,360 --> 01:15:45,320
Decision metrics tied to outcomes, not democquality.

1674
01:15:45,320 --> 01:15:47,640
Leadership has to kill vanity metrics early.

1675
01:15:47,640 --> 01:15:49,040
Session volume is not success.

1676
01:15:49,040 --> 01:15:50,720
Positive reactions are not success.

1677
01:15:50,720 --> 01:15:54,960
Success is an operational delta that survives outside the agent team.

1678
01:15:54,960 --> 01:15:59,400
Reduced escalations with strict definitions, reduced rework, faster approvals with preserved

1679
01:15:59,400 --> 01:16:04,060
compliance, measurable cycle time reduction, improved throughput and intervention rates

1680
01:16:04,060 --> 01:16:05,800
trending down over time.

1681
01:16:05,800 --> 01:16:07,920
And leaders have to assign metric ownership.

1682
01:16:07,920 --> 01:16:10,600
If nobody owns the metric, the agent doesn't have a goal.

1683
01:16:10,600 --> 01:16:11,800
It has a launch date.

1684
01:16:11,800 --> 01:16:16,800
When these five decisions are explicit, the rest becomes execution, contracts, tools,

1685
01:16:16,800 --> 01:16:20,280
context, orchestration, governance, measurement and operations.

1686
01:16:20,280 --> 01:16:23,360
And when they're not explicit, what you get is predictable too.

1687
01:16:23,360 --> 01:16:28,680
Chat first agents with broad permissions, unclear boundaries and an AI is unpredictable

1688
01:16:28,680 --> 01:16:33,200
narrative that conveniently avoids the actual root cause.

1689
01:16:33,200 --> 01:16:36,120
This is what thought leadership should really do in this space.

1690
01:16:36,120 --> 01:16:41,080
Stop arguing about whether agents are ready and start forcing architectural honesty.

1691
01:16:41,080 --> 01:16:43,240
Because the platform will keep improving.

1692
01:16:43,240 --> 01:16:44,440
Models will keep changing.

1693
01:16:44,440 --> 01:16:45,520
Features will keep shipping.

1694
01:16:45,520 --> 01:16:46,520
That's not your constraint.

1695
01:16:46,520 --> 01:16:51,640
Your constraint is whether you can turn probabilistic reasoning into deterministic operations.

1696
01:16:51,640 --> 01:16:55,280
And that only happens when leadership decides what gets enforced.

1697
01:16:55,280 --> 01:16:56,280
Conclusion.

1698
01:16:56,280 --> 01:16:57,760
Agents don't fail because they're dumb.

1699
01:16:57,760 --> 01:17:01,200
They fail because you ask them to act without a control plane.

1700
01:17:01,200 --> 01:17:05,240
An architecture is what turns probabilistic reasoning into governed execution.

1701
01:17:05,240 --> 01:17:09,800
If this helped, leave a review so fewer teams keep funding chat-shaped risk.

1702
01:17:09,800 --> 01:17:13,080
Just put your computer on LinkedIn and tell them what you're building and send the next

1703
01:17:13,080 --> 01:17:15,560
failure pattern you're seeing so it becomes the next episode.