This episode of the M365.FM Podcast — “The High-Performance Automation Control Plane” — explains why most enterprise automation initiatives stall or fail not because of tooling, but because they lack a control plane that governs automation at scale. Simply building workflows and connectors without governance, identity boundaries, execution constraints, and lifecycle policies leads to sprawl, drift, unpredictable outcomes, and hidden risk. A high-performance automation control plane is a live governance and execution fabric that ensures automation behaves predictably, aligns with business intent, is auditable, and can scale safely. The host outlines the architectural layers, design principles, and metrics that distinguish sustainable automation programs from chaotic ones.

Apple Podcasts podcast player iconSpotify podcast player iconYoutube Music podcast player iconSpreaker podcast player iconPodchaser podcast player iconAmazon Music podcast player icon

🧠 Core Theme

  • Traditional automation efforts focus on building workflows but neglect governance at runtime.

  • A control plane is the governance mechanism that constrains, measures, and protects automation at scale.

  • Without it, automation programs become unmanageable as they grow.


Why Automation Fails at Scale


🔹 1. Siloed Automation Islands

  • Teams build automations independently with different tools and execution paths.

  • This creates duplication, inconsistent behavior, and hidden dependencies.

  • Risk is fragmented and not visible at the enterprise level.


🔹 2. Identity Debt

  • Automations often run under:

    • Shared accounts

    • Generic service principals

    • Broad group permissions

  • Without scoped, unique identities, actions cannot be attributed or audited.

  • Lack of identity clarity undermines accountability and rollback capability.


🔹 3. Execution Without Contracts

  • Workflow actions run without clearly defined boundaries or constraints:

    • What operations are allowed

    • Under what conditions

    • What escalation paths exist

  • Without contracts, automation behaves unpredictably and side effects proliferate.


🔹 4. Ungoverned Data & Context

  • Automation grounded on unmanaged data sources produces inconsistent and unreliable outcomes.

  • Automation must distinguish readable content from allowed actionable content.


🔹 5. Vanity Metrics

  • Counting runs or workflows triggered does not reflect impact or risk.

  • Without meaningful metrics, programs misallocate focus and ignore hidden problems.


What a Control Plane Actually Is

A high-performance control plane is not a dashboard — it is a live governance fabric that enforces how automation operates, not just how it is monitored.

It consists of:


🛡️ Identity & Authentication

  • Each automation must have a unique, non-human, least-privilege identity.

  • This identity enables:

    • Clear audit trails

    • Responsible ownership

    • Safe rollback when necessary

Identity is the anchor for controlled execution.


🔒 Execution Contracts

Contracts define:

  • What actions are permitted

  • Under which conditions

  • With what constraints

They convert permission into predictable behavior.
Without them, automation is “permission without guardrails.”


📍 Authoritative Grounding

  • Automation must use governed, curated data sources as context.

  • Systems must distinguish between accessible data and authorized actionable data.

This ensures outputs are correct and trusted.


Lifecycle Governance

  • Automation must have a defined lifecycle:

    • Creation

    • Versioning

    • Review

    • Retirement

  • Clear governance prevents unwanted persistence and drift.

Retired or outdated workflows should not remain active by accident.


🔍 Drift Detection & Containment

  • The control plane must continuously monitor for:

    • Identity drift

    • Permission creep

    • Behavioral change

  • When drift is detected, containment should be surgical — not program-wide shutdown.

  • Self-correcting governance reduces risk without blocking innovation.


Practical Enterprise Considerations


🛠 Treat Automation as a Product

Each automation should be treated like a managed product with:

  • A clear sponsor

  • Defined users

  • Measurable KPIs

  • Ownership

  • Maintenance lifecycle

This product mindset prevents ad-hoc scripting chaos and aligns automation with business goals.


Meaningful Metrics That Matter

Move beyond vanity metrics toward metrics that reflect value and risk:

🎯 Operational Impact

  • Rate of task completion

  • Time saved per workflow

  • SLA adherence improvements

🛡 Risk Metrics

  • Traceable audit trails

  • Permission creep rates

  • Drift detection frequency

💰 Cost Efficiency

  • Cost per completed task

  • Resource usage efficiency

Meaningful metrics indicate true performance and risk posture.


Why a Control Plane Matters

  • It brings predictability to automation at scale.

  • It makes automation auditable and compliant.

  • It reduces risk by enforcing policies at runtime.

  • It enables governance without stifling innovation.

Programs without a control plane tend to be fragile, hidden from visibility, and dangerous when stretched.


Leadership Implications


🪄 Governance Must Be Operational

  • Policies must be enforced continuously, not just documented.

  • The control plane is the mechanism of enforcement.


👤 Identity & Accountability Are Core

  • Without unique identities, automation actions cannot be attributed, audited, or contained.


🚀 Constraints Enable Scale

  • Boundaries, contracts, and guardrails actually enable safe innovation at scale.

  • A controlled environment fosters trust in automation.


Key Takeaways

  • Most automation fails not because tools are inadequate, but because governance is missing.

  • A high-performance automation control plane enforces:

    • Identity boundaries

    • Execution contracts

    • Authoritative context

    • Lifecycle governance

    • Drift detection

  • Meaningful metrics measure outcomes and risk, not activity.

  • Treat automation as a governed product, not a loose collection of workflows.

Transcript

1
00:00:00,000 --> 00:00:03,220
But most organizations talk about power automate like it's a workflow tool.

2
00:00:03,220 --> 00:00:04,220
They are wrong.

3
00:00:04,220 --> 00:00:07,860
What they actually run is an automation control plane, a distributed decision and execution

4
00:00:07,860 --> 00:00:12,760
system that can write data, trigger approvals, move files and wake up other systems at machine

5
00:00:12,760 --> 00:00:13,760
speed.

6
00:00:13,760 --> 00:00:16,760
Every flow is policy in motion, whether you admit it or not.

7
00:00:16,760 --> 00:00:18,880
That's why it works, isn't success.

8
00:00:18,880 --> 00:00:20,440
It's the beginning of entropy.

9
00:00:20,440 --> 00:00:25,640
Outages, audit fights, invisible cost growth, and now AI agent sprawl.

10
00:00:25,640 --> 00:00:30,200
This will give you the executive mental model first, then the enforceable mechanics.

11
00:00:30,200 --> 00:00:31,960
Low code is not low engineering.

12
00:00:31,960 --> 00:00:33,920
The foundational mistake is simple.

13
00:00:33,920 --> 00:00:36,560
Teams treat the flow run as the definition of success.

14
00:00:36,560 --> 00:00:38,960
That definition works for personal productivity.

15
00:00:38,960 --> 00:00:44,280
A flow that renames files, pings a channel or copies rows into a spreadsheet has consequences

16
00:00:44,280 --> 00:00:45,280
but they're local.

17
00:00:45,280 --> 00:00:49,600
The blast radius is you, your team, and maybe a share point library that nobody admits

18
00:00:49,600 --> 00:00:51,280
they own anyway.

19
00:00:51,280 --> 00:00:52,280
Enterprise automation is different.

20
00:00:52,280 --> 00:00:55,320
It creates side effects across systems you don't control.

21
00:00:55,320 --> 00:00:58,960
Some identities you didn't design under policies you haven't revisited since the rollout

22
00:00:58,960 --> 00:00:59,960
deck.

23
00:00:59,960 --> 00:01:03,400
The platform will happily let you do it because the platform enforces configuration, not

24
00:01:03,400 --> 00:01:04,400
intent.

25
00:01:04,400 --> 00:01:05,720
That distinction matters.

26
00:01:05,720 --> 00:01:08,040
Low code isn't less engineering.

27
00:01:08,040 --> 00:01:12,080
It's engineering without the cultural friction that normally forces someone to ask hard questions

28
00:01:12,080 --> 00:01:13,080
early.

29
00:01:13,080 --> 00:01:14,080
Who owns this?

30
00:01:14,080 --> 00:01:15,080
What happens when it fails?

31
00:01:15,080 --> 00:01:16,080
What happens when it runs twice?

32
00:01:16,080 --> 00:01:17,560
What happens when the connector throttles?

33
00:01:17,560 --> 00:01:19,280
What happens when the original maker leaves?

34
00:01:19,280 --> 00:01:22,800
In Pro code, the difficulty of building forces design discussions.

35
00:01:22,800 --> 00:01:25,000
In low code, the ease of building removes them.

36
00:01:25,000 --> 00:01:27,000
The organization gets what it optimised for.

37
00:01:27,000 --> 00:01:30,080
Speed of creation, not quality of operation.

38
00:01:30,080 --> 00:01:33,960
Excellence in this context isn't how fast a team can assemble a flow.

39
00:01:33,960 --> 00:01:35,800
Excellence is repeatability under change.

40
00:01:35,800 --> 00:01:39,760
It's the ability to modify the automation when the business changes, when the connector

41
00:01:39,760 --> 00:01:44,000
changes, when the API changes, when identity rules change, when regulators change their

42
00:01:44,000 --> 00:01:47,280
mind, when Microsoft changes a default and they will.

43
00:01:47,280 --> 00:01:48,280
Constantly.

44
00:01:48,280 --> 00:01:51,680
A flow that works today but cannot survive change isn't a solution.

45
00:01:51,680 --> 00:01:53,080
It's deferred incident response.

46
00:01:53,080 --> 00:01:57,240
Now the predictable pushback is, but low code is supposed to be agile.

47
00:01:57,240 --> 00:01:58,240
Governance slows it down.

48
00:01:58,240 --> 00:01:59,680
That's the comfortable story.

49
00:01:59,680 --> 00:02:03,640
It's also how governance becomes a cleaner project instead of a design constraint.

50
00:02:03,640 --> 00:02:07,040
Governance can't be bolted on after adoption because the critical decisions are made at

51
00:02:07,040 --> 00:02:11,640
creation time, which environment, which connectors, which identity, which data path, which

52
00:02:11,640 --> 00:02:15,720
ownership model, which logging surface, which retry behavior, which permissions.

53
00:02:15,720 --> 00:02:17,600
Those aren't decorations you add later.

54
00:02:17,600 --> 00:02:19,040
They are the physics of the system.

55
00:02:19,040 --> 00:02:22,040
Once you let the estate form organically, the estate hardens.

56
00:02:22,040 --> 00:02:26,600
You get thousands of small, convenient decisions that become permanent architecture and every

57
00:02:26,600 --> 00:02:29,440
exception you tolerate becomes an entropy generator.

58
00:02:29,440 --> 00:02:30,760
Share accounts.

59
00:02:30,760 --> 00:02:31,760
Personal connections.

60
00:02:31,760 --> 00:02:33,920
Flows that nobody can explain.

61
00:02:33,920 --> 00:02:35,280
Retries that hide root cause.

62
00:02:35,280 --> 00:02:38,040
Branches that create non-inumerable behavior.

63
00:02:38,040 --> 00:02:42,160
This is why executives should care, even if they never open the power automate portal.

64
00:02:42,160 --> 00:02:44,560
Automation becomes operational dependency quietly.

65
00:02:44,560 --> 00:02:46,720
It doesn't announce itself as a tier one workload.

66
00:02:46,720 --> 00:02:48,880
It just shows up in the business process.

67
00:02:48,880 --> 00:02:53,440
Routes faster, requests get approved, onboarding happens, records sync, notifications go

68
00:02:53,440 --> 00:02:54,440
out.

69
00:02:54,440 --> 00:02:58,200
Then one day it doesn't and suddenly a workflow tool outage is a business outage.

70
00:02:58,200 --> 00:03:02,520
Executives also care because cost becomes unpredictable, not because the platform is malicious,

71
00:03:02,520 --> 00:03:06,320
but because uncontrolled execution pathways create churn.

72
00:03:06,320 --> 00:03:10,160
Flows that rerun, loops that burn calls, retries that multiply load.

73
00:03:10,160 --> 00:03:12,120
The business didn't ask for more transactions.

74
00:03:12,120 --> 00:03:15,360
The automation manufactured them and then there's audit defensibility.

75
00:03:15,360 --> 00:03:18,000
Auditors don't care that the maker had good intentions.

76
00:03:18,000 --> 00:03:22,520
They care that the organization can explain what happened, who approved it, what data moved,

77
00:03:22,520 --> 00:03:27,200
what identity performed the action and what controls prevented the wrong thing from happening.

78
00:03:27,200 --> 00:03:31,320
When an automation estate grows without a model, the audit answer becomes, "We think this

79
00:03:31,320 --> 00:03:35,440
flow did it, but we're not sure, that's not a process gap, that's a control plane gap,

80
00:03:35,440 --> 00:03:37,240
architects should care for a different reason."

81
00:03:37,240 --> 00:03:41,560
Low code turns architectural debt into security debt faster than most platforms because

82
00:03:41,560 --> 00:03:46,960
flows don't just compute, they execute, they create delete, update, approve and notify.

83
00:03:46,960 --> 00:03:49,720
They are distributed right access with a friendly UI.

84
00:03:49,720 --> 00:03:53,640
When that access becomes unowned, you are no longer managing apps and workflows, you are

85
00:03:53,640 --> 00:03:55,720
managing orphaned authority.

86
00:03:55,720 --> 00:03:58,840
So the correct posture is not to stop low code, that's not happening.

87
00:03:58,840 --> 00:04:02,760
The correct posture is to treat it like what it is, an enterprise automation layer that

88
00:04:02,760 --> 00:04:07,320
needs the same things every execution system needs, clear intent, bounded blast radius,

89
00:04:07,320 --> 00:04:11,600
deterministic execution, observable behavior and enforceable ownership.

90
00:04:11,600 --> 00:04:15,920
This episode starts with the mental model because without it, patterns sound like personal

91
00:04:15,920 --> 00:04:16,920
preference.

92
00:04:16,920 --> 00:04:19,400
Once the model is clear, the patterns feel inevitable.

93
00:04:19,400 --> 00:04:20,400
And that's the point.

94
00:04:20,400 --> 00:04:24,200
Next, the system needs a name because when you call it a workflow tool, you govern it

95
00:04:24,200 --> 00:04:25,200
like a hobby.

96
00:04:25,200 --> 00:04:29,120
When you recognize it as a control plane, you govern it like infrastructure.

97
00:04:29,120 --> 00:04:31,760
What the automation control plane really is.

98
00:04:31,760 --> 00:04:36,440
Calling power automate a workflow tool is like calling EntraID a login screen.

99
00:04:36,440 --> 00:04:39,320
It's not wrong exactly, it's just the wrong level of abstraction.

100
00:04:39,320 --> 00:04:43,520
The system you're operating is an automation control plane, a set of distributed controls

101
00:04:43,520 --> 00:04:48,240
that decide who can do what, with which data, through which connectors, in which environments

102
00:04:48,240 --> 00:04:52,640
under which policies, with which logs and with which failure behavior.

103
00:04:52,640 --> 00:04:55,920
That definition matters because it forces the right question.

104
00:04:55,920 --> 00:04:57,560
Not can we build the flow?

105
00:04:57,560 --> 00:05:00,640
Should this flow exist and if it exists, what does it get to touch?

106
00:05:00,640 --> 00:05:04,520
In platform terms, the control plane is everything that shapes execution without being

107
00:05:04,520 --> 00:05:10,240
the business payload, identity, connectors, environments, data loss prevention policy,

108
00:05:10,240 --> 00:05:15,200
managed environments, solution packaging, connection references, environment variables, audit

109
00:05:15,200 --> 00:05:17,280
logs and admin analytics.

110
00:05:17,280 --> 00:05:19,280
The data plane is the actual work.

111
00:05:19,280 --> 00:05:22,760
The message, the document, the record update, the approval, the ticket, the email, the

112
00:05:22,760 --> 00:05:26,720
API call, people obsess over the data plane because it's visible.

113
00:05:26,720 --> 00:05:29,200
Incidents are caused by the control plane because it's implicit.

114
00:05:29,200 --> 00:05:30,640
Here's what most people miss.

115
00:05:30,640 --> 00:05:32,560
Those pieces don't operate independently.

116
00:05:32,560 --> 00:05:34,920
They compile together into one machine.

117
00:05:34,920 --> 00:05:38,160
Identity is not just authentication in this system identities authority.

118
00:05:38,160 --> 00:05:41,040
A connection is a delegated permission boundary.

119
00:05:41,040 --> 00:05:42,840
A connector isn't a convenience.

120
00:05:42,840 --> 00:05:47,120
It's an integration contract with its own throttles, behaviors and failure modes.

121
00:05:47,120 --> 00:05:48,880
An environment isn't a folder.

122
00:05:48,880 --> 00:05:52,000
It's an isolation boundary with its own policy surfaces.

123
00:05:52,000 --> 00:05:54,400
DLP isn't a compliance checkbox.

124
00:05:54,400 --> 00:05:56,560
It's the connector graph you're allowed to traverse.

125
00:05:56,560 --> 00:05:58,800
ALM isn't bureaucracy.

126
00:05:58,800 --> 00:06:02,720
It's how you prevent someone clicked save from becoming a production change.

127
00:06:02,720 --> 00:06:04,040
And audit logs aren't histories.

128
00:06:04,040 --> 00:06:07,440
They are your only defensible explanation when something goes wrong.

129
00:06:07,440 --> 00:06:09,240
The uncomfortable truth is this.

130
00:06:09,240 --> 00:06:11,240
Power platform does not enforce your intent.

131
00:06:11,240 --> 00:06:12,800
It enforces your configuration.

132
00:06:12,800 --> 00:06:15,400
If you didn't encode the boundary, the boundary does not exist.

133
00:06:15,400 --> 00:06:16,880
Your org chart does not exist.

134
00:06:16,880 --> 00:06:18,920
Your risk appetite does not exist.

135
00:06:18,920 --> 00:06:19,920
That's not how we do it.

136
00:06:19,920 --> 00:06:20,920
It does not exist.

137
00:06:20,920 --> 00:06:22,400
Only configuration exists.

138
00:06:22,400 --> 00:06:25,080
That's why policy exceptions are entropy generators.

139
00:06:25,080 --> 00:06:28,600
Every time someone says just this one, let them use that connector or just let the flow

140
00:06:28,600 --> 00:06:30,200
run under my account for now.

141
00:06:30,200 --> 00:06:32,160
The platform does not record your reluctance.

142
00:06:32,160 --> 00:06:33,400
It records the exception.

143
00:06:33,400 --> 00:06:35,640
Then the exception becomes normal because it works.

144
00:06:35,640 --> 00:06:36,640
Then it gets copied.

145
00:06:36,640 --> 00:06:37,640
Then it becomes template.

146
00:06:37,640 --> 00:06:38,640
Then it becomes the estate.

147
00:06:38,640 --> 00:06:42,800
Over time, the control plane drifts away from your original assumptions.

148
00:06:42,800 --> 00:06:44,800
Missing policies create obvious gaps.

149
00:06:44,800 --> 00:06:46,440
Drifting policies create ambiguity.

150
00:06:46,440 --> 00:06:48,080
And ambiguity is where incidents live.

151
00:06:48,080 --> 00:06:51,240
So when this episode says control plane, it's not being poetic.

152
00:06:51,240 --> 00:06:53,320
It's describing a real operating model.

153
00:06:53,320 --> 00:06:55,040
Intent, decision execution.

154
00:06:55,040 --> 00:06:57,080
Intent is what leadership thinks they bought.

155
00:06:57,080 --> 00:06:58,080
It's the business contract.

156
00:06:58,080 --> 00:06:59,080
What must happen?

157
00:06:59,080 --> 00:07:00,080
What must never happen?

158
00:07:00,080 --> 00:07:01,080
What data must not move?

159
00:07:01,080 --> 00:07:03,400
What identity must never approve its own work?

160
00:07:03,400 --> 00:07:04,880
What changes require review?

161
00:07:04,880 --> 00:07:06,760
And actions are considered side effects.

162
00:07:06,760 --> 00:07:09,560
What the blast radius is allowed to be.

163
00:07:09,560 --> 00:07:11,960
Decision is where you interpret intent against reality.

164
00:07:11,960 --> 00:07:16,800
This is classification, routing, triage, exception handling and increasingly AI reasoning.

165
00:07:16,800 --> 00:07:18,080
It's probabilistic by nature.

166
00:07:18,080 --> 00:07:19,080
It makes guesses.

167
00:07:19,080 --> 00:07:20,080
It prioritizes.

168
00:07:20,080 --> 00:07:21,080
It may be wrong.

169
00:07:21,080 --> 00:07:22,080
That is not a failure.

170
00:07:22,080 --> 00:07:23,080
It's the design.

171
00:07:23,080 --> 00:07:25,320
Execution is where you must stop guessing.

172
00:07:25,320 --> 00:07:26,640
Execution is deterministic.

173
00:07:26,640 --> 00:07:32,360
It is rights, approvals, notifications, record creation, file movement, permission changes,

174
00:07:32,360 --> 00:07:33,360
side effects.

175
00:07:33,360 --> 00:07:38,200
Execution must behave like an industrial machine, consistent inputs, consistent outputs,

176
00:07:38,200 --> 00:07:42,480
bounded retries and an audit trail that can survive an incident review when you mix decision

177
00:07:42,480 --> 00:07:46,080
and execution in the same flow you create conditional chaos.

178
00:07:46,080 --> 00:07:47,080
Probabilistic reasoning?

179
00:07:47,080 --> 00:07:49,720
Glue directly to deterministic side effects.

180
00:07:49,720 --> 00:07:53,280
That's how you get duplicate emails, double created tickets, conflicting updates and

181
00:07:53,280 --> 00:07:54,640
we don't know why it ran is.

182
00:07:54,640 --> 00:07:57,360
And because the control plane is distributed, the chaos scales.

183
00:07:57,360 --> 00:07:59,640
A single bad retry policy isn't local.

184
00:07:59,640 --> 00:08:01,200
It becomes load on shared services.

185
00:08:01,200 --> 00:08:03,240
A single-unknown connection isn't private.

186
00:08:03,240 --> 00:08:04,640
It becomes orphaned authority.

187
00:08:04,640 --> 00:08:06,600
A single DLP exception isn't isolated.

188
00:08:06,600 --> 00:08:08,920
It becomes a new pathway for data movement.

189
00:08:08,920 --> 00:08:10,280
This is the core misalignment.

190
00:08:10,280 --> 00:08:13,640
People design flows as if each flow is a self-contained script.

191
00:08:13,640 --> 00:08:17,960
In reality, each flow is a node in a tenant-wide authorization and execution graph.

192
00:08:17,960 --> 00:08:20,880
The platform compiles that graph from your configurations.

193
00:08:20,880 --> 00:08:22,720
Whether you understand them or not.

194
00:08:22,720 --> 00:08:24,840
So the goal now is not better flows.

195
00:08:24,840 --> 00:08:28,320
The goal is a control plane that produces predictable automation.

196
00:08:28,320 --> 00:08:29,800
Intent expressed clearly.

197
00:08:29,800 --> 00:08:34,760
Intent isolated where uncertainty belongs.

198
00:08:34,760 --> 00:08:38,240
Once you adopt that model, excellence stops being a motivational word.

199
00:08:38,240 --> 00:08:41,400
It becomes a mechanical property you can enforce.

200
00:08:41,400 --> 00:08:44,080
Excellence as an architecture, not a goal.

201
00:08:44,080 --> 00:08:47,080
Excellence is usually where organizations hide vagueness.

202
00:08:47,080 --> 00:08:51,880
They say it when they mean we want fewer incidents or we want to scale or we want the

203
00:08:51,880 --> 00:08:53,920
auditors to stop asking questions.

204
00:08:53,920 --> 00:08:56,560
But in automation, excellence isn't a value statement.

205
00:08:56,560 --> 00:08:58,280
It's an architectural property.

206
00:08:58,280 --> 00:09:02,040
If the automation behaves deterministically under change, you get excellence.

207
00:09:02,040 --> 00:09:03,800
If it doesn't, you get entropy.

208
00:09:03,800 --> 00:09:05,680
And the platform doesn't care what you intended.

209
00:09:05,680 --> 00:09:06,760
It just runs what you built.

210
00:09:06,760 --> 00:09:10,680
So define excellence in terms the control plane understands.

211
00:09:10,680 --> 00:09:15,320
Deterministic behavior, bounded blast radius and explainable failure.

212
00:09:15,320 --> 00:09:18,680
Deterministic behavior means the same input leads to the same side effects.

213
00:09:18,680 --> 00:09:21,720
Not usually, not unless it retry to the same.

214
00:09:21,720 --> 00:09:25,440
When a run produces different outputs depending on timing, connector mood or how many other

215
00:09:25,440 --> 00:09:28,240
flows are hammering the tenant, you don't have an automation system.

216
00:09:28,240 --> 00:09:30,680
You have a slot machine with admin permissions.

217
00:09:30,680 --> 00:09:34,120
Bounded blast radius means one automation can't take the rest of the estate with it.

218
00:09:34,120 --> 00:09:38,000
It can fail, it can be throttled, it can hit an upstream outage, but it cannot amplify

219
00:09:38,000 --> 00:09:42,120
into a tenant-wide performance incident because someone built an unbounded loop and called

220
00:09:42,120 --> 00:09:43,720
it robust.

221
00:09:43,720 --> 00:09:47,200
Explainable failure means a human can answer quickly what happened and why.

222
00:09:47,200 --> 00:09:48,200
Not with folklore.

223
00:09:48,200 --> 00:09:52,320
With artifacts, a run history that reads like a narrative, a correlation idea that ties

224
00:09:52,320 --> 00:09:54,240
orchestration to execution.

225
00:09:54,240 --> 00:09:58,600
In an audit trail that doesn't require a multi-day archaeological dig through nested scopes

226
00:09:58,600 --> 00:10:02,440
and renamed actions, this is why executives should care.

227
00:10:02,440 --> 00:10:04,800
Automation doesn't stay in the productivity category.

228
00:10:04,800 --> 00:10:08,400
It becomes embedded in revenue processes, customer operations, compliance workflows

229
00:10:08,400 --> 00:10:09,800
and financial approvals.

230
00:10:09,800 --> 00:10:12,440
Quietly, then it becomes politically impossible to turn off.

231
00:10:12,440 --> 00:10:13,440
That's the trap.

232
00:10:13,440 --> 00:10:16,960
Leaders end up running a production system that nobody designed as a production system.

233
00:10:16,960 --> 00:10:19,560
And when an executive asks, why did this break?

234
00:10:19,560 --> 00:10:22,440
The worst possible answer is because someone changed a flow.

235
00:10:22,440 --> 00:10:23,440
That's not an explanation.

236
00:10:23,440 --> 00:10:25,920
That's an admission that change control never existed.

237
00:10:25,920 --> 00:10:30,480
Architects should care because this platform converts complexity into security debt faster

238
00:10:30,480 --> 00:10:32,800
than most engineering teams can track.

239
00:10:32,800 --> 00:10:36,760
Not because makers are reckless, because the system makes it easy to create authority pathways

240
00:10:36,760 --> 00:10:38,320
without understanding them.

241
00:10:38,320 --> 00:10:42,680
A flow with the wrong connector and the wrong connection reference isn't just misconfigured.

242
00:10:42,680 --> 00:10:45,200
It's a new right capability edge in your authorization graph.

243
00:10:45,200 --> 00:10:46,200
It can move data.

244
00:10:46,200 --> 00:10:47,200
It can approve.

245
00:10:47,200 --> 00:10:48,200
It can delete.

246
00:10:48,200 --> 00:10:49,200
It can exfiltrate.

247
00:10:49,200 --> 00:10:52,000
And if you can't name who owns it, you can't bound it and you can't explain it,

248
00:10:52,000 --> 00:10:53,120
then you can't defend it.

249
00:10:53,120 --> 00:10:55,000
So excellence is not better standards.

250
00:10:55,000 --> 00:10:57,280
It's a model that makes failure boring.

251
00:10:57,280 --> 00:10:59,320
Now here's the part most people miss.

252
00:10:59,320 --> 00:11:01,520
Excellence collapses in three predictable ways.

253
00:11:01,520 --> 00:11:06,480
Not because of bad people, because of system behavior, first complexity, flows a creed logic.

254
00:11:06,480 --> 00:11:07,960
Every exception becomes a branch.

255
00:11:07,960 --> 00:11:09,600
Every branch becomes another run shape.

256
00:11:09,600 --> 00:11:13,320
Eventually you can't enumerate behavior, testing becomes theater because you can't prove

257
00:11:13,320 --> 00:11:14,720
you covered all paths.

258
00:11:14,720 --> 00:11:16,120
The system still runs.

259
00:11:16,120 --> 00:11:17,880
But you no longer know what you're running.

260
00:11:17,880 --> 00:11:19,200
Second, capacity.

261
00:11:19,200 --> 00:11:20,200
Power.

262
00:11:20,200 --> 00:11:23,160
It lives in multi-tenant services with shared limits and throttles.

263
00:11:23,160 --> 00:11:27,160
Your flow isn't alone and it doesn't execute in a vacuum.

264
00:11:27,160 --> 00:11:31,920
When teams treat retreats as reliability or loops as just how you do it, they manufacture

265
00:11:31,920 --> 00:11:33,800
load, then they get throttled.

266
00:11:33,800 --> 00:11:34,800
Then they retry harder.

267
00:11:34,800 --> 00:11:36,320
Then they call it a platform issue.

268
00:11:36,320 --> 00:11:37,320
It isn't.

269
00:11:37,320 --> 00:11:39,040
It's uncontrolled execution pathways.

270
00:11:39,040 --> 00:11:40,640
Third, ownership.

271
00:11:40,640 --> 00:11:42,440
Automation estates outlive their creators.

272
00:11:42,440 --> 00:11:43,760
People move roles.

273
00:11:43,760 --> 00:11:44,760
Contractors leave.

274
00:11:44,760 --> 00:11:45,760
Teams reorganize.

275
00:11:45,760 --> 00:11:50,000
But flows keep running because the business now depends on them.

276
00:11:50,000 --> 00:11:54,520
When ownership decays, authority becomes often, incidents become untraceable.

277
00:11:54,520 --> 00:11:58,880
And audit questions become existential because the organization can't even identify who

278
00:11:58,880 --> 00:12:01,880
is responsible for the thing that moved the data.

279
00:12:01,880 --> 00:12:05,680
Those three failure modes, complexity, capacity, ownership aren't independent.

280
00:12:05,680 --> 00:12:08,960
They compound complexity hides capacity waste.

281
00:12:08,960 --> 00:12:10,600
Capacity incidents trigger retries.

282
00:12:10,600 --> 00:12:12,080
Retries create more complexity.

283
00:12:12,080 --> 00:12:15,680
And when ownership is unclear, nobody has the mandate to refactor.

284
00:12:15,680 --> 00:12:18,600
So the estate just accumulates more entropy generators.

285
00:12:18,600 --> 00:12:22,880
And clicked for most experienced platform teams when they stopped chasing good flows and

286
00:12:22,880 --> 00:12:26,920
started enforcing one architectural pattern that collapses complexity later.

287
00:12:26,920 --> 00:12:30,240
There is one pattern that turns a chaotic estate into a readable one.

288
00:12:30,240 --> 00:12:31,600
It's not a naming convention.

289
00:12:31,600 --> 00:12:32,880
It's not a training session.

290
00:12:32,880 --> 00:12:34,040
It's structural.

291
00:12:34,040 --> 00:12:38,200
And it starts with admitting that branching everywhere is not flexibility.

292
00:12:38,200 --> 00:12:40,200
It's conditional chaos.

293
00:12:40,200 --> 00:12:44,440
Next that chaos gets a name because once you can name it, you can kill it.

294
00:12:44,440 --> 00:12:46,800
Anti-pattern ones at Christmas tree flows.

295
00:12:46,800 --> 00:12:50,640
Every tree flows are what happen when a flow becomes the place the business goes to negotiate

296
00:12:50,640 --> 00:12:51,720
with reality.

297
00:12:51,720 --> 00:12:55,640
They start innocent, a trigger, a couple of actions and a condition.

298
00:12:55,640 --> 00:12:57,040
Then someone adds an exception.

299
00:12:57,040 --> 00:12:59,760
Then another, then a retry, just to be safe.

300
00:12:59,760 --> 00:13:01,840
Then a switch to handle special customers.

301
00:13:01,840 --> 00:13:03,600
Then a loop because the data shape change.

302
00:13:03,600 --> 00:13:06,160
Then a scope because the loop failed sometimes.

303
00:13:06,160 --> 00:13:07,960
Then an approval because someone got nervous.

304
00:13:07,960 --> 00:13:10,200
Then another branch because approval's time out.

305
00:13:10,200 --> 00:13:11,960
And suddenly the flow isn't a workflow.

306
00:13:11,960 --> 00:13:14,800
It's a policy argument expressed as nested containers.

307
00:13:14,800 --> 00:13:17,640
The symptoms are obvious when someone is brave enough to zoom out.

308
00:13:17,640 --> 00:13:18,640
One mega flow.

309
00:13:18,640 --> 00:13:19,840
Branches and branches.

310
00:13:19,840 --> 00:13:21,880
Conditions inside loops inside conditions.

311
00:13:21,880 --> 00:13:25,760
Decision logic mixed with execution steps so tightly that you can't tell which part is

312
00:13:25,760 --> 00:13:29,320
figuring out what should happen and which part is actually doing the thing.

313
00:13:29,320 --> 00:13:31,000
And it ends in multiple places.

314
00:13:31,000 --> 00:13:32,520
Multiple terminate actions.

315
00:13:32,520 --> 00:13:34,120
Multiple send email actions.

316
00:13:34,120 --> 00:13:37,560
Multiple create record actions depending on which branch you happen to fall into that

317
00:13:37,560 --> 00:13:38,560
day.

318
00:13:38,560 --> 00:13:39,560
That's not flexibility.

319
00:13:39,560 --> 00:13:41,160
That's non-enumerable behavior.

320
00:13:41,160 --> 00:13:42,160
Here's why it happens.

321
00:13:42,160 --> 00:13:43,680
The UI makes branching cheap.

322
00:13:43,680 --> 00:13:45,800
And the business asks for just one more case.

323
00:13:45,800 --> 00:13:47,440
The maker doesn't need to redesign anything.

324
00:13:47,440 --> 00:13:48,600
They just add a condition.

325
00:13:48,600 --> 00:13:51,000
The platform rewards this with immediate success.

326
00:13:51,000 --> 00:13:52,400
The run turns green.

327
00:13:52,400 --> 00:13:53,400
Everyone celebrates.

328
00:13:53,400 --> 00:13:54,840
No one asks what was traded away.

329
00:13:54,840 --> 00:13:56,920
What was traded away is explainability.

330
00:13:56,920 --> 00:13:58,840
Every new branch creates a new run shape.

331
00:13:58,840 --> 00:14:01,880
Every new run shape creates a new set of possible side effects.

332
00:14:01,880 --> 00:14:06,480
And because decision and execution are interwoven, you can't reason about blast radius locally.

333
00:14:06,480 --> 00:14:10,840
A change to a routing rule now sits next to a delete file action in the same container

334
00:14:10,840 --> 00:14:11,840
stack.

335
00:14:11,840 --> 00:14:14,000
Times start testing or what they call testing.

336
00:14:14,000 --> 00:14:15,000
They run the happy path.

337
00:14:15,000 --> 00:14:16,000
They run one exception.

338
00:14:16,000 --> 00:14:17,000
They run one more.

339
00:14:17,000 --> 00:14:18,000
Then they ship.

340
00:14:18,000 --> 00:14:21,200
Because enumerating all paths is impossible once the tree gets wide enough.

341
00:14:21,200 --> 00:14:24,640
At that point, testing is a ritual that makes people feel responsible without producing

342
00:14:24,640 --> 00:14:25,640
certainty.

343
00:14:25,640 --> 00:14:27,200
That distinction matters.

344
00:14:27,200 --> 00:14:29,880
Christmas tree flows don't fail in clean, diagnosable ways.

345
00:14:29,880 --> 00:14:31,040
They fail as ambiguity.

346
00:14:31,040 --> 00:14:32,280
The wrong branch triggers.

347
00:14:32,280 --> 00:14:35,040
A variable gets set in one path and referenced in another.

348
00:14:35,040 --> 00:14:38,320
A condition doesn't match because the data is null in one scenario.

349
00:14:38,320 --> 00:14:40,160
A connector returns a different error shape.

350
00:14:40,160 --> 00:14:42,400
A scope catches something it shouldn't.

351
00:14:42,400 --> 00:14:46,080
And now the run history looks like a choose your own adventure novel written by five different

352
00:14:46,080 --> 00:14:47,520
people across two years.

353
00:14:47,520 --> 00:14:49,120
So mean time to explain goes vertical.

354
00:14:49,120 --> 00:14:52,360
Not because people are incompetent, because the artifact is not explainable.

355
00:14:52,360 --> 00:14:53,760
The flow is a maze.

356
00:14:53,760 --> 00:14:55,840
Debugging becomes archaeology.

357
00:14:55,840 --> 00:15:00,480
Expand a scope, scroll, expand another scope, scroll again, check which branches ran,

358
00:15:00,480 --> 00:15:05,520
and then try to infer intent from action names like condition 12 and compose 4.

359
00:15:05,520 --> 00:15:09,160
And yes, the operational metrics get worse in exactly the way you'd expect.

360
00:15:09,160 --> 00:15:13,160
Time to explain moves from minutes to hours to days as the tree grows.

361
00:15:13,160 --> 00:15:16,720
Change failure rate rises because every branch multiplies the number of ways.

362
00:15:16,720 --> 00:15:19,560
A small edit can create an unexpected side effect.

363
00:15:19,560 --> 00:15:21,480
And deployment time increases.

364
00:15:21,480 --> 00:15:23,920
Non-linearly because nobody wants to touch it.

365
00:15:23,920 --> 00:15:27,440
Every change feels like surgery on a live system without imaging.

366
00:15:27,440 --> 00:15:29,440
The punch line is the most uncomfortable part.

367
00:15:29,440 --> 00:15:32,080
If one flow handles everything, nobody owns anything.

368
00:15:32,080 --> 00:15:34,120
Ownership doesn't survive ambiguity.

369
00:15:34,120 --> 00:15:38,560
When the flow contains every business rule, every exception, every connector quirk,

370
00:15:38,560 --> 00:15:43,360
and every workaround, no single team can claim responsibility for the whole thing.

371
00:15:43,360 --> 00:15:45,240
Make us say I'd should own it because it's critical.

372
00:15:45,240 --> 00:15:48,400
I'd say the business should own it because it's business logic.

373
00:15:48,400 --> 00:15:50,800
Security says everyone should own it because it moves data.

374
00:15:50,800 --> 00:15:52,360
So in practice nobody owns it.

375
00:15:52,360 --> 00:15:53,960
And the control plane behaves accordingly.

376
00:15:53,960 --> 00:15:55,680
Unknown systems don't get refactored.

377
00:15:55,680 --> 00:15:56,680
They are creed.

378
00:15:56,680 --> 00:15:59,040
There's also a hidden governance failure in the Christmas tree.

379
00:15:59,040 --> 00:16:01,800
It collapses the boundary between decision and execution.

380
00:16:01,800 --> 00:16:06,280
That means a change to how we classify can change what we write in the same edit.

381
00:16:06,280 --> 00:16:10,160
The system stops being deterministic at the exact layer that must be deterministic.

382
00:16:10,160 --> 00:16:13,880
So when people say we need standards, what they really mean is we built a tree and now

383
00:16:13,880 --> 00:16:15,280
we can't explain it.

384
00:16:15,280 --> 00:16:16,520
The fix isn't heroic.

385
00:16:16,520 --> 00:16:21,360
It's structural, collapse branching, separate decision from execution, and force a single

386
00:16:21,360 --> 00:16:22,680
readable run shape.

387
00:16:22,680 --> 00:16:25,800
But before the pattern show up, the next cliff appears.

388
00:16:25,800 --> 00:16:30,160
Because even a perfectly readable tree still burns capacity, and the platform always collects

389
00:16:30,160 --> 00:16:31,680
its bill.

390
00:16:31,680 --> 00:16:32,680
Anti-pattern 2.

391
00:16:32,680 --> 00:16:34,800
API exhaustion by convenience.

392
00:16:34,800 --> 00:16:38,520
API exhaustion is what happens when a team treats execution like it's free.

393
00:16:38,520 --> 00:16:39,520
It isn't.

394
00:16:39,520 --> 00:16:41,120
Power automate looks like a canvas.

395
00:16:41,120 --> 00:16:44,520
Underneath it's an engine making calls into shared services.

396
00:16:44,520 --> 00:16:50,680
Microsoft 365, Dataverse, SharePoint, Teams, Exchange, third party APIs, and whatever else

397
00:16:50,680 --> 00:16:54,240
someone connected at 2 a.m. because it unblocked the business.

398
00:16:54,240 --> 00:16:56,560
The anti-pattern shows up in three behaviors.

399
00:16:56,560 --> 00:17:01,000
Unbounded loops, unnecessary runs, and retries used as a personality trait.

400
00:17:01,000 --> 00:17:02,920
Unbounded loops are the obvious one.

401
00:17:02,920 --> 00:17:06,920
Unbounded loops are the same size as the one that's used as a personality trait.

402
00:17:06,920 --> 00:17:09,920
It's the same size as the one that's used as a personality trait.

403
00:17:09,920 --> 00:17:12,920
It's the same size as the one that's used as a personality trait.

404
00:17:12,920 --> 00:17:16,920
It's the same size as the one that's used as a personality trait.

405
00:17:16,920 --> 00:17:20,920
It's the same size as the one that's used as a personality trait.

406
00:17:20,920 --> 00:17:24,920
It's the same size as the one that's used as a personality trait.

407
00:17:24,920 --> 00:17:27,920
It's the same size as the one that's used as a personality trait.

408
00:17:27,920 --> 00:17:31,920
It's the same size as the one that's used as a personality trait.

409
00:17:31,920 --> 00:17:34,920
It's the same size as the one that's used as a personality trait.

410
00:17:34,920 --> 00:17:37,920
It's the same size as the one that's used as a personality trait.

411
00:17:37,920 --> 00:17:40,920
It's the same size as the one that's used as a personality trait.

412
00:17:40,920 --> 00:17:43,920
It's the same size as the one that's used as a personality trait.

413
00:17:43,920 --> 00:17:46,920
It's the same size as the one that's used as a personality trait.

414
00:17:46,920 --> 00:17:49,920
It's the same size as the one that's used as a personality trait.

415
00:17:49,920 --> 00:17:52,920
It's the same size as the one that's used as a personality trait.

416
00:17:52,920 --> 00:17:55,920
It's the same size as the one that's used as a personality trait.

417
00:17:55,920 --> 00:17:58,920
It's the same size as the one that's used as a personality trait.

418
00:17:58,920 --> 00:18:01,920
Most retry behavior doesn't make a system more reliable.

419
00:18:01,920 --> 00:18:02,920
It makes it less explainable.

420
00:18:02,920 --> 00:18:04,920
The original failure gets buried under cascaded attempts,

421
00:18:04,920 --> 00:18:07,920
partial success, duplicate actions and timing variance.

422
00:18:07,920 --> 00:18:08,920
You don't get one incident.

423
00:18:08,920 --> 00:18:10,920
You get a swarm of related incidents.

424
00:18:10,920 --> 00:18:11,920
Each one handled.

425
00:18:11,920 --> 00:18:13,920
But none one fully understood.

426
00:18:13,920 --> 00:18:16,920
That's how you end up with the worst kind of automation failure.

427
00:18:16,920 --> 00:18:19,920
Not it broke, but it kind of worked twice.

428
00:18:19,920 --> 00:18:23,920
Architecturally, API calls are shared capacity and a governance surface.

429
00:18:23,920 --> 00:18:24,920
That's the reframe.

430
00:18:24,920 --> 00:18:27,920
When one team's flows burn through connector calls,

431
00:18:27,920 --> 00:18:29,920
they're not just consuming their own budget.

432
00:18:29,920 --> 00:18:31,920
They're competing inside multi-tenant services.

433
00:18:31,920 --> 00:18:33,920
And multi-tenant services don't negotiate.

434
00:18:33,920 --> 00:18:34,920
They throttle.

435
00:18:34,920 --> 00:18:36,920
Throttling isn't Microsoft punishing you.

436
00:18:36,920 --> 00:18:39,920
It's the platform telling you your execution pathways are unbounded.

437
00:18:39,920 --> 00:18:43,920
But most teams interpret throttling as the platform is flaky.

438
00:18:43,920 --> 00:18:45,920
Therefore they add more retries.

439
00:18:45,920 --> 00:18:46,920
Therefore they create more load.

440
00:18:46,920 --> 00:18:48,920
Therefore they get throttled harder.

441
00:18:48,920 --> 00:18:52,920
Conditional chaos, but at the capacity layer, this is where causation matters.

442
00:18:52,920 --> 00:18:54,920
The incident isn't caused by scale.

443
00:18:54,920 --> 00:18:56,920
Scale is just volume.

444
00:18:56,920 --> 00:18:59,920
The incident is caused by uncontrolled execution pathways.

445
00:18:59,920 --> 00:19:01,920
Flows that run when they don't need to.

446
00:19:01,920 --> 00:19:04,920
Loops that expand without ceilings, retries that multiply pressure

447
00:19:04,920 --> 00:19:07,920
and designs that treat connector calls like internal function calls.

448
00:19:07,920 --> 00:19:08,920
They aren't.

449
00:19:08,920 --> 00:19:10,920
They are network operations with quotas, latency variance,

450
00:19:10,920 --> 00:19:14,920
and service side limits you do not control executives should care

451
00:19:14,920 --> 00:19:17,920
because this is how automation produces hidden cost growth.

452
00:19:17,920 --> 00:19:19,920
The business didn't ask for five X-mortransactions.

453
00:19:19,920 --> 00:19:21,920
The automation manufactured them.

454
00:19:21,920 --> 00:19:25,920
Extra runs, extra calls, extra retries, extra duplicate rights,

455
00:19:25,920 --> 00:19:27,920
extra downstream reconciliation.

456
00:19:27,920 --> 00:19:29,920
And the cost isn't only licensing, it's operational.

457
00:19:29,920 --> 00:19:32,920
Teams burning hours chasing intermittent latency spikes,

458
00:19:32,920 --> 00:19:35,920
explaining why nothing changed right before it failed

459
00:19:35,920 --> 00:19:40,920
and writing compensating flows to clean up the duplicate mess the first flow created.

460
00:19:40,920 --> 00:19:43,920
The most uncomfortable truth is also the simplest.

461
00:19:43,920 --> 00:19:46,920
Most automation capacity incidents are self-inflicted.

462
00:19:46,920 --> 00:19:49,920
Not because people are careless, because nobody put a ceiling on execution

463
00:19:49,920 --> 00:19:51,920
and the platform will not infer one for you.

464
00:19:51,920 --> 00:19:54,920
The platform enforces configuration, not intent.

465
00:19:54,920 --> 00:19:57,920
If you didn't encode this flow may call this connector at this rate

466
00:19:57,920 --> 00:20:00,920
with this maximum fan out with this retry policy,

467
00:20:00,920 --> 00:20:02,920
then the system's default behavior becomes your architecture

468
00:20:02,920 --> 00:20:04,920
and defaults are not an architecture.

469
00:20:04,920 --> 00:20:06,920
They're entropy with good marketing.

470
00:20:06,920 --> 00:20:08,920
The fix later will look boring.

471
00:20:08,920 --> 00:20:11,920
Budgets, trigger discipline, bounded retries, and design patterns

472
00:20:11,920 --> 00:20:14,920
that keep loops from turning into denial of service against your own tenent.

473
00:20:14,920 --> 00:20:17,920
But before that, there's a bigger problem than cost.

474
00:20:17,920 --> 00:20:20,920
Unknown automation doesn't just burn capacity, it burns accountability

475
00:20:20,920 --> 00:20:21,920
and that's the next cliff.

476
00:20:21,920 --> 00:20:22,920
Anti-pattern 3.

477
00:20:22,920 --> 00:20:23,920
Shadow automation.

478
00:20:23,920 --> 00:20:26,920
Shadow automation is the part of the estate nobody wants to admit exists

479
00:20:26,920 --> 00:20:29,920
because admitting it forces a governance conversation

480
00:20:29,920 --> 00:20:31,920
that should have happened two years ago.

481
00:20:31,920 --> 00:20:33,920
And it's not the same thing as Shadow IT.

482
00:20:33,920 --> 00:20:36,920
Shadow IT mostly stores data in the wrong place.

483
00:20:36,920 --> 00:20:38,920
Annoying, risky, usually recoverable.

484
00:20:38,920 --> 00:20:39,920
Shadow automation executes.

485
00:20:39,920 --> 00:20:41,920
It moves data between systems.

486
00:20:41,920 --> 00:20:42,920
It creates and deletes records.

487
00:20:42,920 --> 00:20:44,920
It sends messages that look official.

488
00:20:44,920 --> 00:20:45,920
It approves things.

489
00:20:45,920 --> 00:20:46,920
It opens tickets.

490
00:20:46,920 --> 00:20:47,920
It closes tickets.

491
00:20:47,920 --> 00:20:49,920
It changes permissions.

492
00:20:49,920 --> 00:20:52,920
It does side effects at machine speed, with no human in the loop

493
00:20:52,920 --> 00:20:55,920
and often with no one left who remembers why it was created.

494
00:20:55,920 --> 00:20:57,920
So the core symptom isn't secrecy.

495
00:20:57,920 --> 00:20:59,920
It's the absence of ownership.

496
00:20:59,920 --> 00:21:02,920
It looks like temporary flows that have been running since the last reog.

497
00:21:02,920 --> 00:21:05,920
Flow's built for a project that ended, but the flow stayed alive

498
00:21:05,920 --> 00:21:08,920
because nobody wanted to be the person who turned it off.

499
00:21:08,920 --> 00:21:11,920
Automations that are owned by an individual user account

500
00:21:11,920 --> 00:21:13,920
because it was faster than getting a service account.

501
00:21:13,920 --> 00:21:15,920
Connections that use someone's personal mailbox,

502
00:21:15,920 --> 00:21:17,920
someone's personal sharepoint permissions,

503
00:21:17,920 --> 00:21:21,920
or a connector authenticated by a contractor who left six months ago.

504
00:21:21,920 --> 00:21:24,920
And because it still runs, everyone pretends it's fine.

505
00:21:24,920 --> 00:21:25,920
Until it isn't.

506
00:21:25,920 --> 00:21:28,920
Here's what most people miss.

507
00:21:28,920 --> 00:21:31,920
In Power Automate, the connection is authority.

508
00:21:31,920 --> 00:21:34,920
If the connection still works, the flow still has permission.

509
00:21:34,920 --> 00:21:37,920
Even if the human behind that permission has moved roles,

510
00:21:37,920 --> 00:21:39,920
lost context, or left the company.

511
00:21:39,920 --> 00:21:40,920
That is often authority.

512
00:21:40,920 --> 00:21:44,920
And often authority accumulates silently because the platform doesn't feel shame.

513
00:21:44,920 --> 00:21:47,920
It doesn't ask, "Are you sure this still makes sense?"

514
00:21:47,920 --> 00:21:49,920
It just continues executing your past decisions.

515
00:21:49,920 --> 00:21:52,920
Shadow Automation is also why incident attribution collapses.

516
00:21:52,920 --> 00:21:54,920
When something breaks, the question is simple.

517
00:21:54,920 --> 00:21:55,920
What changed?

518
00:21:55,920 --> 00:21:57,920
In a healthy system, you can answer that.

519
00:21:57,920 --> 00:22:02,920
There's a release, a deployment, a version, a ticket, a change record, a known owner, a rollback plan.

520
00:22:02,920 --> 00:22:04,920
In Shadow Automation, you get none of that.

521
00:22:04,920 --> 00:22:06,920
The flow is in the default environment.

522
00:22:06,920 --> 00:22:08,920
It has been edited by three people.

523
00:22:08,920 --> 00:22:09,920
It uses a connector.

524
00:22:09,920 --> 00:22:11,920
Nobody remembers approving.

525
00:22:11,920 --> 00:22:15,920
The run history shows that it succeeded 9,000 times, failed 200 times,

526
00:22:15,920 --> 00:22:18,920
and retried another 400 times.

527
00:22:18,920 --> 00:22:21,920
Then someone asks, "Is this flow still needed?"

528
00:22:21,920 --> 00:22:24,920
And the room goes quiet, because needed is now undefined.

529
00:22:24,920 --> 00:22:26,920
And that is how audits become painful.

530
00:22:26,920 --> 00:22:28,920
Auditors aren't hunting for elegance.

531
00:22:28,920 --> 00:22:30,920
They want defensible control.

532
00:22:30,920 --> 00:22:31,920
Who owns this automation?

533
00:22:31,920 --> 00:22:32,920
What data does it touch?

534
00:22:32,920 --> 00:22:34,920
What identity executes it?

535
00:22:34,920 --> 00:22:36,920
What prevents it from doing the wrong thing?

536
00:22:36,920 --> 00:22:39,920
And how do you stop it quickly during an incident?

537
00:22:39,920 --> 00:22:41,920
Shadow Automation fails every one of those questions.

538
00:22:41,920 --> 00:22:43,920
No owner means no escalation path.

539
00:22:43,920 --> 00:22:46,920
No escalation path means incidents become group therapy.

540
00:22:46,920 --> 00:22:47,920
Or worse,

541
00:22:47,920 --> 00:22:50,920
a team's thread where everyone says not us.

542
00:22:50,920 --> 00:22:53,920
Until the business impact forces someone to take ownership temporarily,

543
00:22:53,920 --> 00:22:56,920
which is the enterprise version of hot potato governance.

544
00:22:56,920 --> 00:23:00,920
It also creates duplicate actions in a way that feels like a platform bug,

545
00:23:00,920 --> 00:23:01,920
but isn't.

546
00:23:01,920 --> 00:23:03,920
Two flows solve the same business problem,

547
00:23:03,920 --> 00:23:05,920
because nobody had visibility into the first one.

548
00:23:05,920 --> 00:23:09,920
An old flow keeps sending notifications after a process changed,

549
00:23:09,920 --> 00:23:11,920
so a new flow gets built to correct the message,

550
00:23:11,920 --> 00:23:13,920
and now customers get two emails.

551
00:23:13,920 --> 00:23:15,920
A record is created by one automation,

552
00:23:15,920 --> 00:23:17,920
updated by another,

553
00:23:17,920 --> 00:23:20,920
and then corrected by a third because the schema changed.

554
00:23:20,920 --> 00:23:21,920
None of them are malicious.

555
00:23:21,920 --> 00:23:22,920
They're just uncoordinated.

556
00:23:22,920 --> 00:23:26,920
An uncoordinated execution is indistinguishable from failure.

557
00:23:26,920 --> 00:23:28,920
Security teams experience shadow automation

558
00:23:28,920 --> 00:23:30,920
as a different kind of nightmare.

559
00:23:30,920 --> 00:23:32,920
You can't do least privilege retroactively

560
00:23:32,920 --> 00:23:35,920
when you don't know what the automation was meant to do.

561
00:23:35,920 --> 00:23:39,920
You can't rotate credentials cleanly when the connection lives in a personal context.

562
00:23:39,920 --> 00:23:42,920
You can't prove data residency controls if you can't even inventory

563
00:23:42,920 --> 00:23:44,920
which connectors are in play.

564
00:23:44,920 --> 00:23:46,920
So the punchline lands hard, and it's true.

565
00:23:46,920 --> 00:23:48,920
Shadow automation isn't hidden. It's just unowned.

566
00:23:48,920 --> 00:23:52,920
And because it's unowned, it becomes an authority surface no one governs.

567
00:23:52,920 --> 00:23:54,920
Over time, those surfaces accumulate,

568
00:23:54,920 --> 00:23:58,920
more connections, more permissions, more exceptions, more just make it work fixes.

569
00:23:58,920 --> 00:24:01,920
That's how low code turns into distributed authorities brawl.

570
00:24:01,920 --> 00:24:04,920
The immediate instinct is to solve this with cleanup,

571
00:24:04,920 --> 00:24:08,920
inventories, reports, deletion campaigns, naming conventions,

572
00:24:08,920 --> 00:24:10,920
maybe a center of excellence email that says,

573
00:24:10,920 --> 00:24:12,920
"Please assign an owner, those help."

574
00:24:12,920 --> 00:24:16,920
But they aren't structural. The structural fix is to make ownership and authority explicit

575
00:24:16,920 --> 00:24:19,920
in the architecture not implied by who clicked create,

576
00:24:19,920 --> 00:24:22,920
which means the next step isn't another policy document.

577
00:24:22,920 --> 00:24:27,920
It's a mental model that forces the right separation, intent, decision, execution.

578
00:24:27,920 --> 00:24:30,920
So ownership and authority attach to the execution layer,

579
00:24:30,920 --> 00:24:32,920
where the side effects actually happen.

580
00:24:32,920 --> 00:24:35,920
The executive mental model intent decision execution.

581
00:24:35,920 --> 00:24:41,920
So here's the model that makes governance stop sounding like opinion, intent, decision, execution.

582
00:24:41,920 --> 00:24:45,920
Not as a diagram to impress a steering committee, as a way to stop automation

583
00:24:45,920 --> 00:24:47,920
from turning into conditional chaos.

584
00:24:47,920 --> 00:24:51,920
Intent is the business contract, it's what leadership believes the automation is for,

585
00:24:51,920 --> 00:24:53,920
and more importantly, what it must never do.

586
00:24:53,920 --> 00:24:57,920
This is where risk appetite lives in plain language, which data can move,

587
00:24:57,920 --> 00:25:00,920
where it can move, which approvals can be automated,

588
00:25:00,920 --> 00:25:03,920
which approvals must not be automated, what good looks like,

589
00:25:03,920 --> 00:25:07,920
what unsafe looks like, what the kill switch is, and who's allowed to pull it.

590
00:25:07,920 --> 00:25:10,920
Intent also defines the boundary of responsibility.

591
00:25:10,920 --> 00:25:14,920
If the automation touches HR records, that isn't a workflow.

592
00:25:14,920 --> 00:25:16,920
That's regulated behavior with legal consequences.

593
00:25:16,920 --> 00:25:20,920
If it can create a vendor, approve a payment or change access, that's not productivity.

594
00:25:20,920 --> 00:25:22,920
That's control.

595
00:25:22,920 --> 00:25:26,920
And if leadership can't state the intent as a contract, then the platform will invented

596
00:25:26,920 --> 00:25:29,920
from whatever configuration the builder happened to choose.

597
00:25:29,920 --> 00:25:30,920
You already know how that ends.

598
00:25:30,920 --> 00:25:32,920
Decision is the interpretation layer.

599
00:25:32,920 --> 00:25:35,920
This is where ambiguity belongs, classification, routing, triage.

600
00:25:35,920 --> 00:25:37,920
Is this request valid?

601
00:25:37,920 --> 00:25:38,920
Which team owns it?

602
00:25:38,920 --> 00:25:39,920
Is this high risk?

603
00:25:39,920 --> 00:25:41,920
Is this a duplicate?

604
00:25:41,920 --> 00:25:44,920
And yes, this is where AI reasoning fits because reasoning is probabilistic.

605
00:25:44,920 --> 00:25:49,920
It makes judgments, it uses patterns, it may be wrong, it may change its mind when the model changes.

606
00:25:49,920 --> 00:25:52,920
That is not a bug, that is the nature of decisioning.

607
00:25:52,920 --> 00:25:53,920
The critical rule is simple.

608
00:25:53,920 --> 00:25:55,920
Decisions can be uncertain.

609
00:25:55,920 --> 00:25:56,920
Execution cannot.

610
00:25:56,920 --> 00:25:58,920
Execution is the side effect layer.

611
00:25:58,920 --> 00:26:04,920
Rights, deletes, sends, approves, creates, grants access, moves files, opens tickets, closes tickets,

612
00:26:04,920 --> 00:26:07,920
everything that changes the state of another system.

613
00:26:07,920 --> 00:26:12,920
And because execution creates irreversible consequences, it must behave deterministically.

614
00:26:12,920 --> 00:26:18,920
Bounded retries, e-dempetency, audit trail, and compensation when something partially succeeds.

615
00:26:18,920 --> 00:26:23,920
If you can't explain what it did and you can't prove it did at once, you don't have automation,

616
00:26:23,920 --> 00:26:25,920
you have a liability with a green check mark.

617
00:26:25,920 --> 00:26:28,920
This separation is why the earlier anti-patterns are not bad design.

618
00:26:28,920 --> 00:26:30,920
There are category mistakes.

619
00:26:30,920 --> 00:26:35,920
Christmas tree flows collapse decision and execution into one tangled artifact.

620
00:26:35,920 --> 00:26:39,920
Compni-exhaustion happens when execution has no budget, no ceilings and no guardrails.

621
00:26:39,920 --> 00:26:45,920
Shadow automation happens when execution is allowed to exist without explicit ownership and authority.

622
00:26:45,920 --> 00:26:50,920
Intent, decision, execution fixes all three because it creates enforceable seams.

623
00:26:50,920 --> 00:26:53,920
Intent becomes a review surface.

624
00:26:53,920 --> 00:26:59,920
Does this automation align to a business contract and is there a named owner accountable for the contract?

625
00:26:59,920 --> 00:27:05,920
Decision becomes an isolation surface. If the decision logic changes, it doesn't automatically change the execution behavior.

626
00:27:05,920 --> 00:27:09,920
It produces a decision output and explicit classification or instruction.

627
00:27:09,920 --> 00:27:13,920
So the rest of the system can treat it as data, not as branching chaos.

628
00:27:13,920 --> 00:27:15,920
Execution becomes an enforcement surface.

629
00:27:15,920 --> 00:27:23,920
All side effects occur behind a deterministic gate that can be tested, audited, throttled, paused and replaced without rewriting the world.

630
00:27:23,920 --> 00:27:27,920
And here's the part executives usually appreciate once it's stated plainly.

631
00:27:27,920 --> 00:27:31,920
This is not more process. It's how you make automation survivable.

632
00:27:31,920 --> 00:27:37,920
Because over time, intent changes, decision logic evolves and connectors get updated, throttled, deprecated or replaced.

633
00:27:37,920 --> 00:27:42,920
If those changes directly mutate execution behavior, every change becomes a production risk.

634
00:27:42,920 --> 00:27:47,920
If execution is isolated and deterministic, change becomes local, explainable and reversible.

635
00:27:47,920 --> 00:27:52,920
Architecturally, this is the same separation every high scale system eventually rediscoveres.

636
00:27:52,920 --> 00:27:58,920
Policy versus mechanism, planning versus actuation, orchestration versus execution.

637
00:27:58,920 --> 00:28:02,920
Power automate doesn't exempt you from that. It just makes it easy to pretend you can skip it.

638
00:28:02,920 --> 00:28:07,920
You can't. So the working definition becomes intent tells you what must happen and what must never happen.

639
00:28:07,920 --> 00:28:10,920
Decision tells you what you think is happening right now.

640
00:28:10,920 --> 00:28:14,920
Execution does the minimum set of side effects once with proof.

641
00:28:14,920 --> 00:28:20,920
That model also gives you an immediate governance payoff. You can assign different accountability to each layer.

642
00:28:20,920 --> 00:28:25,920
Business leadership owns intent, product or operations owns decision criteria and routing outcomes.

643
00:28:25,920 --> 00:28:30,920
Platform and engineering owns deterministic execution units, capacity budgets and audit defensibility.

644
00:28:30,920 --> 00:28:32,920
No more shared guilt, real ownership.

645
00:28:32,920 --> 00:28:36,920
Now that the model is clear, the patterns stop being stylistic preferences.

646
00:28:36,920 --> 00:28:39,920
They become the mechanics of enforcing separation at scale.

647
00:28:39,920 --> 00:28:44,920
Next comes the practical architecture that makes this separation visible in every run history.

648
00:28:44,920 --> 00:28:48,920
Direct path thinking, thin orchestration and execution units you can actually govern.

649
00:28:48,920 --> 00:28:57,920
Pattern, direct path architecture. Direct path is what happens when you stop treating a flow like a decision tree and start treating it like an execution pipeline.

650
00:28:57,920 --> 00:29:02,920
Most makers build wide because the canvas invites it. Every exception becomes a branch.

651
00:29:02,920 --> 00:29:08,920
Every branch becomes its own little world. And pretty soon the run history is eight different stories depending on which path you triggered.

652
00:29:08,920 --> 00:29:14,920
That's the Christmas tree problem. Direct path flips the shape. One endpoint, minimal nesting. Decisions express this data.

653
00:29:14,920 --> 00:29:20,920
Execution expressed as a linear sequence that can be read top to bottom without expanding containers like your spelunking.

654
00:29:20,920 --> 00:29:26,920
The uncomfortable truth is this. Humans don't debug trees well. They debug narratives.

655
00:29:26,920 --> 00:29:32,920
A direct path flow reads like a narrative. God, classify, set context, execute units, terminate once.

656
00:29:32,920 --> 00:29:40,920
When it fails, it fails in a place you can point to. When it succeeds, you can explain why it succeeded without well. It depends which branch you hit. That distinction matters.

657
00:29:40,920 --> 00:29:58,920
Direct path isn't less logic. Its logic moved out of branching and into explicit decision outputs. Instead of forking execution into ten different branches, the flow decides once, stores the decision as a small set of values, then runs the same execution sequence with conditional no-ops where appropriate.

658
00:29:58,920 --> 00:30:05,920
In other words, decision doesn't create endless execution forks. Here is the architectural rule that makes it work. Decision can be complex.

659
00:30:05,920 --> 00:30:07,920
Execution must be simple.

660
00:30:07,920 --> 00:30:21,920
So a direct path orchestration flow does a few things well. It validates inputs early. It normalizes data into a predictable shape. It produces a decision artifact, classification, root, risk tier, whatever your domain uses.

661
00:30:21,920 --> 00:30:25,920
And then it calls deterministic execution units that perform side effects.

662
00:30:25,920 --> 00:30:33,920
If a given execution unit doesn't apply, the flow doesn't branch into a new universe. It just doesn't call that unit. That's how you collapse the number of run shapes.

663
00:30:33,920 --> 00:30:40,920
And the number of run shapes is directly correlated to mean time to explain when you have one dominant run shape, the run history becomes readable.

664
00:30:40,920 --> 00:30:48,920
When the run history becomes readable, incident response becomes mechanical instead of interpretive. This is also why direct path isn't a developer preference.

665
00:30:48,920 --> 00:30:57,920
It's a governance enabler. A single endpoint gives you one place to enforce termination semantics. One place to stamp a correlation ID. One place to emit consistent logging.

666
00:30:57,920 --> 00:31:08,920
One place to apply a kill switch pattern. One place to measure latency and retries without doing forensic math across 20 branches. Executives like direct path. For the reason they rarely articulate. It localizes impact.

667
00:31:08,920 --> 00:31:16,920
When a flow branches wildly, a small change can affect anything. When a flow stays on a direct path, a change typically affects one decision output or one execution unit.

668
00:31:16,920 --> 00:31:20,920
That's the difference between deployments are scary and deployments are boring.

669
00:31:20,920 --> 00:31:32,920
And yes, there's a measurable impact. MTT e-drops because you stop scrolling through nested scopes, trying to figure out which branch ran. Change failure rate drops because you're not duplicating actions across branches and forgetting to update one of them.

670
00:31:32,920 --> 00:31:42,920
And deployment time drops because people stop treating the flow like an artifact you don't touch unless you're desperate. Now, the common objection is, but branching is how you model real business complexity.

671
00:31:42,920 --> 00:31:54,920
So branching is how you hide business complexity inside execution. Direct path still supports complexity. It just forces you to express it in one of two acceptable places, upstream in decisioning or downstream in isolated execution units.

672
00:31:54,920 --> 00:32:04,920
The main orchestration stays thin and readable because its job is not to be clever. Its job is to be explainable. This is where the diamond versus direct path idea is useful. But the lesson isn't the geometry.

673
00:32:04,920 --> 00:32:20,920
The lesson is that collapse is a design choice. You can build a flow that spreads and then reconverges and that's already better than the Christmas tree. But the direct path goes further. It avoids the spread in the first place by turning different outcomes into different data than using that data to call the right execution units.

674
00:32:20,920 --> 00:32:24,920
So the orchestration stays narrow and narrow is governable.

675
00:32:24,920 --> 00:32:38,920
One more subtle payoff. Direct path forces ownership boundaries to become explicit. When execution happens through child flows or execution units, you can assign owners to units, set API budgets per unit and control side effects with policy.

676
00:32:38,920 --> 00:32:52,920
You can't do that when everything is buried inside one mega-flows branches. So if you want automation excellence to be a property of the estate not a hope, direct path is one of the few patterns that makes the estate converge toward readability instead of drifting toward conditional chaos.

677
00:32:52,920 --> 00:33:02,920
But a pattern is only real if the organization can enforce it. And enforcing it requires the next discipline designed before build. So the path is intentional before the canvas tempts you into entropy.

678
00:33:02,920 --> 00:33:18,920
Design then build the minimum viable design discipline. Direct path sounds like a pattern problem. It isn't. It's a design problem that shows up as a pattern problem because people build before they decide what they're building. And the platform encourages that power automates canvas is optimized for momentum, trigger action, condition, done.

679
00:33:18,920 --> 00:33:26,920
The UI rewards movement it doesn't reward intent capture. So team street design like paperwork something you do after the flow exists when you already know it works. That is backwards.

680
00:33:26,920 --> 00:33:34,920
In an automation control plane design isn't documentation. Design is the act of declaring boundaries before the system turns your defaults into architecture.

681
00:33:34,920 --> 00:33:41,920
If you don't declare boundaries, the platform will. And it will choose convenience every time. So the minimum viable discipline isn't a 40-page document that nobody reads.

682
00:33:41,920 --> 00:33:51,920
It's a short design step that forces the uncomfortable questions early while the flow is still cheap to change. The simplest useful design artifact has four parts. First, scope.

683
00:33:51,920 --> 00:33:59,920
What problem is this automation solving and what problem is it not solving? This sounds obvious until you watch a flow absorb just one more thing for two years.

684
00:33:59,920 --> 00:34:05,920
A written scope is not bureaucracy. It's a stop sign that prevents Christmas tree growth from becoming culturally normal.

685
00:34:05,920 --> 00:34:19,920
Second, systems touched, not SharePoint in Teams. Be explicit. Which tenant services, which line of business systems, which external APIs, which data stores? Because every system you touch adds throttles, failure modes and audit obligations you don't get to ignore later.

686
00:34:19,920 --> 00:34:32,920
Third, data movement. What data crosses boundaries and which boundaries matter? Internal to the tenant. Across environments, out to third parties, into email, into chat, into files, whenever data moves risk increases. That's not fear. That's physics.

687
00:34:32,920 --> 00:34:40,920
And if you can't explain data movement in one diagram, you don't understand what you're building. Fourth, the security conversation starter. Which identities execute the side effects?

688
00:34:40,920 --> 00:34:51,920
Service account, service principle, delegated user connection, managed identity where available, what roles are required, what DLP boundary applies, and what the kill switch is. Not we can disable the flow.

689
00:34:51,920 --> 00:35:02,920
A real kill switch, a control you can activate without editing the flow under pressure. Those four parts take an hour, not a week. But they surface the true bottlenecks before build begins. This is where Enablers matter.

690
00:35:02,920 --> 00:35:19,920
In personal automation, the maker owns everything. Their mailbox, their SharePoint access, their connectors, their licenses. In enterprise automation, the builder doesn't own the prerequisites. They need service accounts, premium licensing approvals, connector reviews, environment access, sometimes even firewall or conditional access alignment.

691
00:35:19,920 --> 00:35:27,920
If you don't identify those early, you don't build faster. You just hit a wall, mid build, and start hard coding to get unstuck. That's how production gets poisoned.

692
00:35:27,920 --> 00:35:35,920
Temporary connections become permanent, environment variables become hard coded values, and will fix it later becomes the estate's architecture.

693
00:35:35,920 --> 00:35:46,920
Minimum viable design also forces non-functional requirements to exist as control surfaces, not as vague wishes. Latency, how fast does this need to complete, and what happens when it doesn't?

694
00:35:46,920 --> 00:35:51,920
Resiliency, what failures are acceptable, and what failures require a human.

695
00:35:51,920 --> 00:35:56,920
Ordered, what evidence must exist after execution, and where is it recorded?

696
00:35:56,920 --> 00:36:04,920
Identity, what is the least authority required to perform side effects, and how is it reviewed? Those aren't nice to have. They're what make deterministic execution possible.

697
00:36:04,920 --> 00:36:14,920
And yes, this is where you deliberately separate scaffolding from production. The scaffolding flow exists to discover unknowns. It can be verbose, inefficient, instrumented for visibility.

698
00:36:14,920 --> 00:36:21,920
But it must still sit on the right foundation, correct environment, correct identity model, correct connection strategy, correct DLP boundary,

699
00:36:21,920 --> 00:36:27,920
otherwise the learning you do becomes unusable, because the moment you migrate to production controls, the flow behaves differently.

700
00:36:27,920 --> 00:36:33,920
That is the hidden cost of skipping design, you validate the wrong thing. Design then build is not about slowing makers down.

701
00:36:33,920 --> 00:36:42,920
It's about preventing the platform from converting prototype shortcuts into permanent control plane debt, because without design, the organization doesn't actually move faster.

702
00:36:42,920 --> 00:36:58,920
It moves earlier. It ships sooner than pays later. During outages, during audits, during mergers, during connector deprecations, during identity hardening, during the moment someone realizes the flow that runs payroll notifications is authenticated with an employee's personal account.

703
00:36:58,920 --> 00:37:06,920
So the rule is simple, if you can't describe the automation before you build it, scope systems, data movement, identity, you're not building an automation.

704
00:37:06,920 --> 00:37:20,920
And once you accept that, the next step becomes unavoidable. Every side effect needs a deterministic gate with transaction semantics, not because engineers like ceremony, because execution creates consequences, and consequences require design.

705
00:37:20,920 --> 00:37:27,920
TDD redefine, transaction-driven design is mandate. This is where the conversation stops being philosophical and becomes enforceable.

706
00:37:27,920 --> 00:37:41,920
TDD in this episode does not mean test-driven development. Nobody is asking citizen developers to write unit tests in a drag and drop canvas and pretend that's the same thing. TDD here means transaction-driven design. And it's a mandate, because enterprise automation is not logic.

707
00:37:41,920 --> 00:37:45,920
It's transactions across systems you don't control, with side effects you can't un-send.

708
00:37:45,920 --> 00:37:58,920
A transaction in plain terms is any automation step that changes the state of the world in a way that matters. Create a record, update a record, delete a record, send an email, post a message, create a ticket, approve something, change access, move a file.

709
00:37:58,920 --> 00:38:04,920
If it produces an external consequence, it's a transaction. Everything else is preparation.

710
00:38:04,920 --> 00:38:22,920
Here's the uncomfortable truth, the platform makes it trivial to create side effects, but it does not guarantee transaction semantics, it will happily run twice, it will happily retry, it will happily partially succeed, and it will happily leave you with a run history that says failed after it already sent the email and created the record. That distinction matters.

711
00:38:22,920 --> 00:38:33,920
So transaction-driven design starts with a simple rule. Every side effect must be treated as a transaction, and every transaction must have three properties, idempotency, an audit trail, an rollback or compensation plan.

712
00:38:33,920 --> 00:38:47,920
If the same transaction runs twice, the outcome is still correct. Not mostly, correct. No duplicate tickets, no double approvals, no duplicate emails that trigger customer escalations, no double rights that corrupt downstream reporting.

713
00:38:47,920 --> 00:38:53,920
If your automation can't be safely rerun, you don't have reliability, you have a one-shot script pretending to be infrastructure.

714
00:38:53,920 --> 00:39:07,920
Audit trail means the automation leaves evidence that a human can use to explain what happened later. Not just the flow succeeded, evidence, correlation IDs, transaction IDs, the target record identifiers, and the decision output that led to the execution.

715
00:39:07,920 --> 00:39:20,920
Because when incidents happen, you won't be debugging code, you'll be defending actions. Rollback or compensation means when the transaction partially succeeds, you have a defined behavior to restore correctness. Some systems support true rollback, most don't.

716
00:39:20,920 --> 00:39:31,920
So in practice, compensation is the pattern. Create a reversing action, mark a record for remediation, reopen a ticket, post a corrective message, or move the item into a quarantine queue with human review.

717
00:39:31,920 --> 00:39:42,920
The point isn't perfection, the point is bounded harm. This is why this is executive grade. A duplicate email is annoying. A duplicate approval is a control failure. A duplicate record in a financial system is audit exposure.

718
00:39:42,920 --> 00:39:56,920
A, we don't know if it ran twice answer, is not operational uncertainty. It's governance failure. Transaction-driven design makes that failure impossible to normalize because it forces every builder to declare where side effects happen and how they're contained.

719
00:39:56,920 --> 00:40:07,920
Now connect this back to the model. Intent, decision, execution. Transaction-driven design is the execution layer discipline. It's the thing that keeps probabilistic decisioning from bleeding into irreversible actions.

720
00:40:07,920 --> 00:40:19,920
So, instead of treating send email as just another action, you treat it as a side effect with semantics. You ask, "What is the unique key for this event? What proves we sent it? What stops a retry from sending it again?"

721
00:40:19,920 --> 00:40:27,920
Where is the evidence stored? What happens if the connector times out after sending, but before returning success? Because yes, that happens and the platform will not solve it for you.

722
00:40:27,920 --> 00:40:46,920
This is also where execution units come from. An execution unit is the smallest chunk of automation that performs one business relevant side effect set with an explicit input contract and explicit output evidence. Think of it as a function with receipts. It takes a payload plus a transaction identity, performs the side effect once and returns a result you can log and reason about.

723
00:40:46,920 --> 00:40:56,920
Orchestration should call execution units. It should not perform side effects directly because orchestration is where variability lives, routing, branching, exception decisions, retries and timing.

724
00:40:56,920 --> 00:41:05,920
Execution units are where determinism lives. This is the design move that changes everything without hype. You stop building flows as processed diagrams. You build them as a set of transactions with proofs.

725
00:41:05,920 --> 00:41:24,920
And when you do that, you gain a control surface executives actually understand you can measure how many transactions exist, who owns them, what data they touch, what their API budgets are and whether they can be safely rerun. That's not developer elegance, that's governance you can audit, deterministic scaffolding, prototype without poisoning production. Most organizations skip a phase.

726
00:41:24,920 --> 00:41:36,920
Then they act surprised when the prototype becomes production, the missing phase is deterministic scaffolding, a working proof that deliberately trades efficiency for visibility while refusing to compromise the foundation. That distinction matters.

727
00:41:36,920 --> 00:41:51,920
A scaffolding flow is not an MVP, an MVP ships and accumulates obligations. Scaffolding exists to answer unknowns, connector behavior, data shape, latency variance, throttling patterns and the one error condition nobody documented because it only happens on Tuesdays.

728
00:41:51,920 --> 00:42:08,920
Scaffolding has one job, make the unknowns obvious before they become outages, but scaffolding fails when teams treat it as a shortcut around governance. They build it in the default environment, they authenticate with personal connections, they hard code URLs and IDs for now. They skip DLP conversations because it's just a prototype.

729
00:42:08,920 --> 00:42:25,920
Then when it works, the organization congratulates itself and asks to just put it in prod and now you're trapped because the foundation is wrong. The prototype validated a configuration that production will not allow or worse production will allow it because nobody enforced the boundary and the prototype's bad assumptions become permanent.

730
00:42:25,920 --> 00:42:31,920
So deterministic scaffolding has a non-negotiable rule. Foundation first, changeable layers above.

731
00:42:31,920 --> 00:42:42,920
The foundation is the control plane, environment strategy, identity model, connector boundaries, connection references, environment variables and the minimum logging surfaces you need to prove what happened.

732
00:42:42,920 --> 00:42:48,920
Those choices must be production aligned even in scaffolding because changing them later is expensive and destabilizing.

733
00:42:48,920 --> 00:42:56,920
It's like building a house on sand because it was faster than pouring concrete. Everything on top of that can be ugly. In scaffolding, efficiency is deferred by design.

734
00:42:56,920 --> 00:43:03,920
You accept redundant actions, you accept extra logging, you accept verbose run history, you accept waste that makes behavior legible.

735
00:43:03,920 --> 00:43:07,920
Because early in a build, the expensive thing isn't API calls, it's uncertainty.

736
00:43:07,920 --> 00:43:13,920
Teams love optimizing early because it feels like competence, but early optimization is how you hide the very signals you need.

737
00:43:13,920 --> 00:43:23,920
Which step is slow? Which connector times out? What the actual payload looks like? How often the trigger fires? Which errors are transient? Which are deterministic? And which are just your own logic failing?

738
00:43:23,920 --> 00:43:28,920
Visibility beats elegance during discovery, so the scaffolding flow should be opinionated about observability.

739
00:43:28,920 --> 00:43:35,920
It should stamp a correlation ID immediately and propagate it through every call, every child flow, every record write, every ticket, every message.

740
00:43:35,920 --> 00:43:42,920
Not as decoration, as the thread you'll pull during incident reviews. It should log decision outputs as data, not as branching logic.

741
00:43:42,920 --> 00:43:48,920
If there is rooting or classification, capture it explicitly so you can compare what we decided to, what we executed.

742
00:43:48,920 --> 00:44:00,920
That becomes the evidence trail later when someone asks why a particular downstream system changed. It should validate inputs early and fail fast, when inputs are invalid, rather than limping forward and generating partial side effects.

743
00:44:00,920 --> 00:44:07,920
A prototype that kind of works is how you get the worst production incidents, half completed transactions with no compensation plan.

744
00:44:07,920 --> 00:44:13,920
And yes, scaffolding should plan for exception handling without implementing every last branch. This is where most makers do the opposite.

745
00:44:13,920 --> 00:44:19,920
They add exception branches early because they're nervous. The flow becomes a Christmas tree before the core path is even stable.

746
00:44:19,920 --> 00:44:26,920
Then they can't see what failed because the error is swallowed in a scope that handles it by sending an email to someone who never reads it.

747
00:44:26,920 --> 00:44:36,920
Deterministic scaffolding plans the exception model as structure. It reserves a place for it, a quarantine path, an escalation unit, a compensation unit, and a clear terminate with reason behavior.

748
00:44:36,920 --> 00:44:46,920
But it does not implement a dozen bespoke exceptions while the core execution shape is still changing. Because early exception handling usually encodes wrong assumptions. It treats symptoms instead of causes.

749
00:44:46,920 --> 00:44:56,920
So when does scaffolding become production grade? There's a pivot point and it's not when the business likes it. It's when the unknowns are resolved and the run shape stabilizes.

750
00:44:56,920 --> 00:45:06,920
So the inputs are predictable. Connector behavior is understood. Transaction boundaries are defined and the decision outputs are explicit. At that point you refactor. You remove visibility only noise.

751
00:45:06,920 --> 00:45:15,920
You implement bounded retries as policy not habit. You replace broad queries with targeted filters. You convert repeated blocks into execution units. You enforce direct path shape intentionally.

752
00:45:15,920 --> 00:45:22,920
And you do it before users depend on it. Because once users depend on it, every refactor becomes politically expensive and operationally risky.

753
00:45:22,920 --> 00:45:32,920
When building hardens, the shortcuts become architecture. The incident gets scheduled for later. Deterministic scaffolding is how you keep that from happening. It preserves the one thing low code destroys by default.

754
00:45:32,920 --> 00:45:36,920
The ability to learn without accidentally shipping your learning into production.

755
00:45:36,920 --> 00:45:37,920
Pattern.

756
00:45:37,920 --> 00:45:38,920
Orchestration thin.

757
00:45:38,920 --> 00:45:48,920
Execution thick. Deterministic scaffolding solves the prototype became production problem. But it doesn't solve the bigger problem. Most estates still put the wrong responsibilities in the wrong place.

758
00:45:48,920 --> 00:45:57,920
Orchestration flows that are thick with side effects, connector quirks, retries and data mutations. Then they wonder why every change feels dangerous. This is the pattern that corrects that drift.

759
00:45:57,920 --> 00:46:04,920
Orchestration thin. Execution thick. Orchestration is not where work happens. Orchestration is where work gets authorized, shaped and rooted.

760
00:46:04,920 --> 00:46:16,920
Execution is where side effects happen. Once. With proof. That distinction matters because it gives you an architecture you can govern. Not a canvas you can decorate. A thin. Orchestration flow has a tight job description.

761
00:46:16,920 --> 00:46:27,920
It does four things and it does them predictably. First, it triggers and identifies the transaction. It stamps a correlation ID immediately because without correlation you don't have observability, you have anecdotes.

762
00:46:27,920 --> 00:46:37,920
Second, it applies guardrails. Validate inputs. Inforce intent boundaries. Check eligibility. Short circuit early when the contract isn't met. Not by terminating later.

763
00:46:37,920 --> 00:46:53,920
By preventing useless execution upfront, third, it does routing and produces a decision artifact. Not branching chaos. A small explicit output, root, risk tier, action list, classification, something you can log, audit and replay. Fourth, it delegates. It calls execution units that do the real work. And that's it.

764
00:46:53,920 --> 00:47:16,920
If orchestration starts performing rights, approvals, file moves or ticket creation directly, it's no longer orchestration. It's a Christmas tree in training. Execution thick means the opposite. The execution units carry the weight. An execution unit owns the side effect. It owns idempotency. It owns bounded retries. It owns connector specific failure behavior. It owns compensation. It produces evidence. It returns an outcome that can be logged and reasoned about.

765
00:47:16,920 --> 00:47:28,920
So when a connector changes behavior, you update the execution unit. When an API starts throttling, you tune the unit. When the business wants an additional side effect, you add a new unit or extend an existing one with a contract change.

766
00:47:28,920 --> 00:47:35,920
You don't rewrite the orchestration narrative. You don't multiply branches. You contain change where change belongs at the side effect boundary.

767
00:47:35,920 --> 00:47:45,920
This is also how you avoid the most common low-code trap. Business logic mixed with connector mechanics. In a thick orchestration flow, you'll see things like if customer is tier one, create a ticket.

768
00:47:45,920 --> 00:47:55,920
Then post to teams. Then update dataverse. But only if the SharePoint call returns 200 unless it times out in which case retry three times, unless it's Friday then email finance.

769
00:47:55,920 --> 00:48:07,920
That isn't business logic. That's connector behavior welded to policy. And over time it becomes impossible to tell which part of the flow exists because the business asked for it. And which part exists because the platform misbehaved once and someone patched around it.

770
00:48:07,920 --> 00:48:21,920
Execution thick forces that separation business logic expresses intent and decision outputs execution units express mechanics how to reliably write send create or update under the constraints of real services. Now the best part ownership becomes assignable.

771
00:48:21,920 --> 00:48:30,920
A thin orchestration flow can be owned by the process owner or platform team because it's mostly stable. It triggers validates roots and delegates.

772
00:48:30,920 --> 00:48:42,920
Execution units can be owned by the teams that actually understand the side effect domain. The dataverse team owns right customer record. The ITSM team owns create incident. The messaging team owns notified team's channel.

773
00:48:42,920 --> 00:48:53,920
Security owns grant access finance owns create vendor with controls. That's not politics. That's accountability aligned to authority. And because execution units are isolated, you can bound blast radius.

774
00:48:53,920 --> 00:49:05,920
If the create ticket unit starts failing or getting throttled, you can disable or degrade that unit without taking down the entire orchestration. You can quarantine. You can fall back to manual. You can stop side effects without losing intake visibility.

775
00:49:05,920 --> 00:49:18,920
This pattern also quietly sets you up for the agentic future. Everyone is rushing into without the discipline to survive it. Decision agents belong upstream. They classify triage recommend. They live in the decision layer where probabilistic behavior is acceptable.

776
00:49:18,920 --> 00:49:31,920
But the moment an agent crosses into execution, writes, deletes, approvals, everything must snap back to deterministic controls. Execution units become the airlock. Structured inputs, strict permissions, bounded retreats and evidence trails.

777
00:49:31,920 --> 00:49:41,920
So when people say we want AI to handle more, the correct architectural response is fine. But it will only talk to execution through deterministic units that you can audit, throttle and kill.

778
00:49:41,920 --> 00:49:54,920
Orchestration thin execution thick is how you keep autonomy from becoming conditional chaos. It's not a stylistic preference, it's entropy management. And once it's in place, child flows stop being a convenience feature. They become the execution architecture.

779
00:49:54,920 --> 00:50:07,920
Pattern. Child flows as execution units. Child flows are where most power automated states either become governable or they become a distributed mess with nicer diagrams. Use correctly, child flows are not reusable chunks.

780
00:50:07,920 --> 00:50:27,920
That's the beginning framing. Architecturally they are execution units. The deterministic boundary where side effects happen with contracts, controls and receipts. That distinction matters. Because orchestration thin, execution thick only works if execution is actually separable. Child flows give you the seam. Here's the simple version.

781
00:50:27,920 --> 00:50:42,920
A child flow should behave like a function. Clear inputs, clear outputs, predictable behavior, no hidden dependencies, no it reads a variable that happens to exist upstream. No it grabs context from whatever connector the parent used. Inputs in, side effects, evidence out.

782
00:50:42,920 --> 00:50:50,920
And once you treat child flows that way, three things become possible that were basically impossible inside a mega flow. First reuse without duplication.

783
00:50:50,920 --> 00:51:16,920
The Christmas tree anti-pattern duplicates actions across branches. Create ticket exists in four different places because each branch needed it slightly differently. That's how you get drift. One branch gets updated, the other doesn't. And now the system behaves differently depending on which path you hit. A child flow collapses that. There is one create IT-Sem ticket execution unit, one right dataverse record unit, one sent notification unit. If behavior changes it changes once the estate stops forking itself.

784
00:51:16,920 --> 00:51:31,920
Second, testing becomes real. Power automate testing inside a giant flow is mostly theater because you can't isolate behavior. Everything is coupled to test one side effect. You must trigger the entire process, hit the right branch and hope the dependencies behave.

785
00:51:31,920 --> 00:51:38,920
With child flows, you can test an execution unit directly with controlled inputs. You can rerun it, you can confirm idem potency.

786
00:51:38,920 --> 00:51:54,920
You can validate error handling, you can prove that a retry won't double right. That's not pro code purity, that's incident prevention. Third, governance becomes enforceable. This is the payoff executives and auditors actually care about. If all side effects live inside child flows, you can apply rules at the execution boundary.

787
00:51:54,920 --> 00:52:10,920
Who can own and edit execution units? Which connectors are allowed inside execution units? What identity is permitted to execute them? What logging must be produced? What the retry policy is? What the compensation path is? A child flow becomes a control surface. A place you can enforce intent through configuration.

788
00:52:10,920 --> 00:52:20,920
Because you've made the place where intent turns into action explicit. Now there's a trap here. And most teams fall into it. They turn child flows into micro flows for everything. One child flow to pass Jason.

789
00:52:20,920 --> 00:52:35,920
One to set variables, one to format the string, one to check a condition, 50 child flows later nobody can trace execution and the run history becomes a set of nested invocations that reads like a dependency graph designed by a committee. That isn't modularity, that's sprawl with better naming.

790
00:52:35,920 --> 00:52:43,920
So the rule is blunt, child flows are for side effects. If the step doesn't change the state of another system, it probably doesn't deserve to be an execution unit.

791
00:52:43,920 --> 00:52:55,920
Keep orchestration thin, but don't fracture it into nothing. The orchestration still owns intent enforcement, decision output, correlation and delegation execution units own transactions. That also means child flows need contracts.

792
00:52:55,920 --> 00:53:08,920
Inputs should include the transaction identity, correlation ID, business key, whatever makes identity possible. If you can't a dupe, you can't safely retry and if you can't safely retry, you'll eventually ship duplicate side effects under load.

793
00:53:08,920 --> 00:53:19,920
Outputs should include receipts, the record ID created, the ticket number, the message ID, the status code, the timestamp, the decision context used because it succeeded isn't evidence, it's a feeling.

794
00:53:19,920 --> 00:53:31,920
And yes, ownership gets cleaner, you can assign execution units to domain owners, you can version them, you can ring fence updates, you can run change control on the units that actually create blast radius instead of policing the entire canvas.

795
00:53:31,920 --> 00:53:45,920
This is how you convert power automate from a bunch of flows into an automation control plane, not by centralizing everything, by standardizing the boundaries where authority turns into action. And when those boundaries exist, something else becomes possible, you can stop paying for noise at runtime.

796
00:53:45,920 --> 00:53:52,920
Because now you can terminate early, enforce trigger discipline and keep the orchestration path readable while execution stays deterministic. That's next.

797
00:53:52,920 --> 00:54:03,920
Pattern, terminate early and stop paying for noise. Most power automated states don't have a performance problem, they have a willingness to execute problem. The platform will happily run your flow for every event you allow it to see.

798
00:54:03,920 --> 00:54:18,920
It will run it when the email is irrelevant, when the record change is cosmetic, when the attachment is missing, when the payload is empty, when the customer ID is null, when the request is obviously a duplicate, and when the status already equals the value you're about to set.

799
00:54:18,920 --> 00:54:30,920
Then it will give you a run history full of green checkmarks and polite terminations. And you will pay for all of it capacity, API calls, connector quotas, log volume and human attention during incident response. So this pattern is simple.

800
00:54:30,920 --> 00:54:37,920
Terminate early and stop paying for noise. But terminate early has two different meanings and most teams only implement the expensive one.

801
00:54:37,920 --> 00:54:55,920
The expensive version is let the trigger fire run 10 actions, then realize you didn't need to run, then terminate. That's not early. That's late with manners. The cheap version is prevent the run from existing. Trigger conditions are the actual guardrail. They block the run before the engine spins up. They keep run history clean. They reduce capacity burn.

802
00:54:55,920 --> 00:55:09,920
They prevent noisy neighbor behavior inside your own tenant because the automation never joins the competition for shared resources. And this is where governance becomes real. If you don't enforce trigger discipline, you are accepting uncontrolled execution as your default. You are choosing entropy.

803
00:55:09,920 --> 00:55:18,920
Once a run exists, early termination still matters, but it's the second line of defense. This is where one sided conditions earn their keep, not branching logic, escape hatches.

804
00:55:18,920 --> 00:55:29,920
If the input doesn't meet the contract, terminate with a reason code and stop. If the item is already processed, terminate and stop. If the decision layer classifies the request as manual review, terminate and hand off.

805
00:55:29,920 --> 00:55:41,920
If the downstream system is in a known outage state, terminate and quarantine rather than hammering it with retreats. That's deterministic behavior. The flow exits on purpose, not by accident. This pattern also fixes a subtle audit and operations problem.

806
00:55:41,920 --> 00:55:50,920
Ambiguous runs. When a flow runs and does nothing, it still leaves a trace. But that trace is confusing. Auditors and incident responders see a run and assume it mattered.

807
00:55:50,920 --> 00:56:02,920
Operations teams see hundreds of daily runs and can't distinguish meaningful execution from noise. So they either ignore the logs, which is what everyone does until the day they can't, or they waste time interpreting runs that should never have existed.

808
00:56:02,920 --> 00:56:11,920
Noise is not neutral, noise is cost and confusion. So the architectural rule is if a run will do nothing, it shouldn't run. That means pushing conditions as far upstream as possible.

809
00:56:11,920 --> 00:56:24,920
Filter at the trigger, filter at the connector query. Reject before loops, reject before child flows. And if you're calling execution units, don't call them just in case. Call them only when the decision artifacts says they apply. Now about retries because teams keep using retries as therapy.

810
00:56:24,920 --> 00:56:33,920
Retries are not a developer convenience. They're part of your reliability policy and they belong to the execution layer, not the orchestration layer. Orchestration should not retry side effects blindly.

811
00:56:33,920 --> 00:56:41,920
Orchestration doesn't know if the side effect already happened. It only knows it didn't get a clean success response. Retrying at orchestration is how you manufacture duplicates.

812
00:56:41,920 --> 00:56:53,920
Execution units can retry because they can implement idempotency. They can check whether the transaction already applied. They can bound retries by policy. They can back off. They can emit receipts. They can stop. That's the difference between resilience and churn.

813
00:56:53,920 --> 00:57:03,920
So the rule is blunt. If you can't prove an action is safe to retry, you're not retrying. You're doubling down. And when you stop paying for noise, the platform's behavior gets less mysterious.

814
00:57:03,920 --> 00:57:07,920
Latency becomes explainable because you're not competing with your own useless runs.

815
00:57:07,920 --> 00:57:15,920
Throtling becomes a meaningful signal because you're not flooding connectors with irrelevant calls. Run history becomes readable because most runs now represent actual work.

816
00:57:15,920 --> 00:57:28,920
Not a trigger that fired because someone edited a description field. Executives will like this pattern for the simplest reason. It produces immediate cost control without a re-platforming project. You reduce wasted execution, you reduce connector pressure, you reduce operational churn.

817
00:57:28,920 --> 00:57:36,920
And you get cleaner metrics because your dashboard stop measuring activity and start measuring outcomes. Architects will like it because it restores determinism.

818
00:57:36,920 --> 00:57:45,920
The run shape becomes consistent. The path becomes readable. The execution surface becomes bounded. And governance teams will like it because it creates enforceable standards that are easy to audit.

819
00:57:45,920 --> 00:57:54,920
Does this flow have trigger conditions? Does it have explicit escape conditions? Does it avoid null runs? Does it enforce bounded retries at the transaction boundary?

820
00:57:54,920 --> 00:57:57,920
That's how you turn terminate from an action into a design principle.

821
00:57:57,920 --> 00:58:06,920
Next, there's an even cheaper way to stop paying eliminate actions that never needed to exist in the first place by doing work in expressions, not API calls.

822
00:58:06,920 --> 00:58:14,920
Pattern, expressions before actions. Expressions before actions is the point where a low code estate either grows up or it keeps paying an invisible tax forever.

823
00:58:14,920 --> 00:58:22,920
Because actions aren't steps, they're calls, they're latency, they're quota consumption, they run history noise. And they're a governance surface you just expanded for no reason.

824
00:58:22,920 --> 00:58:33,920
The platform makes actions feel cheap because they're dragable. But the execution engine doesn't care about your canvas. It cares about how many operations you asked it to perform and how many external systems you pulled into the transaction.

825
00:58:33,920 --> 00:58:50,920
So the rule is simple. If the platform can compute it locally, don't outsource it to an action. That means use expressions to transform data, choose values and shape payloads before you start calling connectors, especially in loops, especially in orchestration, especially anywhere the flow could fan out under load. Here's what most people miss.

826
00:58:50,920 --> 00:59:05,920
A condition block is not just logic. It's a container. Containers create nesting, nesting hides behavior. Hidden behavior increases mean time to explain. And if the condition exists purely to set one value, you didn't need a container. You needed an expression. This is the common example.

827
00:59:05,920 --> 00:59:14,920
Two nearly identical email actions, one for case A and one for case B. Because the recipient or subject changes, that's not branching, that's duplication.

828
00:59:14,920 --> 00:59:29,920
One email action can handle both if you compute the two and subject using an IF expression. Now you have one place to change the template, one place to log evidence, and one place to enforce execution rules like emails must include correlation ID.

829
00:59:29,920 --> 00:59:45,920
Branching is how you create drift. Expressions are how you keep execution narrow. Same logic applies to variable spam. Power automate makes it easy to set variables. So team set variables for everything. Store the output, store the intermediate value, store a flag, store a string they could compute at the point of use.

830
00:59:45,920 --> 01:00:02,920
And every variable set is another action, another log entry, another opportunity for confusion and another place for somebody to temporarily change behavior. Most of the time the data is already accessible. The platforms outputs are effectively global within the run. You don't need to copy a value into a variable to use it later.

831
01:00:02,920 --> 01:00:14,920
You just need to reference it when you do use a variable it should be because the value genuinely changes over time and you need that state. Not because you wanted a label that distinction matters because variables are where accidental complexity hides.

832
01:00:14,920 --> 01:00:23,920
The flow stops reading like an execution narrative and starts reading like a memory game. Then there's filtering, which is where states light money on fire without even realizing it.

833
01:00:23,920 --> 01:00:38,920
If you see get items and then filter array right after you're usually looking at waste you pull too much data then you paid more actions to throw most of it away and then you looped over what's left anyway filter upstream use the connector query options where they exist.

834
01:00:38,920 --> 01:00:53,920
And when the connector can't filter the way you need still prefer expressions that shape the data set once rather than action patterns that explode per item.

835
01:00:53,920 --> 01:01:00,920
Select join coalesce equals and substring operations don't need to be actions in many cases they can be computed in line. This is where teams get nervous about readability and that fear is valid.

836
01:01:00,920 --> 01:01:10,920
If the expression heavy flows can become cryptic people paste a terrifying formula into a field and call it optimized then nobody understands it and the next person adds another condition block to avoid touching it.

837
01:01:10,920 --> 01:01:14,920
So the governance rule here isn't be clever it's standardized.

838
01:01:14,920 --> 01:01:23,920
If the organization is going to use expressions to reduce actions then it also needs standards naming conventions for compose outputs when you truly need them.

839
01:01:23,920 --> 01:01:31,920
A small set of approved expression patterns and a bias towards simple repeatable constructs over one line is that require a decoder ring.

840
01:01:31,920 --> 01:01:39,920
In architectural terms expressions before actions is not a micro optimization its capacity control every unnecessary action is a small leak.

841
01:01:39,920 --> 01:01:46,920
A thousand small leaks is how tenants get noisy run histories get unreadable and throttling becomes weakly folklore.

842
01:01:46,920 --> 01:02:04,920
And it directly connects back to intent decision execution expressions belong in orchestration and decision shaping normalize inputs compute decision outputs build payloads actions belong in execution units side effects with receipts when you keep that boundary clean you get a flow that runs faster costs less and fails in fewer places.

843
01:02:04,920 --> 01:02:13,920
More importantly you get a run history that tells the truth what did we decide and what did we execute and that's the point optimization that destroys explainability isn't excellence is just faster entropy.

844
01:02:13,920 --> 01:02:32,920
Next the problem that quietly amplifies all the others nesting because once you can reduce actions you also need to reduce depth or you'll still be debugging like an archaeologist pattern flatten nesting to restore explainability nesting is the tax you pay when you let the canvas thing for you conditions inside loops loops inside conditions

845
01:02:32,920 --> 01:02:53,920
Scopes inside switches and then a tricatch scope wrapped around the whole thing because somebody got tired of seeing red failures in run history it looks organized it isn't nesting hides execution shape and once execution shape is hidden you lose the only thing that matters in operations the ability to explain what happened without interpreting amaze that distinction matters.

846
01:02:53,920 --> 01:03:22,920
A deeply nested flow can still work but it stops being explainable and once it stops being explainable meantime to explain rises ownership evaporates and every incident becomes a debate about which container did what flattening nesting is how you take that power back the goal isn't to eliminate all containers the goal is to make the run shape readable predictable stable so a human can scan a run and understand what did we decide what did we do and where did it fail here's the foundational misunderstanding makers think nesting is structure it's not.

847
01:03:22,920 --> 01:03:50,920
nesting is opacity so the first move is path collapse pull common work out of branches if both sides of a condition do the same four actions and only differ in one value that condition doesn't deserve to branches with duplicated execution compute the differing value once then run the common actions once you just removed duplication reduce drift and collapse to run shapes into one same idea with switches if the switches only choosing which field to write the switch belongs in data shaping not in execution

848
01:03:50,920 --> 01:04:19,920
the switch should output an action plan or a payload shape execution should remain linear the second move is to get conditions out of loops a condition inside an apply to each is not one condition its end conditions multiplied by your item count that's how API budgets die quietly not from one big mistake from one small condition that got replicated a thousand times at runtime so instead split the data set first filter upstream using query options where you can if you can't use select and filter array once outside the loop to create two arrays.

849
01:04:19,920 --> 01:04:41,920
items that need work and items that don't then loop only what needs work with no internal condition this is the concurrent conditions idea in practice organize first execute second everything becomes easier fewer actions per item fewer container expansions during debugging and a run history where pass and fail are separate narratives not interleaved archaeology.

850
01:04:41,920 --> 01:05:10,920
the third move is to stop using scopes as emotional support scopes are useful for grouping a transaction boundary and for controlling run after behavior but when scopes become the default way to organize you create hidden life cycles actions that only exist inside a scope that only exists inside a branch that only runs in one case then somebody adds another configure run after to handle an exception and now the flows real logic is encoded in edge case execution semantics nobody can see without clicking through menus.

851
01:05:10,920 --> 01:05:23,920
that's not engineering that's conditional chaos with extra steps flattening means use scopes sparingly only where a transaction boundary exists if a scope doesn't represent a transaction is probably just hiding complexity.

852
01:05:23,920 --> 01:05:38,920
the fourth move is to make termination visible and singular deep nesting often creates multiple endpoints termination actions sprinkled across branches scopes and loops that makes it impossible to answer a basic question during an incident did this run complete cleanly or did it exit early because it hit an escape condition.

853
01:05:38,920 --> 01:05:53,920
a flattened flow has one endpoint and one clear termination pattern early exits happen through explicit escape hatches that terminate with a reason and the rest of the flow continues on a single dominant path that gives you explainability back and it aligns perfectly with direct path.

854
01:05:53,920 --> 01:05:56,920
one narrative one shape one place to look for what happened.

855
01:05:56,920 --> 01:06:23,920
now the counter intuitive part flattening doesn't necessarily mean fewer actions on the canvas sometimes it increases visible steps because you're pulling things out of containers and making them explicit but operationally it reduces complexity because you've reduced depth reduced run shape variation and reduce the number of hidden behaviors that only show up at runtime executives don't care about nesting they care about predictability architects care about nesting because nesting is how complexity becomes unbounded.

856
01:06:23,920 --> 01:06:45,920
so treat nesting as a budget every container you add must earn its existence by creating clarity not by hiding mess and when you enforce that the estate becomes explainable again which means governance can finally measure something real instead of counting flows and pretending that's control control plane kpi dashboard metrics executives can actually use here's the part most governance

857
01:06:45,920 --> 01:07:04,920
programs skip then they act confused when governance becomes a vibe instead of a mechanism if you can't measure the control plane you can't steer it you can only react to whatever broke last so the kpi dashboard can't be a maker leader board it can't be number of flows it can't be how many runs and an activity is not excellence activities entropy with a graph

858
01:07:04,920 --> 01:07:14,920
the dashboard has one job translate architecture into executive safe signals start with the primary metric mean time to explain mt te not mean time to resolve that comes later.

859
01:07:14,920 --> 01:07:33,920
mt te is how long it takes a competent person to answer one question during an incident what did this automation do and why if mt te is high everything else is theater you can't do reliable incident response you can't do audit defense and you can't do change control because you can't predict impact and mt te isn't a platform limitation it's a design outcome.

860
01:07:33,920 --> 01:08:01,920
direct path lowers it flattened nesting lowers it in orchestration lowers it execution units lower it often mega flows raise it so you measure mt te as a health signal median and 95th percentile by environment and by business domain you don't need perfect math you need directional truth second ownership coverage what percentage of automations have a named business owner and a name technical owner plus an escalation path not who created it owner means accountable for intent and accountable for execution.

861
01:08:01,920 --> 01:08:15,920
if the percentage isn't near total for anything running in production you don't have an automation program you have distributed risk with plausible deniability and you can make this measurable fast runs in last 30 days versus has owner metadata.

862
01:08:15,920 --> 01:08:30,920
anything that runs and has no owner is an entropy generator it should be quarantined reassigned or killed third deterministic gates coverage this is the metric that proves you actually adopted intent decision execution instead of talking about it what percentage of side effects occur inside execution units you can't do.

863
01:08:30,920 --> 01:08:52,920
in plain terms how many rights sends creates an approvals happen behind child flows or explicit execution boundaries versus directly inside orchestration logic this can be rough you're not writing a dissertation you're building a steering signal are we consolidating authority into governable seems or are we still letting every maker embed side effects wherever the canvas made it convenient.

864
01:08:52,920 --> 01:09:06,920
fourth retry rate versus failure rate these are not the same thing and most dashboards treat them as one red number failure rate is how often did the automation not achieve its intended outcome retry rate is how often did you have to re attempt execution to get a success signal.

865
01:09:06,920 --> 01:09:19,920
high failure with low retries means brittle dependencies or broken logic low failure with high retries means churn your burning capacity and API calls to compensate for instability and you're probably generating duplicates unless I am potency exists.

866
01:09:19,920 --> 01:09:32,920
that distinction matters because retry policies in executive decision it's literally cost versus latency versus reliability if teams set retries by habit the organization is letting developers allocate capacity budgets without accountability.

867
01:09:32,920 --> 01:09:48,920
fifth API budget adherence if you take nothing else from this episode take this API is a shared capacity they are a governance surface they are a cost surface they are a stability surface so you measure call volume per automation peak call rate and budget breaches you don't need to shame teams you need to detect runaway execution pathways

868
01:09:48,920 --> 01:10:00,920
loops that exploded triggers that are too permissive retries that turned into storms and when a budget is breached the response shouldn't be increased the limit.

869
01:10:00,920 --> 01:10:15,920
the response should be architectural fix trigger conditions terminate early flatten nesting move work to expressions or split execution into bounded units finally explainability coverage what percentage of production automations stamp a correlation ID

870
01:10:15,920 --> 01:10:43,920
and emit receipts you can tie back to a business transaction because audit isn't about logs audit is about evidence executive don't need to see every run they need confidence that when something goes wrong the organization can explain it prove it and correct it without guessing this dashboard is how you make governance real not by adding more meetings by making the control plane measurable so architecture stops being preference and starts being a managed system and once you can measure it you can enforce it with artifacts that's next reference architecture plus before after map.

871
01:10:43,920 --> 01:11:12,920
at this point the mental model is established intent decision execution direct path thin orchestration thick execution transaction semantics metrics that measure explainability not activity now the only question that matters is whether this becomes a picture the organization can enforce so here's the reference architecture not as a vendor diagram as a control plane map you can hand to a platform team and audit team and a delivery team and everyone can agree what good looks like start with the before map because

872
01:11:12,920 --> 01:11:28,920
the one most tenants already have one massive orchestration flow sits in the middle like a nervous system with no spine it triggers on everything it branches immediately it mixes decision with execution because the easiest place to decide is right before you act connections are shared and personal

873
01:11:28,920 --> 01:11:57,920
authorities implicit whoever owns the connector owns the blast radius retries are scattered in line because try again feels like reliability logging is inconsistent because each branch did its own thing and the run history looks like a choose your own adventure novel written by five different people that's not an automation estate that's distributed side effects with the ui now the aftermath it has three layers and each layer has a different job layer one is the orchestration flow one entry one endpoint minimal nesting it stamps correlation immediately it validates the contract it proves it's a real thing that's not the first time we're going to see the first time in the first one is the first one.

874
01:11:57,920 --> 01:12:25,920
It's the contract it produces a decision artifact root risk tier action list whatever you use to express intent as data then it delegates it does not perform side effects directly layer two is decisioning and this is where AI and classification can live without contaminating execution if you use copilot a classifier or an agent it produces outputs it does not write records it does not send emails it does not create tickets it does not approve anything it only answers the question what should happen and under which constraints layer three is execution

875
01:12:25,920 --> 01:12:37,920
child flows as execution units deterministic, idempotent receipt producing each unit owns a side effect boundary and returns evidence record IDs ticket numbers message IDs and the status you can interpret

876
01:12:37,920 --> 01:12:47,920
red trees live here because red trees are only safe where idem potency exists compensation lives here because partial success is inevitable indistributed systems

877
01:12:47,920 --> 01:13:15,920
if you want the simplest visual the before map is a tangled ball the after map is a straight line with a few callable blocks that's path collapse and path collapses the real output of this episode because once the path collapses everything else becomes enforceable ownership budgets audit trails kill switches and change control now at the control play in reality most people ignore connectors identity in the before map the connector is a convenience in the after map the connector is an authority boundary execution units should use control identities

878
01:13:15,920 --> 01:13:30,920
service accounts service principles manage identities were available and connection references that can be rotated and audited orchestration should not depend on a human's delegated access to production systems that's not automation that's a

879
01:13:30,920 --> 01:13:44,920
resignation that are waiting to happen the architecture also needs zones because one set of rules is how governance dies a green zone exists for experimentation personal productivity prototypes scaffolding learning it has guardrails but it's optimized for speed and it has an expiry model

880
01:13:44,920 --> 01:14:08,920
because prototypes should die if they don't graduate an enterprise zone exists for execution production automations with side effects that zone requires the patterns direct path shape thin orchestration execution units explicit identities api budgets and an ownership model that survives staff churn this is where the key line belongs because it stops governance from becoming performative

881
01:14:08,920 --> 01:14:27,920
these artifacts are documentation they are enforcement surfaces a reference architecture is an enforcement surface because it defines allowed shapes a before after map is an enforcement surface because it makes drift visible a control plane kpi dashboard is an enforcement surface because it makes outcomes measurable zone governance is an enforcement surface because

882
01:14:27,920 --> 01:14:36,920
it lets you say yes to experimentation without letting it become production by accident and once you have this you can stop arguing about best practices you can just ask one question

883
01:14:36,920 --> 01:14:58,920
does this automation conform to the reference architecture of the zone is running in if the answer is no it doesn't ship not because someone is mean because the system will eventually extract payment and it always collects with interest next this gets real with a case explainable automation beats smart automation every time case study explainable automation beats smart automation

884
01:14:58,920 --> 01:15:21,920
a global services organization built an intake automation to root requests and update downstream systems i.t.s. m. tickets a date of us tracker and a couple of line of business apis it started as a reasonable goal reduce manual triage and speed up fulfillment then copilot arrived so they did what everyone does they embedded smart classification directly into the flow and let it drive execution

885
01:15:21,920 --> 01:15:33,920
the orchestration flow triggered on every request branched immediately into rooting logic then performed the rights in line create ticket update date averse notify teams email the requester and sometimes open an approval

886
01:15:33,920 --> 01:15:44,920
retries was sprinkled everywhere because it improved reliability ownership was fuzzy because the flow touched multiple domains so everyone assumed someone else owned it it worked until it didn't

887
01:15:44,920 --> 01:16:09,920
they started seeing intermittent latency spikes during peak hours sometimes the ticket was created but the data verse record failed sometimes the data verse record existed but the team's message didn't sometimes the flow failed but still sent the customer email so support got a follow up for work that never actually happened and when incidents hit the run history was a forensic exercise dozens of branches nested scopes and run after behavior that encoded policy in places nobody remembered

888
01:16:09,920 --> 01:16:23,920
mt.t. wasn't an hour it was multiple days because nobody could answer the first question what did this run actually do so they didn't rewrite the automation they changed the architecture first they separated decision from execution the decision layer produced a classification

889
01:16:23,920 --> 01:16:36,920
output only intent category priority required actions and risk tier that output was logged as data with a correlation ID it did not write to downstream systems it did not send notifications it did not create tickets it simply produced a decision

890
01:16:36,920 --> 01:16:50,920
artifact that could be reviewed later then they rebuild the orchestration flow as thin one path one endpoint validate the input stamp correlation call decision then delegate to execution units based on the decision artifact no branching explosion no in line side effects

891
01:16:50,920 --> 01:17:01,920
finally they pushed every side effect into deterministic child flows execution units with transaction identity item potency receipts and boundary try policy create it is a ticket return the ticket number

892
01:17:01,920 --> 01:17:16,920
write data verse record return the record ID notify teams return the message ID each unit wrote evidence back to a single log surface keyed by correlation ID so they could answer what happened without scrolling through a maze the outcomes weren't magical they were mechanical

893
01:17:16,920 --> 01:17:24,920
latency dropped materially because they removed redundant connector calls and stopped retry storms from competing with legitimate work

894
01:17:24,920 --> 01:17:42,920
failure rate dropped because side effects became a damp and compensation parts existed for partial success mtde dropped from multi day interpretation to under an hour because the run shape became stable and the evidence became searchable and costs dropped because api churn stopped being treated as an acceptable tax for reliability

895
01:17:42,920 --> 01:17:51,920
the proof point wasn't that the automation became smarter it became explainable and explainable systems are the only ones that survive audits outages and executive scrutiny

896
01:17:51,920 --> 01:18:12,920
conclusion excellence comes from separating probabilistic decision from deterministic execution then enforcing that boundary with architecture you can audit in the next 30 days mandate execution only child flows for all side effects set explicit api budgets with retry limits as policy and run an automation review board that checks ownership authority and a real kill switch before anything ships

897
01:18:12,920 --> 01:18:22,920
if you want the templates and patterns connect with Mirko Peters on LinkedIn and leave a review on the podcast with what you want next agents cmk tenant isolation or COE metrics