This episode challenges the common belief that showback and chargeback alone create accountability in enterprise IT cost management. Many organizations implement showback dashboards or reports expecting they will change behavior, only to find that business units ignore, dispute, or game the numbers. The core message is that transparency without consequence is not accountability. Showback must be paired with governed cost allocation, service ownership, meaningful incentives, and integrated enterprise processes in order to influence decisions and deliver sustainable cost optimization.

The discussion starts by defining showback — reporting costs back to consumers — and contrasts it with chargeback — billing cost centers for usage. While showback can increase awareness, it often fails because it decouples information from decision authority. Without mechanisms that tie cost visibility to real organizational levers — budgeting, approvals, quotas, enforcement — users treat showback as a dashboard, not a driver. The episode explores why cost signals fail to influence behavior in cloud and Microsoft 365 environments, how tribal knowledge and shadow consumption undermine accountability, and why cost transparency must be embedded in operational workflows rather than siloed in finance reports.

Using examples from cloud governance, identity licensing, endpoint management, and collaboration sprawl, the hosts explain that effective accountability requires shared definitions of cost, common measurement frameworks, and governed access to resources. They discuss how automated enforcement, policy guardrails, and real consequences for exceeding budgets turn cost signals into actual decisions. The core takeaway is that metrics alone do not change behavior; accountability emerges only when cost visibility is coupled with control, ownership, and enforceable consequences across the organization.

Organizations often assume that cost showback — surfacing usage and spend back to IT consumers — will automatically drive accountability and optimization. In this episode, we explain why transparency without consequence fails, and how showback must be paired with governance, ownership, enforcement, and integrated operational processes to deliver real accountability and cost control.


What Is Showback (and Why It Feels Powerful)

Showback refers to:

  • Reporting costs back to consuming teams

  • Assigning visibility to who uses what

  • Creating dashboards that break down spend by business unit

It feels powerful because cost becomes transparent—but visibility is not the same as influence.

Showback provides:

  • Awareness

  • Attribution

  • Data for decisions

But it rarely changes behavior because:

  • There is no direct consequence

  • Ownership is informal

  • Budgets are disconnected

  • Cost signals compete with functional priorities


Showback vs Chargeback

Showback

  • Reports costs

  • Assigns visibility

  • Leaves decisions to teams

Chargeback

  • Allocates costs back to budgets

  • Forces accountability in financial terms

  • Ties consumption to expenditure

Chargeback introduces consequence.
Showback by itself usually does not.


The Transparency Trap

Many organizations stop at transparency and believe accountability will follow naturally. But this overlooks a critical truth:
People change behavior when consequences affect their goals, incentives, or risk profile.

Without that:

  • Teams ignore dashboards

  • Leaders argue over allocation methodology

  • Cost becomes a negotiation, not a decision

  • Metrics become noise


Why Showback Often Fails in Practice

Common failure modes include:

1. Tribal Definitions of Cost
Different teams measure differently:

  • Cloud usage vs reserved spend

  • Metered vs allocated costs

  • Consumption vs chargeable units

When definitions differ, showback becomes opinionated, not factual.

2. Shadow Consumption
Teams consume services outside governance visibility:

  • Dozens of Power Platform environments

  • Unmanaged Azure subscriptions

  • Teams and SharePoint sites spun up without oversight

No dashboard can correct what is not measured.

3. No Enforcement Path
Even with numbers:

  • Budget owners override limits

  • Approvers ignore alerts

  • Policies aren’t tied to approvals

  • Leaders treat warnings as suggestions

Visibility without enforcement is decoration, not governance.

4. Costs Are Not Trusted
When data is noisy or inconsistent:

  • Teams dispute numbers

  • Finance teams reclassify spending

  • Technical owners don’t trust the model

Trust is foundational. Without it, showback is dismissed.


Accountability Requires More Than Visibility

Accountability only emerges when cost signals are tied to organizational levers:

✔ Clear ownership of services and costs
✔ Shared definitions and measurement frameworks
✔ Enforcement mechanisms (budgets, quotas, limits)
✔ Automated policies that block over-consumption
✔ Consequences for exceeding allocations

Metrics become meaningful only when they matter in decisions — not just reports.


Governance as the Mechanism That Turns Signals Into Decisions

Governance should integrate cost visibility into:

  • Provisioning workflows

  • Approval paths

  • Identity and entitlement systems

  • Budgeting and planning processes

  • Automated enforcement at the policy layer

Examples:

  • Conditional access tying cost signals to license approvals

  • Guardrails blocking unmanaged Azure regions

  • Quota limits with automatic throttling

  • Chargeback tied to financial systems, not spreadsheets

These turn cost signals into action vectors rather than reference dashboards.


Why Ownership Matters

Accountability succeeds when:

  • A service owner is responsible for outcomes

  • Measurements affect performance discussions

  • Leaders enforce consequences consistently

Without definitive ownership:

  • Teams blame dashboards

  • Executives defer action

  • Cost visibility becomes a report, not a lever


Common Organizational Gaps

Differentiated Metrics, Same Language Needed
Technical teams think in usage metrics; financial teams think in actual chargeable spend. Without alignment, showback creates confusion.

Governance and Operations Silos
Cloud governance teams own policies; finance owns budgets; product teams own services — but no one owns cost as a controllable outcome.

Worst Case: Dashboard → Email → Ignored
Users get alerts, then ignore emails with no consequences. That’s confirmation, not correction.


How to Make Showback Work

Showback must be connected to:

  • Chargeback models with real budget impact

  • Quotas and automated enforcement

  • Planning cycles and approvals

  • Lifecycle policies for resources

  • Identity-based accountability

Instead of focusing on cost as information, focus on:
cost as a decision input that feeds governance mechanisms.


When Showback Does Work

Showback becomes effective when:

  • It has shared definitions

  • It ties into ownership models

  • It aligns with budgetary consequences

  • It integrates into operational workflows

  • It is enforced, not just displayed

In this mode, visibility is a gateway to accountability, not a substitute for it.


Key Takeaways

  • Transparency without consequence is noise, not governance

  • Showback gives visibility, but not authority

  • Chargeback introduces financial consequences

  • Accountability requires ownership, enforcement, and integration

  • Cost signals must affect decision paths, not just dashboards


Who This Episode Is For

This episode is highly relevant for:

  • Cloud and Microsoft 365 governance teams

  • IT finance, FinOps, and cost management professionals

  • CIOs and CTOs facing uncontrolled consumption

  • Cloud architects and operational leaders

  • Teams implementing chargeback or showback programs


Final Thought

Showback makes costs visible, but visibility isn’t influence.
Accountability only happens when cost signals are live levers in governance, operations, and decision processes.

Transcript

1
00:00:00,000 --> 00:00:02,800
Most organizations believe Shobak creates accountability.

2
00:00:02,800 --> 00:00:03,720
They are wrong.

3
00:00:03,720 --> 00:00:07,040
Shobak creates visibility, and visibility feels like control,

4
00:00:07,040 --> 00:00:08,640
therefore everyone relaxes.

5
00:00:08,640 --> 00:00:11,640
But nothing changes, because nothing in the system is forced to change.

6
00:00:11,640 --> 00:00:13,120
A dashboard is not a decision.

7
00:00:13,120 --> 00:00:17,760
A report is not an escalation path, and a monthly cost review is not governance.

8
00:00:17,760 --> 00:00:19,360
This episode is about the trap.

9
00:00:19,360 --> 00:00:22,440
You can instrument spend perfectly and still drift into financial chaos.

10
00:00:22,440 --> 00:00:26,200
Then we'll convert that theater into an operating model with consequences,

11
00:00:26,200 --> 00:00:28,640
ownership, guardrails, workflows,

12
00:00:28,640 --> 00:00:32,040
and the one governance decision that prevents most drift.

13
00:00:32,040 --> 00:00:34,320
Definitions that people blur on purpose.

14
00:00:34,320 --> 00:00:37,160
Shobak, chargeback, accountability.

15
00:00:37,160 --> 00:00:40,120
Words matter because the platform doesn't care about your intentions.

16
00:00:40,120 --> 00:00:41,800
It only cares about what is enforced.

17
00:00:41,800 --> 00:00:43,600
So let's clean up the definitions people blur.

18
00:00:43,600 --> 00:00:45,960
Usually because the blur keeps everyone comfortable.

19
00:00:45,960 --> 00:00:47,680
Shobak is attribution without impact.

20
00:00:47,680 --> 00:00:50,040
It answers, "Who did we think spent this money?"

21
00:00:50,040 --> 00:00:53,920
It produces a report, a Power BI page, a set of tags,

22
00:00:53,920 --> 00:00:55,760
maybe a cost allocation model.

23
00:00:55,760 --> 00:00:57,920
It is the passive voice of financial management.

24
00:00:57,920 --> 00:00:59,200
Costs were incurred.

25
00:00:59,200 --> 00:01:01,800
And because no budget moves, no one's incentives change.

26
00:01:01,800 --> 00:01:03,440
Shobak is telemetry.

27
00:01:03,440 --> 00:01:04,640
Telemetry is useful.

28
00:01:04,640 --> 00:01:06,080
Telemetry is not a control.

29
00:01:06,080 --> 00:01:08,840
Chargeback is impact without necessarily intelligence.

30
00:01:08,840 --> 00:01:10,720
Chargeback answers, "Who is paying?"

31
00:01:10,720 --> 00:01:13,760
The spend hits a cost center, a department budget, a P&L.

32
00:01:13,760 --> 00:01:16,120
Now people feel it, and yes, that can drive behavior.

33
00:01:16,120 --> 00:01:18,880
But Chargeback also creates a predictable pathology.

34
00:01:18,880 --> 00:01:22,080
Teams optimize for appearing cheap, not for being effective.

35
00:01:22,080 --> 00:01:24,120
They will defer work, under provision,

36
00:01:24,120 --> 00:01:25,880
shift spend to different buckets,

37
00:01:25,880 --> 00:01:30,760
argue about allocations, and hide behind shared services and baguity.

38
00:01:30,760 --> 00:01:34,200
Chargeback turns cost into conflict if the ownership model is weak.

39
00:01:34,200 --> 00:01:36,000
So neither of these is accountability.

40
00:01:36,000 --> 00:01:38,800
They are accounting mechanisms with different side effects.

41
00:01:38,800 --> 00:01:41,800
Accountability is owned, decisions plus enforce constraints,

42
00:01:41,800 --> 00:01:43,120
plus an audit trail.

43
00:01:43,120 --> 00:01:45,160
That distinction matters.

44
00:01:45,160 --> 00:01:48,040
Accountability means a human or team can say,

45
00:01:48,040 --> 00:01:51,040
"This spend exists because we chose it, we can justify it,

46
00:01:51,040 --> 00:01:52,520
and we accept the trade-offs."

47
00:01:52,520 --> 00:01:54,720
And it also means the platform can say, "No."

48
00:01:54,720 --> 00:01:57,240
Not metaphorically, literally, deny at deploy time,

49
00:01:57,240 --> 00:01:59,720
quarantine at run time, escalate when budget's breach,

50
00:01:59,720 --> 00:02:02,120
create evidence when exceptions are granted.

51
00:02:02,120 --> 00:02:04,360
If the system cannot refuse a bad decision,

52
00:02:04,360 --> 00:02:05,760
you are not running governance.

53
00:02:05,760 --> 00:02:08,680
You are running persuasion, and persuasion does not scale.

54
00:02:08,680 --> 00:02:11,920
Here's why Finance loves showback and engineering tolerates it.

55
00:02:11,920 --> 00:02:15,400
Finance likes showback because it creates the appearance of responsibility

56
00:02:15,400 --> 00:02:17,400
without triggering organizational warfare.

57
00:02:17,400 --> 00:02:18,200
It's safe.

58
00:02:18,200 --> 00:02:20,480
It doesn't require reworking budgeting models.

59
00:02:20,480 --> 00:02:23,360
It doesn't require leaders to fight each other over shared costs.

60
00:02:23,360 --> 00:02:27,440
It doesn't require anyone to tell a powerful team you don't get to do that.

61
00:02:27,440 --> 00:02:30,200
It produces artifacts that look like progress.

62
00:02:30,200 --> 00:02:33,880
Allocation percentages, variance charts, forecasts, trend lines,

63
00:02:33,880 --> 00:02:36,840
engineering tolerates showback because it rarely blocks delivery.

64
00:02:36,840 --> 00:02:40,480
Engineers cannot at the dashboard, explain the spike, and go back to shipping.

65
00:02:40,480 --> 00:02:43,640
If the showback meeting becomes annoying, they send a delegate.

66
00:02:43,640 --> 00:02:45,840
If it becomes political, they stop showing up.

67
00:02:45,840 --> 00:02:46,840
The work still happens.

68
00:02:46,840 --> 00:02:48,040
The cloud still builds.

69
00:02:48,040 --> 00:02:48,960
The system keeps running.

70
00:02:48,960 --> 00:02:49,840
The bill keeps growing.

71
00:02:49,840 --> 00:02:52,880
Now, the early warning sign that you are in cost theatre is simple.

72
00:02:52,880 --> 00:02:55,800
If you cannot answer, quickly, who owns this cost?

73
00:02:55,800 --> 00:02:57,080
You do not have governance.

74
00:02:57,080 --> 00:02:58,960
You have a reporting layer over a mess.

75
00:02:58,960 --> 00:03:01,320
And owns doesn't mean is aware of.

76
00:03:01,320 --> 00:03:05,120
It means who is obligated to respond when the spend deviates

77
00:03:05,120 --> 00:03:08,480
and who has authority to change the conditions that created the spend.

78
00:03:08,480 --> 00:03:10,440
A distribution list is not an owner.

79
00:03:10,440 --> 00:03:12,800
A Finops team is not the owner of every workload.

80
00:03:12,800 --> 00:03:16,160
A central platform team is not magically responsible for every developer's

81
00:03:16,160 --> 00:03:18,200
bad sizing decisions.

82
00:03:18,200 --> 00:03:19,760
Ownership is a control plane.

83
00:03:19,760 --> 00:03:23,600
It defines who can approve, who can say no, and who can accept risk.

84
00:03:23,600 --> 00:03:26,960
And the more your organization scales, the more this becomes non-negotiable.

85
00:03:26,960 --> 00:03:29,920
Because at scale, cloud spend isn't a budgeting problem.

86
00:03:29,920 --> 00:03:31,560
It's a decision volume problem.

87
00:03:31,560 --> 00:03:33,000
Thousands of small choices.

88
00:03:33,000 --> 00:03:36,960
SQ selection, region, retention, replication, idle compute,

89
00:03:36,960 --> 00:03:39,160
premium licensing, temporary sandboxes,

90
00:03:39,160 --> 00:03:41,000
accumulate into a financial outcome.

91
00:03:41,000 --> 00:03:42,480
Showback observes the outcome.

92
00:03:42,480 --> 00:03:44,080
Accountability shapes the decisions.

93
00:03:44,080 --> 00:03:46,920
So when people say we're doing Finops, we have showback.

94
00:03:46,920 --> 00:03:49,680
What they usually mean is we can measure the blast radius.

95
00:03:49,680 --> 00:03:50,960
They cannot stop it.

96
00:03:50,960 --> 00:03:52,880
They cannot root it to an accountable owner.

97
00:03:52,880 --> 00:03:54,880
They cannot enforce a timeout on exceptions.

98
00:03:54,880 --> 00:03:57,040
They cannot prevent the next identical mistake.

99
00:03:57,040 --> 00:03:58,800
This is where the trap closes.

100
00:03:58,800 --> 00:04:01,200
Showback becomes a substitute for governance.

101
00:04:01,200 --> 00:04:04,480
Everyone can point at the dashboard and say we're on it.

102
00:04:04,480 --> 00:04:08,880
Meanwhile, the system's incentives remain unchanged and entropy does what entropy does.

103
00:04:08,880 --> 00:04:12,480
In reality, the goal is not to pick showback versus chargeback

104
00:04:12,480 --> 00:04:14,120
as if one is more mature.

105
00:04:14,120 --> 00:04:18,920
The goal is to design accountability so that visibility turns into action by default.

106
00:04:18,920 --> 00:04:22,240
And once the definitions are clean, the rest of the episode becomes obvious.

107
00:04:22,240 --> 00:04:25,800
Showback fails at scale because it is an observer pattern with no actuator.

108
00:04:25,800 --> 00:04:27,520
Why showback fails at scale?

109
00:04:27,520 --> 00:04:29,600
The observer pattern with no actuator.

110
00:04:29,600 --> 00:04:33,560
Showback fails for the same reason, monitoring fails when it isn't paired with response.

111
00:04:33,560 --> 00:04:35,800
It's an observer pattern with no actuator.

112
00:04:35,800 --> 00:04:36,880
The system can detect.

113
00:04:36,880 --> 00:04:37,640
It can display.

114
00:04:37,640 --> 00:04:38,320
It can email.

115
00:04:38,320 --> 00:04:39,560
It can hold a meeting.

116
00:04:39,560 --> 00:04:42,600
But it cannot force a decision and it cannot force a change.

117
00:04:42,600 --> 00:04:45,720
So showback becomes a strange kind of organizational comfort.

118
00:04:45,720 --> 00:04:49,600
Everyone can see the fire, therefore everyone feels like they did something about the fire.

119
00:04:49,600 --> 00:04:50,960
But there are no sprinklers.

120
00:04:50,960 --> 00:04:52,800
There's no automatic shut off valve.

121
00:04:52,800 --> 00:04:55,960
There isn't even a fire warden with authority to evacuate the building.

122
00:04:55,960 --> 00:04:58,800
And the uncomfortable truth is in large organizations,

123
00:04:58,800 --> 00:05:02,160
someone should do something is the same as no one will.

124
00:05:02,160 --> 00:05:03,560
Here's what most people miss.

125
00:05:03,560 --> 00:05:05,600
Cloud costs aren't one big decision.

126
00:05:05,600 --> 00:05:10,080
There are continuous stream of micro decisions made by hundreds of people across dozens of systems.

127
00:05:10,080 --> 00:05:13,040
If you expect a monthly report to correct a daily decision stream,

128
00:05:13,040 --> 00:05:14,160
you're not doing governance.

129
00:05:14,160 --> 00:05:15,840
You're doing archaeology.

130
00:05:15,840 --> 00:05:17,880
The organizational gravity is predictable.

131
00:05:17,880 --> 00:05:20,200
Urgent delivery beats cost hygiene every time.

132
00:05:20,200 --> 00:05:21,040
The sprint is due.

133
00:05:21,040 --> 00:05:22,360
The incident just happened.

134
00:05:22,360 --> 00:05:23,600
The customer escalated.

135
00:05:23,600 --> 00:05:25,000
A security finding came in.

136
00:05:25,000 --> 00:05:27,960
And then a Finops dashboard shows a 12% cost increase.

137
00:05:27,960 --> 00:05:29,240
Everyone agrees it's a problem.

138
00:05:29,240 --> 00:05:31,560
Then everyone goes back to what gets them promoted.

139
00:05:31,560 --> 00:05:33,520
Because showback doesn't change incentives.

140
00:05:33,520 --> 00:05:34,720
It doesn't create deadlines.

141
00:05:34,720 --> 00:05:37,720
It doesn't create accountability that survives competing priorities.

142
00:05:37,720 --> 00:05:38,960
It just creates information.

143
00:05:38,960 --> 00:05:40,240
And information is abundant.

144
00:05:40,240 --> 00:05:41,680
Attention is scarce.

145
00:05:41,680 --> 00:05:43,720
This is how dashboards turn into rituals.

146
00:05:43,720 --> 00:05:48,280
The weekly cost review becomes a status meeting where teams explain what already happened.

147
00:05:48,280 --> 00:05:52,720
The goal becomes have an explanation, not change the system that people show up with narratives.

148
00:05:52,720 --> 00:05:54,160
That spike was a load test.

149
00:05:54,160 --> 00:05:55,640
That's the new region rollout.

150
00:05:55,640 --> 00:05:57,760
And that's because we had to scale for the campaign.

151
00:05:57,760 --> 00:05:58,760
That's shared networking.

152
00:05:58,760 --> 00:06:00,640
That's a reserved instance true up.

153
00:06:00,640 --> 00:06:02,520
Most of those explanations are even true.

154
00:06:02,520 --> 00:06:05,960
But they don't produce outcomes because the meeting has no authority boundary.

155
00:06:05,960 --> 00:06:07,640
Nobody in the room can enforce anything.

156
00:06:07,640 --> 00:06:09,080
Nobody can deny a deployment.

157
00:06:09,080 --> 00:06:11,040
Nobody can quarantine an unknown resource.

158
00:06:11,040 --> 00:06:12,760
Nobody can expire an exception.

159
00:06:12,760 --> 00:06:15,920
So the only thing the organization can do is talk about it again next week.

160
00:06:15,920 --> 00:06:19,240
Over time, cost drift becomes normalized and then defended.

161
00:06:19,240 --> 00:06:23,720
Because once drift becomes normal, anyone trying to stop it looks like they're slowing the business down.

162
00:06:23,720 --> 00:06:26,040
And that label is fatal in most enterprises.

163
00:06:26,040 --> 00:06:29,720
So even when people know the environment is wasteful, they protect the waste

164
00:06:29,720 --> 00:06:32,480
because it protects delivery velocity and political safety.

165
00:06:32,480 --> 00:06:35,680
This is also why the incentive mismatch is structural, not moral.

166
00:06:35,680 --> 00:06:38,320
Engineering optimizes for throughput and stability.

167
00:06:38,320 --> 00:06:39,720
The organization rewards shipping.

168
00:06:39,720 --> 00:06:40,840
It rewards uptime.

169
00:06:40,840 --> 00:06:42,280
It rewards feature delivery.

170
00:06:42,280 --> 00:06:45,000
It rewards being the team that unblocks everyone else.

171
00:06:45,000 --> 00:06:46,600
It does not reward variance reduction.

172
00:06:46,600 --> 00:06:48,480
It does not reward deleting resources.

173
00:06:48,480 --> 00:06:51,800
It does not reward making a messy shared platform bill allocatable.

174
00:06:51,800 --> 00:06:56,560
So show back asks engineering teams to do work that looks like risk with no visible upside.

175
00:06:56,560 --> 00:06:58,320
It's not that they don't care about money.

176
00:06:58,320 --> 00:07:01,880
It's that the system has trained them to care about different consequences.

177
00:07:01,880 --> 00:07:02,680
And that's the point.

178
00:07:02,680 --> 00:07:05,040
Governance is the art of defining consequences.

179
00:07:05,040 --> 00:07:07,080
Show back defines none.

180
00:07:07,080 --> 00:07:09,000
Here's the part where people get confused.

181
00:07:09,000 --> 00:07:10,920
They think the problem is the data quality.

182
00:07:10,920 --> 00:07:15,400
They think the dashboard needs better tagging, better allocation rules, more granular reporting,

183
00:07:15,400 --> 00:07:17,000
or a prettier power BI layout.

184
00:07:17,000 --> 00:07:20,760
No, the data can be perfect and the behavior can still be wrong because the issue is not

185
00:07:20,760 --> 00:07:21,760
visibility.

186
00:07:21,760 --> 00:07:22,760
The issue is actuation.

187
00:07:22,760 --> 00:07:26,040
Actuation means when spend deviates something happens automatically.

188
00:07:26,040 --> 00:07:27,960
A ticket is opened, an owner is assigned.

189
00:07:27,960 --> 00:07:29,440
A response window is defined.

190
00:07:29,440 --> 00:07:31,440
An escalation path exists.

191
00:07:31,440 --> 00:07:34,000
Exceptions have to be requested, approved and expired.

192
00:07:34,000 --> 00:07:35,800
Unknown assets get quarantined.

193
00:07:35,800 --> 00:07:39,000
Guardrails block known bad decisions at deploy time.

194
00:07:39,000 --> 00:07:42,080
But those mechanics show back is just a cost weather report.

195
00:07:42,080 --> 00:07:45,920
Accurate, interesting, ignored.

196
00:07:45,920 --> 00:07:49,440
And the larger the environment gets, the more brutal this becomes.

197
00:07:49,440 --> 00:07:51,000
At small scale, heroics can work.

198
00:07:51,000 --> 00:07:52,760
A few people can manually hunt waste.

199
00:07:52,760 --> 00:07:55,280
A few teams can coordinate through goodwill.

200
00:07:55,280 --> 00:07:59,120
But at scale, heroics turn into burnout and goodwill turns into ambiguity.

201
00:07:59,120 --> 00:08:01,320
Therefore, the pattern is consistent.

202
00:08:01,320 --> 00:08:02,560
Showback creates awareness.

203
00:08:02,560 --> 00:08:03,800
Awareness creates meetings.

204
00:08:03,800 --> 00:08:04,800
Meetings create explanations.

205
00:08:04,800 --> 00:08:07,560
And explanations create permission for the status quo to continue.

206
00:08:07,560 --> 00:08:09,240
This is not a failure of character.

207
00:08:09,240 --> 00:08:11,040
It is a failure of system design.

208
00:08:11,040 --> 00:08:14,440
And once you accept that, the path forward becomes clear.

209
00:08:14,440 --> 00:08:16,160
You don't need more observers.

210
00:08:16,160 --> 00:08:17,640
You need actuators.

211
00:08:17,640 --> 00:08:18,640
Cost entropy.

212
00:08:18,640 --> 00:08:21,800
The inevitable drift of tags, ownership and intent.

213
00:08:21,800 --> 00:08:25,640
Everything clicked for a lot of teams when they stopped treating cloud cost as an accounting

214
00:08:25,640 --> 00:08:29,480
artifact and started treating it as an architectural property.

215
00:08:29,480 --> 00:08:31,280
Accounting happens after the fact.

216
00:08:31,280 --> 00:08:34,120
Architecture defines what can happen in the first place.

217
00:08:34,120 --> 00:08:36,560
Cloudspin behaves like security posture.

218
00:08:36,560 --> 00:08:40,040
It degrades unless you continuously apply pressure.

219
00:08:40,040 --> 00:08:45,600
Not because people are sloppy, but because complex systems drift, teams change, names change,

220
00:08:45,600 --> 00:08:47,800
ownership shifts, tooling evolves.

221
00:08:47,800 --> 00:08:53,280
The platform adds new SKUs, new regions, new billing meters, new licensing bundles.

222
00:08:53,280 --> 00:08:56,880
The environment doesn't stay still long enough for your spreadsheet to remain true.

223
00:08:56,880 --> 00:08:58,040
That drift has a name.

224
00:08:58,040 --> 00:08:59,040
Cost entropy.

225
00:08:59,040 --> 00:09:00,840
Entropy is not a metaphor here.

226
00:09:00,840 --> 00:09:05,360
It's the unavoidable tendency for metadata, ownership and intent to decay over time unless

227
00:09:05,360 --> 00:09:06,840
the system forces renewal.

228
00:09:06,840 --> 00:09:09,600
If you rely on voluntary upkeep, you're not managing cost.

229
00:09:09,600 --> 00:09:10,600
You're hoping.

230
00:09:10,600 --> 00:09:12,640
And hope is not an operating model.

231
00:09:12,640 --> 00:09:14,400
Here's what most people miss.

232
00:09:14,400 --> 00:09:16,920
Tags and allocation rules are not good hygiene.

233
00:09:16,920 --> 00:09:18,360
They are allocation primitives.

234
00:09:18,360 --> 00:09:22,320
They are the only way to map a real invoice back to a human decision maker.

235
00:09:22,320 --> 00:09:26,200
When those primitives drift, accountability collapses first and cost optimization collapses

236
00:09:26,200 --> 00:09:27,200
right after that.

237
00:09:27,200 --> 00:09:28,960
What are the entropy generators?

238
00:09:28,960 --> 00:09:30,480
Untagged resources are the obvious one.

239
00:09:30,480 --> 00:09:34,200
Not because tagging is magical, but because untagged resources are anonymous and anonymous

240
00:09:34,200 --> 00:09:37,920
things don't get deleted, they don't get right sized, they don't get challenged, they

241
00:09:37,920 --> 00:09:39,280
just keep billing in silence.

242
00:09:39,280 --> 00:09:43,520
Then you get shared services, the platform subscription, the networking resource group,

243
00:09:43,520 --> 00:09:46,040
the logging workspace everyone uses.

244
00:09:46,040 --> 00:09:50,440
Shared services are legitimate, but they're also the perfect hiding place for cost ambiguity.

245
00:09:50,440 --> 00:09:54,560
If you don't have an allocation model that is agreed and forced and periodically revisited,

246
00:09:54,560 --> 00:09:58,800
shared costs turn into a permanent argument and permanent arguments are a form of financial

247
00:09:58,800 --> 00:09:59,800
waste.

248
00:09:59,800 --> 00:10:03,320
Undefined owners are worse than missing tags because they create the illusion that someone

249
00:10:03,320 --> 00:10:04,160
owns it.

250
00:10:04,160 --> 00:10:08,760
It will point at an ops team or a cloud team or an IT cost center and call it accountability.

251
00:10:08,760 --> 00:10:09,760
It isn't.

252
00:10:09,760 --> 00:10:13,720
That's just routing spend into a bucket where nobody has product level intent.

253
00:10:13,720 --> 00:10:15,960
Then there's the most common generator.

254
00:10:15,960 --> 00:10:18,320
Temporary resources with no expiry.

255
00:10:18,320 --> 00:10:23,880
Dev environments, migration replicas, diagnostics enabled just for a week, premium tiers just

256
00:10:23,880 --> 00:10:28,560
for launch, trials just to evaluate, an extra region just for resilience testing.

257
00:10:28,560 --> 00:10:33,360
The cloud is incredibly good at turning temporary into forever because it never asks you to renew

258
00:10:33,360 --> 00:10:34,360
the decision.

259
00:10:34,360 --> 00:10:36,520
It builds, so the cost becomes scheduled.

260
00:10:36,520 --> 00:10:38,480
Not in a calendar sense, in a system sense.

261
00:10:38,480 --> 00:10:42,840
If you allow resources to exist without an owner, without allocation metadata and without

262
00:10:42,840 --> 00:10:46,240
an enforced life cycle, then the future invoice is not a surprise.

263
00:10:46,240 --> 00:10:48,120
It is the expected output of your design.

264
00:10:48,120 --> 00:10:51,680
This is the part finance often misunderstands, but they ask for better tagging compliance

265
00:10:51,680 --> 00:10:53,080
as if it's a training issue.

266
00:10:53,080 --> 00:10:54,080
It isn't.

267
00:10:54,080 --> 00:10:57,240
It's a creation time enforcement issue because drift mechanics are structural.

268
00:10:57,240 --> 00:10:58,240
New teams show up.

269
00:10:58,240 --> 00:10:59,240
ReOrgs happen.

270
00:10:59,240 --> 00:11:00,240
Mergers happen.

271
00:11:00,240 --> 00:11:01,240
Vendors change.

272
00:11:01,240 --> 00:11:02,240
A product gets renamed.

273
00:11:02,240 --> 00:11:03,760
A cost center gets split.

274
00:11:03,760 --> 00:11:04,760
People leave.

275
00:11:04,760 --> 00:11:06,120
Distribution lists rot.

276
00:11:06,120 --> 00:11:07,960
Tags stay behind like fossils.

277
00:11:07,960 --> 00:11:11,960
Meanwhile your show back report still confidently allocates spend to a team that no longer

278
00:11:11,960 --> 00:11:12,960
exists.

279
00:11:12,960 --> 00:11:17,520
A cost center that got retired or an owner who changed roles six months ago.

280
00:11:17,520 --> 00:11:20,840
And everyone treats the report as truth because it has numbers.

281
00:11:20,840 --> 00:11:22,760
This is why tag decay is so corrosive.

282
00:11:22,760 --> 00:11:24,360
It doesn't just break reporting.

283
00:11:24,360 --> 00:11:25,480
It breaks the decision loop.

284
00:11:25,480 --> 00:11:29,000
You can't root anomalies to owners if ownership is stale.

285
00:11:29,000 --> 00:11:32,520
You can't enforce budgets if budgets aren't mapped to accountable teams.

286
00:11:32,520 --> 00:11:36,160
You can't run exception governance if you can't even find who requested the exception.

287
00:11:36,160 --> 00:11:40,760
So the system quietly converts deterministic accountability into probabilistic accountability.

288
00:11:40,760 --> 00:11:42,840
Sometimes the right person sees the report.

289
00:11:42,840 --> 00:11:43,840
Sometimes they don't.

290
00:11:43,840 --> 00:11:44,840
Sometimes they can act.

291
00:11:44,840 --> 00:11:45,840
Sometimes they can't.

292
00:11:45,840 --> 00:11:50,000
And that randomness is what makes cost drift feel unpredictable.

293
00:11:50,000 --> 00:11:52,280
When in reality it's completely predictable.

294
00:11:52,280 --> 00:11:55,240
Unmanaged systems drift and drift accumulates.

295
00:11:55,240 --> 00:11:56,600
This is the uncomfortable truth.

296
00:11:56,600 --> 00:11:58,760
The real Finops work is entropy management.

297
00:11:58,760 --> 00:12:01,960
It's making sure resources remain mappable to intent.

298
00:12:01,960 --> 00:12:06,920
And if you don't enforce intent at creation time, owner, cost center, workload, environment,

299
00:12:06,920 --> 00:12:09,520
you will spend the rest of your time doing archaeology.

300
00:12:09,520 --> 00:12:14,200
You'll dig through builds, chase teams, argue about allocation and reliticate all decisions.

301
00:12:14,200 --> 00:12:15,200
That's not governance.

302
00:12:15,200 --> 00:12:17,120
That's cleanup after governance failed.

303
00:12:17,120 --> 00:12:19,880
So now we stop talking in abstractions and we map the failures.

304
00:12:19,880 --> 00:12:24,320
Because once you can name the failure modes, you can build systems that prevent them.

305
00:12:24,320 --> 00:12:28,200
Failure mode one, ignored dashboards and the myth of informed teams.

306
00:12:28,200 --> 00:12:33,040
This failure mode is the one people defend the hardest because it sounds reasonable.

307
00:12:33,040 --> 00:12:34,040
We gave teams the data.

308
00:12:34,040 --> 00:12:35,040
They're informed.

309
00:12:35,040 --> 00:12:36,040
Now they can make better choices.

310
00:12:36,040 --> 00:12:37,680
No, they can explain their choices.

311
00:12:37,680 --> 00:12:39,360
That's what the dashboard actually enables.

312
00:12:39,360 --> 00:12:42,040
In a large environment, a cost report is not a trigger.

313
00:12:42,040 --> 00:12:45,320
It's a reference document for the next meeting where everyone explains why the spend was

314
00:12:45,320 --> 00:12:46,320
inevitable.

315
00:12:46,320 --> 00:12:49,960
And because the organization confuses explanation with control, the same offenders show up

316
00:12:49,960 --> 00:12:51,520
every month like clockwork.

317
00:12:51,520 --> 00:12:53,280
Here's what's happening mechanically.

318
00:12:53,280 --> 00:12:56,200
A dashboard tells you what happened after the spend already occurred.

319
00:12:56,200 --> 00:12:57,920
It doesn't change starting conditions.

320
00:12:57,920 --> 00:13:01,040
It doesn't stop new resources from being created with missing tags.

321
00:13:01,040 --> 00:13:04,340
It doesn't stop premium skills from being selected just in case.

322
00:13:04,340 --> 00:13:07,760
It doesn't expire the dev subscription that was meant to live for two weeks.

323
00:13:07,760 --> 00:13:11,480
It doesn't force anyone to choose between keeping the spend and changing the design.

324
00:13:11,480 --> 00:13:16,000
So teams can read the report, nod and still do nothing because nothing in the system requires

325
00:13:16,000 --> 00:13:16,800
closure.

326
00:13:16,800 --> 00:13:18,640
The missing object is always the same.

327
00:13:18,640 --> 00:13:21,000
A decision owner with authority and obligation.

328
00:13:21,000 --> 00:13:23,760
Awareness without obligation is trivia.

329
00:13:23,760 --> 00:13:25,400
Obligation without authority is cruelty.

330
00:13:25,400 --> 00:13:28,720
A lot of organizations accidentally choose the worst option.

331
00:13:28,720 --> 00:13:32,520
They assign obligation to a central FinOps team and authority to everyone else.

332
00:13:32,520 --> 00:13:35,200
The FinOps team gets to chase, escalate and plead.

333
00:13:35,200 --> 00:13:38,600
The engineering teams keep the ability to deploy, resize and ignore.

334
00:13:38,600 --> 00:13:39,600
That's not accountability.

335
00:13:39,600 --> 00:13:41,640
That's a help desk for financial entropy.

336
00:13:41,640 --> 00:13:45,440
The second missing piece is the system, a workflow that forces a decision to end in one

337
00:13:45,440 --> 00:13:46,760
of three states.

338
00:13:46,760 --> 00:13:48,400
Accept, mitigate, transfer.

339
00:13:48,400 --> 00:13:51,960
If a cost anomaly shows up, somebody has to either accept it as an intentional spend

340
00:13:51,960 --> 00:13:56,200
with documented rationale, mitigate it with a change in a due date or transfer it by formally

341
00:13:56,200 --> 00:14:00,120
escalating the budget impact to a higher owner who can approve the trade off.

342
00:14:00,120 --> 00:14:03,120
That's governance, a closed loop.

343
00:14:03,120 --> 00:14:04,160
Showback is an open loop.

344
00:14:04,160 --> 00:14:06,200
It creates perpetual motion meetings.

345
00:14:06,200 --> 00:14:10,400
And that's why cost reports get read and still produce no action because action requires

346
00:14:10,400 --> 00:14:11,400
friction.

347
00:14:11,400 --> 00:14:13,720
It requires a moment where the system says choose.

348
00:14:13,720 --> 00:14:17,640
And the choice has to be recorded because without an audit trail, the organization will

349
00:14:17,640 --> 00:14:27,880
really to get the same spike forever.

350
00:14:27,880 --> 00:14:29,560
Sometimes it is.

351
00:14:29,560 --> 00:14:30,840
Usually it's incomplete.

352
00:14:30,840 --> 00:14:32,440
But the pattern is predictable.

353
00:14:32,440 --> 00:14:34,720
When there's no actuator, people attack the observer.

354
00:14:34,720 --> 00:14:38,440
They debate tags, they debate allocation rules, they debate whether shared services are

355
00:14:38,440 --> 00:14:43,040
being charged fairly, they debate whether a reservation true up counts.

356
00:14:43,040 --> 00:14:45,040
None of that reduces spend volatility.

357
00:14:45,040 --> 00:14:46,800
None of it reduces often inventory.

358
00:14:46,800 --> 00:14:48,560
None of it makes the next month different.

359
00:14:48,560 --> 00:14:52,920
It just creates a safe technical argument that avoids the real question.

360
00:14:52,920 --> 00:14:54,080
Who is going to do what?

361
00:14:54,080 --> 00:14:55,080
By when?

362
00:14:55,080 --> 00:14:56,080
And what happens if they don't?

363
00:14:56,080 --> 00:15:00,000
So, the practical signal that you're stuck in this failure mode is painfully simple.

364
00:15:00,000 --> 00:15:02,160
Look at your top offenders across three months.

365
00:15:02,160 --> 00:15:05,920
If the same subscriptions, the same workloads, the same resource groups, or the same license

366
00:15:05,920 --> 00:15:08,800
pools keep appearing near the top, you don't have a cost problem.

367
00:15:08,800 --> 00:15:10,240
You have a governance problem.

368
00:15:10,240 --> 00:15:11,920
Because a one time spike can be legitimate.

369
00:15:11,920 --> 00:15:15,960
A repeating spike is a policy failure that the organization has chosen to tolerate.

370
00:15:15,960 --> 00:15:18,520
And once it's tolerated, teams learn the real rule.

371
00:15:18,520 --> 00:15:19,520
Nothing happens.

372
00:15:19,520 --> 00:15:22,440
That's the informed teams myth in its final form.

373
00:15:22,440 --> 00:15:23,440
Everyone is informed.

374
00:15:23,440 --> 00:15:25,440
Everyone is still optimizing for delivery.

375
00:15:25,440 --> 00:15:28,440
Therefore, the dashboard becomes a cultural artifact.

376
00:15:28,440 --> 00:15:31,320
Proof that someone cared, not proof that anything changed.

377
00:15:31,320 --> 00:15:34,240
This is why the fix is not more reporting cadence.

378
00:15:34,240 --> 00:15:36,400
The fix is to define closure.

379
00:15:36,400 --> 00:15:38,640
Every anomaly creates a work item.

380
00:15:38,640 --> 00:15:40,240
Every work item has an owner.

381
00:15:40,240 --> 00:15:44,480
Every owner has a response window and every response ends in a recorded decision.

382
00:15:44,480 --> 00:15:48,400
Until you have that, showback is just a monthly ritual where the organization watches itself

383
00:15:48,400 --> 00:15:50,560
drift in high resolution.

384
00:15:50,560 --> 00:15:53,600
Failure mode 2, exception dead and policy without teeth.

385
00:15:53,600 --> 00:15:57,480
The second failure mode is worse than ignored dashboards because it looks like governance.

386
00:15:57,480 --> 00:15:58,480
You have policies.

387
00:15:58,480 --> 00:15:59,480
You have standards.

388
00:15:59,480 --> 00:16:00,720
You have a FinOps playbook.

389
00:16:00,720 --> 00:16:04,160
You might even have a cloud center of excellence that publishes guardrails.

390
00:16:04,160 --> 00:16:05,680
And then you have exceptions.

391
00:16:05,680 --> 00:16:07,880
Exceptions are not rare events in enterprise cloud.

392
00:16:07,880 --> 00:16:09,760
They are the normal operating state.

393
00:16:09,760 --> 00:16:12,960
And that's fine until you treat exceptions as harmless paperwork.

394
00:16:12,960 --> 00:16:16,080
Because architecturally, an exception is an entropy generator.

395
00:16:16,080 --> 00:16:20,000
It creates a second rule set and a second rule set always multiplies ambiguity.

396
00:16:20,000 --> 00:16:21,760
The common story is always the same.

397
00:16:21,760 --> 00:16:22,840
Just for this release.

398
00:16:22,840 --> 00:16:24,200
Just for this region.

399
00:16:24,200 --> 00:16:25,560
Just for this customer.

400
00:16:25,560 --> 00:16:27,440
Just until the migration is done.

401
00:16:27,440 --> 00:16:28,800
Just until we refactor.

402
00:16:28,800 --> 00:16:31,120
Just until Microsoft fixes the feature gap.

403
00:16:31,120 --> 00:16:34,560
Every time someone says just the system hears forever.

404
00:16:34,560 --> 00:16:36,080
Not because people lie.

405
00:16:36,080 --> 00:16:38,000
Because nobody enforces an end date.

406
00:16:38,000 --> 00:16:40,760
This is what policy without teeth actually means.

407
00:16:40,760 --> 00:16:42,560
It does exist as documentation.

408
00:16:42,560 --> 00:16:45,200
But the platform still allows violations by default.

409
00:16:45,200 --> 00:16:46,600
So policy becomes optional.

410
00:16:46,600 --> 00:16:48,760
And once a policy is optional, it's not a policy.

411
00:16:48,760 --> 00:16:49,760
It's a suggestion.

412
00:16:49,760 --> 00:16:51,400
That distinction matters.

413
00:16:51,400 --> 00:16:55,240
In mature organizations, a policy is a constraint with an explicit exception lane.

414
00:16:55,240 --> 00:16:58,480
In a mature ones, a policy is a PDF and exceptions are an email thread.

415
00:16:58,480 --> 00:17:00,000
The difference is not bureaucracy.

416
00:17:00,000 --> 00:17:01,000
It's survivability.

417
00:17:01,000 --> 00:17:03,040
Here's how exception debt accumulates.

418
00:17:03,040 --> 00:17:05,600
First, you grant an exception with no owner.

419
00:17:05,600 --> 00:17:09,320
It's approved by a committee or signed off by someone who isn't responsible for

420
00:17:09,320 --> 00:17:10,320
paying the bill.

421
00:17:10,320 --> 00:17:13,520
Nobody carries the operational burden of the exception later.

422
00:17:13,520 --> 00:17:15,120
That exception now exists in limbo.

423
00:17:15,120 --> 00:17:17,400
Second, you grant an exception with no end date.

424
00:17:17,400 --> 00:17:20,280
So there is no future moment where the system forces a redecision.

425
00:17:20,280 --> 00:17:21,440
No renewal, no review.

426
00:17:21,440 --> 00:17:22,720
No, do we still need this?

427
00:17:22,720 --> 00:17:25,320
Third, you grant an exception with no review cadence.

428
00:17:25,320 --> 00:17:28,800
So even if someone intended to revisit it, the revisit never arrives.

429
00:17:28,800 --> 00:17:29,960
There's always a bigger fire.

430
00:17:29,960 --> 00:17:31,120
That's exception debt.

431
00:17:31,120 --> 00:17:33,000
And it behaves exactly like security debt.

432
00:17:33,000 --> 00:17:34,440
It compounds.

433
00:17:34,440 --> 00:17:36,960
Not metaphorically, mechanically.

434
00:17:36,960 --> 00:17:40,760
Because once one team gets an exception, other teams learn the real process.

435
00:17:40,760 --> 00:17:43,880
The policy is negotiable, so they request their own carve-outs.

436
00:17:43,880 --> 00:17:47,720
Then your standard becomes a set of loosely related guidelines with dozens of undocumented

437
00:17:47,720 --> 00:17:48,720
overrides.

438
00:17:48,720 --> 00:17:51,520
And the only people who understand the real rules are the people who were in the meeting

439
00:17:51,520 --> 00:17:52,520
six months ago.

440
00:17:52,520 --> 00:17:53,600
Those people leave.

441
00:17:53,600 --> 00:17:55,760
And the exceptions stay.

442
00:17:55,760 --> 00:17:59,040
This is where governance collapses into conditional chaos.

443
00:17:59,040 --> 00:18:03,160
You no longer have a deterministic model where the organization can say, "This is allowed,

444
00:18:03,160 --> 00:18:04,160
this is denied."

445
00:18:04,160 --> 00:18:07,880
You have a probabilistic model where outcomes depend on who asked, when they asked, and

446
00:18:07,880 --> 00:18:09,800
how loud the escalation was.

447
00:18:09,800 --> 00:18:11,760
Now, cost governance becomes politics.

448
00:18:11,760 --> 00:18:13,560
And politics is expensive.

449
00:18:13,560 --> 00:18:15,600
The practical impact shows up in two places.

450
00:18:15,600 --> 00:18:17,920
One, cost allocation breaks.

451
00:18:17,920 --> 00:18:22,280
Because exceptions usually bypass tagging requirements or they use temporary tags or they

452
00:18:22,280 --> 00:18:25,840
deploy into shared subscriptions just for now.

453
00:18:25,840 --> 00:18:28,080
So when finance asks, "Who owns this cost?"

454
00:18:28,080 --> 00:18:30,200
You know, the answer becomes, "It's complicated."

455
00:18:30,200 --> 00:18:33,440
And it's complicated is the prelude to unallocated spend.

456
00:18:33,440 --> 00:18:36,480
Two, enforcement collapses.

457
00:18:36,480 --> 00:18:41,880
As your policy becomes audit mode forever, budgets become alerts that go to a mailbox and

458
00:18:41,880 --> 00:18:44,400
workflows become someone should investigate.

459
00:18:44,400 --> 00:18:46,880
So you end up with the worst possible combination.

460
00:18:46,880 --> 00:18:50,840
You have rules that slow down the teams who comply and you have no control over the teams

461
00:18:50,840 --> 00:18:51,840
who don't.

462
00:18:51,840 --> 00:18:54,160
That's why policies without enforcement create resentment.

463
00:18:54,160 --> 00:18:55,840
The compliant teams feel punished.

464
00:18:55,840 --> 00:18:57,560
The exception teams feel empowered.

465
00:18:57,560 --> 00:19:02,120
And the central governance group becomes the villain for trying to restore basic discipline.

466
00:19:02,120 --> 00:19:03,840
The fix is not be stricter.

467
00:19:03,840 --> 00:19:05,920
The fix is to make exceptions real.

468
00:19:05,920 --> 00:19:07,680
Real exceptions have five properties.

469
00:19:07,680 --> 00:19:11,600
An accountable owner, a scoped blast radius, a justification tied to business intent, a

470
00:19:11,600 --> 00:19:13,440
review date and an expiry date.

471
00:19:13,440 --> 00:19:15,800
If any of those is missing, you didn't grant an exception.

472
00:19:15,800 --> 00:19:17,360
You created a permanent bypass.

473
00:19:17,360 --> 00:19:20,960
And permanent bypasses are how organizations drift from FinOps into, "We don't know what

474
00:19:20,960 --> 00:19:22,040
we're paying for."

475
00:19:22,040 --> 00:19:24,680
So when you hear, "We have policies as proof of governance."

476
00:19:24,680 --> 00:19:27,480
The only correct response is, "Show the enforcement mechanism."

477
00:19:27,480 --> 00:19:31,880
If the platform cannot block the violation, if the workflow cannot force renewal.

478
00:19:31,880 --> 00:19:34,880
And if the exception cannot expire, then the policy has no teeth.

479
00:19:34,880 --> 00:19:36,400
It's a wish list with a logo on it.

480
00:19:36,400 --> 00:19:38,920
And the bill will treat it exactly that way.

481
00:19:38,920 --> 00:19:42,000
Failure mode 3, shadow subscriptions and spend outside the graph.

482
00:19:42,000 --> 00:19:46,160
And then there's the third failure mode, the one that makes showback look almost responsible.

483
00:19:46,160 --> 00:19:47,880
Shadow subscriptions.

484
00:19:47,880 --> 00:19:51,160
Spent that never enters your nice clean allocation model in the first place.

485
00:19:51,160 --> 00:19:54,680
This is what happens when the organization builds a cost governance program around the

486
00:19:54,680 --> 00:19:56,400
subscriptions we know about.

487
00:19:56,400 --> 00:20:00,480
Then act surprised when the bill includes things no one can explain.

488
00:20:00,480 --> 00:20:02,680
You can't allocate what you can't enumerate.

489
00:20:02,680 --> 00:20:05,840
You can't hold anyone accountable for a tenant you didn't know existed.

490
00:20:05,840 --> 00:20:10,320
And you definitely can't optimize a workload that finance can't even see as a workload.

491
00:20:10,320 --> 00:20:11,760
Shadow IT didn't die in the cloud.

492
00:20:11,760 --> 00:20:12,760
It evolved.

493
00:20:12,760 --> 00:20:15,360
In the data center era, Shadow IT had friction.

494
00:20:15,360 --> 00:20:19,240
Someone had to buy hardware, rack it, power it, hide it under a desk, or at least convince

495
00:20:19,240 --> 00:20:23,000
procurement to purchase something loud enough that IT would notice.

496
00:20:23,000 --> 00:20:24,400
The cost profile was visible.

497
00:20:24,400 --> 00:20:28,040
A big purchase, a project name, a contract, a depreciation schedule.

498
00:20:28,040 --> 00:20:29,520
Cloud removed that friction.

499
00:20:29,520 --> 00:20:34,040
Now a developer can swipe a card, spin up a subscription, create resources, and be billing

500
00:20:34,040 --> 00:20:35,040
in minutes.

501
00:20:35,040 --> 00:20:39,160
A business unit can buy a SaaS tool with an email address and call it a small experiment.

502
00:20:39,160 --> 00:20:42,480
A team can start a trial, forget to cancel and convert to paid.

503
00:20:42,480 --> 00:20:46,240
And because the charges are often small at the line item level, they evade the immune system

504
00:20:46,240 --> 00:20:47,240
of finance.

505
00:20:47,240 --> 00:20:48,480
This is the uncomfortable truth.

506
00:20:48,480 --> 00:20:52,800
Finance processes are tuned to catch large anomalies, not thousands of micro subscriptions.

507
00:20:52,800 --> 00:20:54,200
That's why Shadow Cloud thrives.

508
00:20:54,200 --> 00:20:55,400
It hides in the long tail.

509
00:20:55,400 --> 00:21:00,320
And in Microsoft ecosystem specifically, there are a few common pathways, Azure, a dev sandbox

510
00:21:00,320 --> 00:21:04,440
subscription created with a personal Microsoft account or with a temporary project tenant

511
00:21:04,440 --> 00:21:06,240
that never got decommissioned.

512
00:21:06,240 --> 00:21:10,000
A subscription created under a separate billing relationship then peered into your network

513
00:21:10,000 --> 00:21:11,800
because we needed access.

514
00:21:11,800 --> 00:21:15,720
A proof of concept that quietly became production because it worked and nobody wanted to

515
00:21:15,720 --> 00:21:16,760
replatform it.

516
00:21:16,760 --> 00:21:18,320
You can call it agility.

517
00:21:18,320 --> 00:21:20,680
The invoice calls it recurring spend.

518
00:21:20,680 --> 00:21:25,360
Microsoft 365, trial tenants, add on skews purchased by departments.

519
00:21:25,360 --> 00:21:30,520
Power platform environments created with minimal oversight, fabric capacities spun up to test

520
00:21:30,520 --> 00:21:32,240
something real quick.

521
00:21:32,240 --> 00:21:36,160
These are especially dangerous because they look like business tooling, not infrastructure.

522
00:21:36,160 --> 00:21:40,600
So they bypass the instinctive controls architects built for Azure subscriptions.

523
00:21:40,600 --> 00:21:45,560
And then there's the procurement angle, SaaS bought outside IT because IT had a backlog.

524
00:21:45,560 --> 00:21:49,280
Or because someone wanted a feature now or because vendor marketing convinced a business

525
00:21:49,280 --> 00:21:51,920
leader that it's just a monthly subscription.

526
00:21:51,920 --> 00:21:55,840
No small monthly subscriptions don't trigger a finance escalation, they accumulate and

527
00:21:55,840 --> 00:22:00,380
what they accumulate into is governance collapse because once spend exists outside your known

528
00:22:00,380 --> 00:22:05,440
graph, the organization loses two things at the same time cost allocation and security posture.

529
00:22:05,440 --> 00:22:09,420
That distinction matters when the spend is outside your graph, you don't just lose reporting

530
00:22:09,420 --> 00:22:10,420
accuracy.

531
00:22:10,420 --> 00:22:13,600
You lose control of identity, data boundaries, retention and audit.

532
00:22:13,600 --> 00:22:18,020
You lose your ability to apply the same policies cost center tags allowed regions approved

533
00:22:18,020 --> 00:22:21,700
as queues because those policies don't reach what you can't see.

534
00:22:21,700 --> 00:22:24,900
So you end up with parallel clouds, the governed one and the actual one.

535
00:22:24,900 --> 00:22:27,120
And the actual one is always bigger than you think.

536
00:22:27,120 --> 00:22:29,860
This is also where showback becomes actively misleading.

537
00:22:29,860 --> 00:22:33,620
Your dashboards might show you every dollar in your enterprise enrollment and still be wrong

538
00:22:33,620 --> 00:22:36,580
because the largest risks often aren't in the biggest subscriptions.

539
00:22:36,580 --> 00:22:40,740
They're in the unowned ones, the ones with no tagging, the ones with a vague name like Innovation

540
00:22:40,740 --> 00:22:43,900
Lab or Project X and no corresponding owner in the org chart.

541
00:22:43,900 --> 00:22:46,100
The real risk is not that someone spent money.

542
00:22:46,100 --> 00:22:50,780
The risk is that someone created a financial and security surface area with no accountability.

543
00:22:50,780 --> 00:22:53,060
So the governance consequence is predictable.

544
00:22:53,060 --> 00:22:58,060
Cost allocation collapses into unallocated spend and arguments, while security posture fragments

545
00:22:58,060 --> 00:23:00,540
into exceptions and blind spots.

546
00:23:00,540 --> 00:23:02,940
Then leadership asks for better showback.

547
00:23:02,940 --> 00:23:06,960
And the organization obediently improves reporting on the part of the system it already

548
00:23:06,960 --> 00:23:07,960
knew about.

549
00:23:07,960 --> 00:23:09,620
This is how the trap reinforces itself.

550
00:23:09,620 --> 00:23:12,740
The fix is not to hunt for shadow subscriptions manually every month.

551
00:23:12,740 --> 00:23:14,220
That's just another form of heroics.

552
00:23:14,220 --> 00:23:18,020
The fix is to make enumeration of first class governance capability, define what approved

553
00:23:18,020 --> 00:23:23,220
cloud means, define how new tenants and subscriptions get created and treat unknown spend like

554
00:23:23,220 --> 00:23:24,260
an incident.

555
00:23:24,260 --> 00:23:27,220
Not because you're trying to be controlling but because you're trying to operate a system

556
00:23:27,220 --> 00:23:29,060
that remains coherent.

557
00:23:29,060 --> 00:23:32,460
Because once spend escapes the graph governance is no longer a design problem.

558
00:23:32,460 --> 00:23:34,020
It becomes a detective story.

559
00:23:34,020 --> 00:23:35,020
And that's when you lose.

560
00:23:35,020 --> 00:23:38,300
Now with the failure modes mapped, the pattern is obvious.

561
00:23:38,300 --> 00:23:39,700
Governance isn't documentation.

562
00:23:39,700 --> 00:23:40,700
It's enforced intent.

563
00:23:40,700 --> 00:23:42,300
Governance is not documentation.

564
00:23:42,300 --> 00:23:43,620
It's enforced intent.

565
00:23:43,620 --> 00:23:47,220
So here's the pivot that most organizations refuse to make.

566
00:23:47,220 --> 00:23:49,260
Governance is not the document you publish.

567
00:23:49,260 --> 00:23:51,380
Governance is what the platform will and will not allow.

568
00:23:51,380 --> 00:23:52,380
The PDF is theater.

569
00:23:52,380 --> 00:23:54,020
The control plane is real.

570
00:23:54,020 --> 00:23:58,540
Most Finops governance programs start as documentation because documentation is politically

571
00:23:58,540 --> 00:23:59,540
cheap.

572
00:23:59,540 --> 00:24:00,540
It doesn't force trade-offs.

573
00:24:00,540 --> 00:24:01,540
It doesn't break deployments.

574
00:24:01,540 --> 00:24:04,860
It doesn't require a fight with a business unit that wants an exception.

575
00:24:04,860 --> 00:24:08,500
It's just words that make leadership feel like the problem is being handled.

576
00:24:08,500 --> 00:24:11,180
But the cloud does not read your policy wiki.

577
00:24:11,180 --> 00:24:14,380
Azure does not care that your standard says required tags.

578
00:24:14,380 --> 00:24:16,980
It will happily accept an untagged resource forever.

579
00:24:16,980 --> 00:24:22,020
Microsoft 365 does not care that your licensing guide says no unapproved add-ons.

580
00:24:22,020 --> 00:24:24,780
It will happily let someone buy the thing you discover three months later.

581
00:24:24,780 --> 00:24:29,780
In other words, governance is control of starting conditions, not review of outcomes.

582
00:24:29,780 --> 00:24:31,060
That distinction matters.

583
00:24:31,060 --> 00:24:32,940
A showback dashboard is an outcome report.

584
00:24:32,940 --> 00:24:35,180
It's a picture of decisions you already allowed.

585
00:24:35,180 --> 00:24:38,340
Real governance designs the conditions under which decisions can be made.

586
00:24:38,340 --> 00:24:40,540
And it forces renewal when the conditions drift.

587
00:24:40,540 --> 00:24:43,780
The reframe governance as a set of enforced outputs.

588
00:24:43,780 --> 00:24:48,500
Constraints what can't be deployed where by whom and under what metadata requirements.

589
00:24:48,500 --> 00:24:54,100
Work flows what happens when spend deviates who must respond and what closure looks like.

590
00:24:54,100 --> 00:24:58,820
Escalation parts what happens when a team can't or won't act inside the response window.

591
00:24:58,820 --> 00:24:59,820
Auditability.

592
00:24:59,820 --> 00:25:04,180
Evidence that a human approved risk accepted variance or requested an exception that later

593
00:25:04,180 --> 00:25:05,180
expired.

594
00:25:05,180 --> 00:25:07,340
If you don't have those four, you do not have governance.

595
00:25:07,340 --> 00:25:10,500
You have a set of preferences and preferences don't survive scale.

596
00:25:10,500 --> 00:25:12,180
This is the uncomfortable truth.

597
00:25:12,180 --> 00:25:13,740
Guidelines are optional by design.

598
00:25:13,740 --> 00:25:17,420
If you call something a guideline, you have already decided the platform must tolerate

599
00:25:17,420 --> 00:25:18,420
non-compliance.

600
00:25:18,420 --> 00:25:22,060
You have already decided that delivery speed outranks cost discipline.

601
00:25:22,060 --> 00:25:23,340
That can be an intentional choice.

602
00:25:23,340 --> 00:25:25,180
But then stop calling it governance.

603
00:25:25,180 --> 00:25:28,100
Good governance is deterministic, not aspirational.

604
00:25:28,100 --> 00:25:33,180
It creates a predictable system where a deployment either passes the guardrails or it doesn't.

605
00:25:33,180 --> 00:25:37,220
And when it doesn't, there is an explicit lane for exceptions that is slower, more visible,

606
00:25:37,220 --> 00:25:38,220
and time bound.

607
00:25:38,220 --> 00:25:40,420
Not punitive, just real.

608
00:25:40,420 --> 00:25:43,820
Because the purpose of governance isn't to stop work, it's to stop unknown work from

609
00:25:43,820 --> 00:25:45,100
becoming permanent spent.

610
00:25:45,100 --> 00:25:47,340
So what does good look like in this context?

611
00:25:47,340 --> 00:25:50,660
It looks like a deterministic security model, but for cost.

612
00:25:50,660 --> 00:25:53,820
And yes, it's the same pattern security teams learned the hard way.

613
00:25:53,820 --> 00:25:56,820
You don't secure an enterprise by emailing people best practices.

614
00:25:56,820 --> 00:26:01,980
You secure it by making insecure behavior difficult, visible, and expensive to maintain.

615
00:26:01,980 --> 00:26:02,980
Finops is the same.

616
00:26:02,980 --> 00:26:04,900
It's risk management, not cost-cutting.

617
00:26:04,900 --> 00:26:06,420
Cost-cutting is a one-time story.

618
00:26:06,420 --> 00:26:10,100
You found waste, you removed it, you declared victory, the system drifted back, and now

619
00:26:10,100 --> 00:26:12,660
you're doing the same exercise again next quarter.

620
00:26:12,660 --> 00:26:14,580
Risk management is an operating model.

621
00:26:14,580 --> 00:26:18,580
You accept that spend volatility, orphaned assets, and exceptions are risks that must be

622
00:26:18,580 --> 00:26:20,060
continuously constrained.

623
00:26:20,060 --> 00:26:21,580
You don't aim for zero spend.

624
00:26:21,580 --> 00:26:23,260
You aim for spend with intent.

625
00:26:23,260 --> 00:26:24,980
And intent has to be encoded.

626
00:26:24,980 --> 00:26:28,260
This is also why the central Finops team, model, fails.

627
00:26:28,260 --> 00:26:32,140
A central team can define standards and build tooling, but it cannot own every decision.

628
00:26:32,140 --> 00:26:34,220
The moment it tries, it becomes a bottleneck.

629
00:26:34,220 --> 00:26:36,020
And the organization roots around it.

630
00:26:36,020 --> 00:26:37,500
Then you get shadow subscriptions.

631
00:26:37,500 --> 00:26:41,700
Then you get exceptions by email, then you're back in the trap just with more meetings.

632
00:26:41,700 --> 00:26:44,780
Governance has to be distributed but enforced centrally.

633
00:26:44,780 --> 00:26:46,020
Central defines the invariance.

634
00:26:46,020 --> 00:26:50,780
Tagging requirements, allowed regions, approval workflows, budget escalation contracts.

635
00:26:50,780 --> 00:26:53,460
Distributed owners make the decisions inside those invariance.

636
00:26:53,460 --> 00:26:57,980
They choose sizes, architectures, licensing, retention, and performance trade-offs.

637
00:26:57,980 --> 00:27:01,260
And when distributed owners want to violate the invariance, they can.

638
00:27:01,260 --> 00:27:06,820
But only through an exception lane that creates evidence, enforces expiry, and forces renewal.

639
00:27:06,820 --> 00:27:10,900
The single governance decision that prevents most drift, exceptions expire by default.

640
00:27:10,900 --> 00:27:13,420
Because an exception without expiry is not an exception.

641
00:27:13,420 --> 00:27:14,740
It's a new baseline.

642
00:27:14,740 --> 00:27:19,500
Once you accept that governance is enforced intent, the rest of the episode gets practical.

643
00:27:19,500 --> 00:27:24,060
You need a minimum viable governance stack that can constrain creation, detect variance,

644
00:27:24,060 --> 00:27:27,860
and force closure, not more reporting, a system of action.

645
00:27:27,860 --> 00:27:29,900
The three enforceable systems of action.

646
00:27:29,900 --> 00:27:31,700
So what replaces showback theatre?

647
00:27:31,700 --> 00:27:34,540
Not better dashboards, not more Finops meetings.

648
00:27:34,540 --> 00:27:37,380
Not another spreadsheet where someone promises to follow up.

649
00:27:37,380 --> 00:27:41,220
A system of action is three enforceable systems working together.

650
00:27:41,220 --> 00:27:43,780
Guard rails, alarms, and actuation.

651
00:27:43,780 --> 00:27:45,700
Each one solves a different failure mode.

652
00:27:45,700 --> 00:27:49,180
And if you skip one, the whole thing collapses back into email and hope.

653
00:27:49,180 --> 00:27:51,180
First, guard rails.

654
00:27:51,180 --> 00:27:54,220
In Azure, the cleanest expression of a guard rail is Azure policy.

655
00:27:54,220 --> 00:27:57,740
Not because policy is special, but because it sits at the creation boundary.

656
00:27:57,740 --> 00:27:59,180
It can shape starting conditions.

657
00:27:59,180 --> 00:28:02,900
That's the only place governance scales as your policy is the constraint engine.

658
00:28:02,900 --> 00:28:07,460
It's where the organization encodes invariants, required tags, allowed regions, allowed

659
00:28:07,460 --> 00:28:11,460
SKUs, maybe required diagnostic settings, maybe required resource locks for certain classes

660
00:28:11,460 --> 00:28:15,380
of assets, not as recommendations, but as rules that actually evaluate and enforce.

661
00:28:15,380 --> 00:28:16,980
And yes, there's a discipline to it.

662
00:28:16,980 --> 00:28:19,180
Start in audit mode to observe last radius.

663
00:28:19,180 --> 00:28:20,740
Then move to deny where you can.

664
00:28:20,740 --> 00:28:25,780
Use modifying deploy if not exists, where you need the platform to correct things automatically.

665
00:28:25,780 --> 00:28:27,540
But conceptually, this is the point.

666
00:28:27,540 --> 00:28:30,260
You are not asking people to remember to do the right thing.

667
00:28:30,260 --> 00:28:32,380
You are making the wrong thing harder to do.

668
00:28:32,380 --> 00:28:34,380
And then, you are not asking for the right thing.

669
00:28:34,380 --> 00:28:36,380
You are not asking for the right thing.

670
00:28:36,380 --> 00:28:38,380
You are not asking for the right thing.

671
00:28:38,380 --> 00:28:40,380
You are not asking for the right thing.

672
00:28:40,380 --> 00:28:42,380
You are not asking for the right thing.

673
00:28:42,380 --> 00:28:44,380
You are not asking for the right thing.

674
00:28:44,380 --> 00:28:46,380
You are not asking for the right thing.

675
00:28:46,380 --> 00:28:48,380
You are not asking for the right thing.

676
00:28:48,380 --> 00:28:50,380
You are not asking for the right thing.

677
00:28:50,380 --> 00:28:52,380
You are not asking for the right thing.

678
00:28:52,380 --> 00:28:54,380
You are not asking for the right thing.

679
00:28:54,380 --> 00:28:56,380
You are not asking for the right thing.

680
00:28:56,380 --> 00:28:58,380
You are not asking for the right thing.

681
00:28:58,380 --> 00:29:00,380
You are not asking for the right thing.

682
00:29:00,380 --> 00:29:02,380
Early enough that you can still change course.

683
00:29:02,380 --> 00:29:04,380
So budgets and anomaly alerts are the alarm system.

684
00:29:04,380 --> 00:29:06,380
They don't fix anything by themselves.

685
00:29:06,380 --> 00:29:08,380
They generate events.

686
00:29:08,380 --> 00:29:12,380
And events are useless unless they root to a human who is obligated to respond.

687
00:29:12,380 --> 00:29:14,380
Which brings us to the third system.

688
00:29:14,380 --> 00:29:16,380
Actuation. This is the part showback never had.

689
00:29:16,380 --> 00:29:18,380
This is the actuator the observer pattern was missing.

690
00:29:18,380 --> 00:29:20,380
Actuation is workflow automation.

691
00:29:20,380 --> 00:29:21,380
Power automate.

692
00:29:21,380 --> 00:29:22,380
Service now.

693
00:29:22,380 --> 00:29:26,380
Whatever ITSM or orchestration layer your enterprise already uses.

694
00:29:26,380 --> 00:29:28,380
The tool choice is secondary.

695
00:29:28,380 --> 00:29:29,380
The pattern is non-negotiable.

696
00:29:29,380 --> 00:29:31,380
A cost anomaly is not an email.

697
00:29:31,380 --> 00:29:37,380
It is a work item with an owner, an SLA, required decision states and closure evidence.

698
00:29:37,380 --> 00:29:39,380
This is where the system stops being polite.

699
00:29:39,380 --> 00:29:42,380
Because a workflow can enforce the things humans won't.

700
00:29:42,380 --> 00:29:43,380
Assignment.

701
00:29:43,380 --> 00:29:44,380
Escalation.

702
00:29:44,380 --> 00:29:45,380
Approvals.

703
00:29:45,380 --> 00:29:47,380
Expiry dates and audit trails.

704
00:29:47,380 --> 00:29:51,380
It can also trigger remediation actions when the decision is already defined.

705
00:29:51,380 --> 00:29:52,380
Like auto tagging.

706
00:29:52,380 --> 00:29:56,380
Stopping dev compute out of hours or quarantining unknown resources.

707
00:29:56,380 --> 00:30:00,380
Now here is the architectural pattern language that ties all three together.

708
00:30:00,380 --> 00:30:01,380
Event reasoning.

709
00:30:01,380 --> 00:30:02,380
Orchestration.

710
00:30:02,380 --> 00:30:03,380
Event.

711
00:30:03,380 --> 00:30:04,380
Something changed.

712
00:30:04,380 --> 00:30:05,380
Spent spiked.

713
00:30:05,380 --> 00:30:06,380
A budget threshold crossed.

714
00:30:06,380 --> 00:30:07,380
A policy violation occurred.

715
00:30:07,380 --> 00:30:09,380
An untagged resource appeared.

716
00:30:09,380 --> 00:30:12,380
A new subscription was created outside the approved process.

717
00:30:12,380 --> 00:30:13,380
Reasoning.

718
00:30:13,380 --> 00:30:15,380
What does the organization want to do about that class of event?

719
00:30:15,380 --> 00:30:16,380
Is this acceptable?

720
00:30:16,380 --> 00:30:17,380
Is it expected?

721
00:30:17,380 --> 00:30:18,380
Is it an exception?

722
00:30:18,380 --> 00:30:19,380
Is it a violation?

723
00:30:19,380 --> 00:30:20,380
Is it orphaned?

724
00:30:20,380 --> 00:30:21,380
Is it shared?

725
00:30:21,380 --> 00:30:22,380
Orchestration?

726
00:30:22,380 --> 00:30:23,380
Do the thing.

727
00:30:23,380 --> 00:30:24,380
Assign it.

728
00:30:24,380 --> 00:30:25,380
Approve it.

729
00:30:25,380 --> 00:30:26,380
Assign it.

730
00:30:26,380 --> 00:30:28,380
When people ask, what should we implement first?

731
00:30:28,380 --> 00:30:32,380
And the answer is not cost management or policy or service now.

732
00:30:32,380 --> 00:30:34,380
The answer is implement a closed loop.

733
00:30:34,380 --> 00:30:38,380
If you only implement guardrails, teams will root around them and you'll get shadow subscriptions.

734
00:30:38,380 --> 00:30:41,380
If you only implement alarms, alerts will die in inboxes.

735
00:30:41,380 --> 00:30:47,380
If you only implement workflows, you'll drown in tickets because you didn't reduce the decision volume at creation time.

736
00:30:47,380 --> 00:30:49,380
The three systems reduce different kinds of entropy.

737
00:30:49,380 --> 00:30:51,380
Guardrails reduce bad creation.

738
00:30:51,380 --> 00:30:54,380
Alarms reduce detection latency.

739
00:30:54,380 --> 00:30:56,380
Actuation reduces closure failure.

740
00:30:56,380 --> 00:30:58,380
There's one more uncomfortable truth buried in this.

741
00:30:58,380 --> 00:31:00,380
Tools don't matter without ownership design.

742
00:31:00,380 --> 00:31:03,380
You can implement the cleanest as your policy initiative on Earth.

743
00:31:03,380 --> 00:31:07,380
If the owner tag points to a distribution list, your actuator still doesn't know who to assign the ticket to.

744
00:31:07,380 --> 00:31:09,380
You can set budgets everywhere.

745
00:31:09,380 --> 00:31:12,380
If the alerts go to a shared mailbox, nobody feels compelled to respond.

746
00:31:12,380 --> 00:31:15,380
So the system of action has to be wired to real ownership.

747
00:31:15,380 --> 00:31:19,380
That's why the next dependency is identity, not cost.

748
00:31:19,380 --> 00:31:24,380
Before you can govern, spend, you need to know who can say no, who can approve variance and who can accept risk.

749
00:31:24,380 --> 00:31:26,380
Ownership is the root control plane.

750
00:31:26,380 --> 00:31:28,380
Everything else is just instrumentation.

751
00:31:28,380 --> 00:31:31,380
Ownership is a control plane. The who can say no problem.

752
00:31:31,380 --> 00:31:35,380
Ownership is the part everyone agrees with in principle, then quietly sabotages in implementation.

753
00:31:35,380 --> 00:31:37,380
Because real ownership creates friction.

754
00:31:37,380 --> 00:31:40,380
Real ownership means someone can say no, you don't get to deploy that.

755
00:31:40,380 --> 00:31:43,380
Or yes, you can keep that spend, but you're accepting it on record.

756
00:31:43,380 --> 00:31:46,380
And most organizations don't actually want that kind of clarity.

757
00:31:46,380 --> 00:31:51,380
They want shared accountability, which is just a polite term for unassigned responsibility.

758
00:31:51,380 --> 00:31:53,380
So here's the definition that matters.

759
00:31:53,380 --> 00:31:57,380
An owner is an accountable person or team that can approve, spend, and accept risk.

760
00:31:57,380 --> 00:32:00,380
Not a distribution list. Not the cloud team.

761
00:32:00,380 --> 00:32:03,380
Not Finops. Not IT.

762
00:32:03,380 --> 00:32:05,380
A named owner is a control surface.

763
00:32:05,380 --> 00:32:08,380
It's the routing table for every cost event you claim you want to manage.

764
00:32:08,380 --> 00:32:11,380
And the moment you don't have it, the system falls back to chaos.

765
00:32:11,380 --> 00:32:16,380
Alerts go nowhere, exceptions get approved by whoever is available, and spend becomes shared.

766
00:32:16,380 --> 00:32:18,380
The way technical debt becomes shared.

767
00:32:18,380 --> 00:32:20,380
Everyone benefits, no one pays.

768
00:32:20,380 --> 00:32:22,380
That's the who can say no problem.

769
00:32:22,380 --> 00:32:27,380
Finops programs collapse because they assume that if they assign a cost to a team, that team can act.

770
00:32:27,380 --> 00:32:29,380
But can act is an authorization problem.

771
00:32:29,380 --> 00:32:34,380
If the people receiving the show back don't have the rights, the time, or the mandate to change anything,

772
00:32:34,380 --> 00:32:36,380
you just created an inbox ritual.

773
00:32:36,380 --> 00:32:39,380
So ownership isn't a label, it's an authority model.

774
00:32:39,380 --> 00:32:43,380
And the scope matters. At enterprise scale you typically need ownership at three layers,

775
00:32:43,380 --> 00:32:46,380
and the mistake is pretending one layer covers all of them.

776
00:32:46,380 --> 00:32:48,380
First, subscription or tenant ownership.

777
00:32:48,380 --> 00:32:54,380
Someone owns the boundary. They decide who can create resources, which policies apply, which budgets exist,

778
00:32:54,380 --> 00:32:56,380
and what the default guardrails are.

779
00:32:56,380 --> 00:32:59,380
If no one owns the boundary, then any workload can bring its own rules.

780
00:32:59,380 --> 00:33:01,380
And now you're running a multi-cloud inside one cloud.

781
00:33:01,380 --> 00:33:04,380
Second, workload or product ownership.

782
00:33:04,380 --> 00:33:06,380
Someone owns the actual system that spends money.

783
00:33:06,380 --> 00:33:09,380
The subscription might be shared, but the workload cannot be.

784
00:33:09,380 --> 00:33:15,380
If a product team can't name a workload owner, what they actually have is a collection of resources with no life cycle.

785
00:33:15,380 --> 00:33:17,380
That's not a product, that's a liability.

786
00:33:17,380 --> 00:33:19,380
Third, shared platform ownership.

787
00:33:19,380 --> 00:33:24,380
Networking, identity logging, security tooling, CICD runners, shared databases, integration layers,

788
00:33:24,380 --> 00:33:26,380
these are legitimate shared costs.

789
00:33:26,380 --> 00:33:31,380
But they still require an owner who can set allocation rules and enforce consumption constraints.

790
00:33:31,380 --> 00:33:35,380
Otherwise, shared services becomes the permanent excuse for ungovernable spend.

791
00:33:35,380 --> 00:33:38,380
Now here's the rule that makes ownership real.

792
00:33:38,380 --> 00:33:42,380
Every resource must map to an owner at creation time, not at month end.

793
00:33:42,380 --> 00:33:45,380
Not when finance asks, not after a cost spike.

794
00:33:45,380 --> 00:33:48,380
Creation time is the only point where the platform still has leverage.

795
00:33:48,380 --> 00:33:50,380
After that, resources become politically protected.

796
00:33:50,380 --> 00:33:54,380
It's running production. We can't touch it. We'll fix it next sprint.

797
00:33:54,380 --> 00:34:00,380
You know the script. So if you want accountability, you enforce ownership as a precondition for existence.

798
00:34:00,380 --> 00:34:02,380
That sounds harsh until you remember the alternative.

799
00:34:02,380 --> 00:34:07,380
Paying indefinitely for things nobody will claim. This is also why the central Finops team owns everything.

800
00:34:07,380 --> 00:34:11,380
Antipatent is so persistent and so destructive. It feels efficient.

801
00:34:11,380 --> 00:34:14,380
One team, one model, one dashboard, one place to ask questions.

802
00:34:14,380 --> 00:34:17,380
But architecturally it creates two outcomes. First, it creates a bottleneck.

803
00:34:17,380 --> 00:34:21,380
Every decision queues behind the same team. Therefore teams root around it.

804
00:34:21,380 --> 00:34:23,380
Second, it creates abdication.

805
00:34:23,380 --> 00:34:26,380
Product teams stop thinking about cost because Finops owns that.

806
00:34:26,380 --> 00:34:30,380
Then when something goes wrong, they blame the Finops team for not catching it sooner.

807
00:34:30,380 --> 00:34:35,380
Meanwhile, the Finops team never had the authority to prevent the decision in the first place.

808
00:34:35,380 --> 00:34:38,380
So the correct pattern is federated ownership with centralized enforcement.

809
00:34:38,380 --> 00:34:41,380
Central defines the invariance and the exception lane.

810
00:34:41,380 --> 00:34:45,380
Distributed owners operate inside those invariance and carry the consequences.

811
00:34:45,380 --> 00:34:48,380
Now, none of this works unless ownership can be exercised.

812
00:34:48,380 --> 00:34:53,380
Owners must have the ability to approve, spend changes and the ability to accept risk explicitly.

813
00:34:53,380 --> 00:34:57,380
That means they need clear levers, budgets they control, tags that map spend to them,

814
00:34:57,380 --> 00:35:03,380
and workflows that force them to decide. If ownership exists only on paper, you didn't build a control plane.

815
00:35:03,380 --> 00:35:06,380
You build a phone tree and this is where the system gets uncomfortable.

816
00:35:06,380 --> 00:35:09,380
Fast. Ownership means someone will be blamed. Good.

817
00:35:09,380 --> 00:35:15,380
Blame is not the goal, but consequence is because without consequence, showback stays what it always was.

818
00:35:15,380 --> 00:35:18,380
High resolution awareness with zero obligation.

819
00:35:18,380 --> 00:35:22,380
So the next step is mechanical. You take ownership and you encode it as metadata.

820
00:35:22,380 --> 00:35:32,380
Not for reporting, for enforcement. That means tags, mandatory at deployment time, mandatory tagging at deployment time, cost center, plus owner as non-negotiable metadata.

821
00:35:32,380 --> 00:35:36,380
Once ownership exists as a concept, it has to become computable.

822
00:35:36,380 --> 00:35:40,380
Otherwise, it's just a slide in a governance deck. That's what mandatory tagging really is.

823
00:35:40,380 --> 00:35:49,380
Turning who owns this into something the platform can evaluate, root and enforce, not next month, not after a variance meeting at creation time.

824
00:35:49,380 --> 00:35:56,380
Most organizations treat tags like decoration, cost center, owner, environment, workload, nice to have.

825
00:35:56,380 --> 00:36:02,380
Useful for reporting, maybe a KPI for tag compliance. That framing is why it fails. Tags are allocation primitives.

826
00:36:02,380 --> 00:36:10,380
They are the joint keys between three different worlds that never naturally agree, cloud resources, finance budgets and human accountability.

827
00:36:10,380 --> 00:36:16,380
If the joint keys are missing, your entire operating model becomes guesswork and guesswork is the root cause of cost theater.

828
00:36:16,380 --> 00:36:22,380
So the minimum viable tag set is boring on purpose. Cost center and owner are non-negotiable.

829
00:36:22,380 --> 00:36:30,380
Environment and workload or app usually come next because prod and dev behave differently and because you need to group spend into something meaningful.

830
00:36:30,380 --> 00:36:35,380
But the critical part isn't the list, it's when and how they get applied. Manual tagging is an entropy generator.

831
00:36:35,380 --> 00:36:40,380
If you ask people to remember to tag, they won't. Not because they're careless but because the system rewards speed.

832
00:36:40,380 --> 00:36:49,380
Tagging happens at the end of a deployment under time pressure right after someone finally got a pipeline working. That's the moment your governance depends on voluntary discipline. It will decay.

833
00:36:49,380 --> 00:36:56,380
So tagging has to move left deployment time and it has to be enforced. Deny, modify or deploy if not exists depending on what you're controlling.

834
00:36:56,380 --> 00:37:01,380
The exact mechanism is less important than the architectural principle.

835
00:37:01,380 --> 00:37:06,380
If a resource cannot be mapped to an owner and a cost center, it should not exist.

836
00:37:06,380 --> 00:37:15,380
It counts extreme until you observe the alternative. A growing inventory of anonymous spend that no one will delete, no one will write size and no one can justify.

837
00:37:15,380 --> 00:37:21,380
You can't hold people accountable for costs, you can't attribute reliably and you can't attribute reliably if the attribution is optional.

838
00:37:21,380 --> 00:37:27,380
Now the next predictable objection is shared services. We can't tag those to one cost center.

839
00:37:27,380 --> 00:37:32,380
Correct, that's why shared services need an allocation model that's explicit, stable and revisited.

840
00:37:32,380 --> 00:37:39,380
The goal isn't to pretend shared costs aren't shared, the goal is to prevent shared from becoming unknown. A platform subscription can still have an owner.

841
00:37:39,380 --> 00:37:47,380
The platform team can still have a cost center and you can still build a split charge model, a platform tags, consumption based allocation or a fixed rate.

842
00:37:47,380 --> 00:37:52,380
Pick your poison, the point is to pick it deliberately and enforce the metadata that makes it possible.

843
00:37:52,380 --> 00:37:59,380
Because shared services are where accountability goes to die, not due to malice, due to ambiguity and ambiguity always wins unless you remove it.

844
00:37:59,380 --> 00:38:06,380
This is also where people make the second foundational mistake. They aim for perfect tagging instead of enforceable tagging. Perfect tagging is a fantasy.

845
00:38:06,380 --> 00:38:14,380
Enforcible tagging is a system. Enforcible tagging accepts that tags will be wrong sometimes. Owners change, costs centers get renamed, Rockloads get split, that's fine.

846
00:38:14,380 --> 00:38:21,380
But enforceable tagging makes the drift visible and correctable because every resource has to carry an identity for who is responsible today.

847
00:38:21,380 --> 00:38:32,380
And when that identity becomes stale, it becomes a life cycle event, not a surprise, so you design for that reality. Tags must be validated, not just present. Owner tags should map to real identities, not free text.

848
00:38:32,380 --> 00:38:41,380
Cost centers should be from a controlled set, not whatever someone typed at 2am. If you allow arbitrary values, you just move the argument from "who owns this cost?"

849
00:38:41,380 --> 00:38:54,380
To what did you mean by this tag value? It's the same dysfunction with better formatting. And no, this is not a FinOps team project alone. This is an identity and governance project because the metadata has to be authoritative enough to drive automation.

850
00:38:54,380 --> 00:39:01,380
If the tags can't be trusted, you can't root tickets, you can't escalate budgets, you can't quarantine resources safely. You're back to manual investigation.

851
00:39:01,380 --> 00:39:16,380
So tagging becomes a control surface, it enables guardrails, alarms, and actuation. Guardrails use tags to decide what can be deployed and where. Alarms use tags to decide who gets paged. Actuation uses tags to decide who gets assigned, who approves, and who gets blamed when an exception expires.

852
00:39:16,380 --> 00:39:21,380
This is why the phrase tagging strategy is misleading, a strategy is optional. This is a contract.

853
00:39:21,380 --> 00:39:31,380
And here's the irony, once you enforce tagging at deployment time, showback actually becomes useful because now your visibility is attached to an accountable owner by design, not by spreadsheet inference.

854
00:39:31,380 --> 00:39:38,380
You can root cost to decisions, not just to teams in theory, that's when the dashboard stops being wallpaper and starts being evidence.

855
00:39:38,380 --> 00:39:48,380
Azure Policy as a cost guardrail engine, not a compliance checkbox. Once tagging becomes a contract, you need something that enforces the contract when people are moving fast. That's Azure Policy.

856
00:39:48,380 --> 00:39:59,380
And no, Azure Policy is not a compliance feature, that's the framing that kills it. Compliance is usually audit after the fact. Azure Policy is valuable when it is pre-conditioned enforcement. It shapes what can exist.

857
00:39:59,380 --> 00:40:09,380
So treat policy as a cost guardrail engine, a distributed decision compiler that evaluates deployments against your invariance and either allows them, fixes them, or blocks them.

858
00:40:09,380 --> 00:40:19,380
This is the part where most organizations sabotage themselves. They deploy policy in audit mode, generate a beautiful compliance dashboard, and then stop. Audit mode is not governance. Audit mode is reconnaissance.

859
00:40:19,380 --> 00:40:27,380
reconnaissance is useful, reconnaissance is not control. Azure Policy has a set of effects and you don't need the full taxonomy to understand the architecture.

860
00:40:27,380 --> 00:40:34,380
You need the intent behind them, deny is the deterministic one. It refuses the deployment. That is what turns guidance into constraint.

861
00:40:34,380 --> 00:40:44,380
When a resource requires an owner and cost center tag, deny makes missing metadata impossible. When a region is disallowed, deny prevents the cost and data gravity of the wrong geography from ever happening.

862
00:40:44,380 --> 00:40:53,380
Audit is evidence. It tells you what would have violated the rule and that's how you measure blast radius before you tighten enforcement. Modify is the most underrated. It auto corrects.

863
00:40:53,380 --> 00:40:59,380
It's how you reduce human friction while still enforcing your invariance. If you can derive a tag from context, modify can apply it.

864
00:40:59,380 --> 00:41:06,380
If you can normalize tag casing or standard values, modify can remove the we typed it differently excuse.

865
00:41:06,380 --> 00:41:16,380
Deploy if not exists is how you force baseline hygiene into the environment. Diagnostic settings resource locks maybe configuration that makes the cost surface observable enough to govern.

866
00:41:16,380 --> 00:41:25,380
Now the guardrails that matter for cost are not exotic. They're boring and that's why they work. Require tags, enforce allowed locations, enforce allowed skews for specific classes of workloads.

867
00:41:25,380 --> 00:41:34,380
Restrict premium tiers in non-production. Block creation of certain resource types in the wrong scope. Require shutdown schedules or automation hooks for dev/sash test where you can.

868
00:41:34,380 --> 00:41:39,380
This is not about being draconian, it's about removing the default ability to make expensive decisions by accident.

869
00:41:39,380 --> 00:41:44,380
Because most cloud waste isn't malicious, it's just the platform's permissive defaults meeting human urgency.

870
00:41:44,380 --> 00:41:48,380
Here's the discipline that prevents self-inflicted outages. Audit first.

871
00:41:48,380 --> 00:41:58,380
If you flip deny on day one at the wrong scope, you will break someone's pipeline and they will root around you for the next two years. That's not a joke. That's how shadow subscriptions get born.

872
00:41:58,380 --> 00:42:07,380
So start in audit mode to observe, measure how many things would fail, fix the obvious gaps, build the exception lane, then move to deny where the organization has agreed that the invariant is real.

873
00:42:07,380 --> 00:42:15,380
And scope is where this becomes architectural. If you apply policy at a management group, you're defining org level invariants. The rules that should not vary.

874
00:42:15,380 --> 00:42:22,380
Required tags, allowed regions, baseline security and logging. That's a reasonable place to put you must be attributable to an owner.

875
00:42:22,380 --> 00:42:33,380
If you apply policy at subscription scope, you're defining environment level constraints. Dev subscriptions can be tighter, production subscriptions can be stricter on change paths, platform subscriptions can have different allowed resources.

876
00:42:33,380 --> 00:42:41,380
If you apply policy at resource group scope, you're doing workload level tailoring, which is useful. But it is also where exceptions silently multiply.

877
00:42:41,380 --> 00:42:49,380
Be careful. Every special case at lower scope is an entropy generator unless it is tracked and time bound. And here's the key point that has to stand on its own.

878
00:42:49,380 --> 00:42:55,380
A policy without enforcement is a wish list. You can call it a standard, you can call it governance, you can put it in a wiki.

879
00:42:55,380 --> 00:43:01,380
The platform will still happily let people violate it and then show back will report the violation after you've already paid for it.

880
00:43:01,380 --> 00:43:10,380
This is also why Azure policy doesn't replace budgets or workflows. It reduces decision volume by blocking known bad starting conditions. It prevents whole classes of waste from ever entering the estate.

881
00:43:10,380 --> 00:43:17,380
But it won't catch everything, it can't tell you whether a workload is worth it. It can't decide that a spike is legitimate.

882
00:43:17,380 --> 00:43:24,380
It can't negotiate trade-offs between performance and cost. That's where alarms and actuation come next. Policy defines the boundaries of acceptable behavior.

883
00:43:24,380 --> 00:43:29,380
Budgets detect drift inside the boundaries. Work flows force a human decision when the drift matters.

884
00:43:29,380 --> 00:43:35,380
So if your current Finop story is, we have a tagging policy and a dashboard, you are missing the only part that scales.

885
00:43:35,380 --> 00:43:42,380
The refusal because until the platform can say no, you're not running guardrails, you're running polite suggestions at cloud speed.

886
00:43:42,380 --> 00:43:51,380
Budgets as escalation contracts, alerts with teeth. Now budgets. This is where most organizations accidentally reveal they never wanted accountability in the first place.

887
00:43:51,380 --> 00:43:59,380
They treat budgets like a finance document, a number someone said during planning season, then ignored until month end, then explained away.

888
00:43:59,380 --> 00:44:08,380
And as your cost management happily supports that patent because it can generate a nice alert email and call it governance. But budgets aren't a reporting feature. They're an escalation contract.

889
00:44:08,380 --> 00:44:20,380
A budget is the organization saying if spend crosses this line someone must respond in a defined time window with a defined set of options and a defined escalation path if they don't. That's not accounting, that's incident management.

890
00:44:20,380 --> 00:44:25,380
The reason this matters is simple. Showback fails because it creates awareness with no consequence.

891
00:44:25,380 --> 00:44:31,380
Budgets are how you attach consequence to variants without needing a heroic Finop's analyst to chase every anomaly.

892
00:44:31,380 --> 00:44:39,380
So the first design rule is non-negotiable. Budgets must be tied to ownership. If your budget alert goes to a shared mailbox, you don't have an alert, you have spam.

893
00:44:39,380 --> 00:44:48,380
If your budget alert goes to a Finop's distribution list, you don't have accountability, you have outsourced guilt. The alert has to hit the person or team that can actually change the system.

894
00:44:48,380 --> 00:44:57,380
The workload owner, the subscription owner or the platform owner, depending on what you're budgeting. And it has to be explicit which one it is because otherwise everyone assumes it's someone else.

895
00:44:57,380 --> 00:45:06,380
Second rule, budgets are not only for over budget. The goal isn't a zero variance fantasy. The goal is reduced volatility and faster detection of drift.

896
00:45:06,380 --> 00:45:11,380
Variance is a signal that either the system behavior changed or your assumptions changed. Either way, it requires a decision.

897
00:45:11,380 --> 00:45:22,380
So the budget thresholds need to be staged, early warning, action required escalation. Not because you enjoy bureaucracy, but because humans ignore binary alarms. They need time to react before you hit the wall.

898
00:45:22,380 --> 00:45:31,380
A practical pattern is notified 50% and 75% require a response at 90% and escalate at 100%. The exact numbers don't matter. The existence of a ladder does.

899
00:45:31,380 --> 00:45:46,380
Because an escalation ladder is what turns we noticed into we did something and the ladder should be explicit. Team owner first, then platform owner, then finance partner, then exec sponsor. It's not about shaming. It's about ensuring the decision reaches someone with the authority to accept risk or reforecast.

900
00:45:46,380 --> 00:45:54,380
That last part matters more than people admit. Most organizations act like the only acceptable response to a budget breach is optimized. No.

901
00:45:54,380 --> 00:46:03,380
There are four legitimate response types and the system needs to allow all of them or people will lie. Acknowledge, yes, this is expected and here's why.

902
00:46:03,380 --> 00:46:14,380
Mitigate, yes, this is unplanned and here's what we will change by when. Request exception, yes, we need to violate the budget temporarily and here's the justification, scope and expiry.

903
00:46:14,380 --> 00:46:28,380
Yes, the budget was wrong because the business changed and we're updating the plan with an accountable approver. If you only allow mitigate, teams will invent narratives to avoid escalation. If you allow all four you get truth and governance runs on truth, not optimism.

904
00:46:28,380 --> 00:46:32,380
Now here's the part people get wrong, they think the budget is the control, it isn't.

905
00:46:32,380 --> 00:46:47,380
The budget is the trigger that creates a decision moment without a required workflow, budget alerts die like every other email, people see them mentally file them under later and then later becomes the invoice. So a budget without rooting and SLA is just another observer with no actuator.

906
00:46:47,380 --> 00:46:51,380
This is why the budget has to create work, not conversation, work.

907
00:46:51,380 --> 00:47:05,380
A cost alert should open a ticket, assign it to an owner, set a response timer and require closure evidence. If the organization won't do that, then stop pretending budgets are governance. They're just notifications and budgets need to be designed with the right metric focus.

908
00:47:05,380 --> 00:47:17,380
Most teams obsess over absolute spend. That's fine for finance optics. Operationally volatility is the killer. Volatility means you can't plan. Volatility means surprise. Volatility means rushed decisions under pressure.

909
00:47:17,380 --> 00:47:26,380
The budget system should aim to shrink the amplitude of surprises, fewer spikes, faster response, less often noise, fewer unallocated charges.

910
00:47:26,380 --> 00:47:33,380
When that works, savings show up as a side effect, not the headline. Because the real win isn't, we saved money this month.

911
00:47:33,380 --> 00:47:40,380
The real win is, spend became predictable, attributable and intentional and once you have that you can actually make trade-offs like adults.

912
00:47:40,380 --> 00:47:51,380
The problem is obvious. Even with budgets and escalation alerts still die unless the organization has a real actuator. So the missing piece is workflow automation, the part that forces closure and leaves evidence behind.

913
00:47:51,380 --> 00:48:00,380
Workflow automation as the missing actuator, now you need the actuator. Because guardrails prevent some bad creation and budgets detect drift, but neither one forces a human decision to finish.

914
00:48:00,380 --> 00:48:14,380
Without actuation alerts become background noise and we'll look at it becomes the organization's primary control mechanism. That is not a mechanism. Workflow automation is the missing piece because it turns a cost signal into an owned work item with consequences.

915
00:48:14,380 --> 00:48:22,380
And yes, that means you have to treat costs anomalies like incidents. Not because money is more important than uptime, but because unmanaged spend is still a production problem.

916
00:48:22,380 --> 00:48:38,380
It drains budget increases risk and forces bad decisions later under pressure. So the mental model is simple and anomaly is not an email and anomaly is an incident record. It has an owner, it has an SLA, it has a required outcome and it has closure evidence. If you can't produce evidence, you didn't govern it, you observed it.

917
00:48:38,380 --> 00:48:49,380
This is where service now or your ITSM platform earns its keep, not as a ticketing system as an accountability compiler. It takes a fuzzy event, cost is up and forces it through a deterministic process.

918
00:48:49,380 --> 00:48:59,380
Assignment, triage, decision, remediation and closure, power automate can play the same role if you don't have a full ITSM estate. The platform choice doesn't matter, the behavior does.

919
00:48:59,380 --> 00:49:16,380
Here's what the workflow has to contain or it will decay into theater again. First a consistent object model, a ticket with fields that matter, scope, subscription, management group, workload, affected service, cost delta, forecast impact and the owner derived from authoritative metadata.

920
00:49:16,380 --> 00:49:31,380
But someone decides who owns it. The system decides based on tags and subscription ownership. Second a response contract. The ticket must require a response type. The same four options you defined for budgets, acknowledge, mitigate, request exception or reforecast.

921
00:49:31,380 --> 00:49:35,380
Those aren't just words. Each option should drive the next step automatically.

922
00:49:35,380 --> 00:49:49,380
Acknowledge means you attach a rationale and an approver and the system records it. Mitigate means you attach an action plan and due date and the system follows up until it's done. Request exception means you enter the exception workflow, not an email thread.

923
00:49:49,380 --> 00:49:57,380
Reforecast means finance gets pulled in with a name sponsor who accepts the new number. Third assignment and escalation.

924
00:49:57,380 --> 00:50:08,380
If the owner doesn't respond in the window, the system escalates, not because you enjoy escalation. Because delayed decisions are still decisions, they are just decisions made by default by the platform via continued billing.

925
00:50:08,380 --> 00:50:15,380
Now the exception workflow is where most organizations fall apart so it needs to be explicit. An exception request is not please let me.

926
00:50:15,380 --> 00:50:23,380
It's a structured record with required fields. Business justification, blast radius, start date, end date and what the team will do to exit the exception.

927
00:50:23,380 --> 00:50:33,380
And the important part expires enforced. If the exception hits its end date, the system either renews it with approval or it triggers remediation. No silent continuation, no we forgot.

928
00:50:33,380 --> 00:50:43,380
That one constraint, expires enforced by workflow, removes more drift than any dashboard you will ever build. Now where does automation end and humans begin? Human set intent.

929
00:50:43,380 --> 00:50:53,380
Systems execute repeatable steps. If the remediation is deterministic and safe, automated. If it requires judgment about availability risk, root it for approval. That's the boundary.

930
00:50:53,380 --> 00:51:03,380
So for example, auto tagging missing metadata via policy modifier is safe. Quarantining an unowned dev VM by shutting it down outside business hours can be safe if your organization agreed to that policy.

931
00:51:03,380 --> 00:51:14,380
Deleting resources is rarely safe without an approval lane, but moving them into a restricted holding state often is. The point is to define tiers of action. Notify only where the ticket exists and someone must respond.

932
00:51:14,380 --> 00:51:22,380
Auto correct where the platform can fix metadata or configuration. Auto stop or restrict where the platform can reduce burn while waiting for a human decision.

933
00:51:22,380 --> 00:51:40,380
And only then delete with explicit approvals and evidence. This is what makes the workflow an actuator instead of a bureaucratic museum. It changes reality, it reduces spend while the human process catches up. And it prevents the most common anti-pattern. Email finance. Email is not a workflow. Email has no ownership, no SLA, no audit trail and no forced closure.

934
00:51:40,380 --> 00:51:49,380
It is exactly the kind of soft control that show back trains organizations to rely on. The final piece is the uncomfortable one. Workflow surface who is allowed to say no.

935
00:51:49,380 --> 00:51:59,380
If every anomaly ticket ends with "we can't do anything", you didn't build automation, you built a complaint routing system. Ownership has to include authority or the workflow becomes cruelty.

936
00:51:59,380 --> 00:52:11,380
Tickets assigned to people who can't change the bill. So workflow automation is not the nice to have layer. It is the actuator that turns governance from observation into enforcement. And once you have an actuator, you can start doing the thing organizations avoid.

937
00:52:11,380 --> 00:52:15,380
Remediation and quarantine for nobody's problems spend.

938
00:52:15,380 --> 00:52:21,380
Automatic remediation and quarantine stop paying for nobody's problem. Once you have a workflow, you are tempted to stop there.

939
00:52:21,380 --> 00:52:27,380
Tickets, SLA's approvals, evidence, everyone feels safer. And you're still bleeding money.

940
00:52:27,380 --> 00:52:36,380
Because a ticket doesn't change reality unless the platform takes some form of action. What show back trains organizations to do is tolerate drift while they investigate.

941
00:52:36,380 --> 00:52:46,380
Investigation is fine for root cause. It is not a control strategy, so you need remediation and quarantine. Not as a heroic cleanup project, as a default operating behavior for spend with no owner.

942
00:52:46,380 --> 00:52:51,380
Start with a definition that the platform can evaluate. Often then unowned are not vibes. They are states.

943
00:52:51,380 --> 00:53:00,380
Unowned means. The resource has no owner tag. The tag points to something non-rootable like a distribution list or the owner identity no longer exists.

944
00:53:00,380 --> 00:53:15,380
Stale ownership is unowned. If the system can't pay to human, it's often from a governance perspective. Often also means unattached discs, unused public IPs, zombie snapshots, abandoned test databases, stop compute that still incur storage or anything that exists without a workload life cycle.

945
00:53:15,380 --> 00:53:23,380
These are not small inefficiencies. They are scheduled spend. Now quarantine is the architectural middle ground between doing nothing and deleting production.

946
00:53:23,380 --> 00:53:32,380
Most organizations avoid remediation because they are afraid of breaking things, reasonable. They also avoid quarantine because they think it sounds punitive. Find keep paying forever then.

947
00:53:32,380 --> 00:53:44,380
Quarantine is simply this. Isolate restrict notify expire. Isolate means you move the resource into a known policy boundary. Maybe that's a dedicated resource group with restrictive policies. Maybe it's a subscription with limited networking.

948
00:53:44,380 --> 00:53:49,380
The point is to stop it from continuing to create blast radius while you figure out what it is.

949
00:53:49,380 --> 00:54:00,380
Restrict means you reduce its ability to burn money, shut down compute where safe, disable autoscale, remove public exposure, block new deployments into that scope, put the resource behind guardrails that may continue drift harder.

950
00:54:00,380 --> 00:54:09,380
Notify means the system still tries to find a human. Tag owners, subscription owners, platform owners, finance partners, you give the organization a chance to claim the asset.

951
00:54:09,380 --> 00:54:20,380
Expire means the quarantine has a timer. If no one claims it, it doesn't become the new normal. It gets stopped, archived or deleted based on the class of resource and the risk tolerance you defined.

952
00:54:20,380 --> 00:54:29,380
This is the core psychological shift people respond to deadlines, not dashboards. Now you need remediation tiers because not every action should be automatic. Tier one is notified only.

953
00:54:29,380 --> 00:54:36,380
The system flags the resource as unowned, opens the ticket and starts the clock. That's where you begin if you don't trust your inventory.

954
00:54:36,380 --> 00:54:45,380
Tier two is autocorrect, the platform fixes what it can safely fix. Apply missing tags via policy modify when the tag can be derived from context. Attached diagnostic settings.

955
00:54:45,380 --> 00:54:53,380
Enforced a shutdown schedule on dev test that opted into the program. These are low risk high return actions. Tier three is autostop or throttle.

956
00:54:53,380 --> 00:55:00,380
Shutdown VMs in non-production outside business hours when they're unowned. Pause fabric capacities that were created as experiments and never tagged.

957
00:55:00,380 --> 00:55:08,380
Disabled scheduled jobs that are running with no accountable owner. This is where you stop the financial bleeding while still allowing recovery. Tier four is delete.

958
00:55:08,380 --> 00:55:14,380
And delete must be gated approval retention evidence and usually a recovery path like backups.

959
00:55:14,380 --> 00:55:20,380
Deleting without a governance lane is how you create a new shadow cloud. Teams will recreate the system elsewhere quieter.

960
00:55:20,380 --> 00:55:26,380
The trade-off is always cost-risk versus availability risk and governance exists to make that trade-off explicit, not accidental.

961
00:55:26,380 --> 00:55:34,380
If leadership says we will never shut anything down automatically what they just said is we accept paying indefinitely for whatever the platform allows.

962
00:55:34,380 --> 00:55:40,380
That is a choice, own it but don't call it Finops governance. This is also how you measure progress without lying to yourself.

963
00:55:40,380 --> 00:55:46,380
Don't measure savings. Savings are easy to game and they don't prove control. Measure often inventory over time.

964
00:55:46,380 --> 00:55:51,380
If the number of unowned resources is shrinking your enforcing ownership, measure MTTR on anomalies.

965
00:55:51,380 --> 00:55:58,380
If a anomalies close faster your actuator works. Measure how many resources hit quarantine and then get claimed versus deleted.

966
00:55:58,380 --> 00:56:05,380
That ratio tells you whether your tagging and ownership model is real or whether your environment is full of abandoned assets nobody wants to admit exists.

967
00:56:05,380 --> 00:56:10,380
And yes this feels like security and that's not an accident. Cost governance is the same pattern.

968
00:56:10,380 --> 00:56:17,380
Define acceptable states detect violations, respond with predictable actions and require explicit acceptance when you're taking risk.

969
00:56:17,380 --> 00:56:25,380
Once quarantine exists, exceptions stop being special. They become a controlled mechanism to keep something alive with an owner and a timer.

970
00:56:25,380 --> 00:56:29,380
That's the only way you stop paying for nobody's problem at scale.

971
00:56:29,380 --> 00:56:38,380
Time bound exception management expiridates or it isn't real governance. Now take quarantine and remediation and aim it at the most politically protected object in the enterprise.

972
00:56:38,380 --> 00:56:44,380
The exception. Because if you don't make exceptions time bound you just build a very efficient way to formalize bypasses.

973
00:56:44,380 --> 00:56:50,380
You didn't reduce entropy, you industrialized it. Time bound exception management is not paperwork, it is a life cycle.

974
00:56:50,380 --> 00:56:55,380
Request, approve, enforce constraints, expire, review and every word in that sequence matters.

975
00:56:55,380 --> 00:57:00,380
Request means the team can't self declare an exception by doing the thing and then writing the justification later.

976
00:57:00,380 --> 00:57:06,380
The system has to catch the violation before it becomes production. Because production status is how exceptions become permanent.

977
00:57:06,380 --> 00:57:11,380
The request has to happen in a lane that is slower than the compliant path. That is the incentive model.

978
00:57:11,380 --> 00:57:19,380
Approved means a named approver accepts the trade off on record, not a committee, not the cloud team, a person with budget authority or risk authority.

979
00:57:19,380 --> 00:57:24,380
Someone who can be asked later why did we allow this and can answer without blaming the tool.

980
00:57:24,380 --> 00:57:28,380
Enforced constraints means the exception doesn't remove guardrails, it relocates them.

981
00:57:28,380 --> 00:57:33,380
A good exception does not turn policy off. It turns policy into a scoped, monitored allowance.

982
00:57:33,380 --> 00:57:40,380
Maybe you allow a region for a specific workload for a specific time. Maybe you allow a premium skew for a specific migration window.

983
00:57:40,380 --> 00:57:47,380
But the exception should still require tagging, still root alerts to the owner, still generate anomaly events, still live inside the budget model.

984
00:57:47,380 --> 00:57:51,380
An exception that disables observability is not an exception, it is a blind spot.

985
00:57:51,380 --> 00:57:58,380
Expire is the point most organizations refuse to implement because it forces a second decision. They'd rather approve once and never think about it again.

986
00:57:58,380 --> 00:58:07,380
But governance exists to force renewal when conditions change. The purpose of expiry is not to be mean. The purpose of expiry is to prevent temporary from becoming baseline.

987
00:58:07,380 --> 00:58:14,380
So the end date is not optional and it has to be enforced by the platform, not will review quarterly, not will put it on the calendar.

988
00:58:14,380 --> 00:58:19,380
If the system doesn't trigger renewal or remediation automatically, you're back to hoping a human remembers.

989
00:58:19,380 --> 00:58:24,380
Review is where you close the loop. When an exception expires, the organization should answer one of three questions.

990
00:58:24,380 --> 00:58:30,380
Did we remove the need? Did we renew intentionally? Or did we learn the policy is wrong? This is the part people miss.

991
00:58:30,380 --> 00:58:39,380
Exception management is how governance evolves without becoming dogma. Exceptions aren't only risk, they're signal, they tell you where the platform or the policy doesn't match reality.

992
00:58:39,380 --> 00:58:45,380
But you only get that signal if you force exceptions to surface again. Now the minimum required fields are boring.

993
00:58:45,380 --> 00:58:47,380
Good, boredom is stable.

994
00:58:47,380 --> 00:58:50,380
Owner, a rootable identity, not a mailbox.

995
00:58:50,380 --> 00:58:59,380
Business justification tied to an outcome, not we needed, blast radius, what scope, what services, what data, what subscriptions, what tenants.

996
00:58:59,380 --> 00:59:05,380
End date, the expiry. And then the one field that turns this into real design. Exit criteria.

997
00:59:05,380 --> 00:59:14,380
What has to be true for this exception to die? A refactor completed, a migration finished, a feature gap closed, a contract renewed with the right skew. Whatever it is, write it down.

998
00:59:14,380 --> 00:59:18,380
Because without exit criteria, the exception is just a permission slip for indefinite drift.

999
00:59:18,380 --> 00:59:24,380
Now you need a cadence that makes the system predictable. A sell-by date discipline for policies and for exceptions.

1000
00:59:24,380 --> 00:59:30,380
Short enough that drift can't hide, long enough that teams can actually execute. And the system should track the age distribution.

1001
00:59:30,380 --> 00:59:39,380
How many exceptions are under 30 days, under 90 days, over 180 days? If you have a long tail of ancient exceptions, you don't have governance, you have historical artifacts.

1002
00:59:39,380 --> 00:59:44,380
This is why deadlines beat dashboards, dashboards invite interpretation. Deadlines force decisions.

1003
00:59:44,380 --> 00:59:52,380
And this is also where you measure maturity in a way that can't be faked. Exception volume, exception age, renewal rate versus closure rate, time to expire.

1004
00:59:52,380 --> 00:59:59,380
Time to remediate after expiry. When those numbers improve it means your organization is reducing ambiguity. Not just producing reports.

1005
00:59:59,380 --> 01:00:06,380
So the rule is simple and absolute. If an exception does not expire, it is not an exception. It is a new baseline you were too polite to name.

1006
01:00:06,380 --> 01:00:12,380
And once you accept that, the showback trap loses its favorite hiding place. The email thread labeled "approved".

1007
01:00:12,380 --> 01:00:16,380
Microsoft 365 showback. Licenses, adoption and the shelfware factory.

1008
01:00:16,380 --> 01:00:24,380
Now take everything we just said about Azure cost governance and apply it to Microsoft 365, where the spend looks simple, right up until you audit it.

1009
01:00:24,380 --> 01:00:31,380
Because M365 showback has its own trap, licensing feels like a fixed cost, so people assume it's inherently governable. It isn't.

1010
01:00:31,380 --> 01:00:37,380
It's just a different kind of drift. In Azure, waste often looks like often resources and missized compute.

1011
01:00:37,380 --> 01:00:49,380
In Microsoft 365, waste looks like shelfware. Licenses are signed just in case. Add-ons bought during a crisis. And premium skews that nobody can justify after the renewal email arrives.

1012
01:00:49,380 --> 01:00:53,380
And showback makes it worse because it turns the problem into a report.

1013
01:00:53,380 --> 01:01:02,380
We send departments a license utilization dashboard. Great. Now what? This is the foundational mismatch. In M365, the unit of accountability is rarely the IT admin.

1014
01:01:02,380 --> 01:01:13,380
It's the product owner, the business owner, the department head who requested the capability got the budget approved and then delegated ongoing ownership to IT, which is the most reliable way to manufacture permanent spend.

1015
01:01:13,380 --> 01:01:22,380
Licenses don't manage themselves. Entitlements don't decay gracefully. They accumulate because Microsoft 365 is designed to be easy to adopt and hard to unwind.

1016
01:01:22,380 --> 01:01:28,380
That's not a complaint. That's how platform scales. So if your governance model is, will review licenses quarterly, you're not governing.

1017
01:01:28,380 --> 01:01:49,380
You're observing drift on a delay. The cost drivers here are boring, but relentless. Base licenses add-ons premium security as queues audio conferencing, power BI variants, fabric capacity, power platform per user and per app licensing storage overages, archived data, retention policies that were set during one legal scare and never revisited.

1018
01:01:49,380 --> 01:02:01,380
And the more your organization buys, the more you lose the ability to answer a basic question who owns the decision that created this recurring charge. Most organizations can't answer that because they treat licensing as procurement, not as an operating model.

1019
01:02:01,380 --> 01:02:12,380
So showback becomes a monthly shame report. Here are the top departments with unused licenses. And the department shrug because nothing forces a decision. No owner, no SLA, no expiry, no quarantine.

1020
01:02:12,380 --> 01:02:24,380
So the licenses stay assigned because removing access is politically harder than paying and the platform happily bills you forever. This is where the shelfware factory becomes systemic. Joiners arrive and get assigned the full bundle because it's easier than thinking.

1021
01:02:24,380 --> 01:02:29,380
Movers change roles and keep their old entitlements because no one wants to break workflows.

1022
01:02:29,380 --> 01:02:38,380
Leavers depart and their accounts get disabled, but their licenses don't get reclaimed quickly because identity of boarding and license harvesting aren't wired together as a deterministic process.

1023
01:02:38,380 --> 01:02:50,380
And yes, every organization claims they do, joiner, mover, lever, they do not. They do disable account. That's not the same thing. So the governance moves here are the same pattern as Azure, just applied to entitlements.

1024
01:02:50,380 --> 01:02:59,380
First, entitlement reviews as a forced workflow, not a survey. A product owner should have to attest these users need these licenses for these reasons and we accept the budget impact.

1025
01:02:59,380 --> 01:03:11,380
If they can't, licenses get reclaimed automatically after a grace period, not because you're saving pennies, because you're enforcing owned spend. Second, enforce role-based license assignment as much as you can and treat exceptions like exceptions.

1026
01:03:11,380 --> 01:03:25,380
If a department wants a premium skew for three people for a migration window, fine. Put it in the exception lane. With an end date and a renewal requirement, otherwise you just created permanent premium spend for a temporary event. Third, reclaiming must be automatic.

1027
01:03:25,380 --> 01:03:45,380
If reclaiming requires a human to remember, you already lost. This is where workflow automation matters again. Identity events should trigger license changes and licenses should have an owner tag equivalent. If you can't root who owns this license spend, you can't govern it. And fourth, your showback metrics need to be honest, not licenses purchased, not licenses assigned.

1028
01:03:45,380 --> 01:03:57,380
The metrics that matter are unused licenses, stale entitlements, time to reclaim after departure, and percentage of premium SKUs with an active business justification on record.

1029
01:03:57,380 --> 01:04:05,380
Because if you don't measure reclaimed time and justification coverage, you're measuring inventory, not accountability. Microsoft 365 cost looks flat until it isn't.

1030
01:04:05,380 --> 01:04:19,380
Then you get hit with a renewal cliff, a security uplift, or an AI add-on wave. And leadership asks why FinOps didn't catch it. FinOps did catch it. It was in the dashboard. You just didn't have governance. And now the platform is shifting again.

1031
01:04:19,380 --> 01:04:29,380
AI and usage-based services are turning fixed license spend into a probabilistic consumption model, which means the M365 showback trap becomes the same as the Azure one, just faster.

1032
01:04:29,380 --> 01:04:36,380
AI multiplies governance failure, co-pilot fabric power platform consumption. Now add AI to this picture and watch the failure accelerate.

1033
01:04:36,380 --> 01:04:48,380
Not because AI is uniquely evil, but because it changes the cost model from mostly steady state to probabilistic. And probabilistic spend destroys organizations that only know how to govern fixed budgets and month and reports.

1034
01:04:48,380 --> 01:05:04,380
Co-pilot is the cleanest example. It looks like a license that makes everyone relax, procurement by the bundle, leadership declares AI transformation and the bill feels predictable. Then reality shows up. Adoption is uneven, usage is unknown, and the value story becomes a vibe.

1035
01:05:04,380 --> 01:05:14,380
Meanwhile, the license is keep renewing because nobody wants to be the person who took AI away. So co-pilot becomes shelf-ware with executive sponsorship, which is the most durable kind of waste.

1036
01:05:14,380 --> 01:05:26,380
But the deeper problem is this, AI spend is not only the license. AI creates secondary spend, storage growth, data classification work, security controls, logging, and whatever downstream services get pulled into the new workflows.

1037
01:05:26,380 --> 01:05:32,380
When people say co-pilot is X-SARS per user, they're describing the smallest line item in the system they just unleashed.

1038
01:05:32,380 --> 01:05:37,380
Fabric makes this worse because it pulls analytic spend into a capacity model that looks simple right until it isn't.

1039
01:05:37,380 --> 01:05:45,380
A capacity gets provisioned, teams start building, and the organization learns a new lesson. Shared platforms are the easiest place to hide cost drift.

1040
01:05:45,380 --> 01:05:58,380
If nobody owns the capacity, nobody owns the burn, and in a shared data platform owner is not the admin who can click buttons, it's the product owner for the platform who can set allocation rules, define onboarding requirements, and deny access to teams that refuse attribution.

1041
01:05:58,380 --> 01:06:12,380
If that owner doesn't exist, Fabric becomes a communal buffet with a single invoice. Power platform is the entropy factory nobody wants to admit they built. It starts as citizen development, which is another way of saying we're going to distribute software creation without distributing governance.

1042
01:06:12,380 --> 01:06:22,380
Apps multiply, environments multiply, connectors get enabled, premium features get turned on temporarily, and then the monthly charges become background radiation. The system didn't break, the build just got noisy.

1043
01:06:22,380 --> 01:06:32,380
Here's the core claim, AI exposes governance gaps faster than infrastructure ever did. Traditional Azure waste often takes weeks to show up as a meaningful number.

1044
01:06:32,380 --> 01:06:44,380
AI usage can spike in hours. A single mis-scoped process, a runaway flow, a misconfigured index, or a team hammering an endpoint can generate cost volatility that your monthly show back process will see after the money is gone.

1045
01:06:44,380 --> 01:06:54,380
And the AI story amplifies the human failure mode teams treat experimentation as free in the cloud experimentation is never free. It's just unmetered in the engineers brain.

1046
01:06:54,380 --> 01:07:06,380
So governance has to shift from track spend to govern usage. That means you treat AI as a product, not a feature. Every AI capability needs an owner, a budget, and a defined purpose, not innovation.

1047
01:07:06,380 --> 01:07:26,380
Purpose, a workload, a business outcome, something you can kill if it doesn't perform because if you don't AI becomes a universal exception. We needed it for AI becomes the justification for every bypass, extra capacity, premium licenses, new tenants, new environments, new connectors, and of course data copied into places it shouldn't be.

1048
01:07:26,380 --> 01:07:39,380
Cost and security drift together here. They are the same drift, just priced differently. And this is where Microsoft's own messaging accidentally proves the point. The ignite narrative around agente operations is move instantly from visibility to action.

1049
01:07:39,380 --> 01:07:53,380
That only works if action is allowed, owned, and constrained. An optimization agent can suggest resizing a VM and even generate the script. Great, who approves it? Who owns the risk if performance degrades? What's the exception lane when the workload has a legitimate reason to stay oversized?

1050
01:07:53,380 --> 01:08:05,380
If you can't answer those questions, you don't have agente Finops. You have faster recommendations going nowhere, AI doesn't fix governance, AI punishes the absence of it. So the leadership move is not by more tools.

1051
01:08:05,380 --> 01:08:13,380
The move is to harden the system of action, guardrails that constraints, brawl, budgets that trigger response, and workflows that force a decision with expiry dates.

1052
01:08:13,380 --> 01:08:24,380
Because AI spend doesn't drift politely, it spikes its spreads, and then it gets defended as strategic. And once it's defended as strategic, it becomes untouchable. That's the trap upgraded for 2026.

1053
01:08:24,380 --> 01:08:30,380
Showback tells you what happened, and AI ensures it happens faster than your organization can argue about it.

1054
01:08:30,380 --> 01:08:38,380
From cost control to value driven, cloud financial management. At this point, some people will hear all of this and conclude, so the goal is cost control.

1055
01:08:38,380 --> 01:08:50,380
No, cost control is the entry fee, it's not the outcome. The real outcome is value driven, cloud financial management. Spend that is intentional, attributable, and tied to something the business can defend without hand waving.

1056
01:08:50,380 --> 01:09:02,380
Because if the only story your organization can tell is we reduced spend, you will eventually optimize yourself into irrelevance, you'll starve platforms, delay modernization, and turn governance into a veto machine.

1057
01:09:02,380 --> 01:09:12,380
That's not Finops, that's austerity with better tagging. This is the uncomfortable truth. The cloud is a variable cost execution engine. It will happily turn unclear priorities into unpredictable invoices.

1058
01:09:12,380 --> 01:09:20,380
Therefore governance isn't about saving money, it's about encoding priorities into the operating model, so spend follow strategy by default, and that distinction matters.

1059
01:09:20,380 --> 01:09:26,380
Value driven management starts with a value lens that isn't fake. Not digital transformation, not innovation.

1060
01:09:26,380 --> 01:09:37,380
A value lens means unit economics and service outcomes. Cost per transaction, cost per customer, unborted, cost per report generated, cost per build, cost per user served at a target latency.

1061
01:09:37,380 --> 01:09:43,380
You don't need perfect precision, you need a stable metric that forces conversation about outcomes, not just consumption.

1062
01:09:43,380 --> 01:09:50,380
Because showback taught the organization to argue about data, value management forces the organization to argue about decisions.

1063
01:09:50,380 --> 01:10:05,380
Here's what most people miss, cost cutting is reactive, cost shaping is intentional, cost cutting is where we found waste, we removed waste, great. You should still do that, but it doesn't tell you where to invest next, and it doesn't create a repeatable pattern that prevents the same waste from reappearing under a different name.

1064
01:10:05,380 --> 01:10:10,380
Cost shaping is, we will spend here, not there, because this is what the business prioritizes.

1065
01:10:10,380 --> 01:10:27,380
And then you enforce that with guardrails, budgets, and workflows, so it isn't just a meeting outcome that evaporates. This is also why FinOps can't be a finance-led exercise or an engineering-led exercise, finance alone will optimize for predictability and reduction, engineering alone will optimize for delivery and performance.

1066
01:10:27,380 --> 01:10:39,380
Both of those are rational, both of those are incomplete. So the mature model is a cross-functional decision-loop, IT, finance, and the business operating as one system that makes trade-offs explicitly records them, and revisits them when reality changes.

1067
01:10:39,380 --> 01:10:44,380
That's governance as strategy execution, not a policy document, not a steering committee.

1068
01:10:44,380 --> 01:10:56,380
Strategy execution means the priorities show up in the platform, which skews are allowed, which regions are permitted, what gets auto-stopped in non-prod, what requires approval, which exceptions are tolerated, how budgets escalate, how renewals happen.

1069
01:10:56,380 --> 01:11:03,380
If those priorities aren't encoded, then the real strategy is whatever engineers do under pressure, and whatever procurement renews by default.

1070
01:11:03,380 --> 01:11:25,380
Now the hardest shift for leadership is this, stop celebrating savings in isolation, savings are often just deferred risk, or they are the artifact of a one-time clean-up that didn't change starting conditions, or they are the result of shifting cost somewhere else, a different subscription, a different cost center, a different SKU, a different vendor, and if AI is involved, savings can be a rounding error against sprawl.

1071
01:11:25,380 --> 01:11:45,380
So the better question is, did the organization improve its ability to make decisions with intent, did unknown spend shrink, did exception debt shrink, did volatility shrink, did the time from a normally to decision shrink, did the system get better at forecasting because the underlying ownership and allocation model got less ambiguous, those are control indicators, and control is what makes value measurable.

1072
01:11:45,380 --> 01:11:56,380
Now connect it back to their world, a product team that can see its cost per customer, and has authority over its budget, can decide whether to buy performance by resiliency or by speed, that's adult decision making.

1073
01:11:56,380 --> 01:12:08,380
A platform team that can allocate shared services intentionally, can justify a platform tax, and reinvest in security, reliability, and automation, instead of begging for headcount every quarter.

1074
01:12:08,380 --> 01:12:18,380
A finance partner who can see variability drivers can stop treating cloud as a runaway train and start treating it like any other managed portfolio with risk, return, and constraints, that's the point.

1075
01:12:18,380 --> 01:12:20,380
Showback tells you what happened.

1076
01:12:20,380 --> 01:12:27,380
Governance tells the platform what is allowed to happen next. Value-driven cloud financial management is what you get when that governance loop doesn't just reduce waste.

1077
01:12:27,380 --> 01:12:31,380
It directs investment toward outcomes the business can defend.

1078
01:12:31,380 --> 01:12:42,380
A pragmatic maturity path. How to escape the trap without boiling the ocean. Now the pragmatic part, because the fastest way to kill this is to turn it into a multi-year transformation program with a logo.

1079
01:12:42,380 --> 01:12:53,380
Most organizations don't fail because they didn't know what good looks like. They fail because they try to implement all of it at once, offended everyone, and then quietly downgraded governance back to showback with extra meetings.

1080
01:12:53,380 --> 01:13:02,380
So the maturity path has to be sequential, not for methodology reasons, for entropy reasons. You can't automate accountability until you can root it. You can't root it until you can attribute it.

1081
01:13:02,380 --> 01:13:08,380
And you can't attribute it if your estate can't even be enumerated. Phase one is inventory ownership mapping and tagging enforcement.

1082
01:13:08,380 --> 01:13:16,380
Inventory means you can enumerate tenants, subscriptions, resource groups, and major cost surfaces without discovering new ones every week.

1083
01:13:16,380 --> 01:13:20,380
Shadow subscriptions are not a Finops problem. They are a governance boundary problem.

1084
01:13:20,380 --> 01:13:27,380
Ownership mapping means every subscription has a boundary owner and every workload has a named accountable owner, not aspirational actual.

1085
01:13:27,380 --> 01:13:35,380
If you can't name it, you don't govern it. Tagging enforcement means you move from tag, compliance, score to resources can't exist without cost center and owner.

1086
01:13:35,380 --> 01:13:41,380
Start with audit if you must, but the outcome is deny for missing attribution. This phase doesn't require cultural miracles.

1087
01:13:41,380 --> 01:13:47,380
It requires the organization to accept one uncomfortable invariant. Anonymous spend is not allowed.

1088
01:13:47,380 --> 01:13:54,380
Your scoreboard in phase one is simple. Unallocated spend percentage, number of unowned resources and the drift rate of tag validity.

1089
01:13:54,380 --> 01:14:01,380
If those aren't improving, stop. Don't move on. You're building automation on sand. Phase two is budgets, alerting, and escalation contracts.

1090
01:14:01,380 --> 01:14:09,380
This is where you convert visibility into decision triggers. Budgets at the right scopes owned by the right people with an explicit escalation ladder.

1091
01:14:09,380 --> 01:14:16,380
And you measure response not awareness. If an alert fires and nobody responds, that's not a tooling gap. That's a governance failure you can quantify.

1092
01:14:16,380 --> 01:14:22,380
Time to acknowledge, time to decide, time to close, treat it like incident response metrics because functionally that's what it is.

1093
01:14:22,380 --> 01:14:30,380
The scoreboard here is volatility and response. Cost spikes detected earlier, fewer surprises at the month end and shrinking time to decision when variance happens.

1094
01:14:30,380 --> 01:14:36,380
Phase three is automation, quarantine, and exception lifecycle management. This is where you actually get leverage.

1095
01:14:36,380 --> 01:14:43,380
You wire budget breaches and anomaly signals into workflows. You enforce exception expiry. You quarantine unowned assets.

1096
01:14:43,380 --> 01:14:49,380
You implement remediation tiers so the platform can reduce burn while humans argue. And you keep it deliberately narrow at first.

1097
01:14:49,380 --> 01:14:54,380
Pick one class of resource that is safe to act on where the business impact is low and the waste is common.

1098
01:14:54,380 --> 01:15:02,380
Unowned dev VMs, unattached disks, idle public IPs, abandoned test databases, stale snapshots.

1099
01:15:02,380 --> 01:15:07,380
Then make that class governable end-to-end, detect root, act, and close with evidence.

1100
01:15:07,380 --> 01:15:11,380
If you can't close the loop on one class of waste, you can't scale to the rest.

1101
01:15:11,380 --> 01:15:18,380
Now the culture reality that sits underneath all three phases is this. Executive sponsorship is the only scalable prioritization engine.

1102
01:15:18,380 --> 01:15:25,380
Engineers don't ignore cost because they don't care. They ignore it because their priority stack is already full and cost isn't allowed to outrank delivery.

1103
01:15:25,380 --> 01:15:32,380
Until leadership makes cost variance a first class operational risk, the system will root around your controls, quietly, predictably.

1104
01:15:32,380 --> 01:15:40,380
That's why you don't sell this as savings. You sell it as control, fewer surprises, faster decisions, lower entropy, clearer ownership, and more.

1105
01:15:40,380 --> 01:15:45,380
You need zero ownership and a reduced ability for spend to hide and you don't need perfection.

1106
01:15:45,380 --> 01:15:57,380
You need directional stability. Often count trending down, exception age distribution, trending shorter, unallocated spend trending towards zero, volatility trending down, and your mean time to resolve anomalies shrinking.

1107
01:15:57,380 --> 01:16:04,380
If those five trend lines move, you escape the show back trap. If they don't, you're still doing theatre. Just with nicer dashboards.

1108
01:16:04,380 --> 01:16:11,380
Conclusion, accountability is enforced, not reported. Showback creates awareness, but awareness doesn't change system behavior.

1109
01:16:11,380 --> 01:16:18,380
Enforced ownership, constraints, and workflows do. If you want a practical blueprint for turning Microsoft spend into a real system of action,

1110
01:16:18,380 --> 01:16:25,380
guard rails, escalation contracts, and time bound exceptions, watch the next episode and subscribe so you don't miss it.