Cost Entropy: The Architectural Flaw Killing Your Azure Budget
Azure doesn’t get expensive because engineers waste money. It gets expensive because the platform is allowed to spend without ownership, limits, or consequences. That isn’t a savings problem — it’s cost entropy.
In this episode, we reframe cloud cost as an authorization outcome, not a finance artifact. Every dollar exists because identity, policy, and subscription boundaries allowed it to exist. When those controls don’t encode financial intent, unowned spend becomes normal: abandoned environments, premium defaults chosen for safety, shared services nobody can allocate, and budget alerts that arrive too late to matter.
We break down why most FinOps programs fail by starting with dashboards instead of governance, and why visibility alone never changes behavior. The real levers live in the control plane: enforced ownership, subscription-level budgets with early escalation, mandatory tagging, constrained SKUs by environment, and time-boxed exceptions.
The takeaway is simple but uncomfortable: cost discipline isn’t about saving money after the fact. It’s about removing unreviewed spending pathways before they ever reach the invoice.
Most organizations believe Azure becomes expensive because engineers waste money.
They are wrong.
Azure becomes expensive because the platform is allowed to spend without ownership, without limits, and without consequences. That isn’t a savings problem. It’s cost entropy: unmanaged deployment pathways that keep generating recurring spend long after the original decision has been forgotten.
Cost overruns rarely arrive as a dramatic mistake. They arrive as a new normal — environments that never shut down, premium tiers chosen “just to be safe,” shared services nobody can attribute, and invoices that feel inevitable rather than intentional.
This episode is not about dashboards, discounts, or folklore about Spot VMs. It’s about the uncomfortable shift from “why is Azure expensive?” to the only question that matters:
What did you allow, and why can nobody stop it?
Core Thesis
Cloud cost is not a finance outcome.
It is an authorization outcome.
Every cloud dollar exists because the platform allowed it to exist.
The Enterprise Cost Failure Mode: When Unowned Spend Becomes Normal
Cost failure doesn’t show up as one bad decision. It compounds quietly.
A “temporary” environment survives because nobody can prove it’s safe to delete.
A premium SKU is selected because engineers are accountable for outages, not invoices.
Egress charges persist after a migration because the path changed but nobody closed it.
Each decision is locally rational.
The problem is that enterprises don’t pay for local rationality.
They pay for the aggregate.
Cloud cost compounds because it is recurring: idle capacity, redundancy chosen “just in case,” shared services without allocation, and resources created by identities that were never forced to name an owner.
This mirrors security drift perfectly. Exceptions accumulate. Determinism erodes. The system stops behaving predictably.
Cost entropy is the result: spend happens because the system tolerates it.
And the standard enterprise response—reminders—never works.
“Be more cost conscious.”
“Please tag your resources.”
“Here’s the monthly review.”
Humans are not a control plane.
Azure Resource Manager is.
If identity, policy, hierarchy, and permissions do not encode financial intent, then the organization is operating a distributed spending engine with no brakes.
By the time finance sees the invoice, the cost is no longer a decision. It’s debt.
FinOps Implemented Backwards: Tooling First, Control Never
Most FinOps programs start with visibility.
Cost Management. Dashboards. Power BI. Monthly reviews. Budgets that email at 90%.
Everyone feels responsible. Nobody is constrained.
Observability is not governance.
A report does not stop a pipeline from deploying a premium tier on Tuesday morning. A dashboard does not prevent a subscription from being created without ownership. A budget email does not force a decision.
Alerts without authority become noise. Engineers learn the real policy: nothing happens.
So FinOps becomes cost theater: prettier reports, longer decks, more sophisticated arguments — and unchanged platform behavior.
The foundational mistake is treating cost like telemetry instead of authorization.
Azure doesn’t spend money.
Your permission model does.
The Reframe: Every Cloud Dollar Is an Authorization Decision
Before a dollar appears on an invoice, something had to be created, scaled, or left running. Before that could happen, the platform evaluated an allow/deny decision.
RBAC allowed it.
Policy allowed it.
A subscription boundary absorbed it.
Azure is an authorization compiler.
Intent is submitted through pipelines, templates, or portal clicks. The control plane evaluates that intent. If it passes, the platform materializes capacity that burns money every hour until something stops it.
If you want cost control, you don’t need better dashboards.
You need tighter compilation rules.
Anonymous spending is the most dangerous anti-pattern. Not because Azure lacks logs, but because the enterprise cannot map spend to a responsible decision-maker in time to intervene.
If accountability is optional, cost control will never scale.
Subscriptions: The Primary Cost Governance Boundary
Subscriptions are not just billing buckets.
They are where RBAC, policy enforcement, and financial blast radius meet.
Resource groups cannot protect you from sprawl.
Management groups are necessary but too abstract for day-to-day accountability.
The subscription is where ownership becomes enforceable.
A subscription should not exist without declared intent:
-
A named accountable owner
-
A budget with early thresholds
-
Allowed SKUs and regions aligned to purpose
-
An escalation path when behavior deviates
Subscription sprawl without ownership is how cost becomes weather — unpleasant, inevitable, and nobody’s fault.
A subscription vending model doesn’t prevent spending.
It prevents unreviewed spending.
Tagging Fails Because It’s Treated as Etiquette
Tagging is not manners.
Tagging is financial identity.
When tags are optional:
-
Allocation collapses
-
Ownership becomes unverifiable
-
Deletion becomes politically risky
-
Untagged resources survive forever
Enterprises don’t get “slightly imperfect tagging.”
They get unusable data.
If a tag is required for allocation, it must be required for deployment. That means Deny or Modify at the policy layer.
Enforcement changes behavior instantly. Not because people improve — because ambiguity is no longer allowed to exist.
With enforced tagging:
-
Unallocated spend shrinks toward zero
-
Showback becomes boring
-
Chargeback becomes possible
-
Deletion becomes defensible
Determinism replaces debate.
Cost Failure Scenarios That Repeat Everywhere
Subscription Sprawl
Ad-hoc subscriptions with no owner, no budget, inconsistent policies. Cost spikes are discovered at invoice time, followed by archaeology and inaction.
Untagged Resources
Nobody knows who owns them. Nobody deletes them. Everyone keeps paying.
PaaS Over-Provisioning
Premium tiers and multi-region redundancy chosen by default because safety is rewarded more than efficiency.
Unbounded Non-Production
Dev and test become cost landfills: idle environments, excessive diagnostics, parallel stacks nobody dares delete.
Shared Platform Services
Networking, logging, and security costs grow invisibly because everyone depends on them but nobody feels them.
Each scenario shares the same root cause:
Unenforced intent.
Budgets: Intent Signals, Not Household Trackers
Budgets do not stop spend.
They signal divergence between intent and reality.
Budgets only work when:
-
They are attached to ownership boundaries, not mailboxes
-
They fire early (50% / 75%, not 90%)
-
They trigger action, not email
A fired budget alert is not failure.
It’s a governance interrupt.
Drift. Growth. Fraud.
Budgets tell you when to look — early enough to act.
Accountability: Showback, Chargeback, and the Real Point
Showback builds trust in the data.
Chargeback creates consequence.
Neither works without governance upstream.
Chargeback without enforced ownership turns chaos into internal invoices. Showback without enforcement becomes wallpaper.
The only thing that matters is this:
Does the cost signal reach the person who made the decision — fast enough to change the next one?
Accountability must follow decision rights:
-
If engineering chooses SKUs, engineering must see the cost
-
If the platform sets defaults, the platform owns the economics
-
If leadership demands resilience, leadership accepts the price
The Enforcement Stack That Actually Works
Cost discipline survives only when enforced in the control plane:
-
Azure Policy to deny ambiguity and constrain SKUs, regions, and tags
-
RBAC to ensure deploy authority and accountability are not merged blindly
-
Budgets as escalation engines, not reports
-
Deployment stamps to standardize cost behavior by default
-
Exceptions that are visible, justified, and time-boxed
This is not restriction for its own sake.
It is enforced autonomy.
A Practical 90-Day Rollout
Days 1–30
Define subscription intent, ownership, and a minimal financial tagging taxonomy. Start showback to expose allocation gaps.
Days 31–60
Enforce tagging and SKU constraints with policy. Implement subscription-level budgets with early thresholds wired to action.
Days 61–90
Formalize exception workflows. Make shared platform spend accountable. Introduce guarded deployment patterns with known cost behavior.
The output is intentionally boring: fewer surprises, fewer arguments, faster intervention.
Closing Thought
Cloud gets expensive when unbounded choice meets zero accountability.
If you want predictable spend, stop treating FinOps like reporting and start enforcing financial intent where decisions are made: identity, policy, hierarchy, and permissions.
Savings are a byproduct.
Control is the goal.
If you’re done paying for ambiguity, this is where it stops.
1
00:00:00,000 --> 00:00:02,120
Most organizations think Azure gets expensive
2
00:00:02,120 --> 00:00:03,960
because engineers waste money.
3
00:00:03,960 --> 00:00:04,800
They are wrong.
4
00:00:04,800 --> 00:00:07,080
Azure gets expensive because the platform is allowed
5
00:00:07,080 --> 00:00:09,040
to spend without an owner, without limits,
6
00:00:09,040 --> 00:00:10,280
and without consequences.
7
00:00:10,280 --> 00:00:11,600
That isn't a savings problem.
8
00:00:11,600 --> 00:00:12,720
It's cost entropy.
9
00:00:12,720 --> 00:00:15,000
Drift created by unowned deployment pathways
10
00:00:15,000 --> 00:00:16,920
that keep producing recurring spend
11
00:00:16,920 --> 00:00:19,360
long after the original decision got forgotten.
12
00:00:19,360 --> 00:00:21,680
This episode isn't dashboards, savings hacks,
13
00:00:21,680 --> 00:00:23,160
or spot VM folklore.
14
00:00:23,160 --> 00:00:26,080
It's the uncomfortable shift from why is Azure expensive?
15
00:00:26,080 --> 00:00:27,600
To the only question that matters,
16
00:00:27,600 --> 00:00:29,800
what did you allow and why can nobody stop it?
17
00:00:30,320 --> 00:00:32,080
The enterprise cost failure mode,
18
00:00:32,080 --> 00:00:33,760
unowned spend becomes normal.
19
00:00:33,760 --> 00:00:36,200
Cost overruns don't show up as one dramatic mistake.
20
00:00:36,200 --> 00:00:37,480
They show up as a new normal,
21
00:00:37,480 --> 00:00:39,400
a temporary environment that never gets deleted
22
00:00:39,400 --> 00:00:41,600
because nobody can prove it safe.
23
00:00:41,600 --> 00:00:43,800
A premium SKU chosen for safety,
24
00:00:43,800 --> 00:00:46,120
because the engineer is accountable for outages,
25
00:00:46,120 --> 00:00:47,280
not for invoices.
26
00:00:47,280 --> 00:00:48,800
Silent egress during a migration
27
00:00:48,800 --> 00:00:50,680
because the network path changed,
28
00:00:50,680 --> 00:00:53,040
the data moved, and the bill kept arriving.
29
00:00:53,040 --> 00:00:54,760
None of these are exotic failures.
30
00:00:54,760 --> 00:00:57,600
They are the default behavior of a large Azure estate
31
00:00:57,600 --> 00:00:59,200
when intent is not enforced.
32
00:01:00,120 --> 00:01:01,160
Here's what most people miss.
33
00:01:01,160 --> 00:01:03,760
Every one of those outcomes is locally rational.
34
00:01:03,760 --> 00:01:05,440
The engineer wants a stable deployment,
35
00:01:05,440 --> 00:01:07,000
so they select the higher tier.
36
00:01:07,000 --> 00:01:08,240
The team wants velocity,
37
00:01:08,240 --> 00:01:09,560
so they clone the environment
38
00:01:09,560 --> 00:01:11,240
and come back later to clean up.
39
00:01:11,240 --> 00:01:13,320
The platform team wants to unblock delivery,
40
00:01:13,320 --> 00:01:16,200
so they grant broad permissions temporarily.
41
00:01:16,200 --> 00:01:18,400
Each decision makes sense in isolation,
42
00:01:18,400 --> 00:01:20,720
but the enterprise doesn't pay for isolated decisions.
43
00:01:20,720 --> 00:01:22,600
The enterprise pays for the aggregate,
44
00:01:22,600 --> 00:01:24,200
and that aggregate becomes chaos
45
00:01:24,200 --> 00:01:27,000
because cloud cost is not additive in the way leaders imagine.
46
00:01:27,000 --> 00:01:27,840
It is compounding.
47
00:01:27,840 --> 00:01:29,600
It accumulates from recurring resources,
48
00:01:29,600 --> 00:01:32,360
from idle capacity, from just in case redundancy,
49
00:01:32,360 --> 00:01:34,960
from shared services nobody can allocate,
50
00:01:34,960 --> 00:01:38,360
and from the quiet truth that Azure is a permissioned system.
51
00:01:38,360 --> 00:01:42,440
If something exists, some identity was allowed to create it.
52
00:01:42,440 --> 00:01:45,000
This is the part that should sound familiar to security people.
53
00:01:45,000 --> 00:01:46,280
Security drift doesn't happen
54
00:01:46,280 --> 00:01:48,360
because everyone suddenly forgets security.
55
00:01:48,360 --> 00:01:50,440
It happens because exceptions accumulate.
56
00:01:50,440 --> 00:01:52,600
A conditional access policy gets an exclude
57
00:01:52,600 --> 00:01:54,160
for this service account.
58
00:01:54,160 --> 00:01:56,320
An R-back role gets a temporary owner.
59
00:01:56,320 --> 00:01:58,280
A firewall rule gets a one-day opening
60
00:01:58,280 --> 00:01:59,800
that survives three quarters.
61
00:01:59,800 --> 00:02:02,120
Over time, the system stops behaving deterministically
62
00:02:02,120 --> 00:02:04,200
and starts behaving probabilistically.
63
00:02:04,200 --> 00:02:05,520
Cost follows the same physics.
64
00:02:05,520 --> 00:02:08,160
If the platform allows teams to create resources
65
00:02:08,160 --> 00:02:09,680
without ownership metadata,
66
00:02:09,680 --> 00:02:10,840
without budget boundaries,
67
00:02:10,840 --> 00:02:12,800
and without constrained SQL choices,
68
00:02:12,800 --> 00:02:14,280
then drift is not a risk.
69
00:02:14,280 --> 00:02:15,240
Drift is guaranteed.
70
00:02:15,240 --> 00:02:18,040
You are operating a cost system with no memory of intent.
71
00:02:18,040 --> 00:02:20,240
That distinction matters.
72
00:02:20,240 --> 00:02:24,000
The typical enterprise response is predictable, reminders.
73
00:02:24,000 --> 00:02:25,880
We need to be more cost conscious.
74
00:02:25,880 --> 00:02:27,320
Please tag your resources.
75
00:02:27,320 --> 00:02:29,400
Here's the monthly cost review deck.
76
00:02:29,400 --> 00:02:32,000
That approach feels mature because it looks organized,
77
00:02:32,000 --> 00:02:33,720
but awareness does not constrain behavior.
78
00:02:33,720 --> 00:02:34,560
It never did.
79
00:02:34,560 --> 00:02:36,800
Reminders don't close deployment pathways.
80
00:02:36,800 --> 00:02:37,920
They don't stop a pipeline
81
00:02:37,920 --> 00:02:40,120
from deploying a premium database tier.
82
00:02:40,120 --> 00:02:42,720
They don't prevent a team from creating yet another subscription
83
00:02:42,720 --> 00:02:44,400
because procurement takes too long.
84
00:02:44,400 --> 00:02:46,480
They don't shut down an abandoned environment
85
00:02:46,480 --> 00:02:47,320
on Friday night.
86
00:02:47,320 --> 00:02:48,760
Humans are not a control plane.
87
00:02:48,760 --> 00:02:50,840
The platform is, as your resource manager is,
88
00:02:50,840 --> 00:02:52,920
R-back is, Azure policy is.
89
00:02:52,920 --> 00:02:54,960
Subscription boundaries are.
90
00:02:54,960 --> 00:02:56,960
Those are the things that decide what can exist
91
00:02:56,960 --> 00:02:57,800
and what cannot.
92
00:02:57,800 --> 00:03:00,200
If those layers do not encode financial intent,
93
00:03:00,200 --> 00:03:01,760
then the enterprise is basically
94
00:03:01,760 --> 00:03:03,480
running a distributed spending engine
95
00:03:03,480 --> 00:03:05,040
with no enforcement mechanism.
96
00:03:05,040 --> 00:03:07,080
So define the failure mode precisely.
97
00:03:07,080 --> 00:03:10,360
Unknown spend becomes normal because the system tolerates it.
98
00:03:10,360 --> 00:03:12,920
It tolerates resources that can't be attributed to a product,
99
00:03:12,920 --> 00:03:15,280
a cost center, or a named owner.
100
00:03:15,280 --> 00:03:16,800
It tolerates platform spend,
101
00:03:16,800 --> 00:03:19,720
smeared across shared subscriptions when nobody feels it.
102
00:03:19,720 --> 00:03:21,800
It tolerates environments that outlive the sprint
103
00:03:21,800 --> 00:03:23,080
they were created for.
104
00:03:23,080 --> 00:03:26,440
It tolerates premium defaults because nothing in the platform
105
00:03:26,440 --> 00:03:27,920
says prove you need this.
106
00:03:27,920 --> 00:03:30,440
And then eventually finance sees the invoice.
107
00:03:30,440 --> 00:03:32,440
By that point, the spend is no longer a decision.
108
00:03:32,440 --> 00:03:33,280
It's dead.
109
00:03:33,280 --> 00:03:34,280
The service is running.
110
00:03:34,280 --> 00:03:35,720
The stakeholders are attached.
111
00:03:35,720 --> 00:03:37,360
The architecture has formed around it.
112
00:03:37,360 --> 00:03:39,240
Turning it off is now a risk discussion,
113
00:03:39,240 --> 00:03:40,520
not a cost discussion.
114
00:03:40,520 --> 00:03:42,360
That's why invoice time escalation fails.
115
00:03:42,360 --> 00:03:44,400
It's always late and it's always political.
116
00:03:44,400 --> 00:03:46,600
Cost entropy is the name for that trap.
117
00:03:46,600 --> 00:03:49,160
It is unmanaged pathways that generate recurring spend
118
00:03:49,160 --> 00:03:50,720
without decision review.
119
00:03:50,720 --> 00:03:54,800
It is the gradual conversion of cost control from a deterministic model
120
00:03:54,800 --> 00:03:57,880
where spending happens because someone explicitly intended it
121
00:03:57,880 --> 00:03:59,440
into a probabilistic one,
122
00:03:59,440 --> 00:04:03,760
where spending happens because the platform is allowed to do whatever it can.
123
00:04:03,760 --> 00:04:06,120
And if you're wondering why waste cleanup never seems to finish,
124
00:04:06,120 --> 00:04:08,200
this is why you are chasing symptoms
125
00:04:08,200 --> 00:04:10,920
after the authorization decision already happened.
126
00:04:10,920 --> 00:04:12,560
The uncomfortable truth is simple.
127
00:04:12,560 --> 00:04:15,840
The enterprise cost failure mode is not the existence of waste.
128
00:04:15,840 --> 00:04:17,840
It's the absence of enforceable ownership.
129
00:04:17,840 --> 00:04:20,960
Waste is just what unowned systems produce at scale.
130
00:04:20,960 --> 00:04:23,720
And that's why most enterprises start FinOps backwards.
131
00:04:23,720 --> 00:04:27,280
They start with visibility tools, dashboards and reports,
132
00:04:27,280 --> 00:04:29,280
and then wonder why behavior doesn't change.
133
00:04:29,280 --> 00:04:31,160
Visibility doesn't enforce intent.
134
00:04:31,160 --> 00:04:32,320
Governance does.
135
00:04:32,320 --> 00:04:35,800
FinOps implemented backwards, tooling first, governance never.
136
00:04:35,800 --> 00:04:40,000
Most enterprises do FinOps the same way they do security awareness.
137
00:04:40,000 --> 00:04:43,720
They buy tooling, build dashboards, schedule a review meeting,
138
00:04:43,720 --> 00:04:46,200
and then act surprised when behavior doesn't change.
139
00:04:46,200 --> 00:04:47,960
The usual sequence is almost scripted,
140
00:04:47,960 --> 00:04:52,280
first enable Azure cost management, then build reports,
141
00:04:52,280 --> 00:04:57,360
then export to Power BI, then argue about amortization, reservations,
142
00:04:57,360 --> 00:05:01,200
and whether the spend should be grouped by resource group, subscription or tech.
143
00:05:01,200 --> 00:05:04,840
Somewhere in the middle, someone adds an email alert at 90% of budget.
144
00:05:04,840 --> 00:05:06,280
Everyone feels responsible.
145
00:05:06,280 --> 00:05:07,960
Nobody is constrained.
146
00:05:07,960 --> 00:05:09,160
That distinction matters.
147
00:05:09,160 --> 00:05:10,720
Observability is not governance.
148
00:05:10,720 --> 00:05:12,400
Observability tells you what happened.
149
00:05:12,400 --> 00:05:14,400
Governance decides what can happen.
150
00:05:14,400 --> 00:05:17,440
FinOps implemented backwards confuses the two and calls it progress.
151
00:05:17,440 --> 00:05:20,040
This is why so many FinOps programs turn into cost theatre.
152
00:05:20,040 --> 00:05:22,240
The reports get prettier, the decks get longer,
153
00:05:22,240 --> 00:05:24,400
the conversations get more sophisticated,
154
00:05:24,400 --> 00:05:27,440
but the platform remains permissive, so the spend keeps happening,
155
00:05:27,440 --> 00:05:30,280
and the FinOps team becomes a translation layer
156
00:05:30,280 --> 00:05:33,080
between invoices and engineers who never had to feel the cost decision
157
00:05:33,080 --> 00:05:34,480
in the moment it was made.
158
00:05:34,480 --> 00:05:36,680
Here's the uncomfortable behavior pattern that follows.
159
00:05:36,680 --> 00:05:37,840
Alerts become noise.
160
00:05:37,840 --> 00:05:41,040
Budget alert hits, email goes out, nobody responds.
161
00:05:41,040 --> 00:05:42,520
Not because people are lazy,
162
00:05:42,520 --> 00:05:46,000
because the alert is not attached to an owner with authority and consequence.
163
00:05:46,000 --> 00:05:47,600
The budget doesn't change anything.
164
00:05:47,600 --> 00:05:50,280
It doesn't block a deployment, it doesn't require an exception,
165
00:05:50,280 --> 00:05:52,200
it doesn't trigger escalation with teeth.
166
00:05:52,200 --> 00:05:56,600
It just creates another message in a mailbox already full of messages that sound urgent.
167
00:05:56,600 --> 00:05:58,520
And when alerts don't trigger action,
168
00:05:58,520 --> 00:06:00,360
engineers learn the real policy.
169
00:06:00,360 --> 00:06:01,760
Ignore it.
170
00:06:01,760 --> 00:06:04,600
That is how cost entropy becomes a culture problem.
171
00:06:04,600 --> 00:06:06,280
Not because the people are irresponsible,
172
00:06:06,280 --> 00:06:08,040
but because the system trains them,
173
00:06:08,040 --> 00:06:10,040
that nothing happens when you exceed intent.
174
00:06:10,040 --> 00:06:12,600
The platform keeps running, the invoice arrives later.
175
00:06:12,600 --> 00:06:13,880
Somebody else argues about it.
176
00:06:13,880 --> 00:06:16,640
FinOps tooling is good at telling you where the money went.
177
00:06:16,640 --> 00:06:18,960
It is structurally bad at preventing the next dollar,
178
00:06:18,960 --> 00:06:22,160
unless you connect it to controls that shape deployment pathways.
179
00:06:22,160 --> 00:06:26,960
Most organizations don't, they treat cost tooling as the control plane when it's just telemetry.
180
00:06:26,960 --> 00:06:29,760
And nowhere does that failure hide better than shared services.
181
00:06:29,760 --> 00:06:32,360
Shared services is where cost accountability goes to die.
182
00:06:32,360 --> 00:06:34,600
Networking, logging, monitoring, security tooling,
183
00:06:34,600 --> 00:06:39,960
egress, private endpoints, everything that platform teams deploy in the name of standardization and safety.
184
00:06:39,960 --> 00:06:43,400
It's also the perfect place for the organization to stop asking who owns spend
185
00:06:43,400 --> 00:06:44,840
because the answer is uncomfortable.
186
00:06:44,840 --> 00:06:46,920
Nobody owns it, everyone depends on it.
187
00:06:46,920 --> 00:06:50,680
So it becomes central IT spend and central IT becomes a cost sink.
188
00:06:50,680 --> 00:06:54,120
Every application team benefits, but no application team sees a direct bill.
189
00:06:54,120 --> 00:06:57,080
Therefore, nobody has an incentive to question retention sampling,
190
00:06:57,080 --> 00:07:01,400
SKU tiers, or whether that cross-region log ingestion was actually required.
191
00:07:01,400 --> 00:07:04,040
The system behaves exactly as designed.
192
00:07:04,040 --> 00:07:06,200
Shared costs become invisible costs.
193
00:07:06,200 --> 00:07:10,120
Then finance asks why cloud is expensive and the platform team shows a dashboard.
194
00:07:10,120 --> 00:07:13,640
The foundational mistake is treating the cost problem like a visibility problem.
195
00:07:13,640 --> 00:07:15,880
Visibility is necessary, it is never sufficient.
196
00:07:15,880 --> 00:07:17,800
A dashboard does not create a boundary.
197
00:07:17,800 --> 00:07:19,800
A report does not create a consequence.
198
00:07:19,800 --> 00:07:24,200
A monthly review does not stop a pipeline from deploying a premium tier on Tuesday morning
199
00:07:24,200 --> 00:07:26,440
because the engineer wants to reduce operational risks.
200
00:07:26,440 --> 00:07:30,680
So the film-opt's meeting becomes a recurring ritual where everyone agrees something should change
201
00:07:30,680 --> 00:07:33,160
and then the system keeps doing what it's allowed to do.
202
00:07:33,160 --> 00:07:35,400
That's the key phrase, what it's allowed to do.
203
00:07:35,400 --> 00:07:40,200
Because the only place you can reliably change cost behavior at scale is the control plane,
204
00:07:40,200 --> 00:07:42,760
identity, policy, hierarchy and permissions.
205
00:07:42,760 --> 00:07:45,800
As your doesn't spend money, your authorization model spends money.
206
00:07:45,800 --> 00:07:49,480
The moment you accept that, the whole tooling first approach looks like putting a
207
00:07:49,480 --> 00:07:51,400
speedometer in a car and calling it breaking.
208
00:07:51,400 --> 00:07:53,480
It's useful information, it is not control.
209
00:07:53,480 --> 00:07:56,440
FinOps implemented correctly starts from a different question.
210
00:07:56,440 --> 00:08:00,120
Where is the enterprise allowing spend to occur without explicit intent?
211
00:08:00,120 --> 00:08:03,480
And how does the platform enforce that intent every time?
212
00:08:03,480 --> 00:08:05,080
That means budgets aren't just numbers.
213
00:08:05,080 --> 00:08:06,760
They're signals wired to owners.
214
00:08:06,760 --> 00:08:08,120
Tagging isn't etiquette.
215
00:08:08,120 --> 00:08:09,720
It's enforced metadata.
216
00:08:09,720 --> 00:08:11,320
SQ selection isn't preference.
217
00:08:11,320 --> 00:08:11,960
It's policy.
218
00:08:11,960 --> 00:08:13,560
Subscription creation isn't convenience.
219
00:08:13,560 --> 00:08:15,640
It's a gated act with declared accountability.
220
00:08:15,640 --> 00:08:19,240
In other words, cost isn't a finance artifact you observe after the fact.
221
00:08:19,240 --> 00:08:20,520
It's a control plane outcome.
222
00:08:20,520 --> 00:08:23,160
You either constrained by design or you didn't.
223
00:08:23,160 --> 00:08:26,280
And once you see cost that way, the next step becomes obvious.
224
00:08:26,280 --> 00:08:29,080
Every cloud dollar is an authorization decision.
225
00:08:29,080 --> 00:08:30,200
The reframe.
226
00:08:30,200 --> 00:08:32,440
Every cloud dollar is an authorization decision.
227
00:08:32,440 --> 00:08:35,800
Here's the reframe that makes everything else painfully obvious.
228
00:08:35,800 --> 00:08:37,720
A cloud bill is not a finance event.
229
00:08:37,720 --> 00:08:40,280
It's a runtime side effect of authorization.
230
00:08:40,280 --> 00:08:41,960
Before a dollar shows up on an invoice,
231
00:08:41,960 --> 00:08:44,920
something had to be created, scaled or left running.
232
00:08:44,920 --> 00:08:46,280
And before that could happen,
233
00:08:46,280 --> 00:08:49,480
the platform evaluated and allowed denied decision somewhere in the graph.
234
00:08:49,480 --> 00:08:51,160
A user, a service principle,
235
00:08:51,160 --> 00:08:53,320
a managed identity, a pipeline,
236
00:08:53,320 --> 00:08:55,160
a landing zone automation account.
237
00:08:55,160 --> 00:08:57,400
Azure didn't get expensive.
238
00:08:57,400 --> 00:08:59,000
Azure did what it was allowed to do.
239
00:08:59,000 --> 00:09:01,720
That distinction matters because it moves the conversation away
240
00:09:01,720 --> 00:09:03,880
from feelings and toward mechanics.
241
00:09:03,880 --> 00:09:05,560
Cost isn't a behavior you inspire.
242
00:09:05,560 --> 00:09:07,000
It's a pathway you permit.
243
00:09:07,000 --> 00:09:08,440
If a resource exists,
244
00:09:08,440 --> 00:09:10,920
some identity had enough permission to create it.
245
00:09:10,920 --> 00:09:13,320
And the hierarchy had enough openness to accept it.
246
00:09:13,320 --> 00:09:14,760
So, okay, so basically,
247
00:09:14,760 --> 00:09:17,960
every cloud dollar begins life as an authorization decision.
248
00:09:17,960 --> 00:09:20,920
Most enterprises pretend cost starts in cost management.
249
00:09:20,920 --> 00:09:21,640
It does not.
250
00:09:21,640 --> 00:09:23,960
Cost starts at deploy time and scale time.
251
00:09:23,960 --> 00:09:27,320
Cost starts when the system compiles intent into reality.
252
00:09:27,320 --> 00:09:28,600
A bag allows the action,
253
00:09:28,600 --> 00:09:32,840
policy allows the configuration and the subscription boundary absorbs the blast radius.
254
00:09:32,840 --> 00:09:35,080
Think of Azure like an authorization compiler.
255
00:09:35,080 --> 00:09:36,360
You write intent as code,
256
00:09:36,360 --> 00:09:39,560
arm templates, bicep, terraform, pipelines, portal clicks.
257
00:09:39,560 --> 00:09:42,200
The control plane evaluates that intent against rules.
258
00:09:42,200 --> 00:09:43,240
If it passes,
259
00:09:43,240 --> 00:09:46,920
the platform materializes capacity that burns money every hour
260
00:09:46,920 --> 00:09:48,360
until something stops it.
261
00:09:48,360 --> 00:09:50,680
If you want cost control, you don't need more visibility.
262
00:09:50,680 --> 00:09:52,520
You need tighter compilation rules.
263
00:09:52,520 --> 00:09:56,680
This is also why anonymous spending is the most dangerous anti-pattern in Azure.
264
00:09:56,680 --> 00:09:59,400
Anonymous spending isn't literally anonymous as your logs everything,
265
00:09:59,400 --> 00:10:00,440
billing has line items.
266
00:10:00,440 --> 00:10:05,720
The issue is that the enterprise can't map spend to a responsible decision maker in time to intervene.
267
00:10:05,720 --> 00:10:07,960
The cost is smeared across shared scopes
268
00:10:07,960 --> 00:10:11,320
or resources are created without enforceable ownership metadata
269
00:10:11,320 --> 00:10:13,640
or the owner left the company and the budget state.
270
00:10:13,640 --> 00:10:14,920
That's not a reporting gap.
271
00:10:14,920 --> 00:10:18,760
That's an authorization gap because cost control only works when the decision maker
272
00:10:18,760 --> 00:10:20,520
is inside the feedback loop.
273
00:10:20,520 --> 00:10:23,400
If engineering can deploy without owning the financial impact,
274
00:10:23,400 --> 00:10:25,880
you've built a system where accountability is optional.
275
00:10:25,880 --> 00:10:28,200
Optional accountability doesn't survive scale.
276
00:10:28,200 --> 00:10:29,480
Now here's the weird part.
277
00:10:29,480 --> 00:10:31,080
The more exceptions you allow,
278
00:10:31,080 --> 00:10:33,640
the less predictable cost control becomes.
279
00:10:33,640 --> 00:10:36,520
Enterprises love exceptions because they sound pragmatic.
280
00:10:36,520 --> 00:10:38,120
This workload is special.
281
00:10:38,120 --> 00:10:39,240
This team is blocked.
282
00:10:39,240 --> 00:10:40,440
We'll fix it later.
283
00:10:40,440 --> 00:10:45,080
And each exception converts your financial control model from deterministic to probabilistic.
284
00:10:45,080 --> 00:10:47,800
To deterministic means if you try to deploy X,
285
00:10:47,800 --> 00:10:50,200
the platform will deny it unless you meet Y.
286
00:10:50,200 --> 00:10:52,680
Probabilistic means sometimes X is denied,
287
00:10:52,680 --> 00:10:53,880
sometimes it passes,
288
00:10:53,880 --> 00:10:56,040
depending on who asked what scope they used,
289
00:10:56,040 --> 00:10:57,880
which subscription they found,
290
00:10:57,880 --> 00:10:59,880
which policy is actually assigned,
291
00:10:59,880 --> 00:11:02,520
and which exemption was quietly granted six months ago.
292
00:11:02,520 --> 00:11:05,000
That is in governance. That's conditional chaos.
293
00:11:05,000 --> 00:11:07,480
So what is financial intent in architectural terms?
294
00:11:07,480 --> 00:11:09,320
It's not a spreadsheet. It's not a forecast.
295
00:11:09,320 --> 00:11:12,120
It's a set of constraints the platform enforces continuously.
296
00:11:12,120 --> 00:11:16,360
Ownership. Every deployable scope has a named accountable party.
297
00:11:16,360 --> 00:11:19,560
Boundaries, budgets and thresholds exist where ownership exists.
298
00:11:19,560 --> 00:11:23,800
Constraints allowed SKUs, regions and patterns match the environment's purpose.
299
00:11:23,800 --> 00:11:28,040
Escalation. When spent deviates, something happens that is not an email.
300
00:11:28,040 --> 00:11:32,520
Financial intent is the enterprise's decision logic encoded where decisions actually happen.
301
00:11:32,520 --> 00:11:34,840
And once you accept that cost is authorization,
302
00:11:34,840 --> 00:11:36,360
you also accept something else.
303
00:11:36,360 --> 00:11:38,920
Finops lives with identity, policy,
304
00:11:38,920 --> 00:11:40,520
RBIAC and hierarchy.
305
00:11:40,520 --> 00:11:42,280
Not because finance wants to be technical,
306
00:11:42,280 --> 00:11:44,200
but because that's where enforcement lives.
307
00:11:44,200 --> 00:11:45,800
Cost management can tell you what happened.
308
00:11:45,800 --> 00:11:49,800
It can't stop the next deployment as your policy can.rbac can.
309
00:11:49,800 --> 00:11:51,640
A subscription boundary can.
310
00:11:51,640 --> 00:11:55,880
Exception governance can. This is why the enterprise should stop talking about saving money
311
00:11:55,880 --> 00:11:59,000
and start talking about removing unordited spending pathways.
312
00:11:59,000 --> 00:12:01,320
The savings is a byproduct, control is the goal.
313
00:12:01,320 --> 00:12:04,040
And if you want one practical implication to hold on to,
314
00:12:04,040 --> 00:12:07,320
if you can't point to the exact boundary where spend is owned and constrained,
315
00:12:07,320 --> 00:12:08,920
you don't have financial governance.
316
00:12:08,920 --> 00:12:10,280
You have financial hope.
317
00:12:10,280 --> 00:12:14,440
So the question becomes, where is the first boundary that actually works at enterprise scale?
318
00:12:14,440 --> 00:12:16,440
It isn't a resource group, it isn't a tag.
319
00:12:16,440 --> 00:12:17,640
It isn't a dashboard.
320
00:12:17,640 --> 00:12:18,840
It's the subscription.
321
00:12:18,840 --> 00:12:21,320
Subscriptions are the primary cost governance boundary.
322
00:12:21,320 --> 00:12:23,640
Most people treat subscriptions as building buckets.
323
00:12:23,640 --> 00:12:25,160
A place to put workloads.
324
00:12:25,160 --> 00:12:26,760
A line item you can move later.
325
00:12:26,760 --> 00:12:29,000
That mental model is why cost control fails.
326
00:12:29,000 --> 00:12:31,560
A subscription is not primarily a finance construct.
327
00:12:31,560 --> 00:12:34,120
It's a governance boundary where three things collide.
328
00:12:34,120 --> 00:12:35,880
RBIAC scope, policy scope,
329
00:12:35,880 --> 00:12:38,040
and a measurable financial blast radius.
330
00:12:38,040 --> 00:12:40,760
It is the first place where you can make ownership real.
331
00:12:40,760 --> 00:12:43,240
Because the platform can attach permissions, budgets,
332
00:12:43,240 --> 00:12:46,360
and policy enforcement to a scope that actually contains damage.
333
00:12:46,360 --> 00:12:48,520
Resource groups don't do that, not reliably.
334
00:12:48,520 --> 00:12:50,440
Resource groups are operational containers.
335
00:12:50,440 --> 00:12:51,720
They help you organize.
336
00:12:51,720 --> 00:12:52,600
They help you deploy.
337
00:12:52,600 --> 00:12:55,400
They do not protect you from a team creating a second resource group
338
00:12:55,400 --> 00:12:58,200
with a different set of tags, a different naming convention,
339
00:12:58,200 --> 00:13:00,360
and a slightly different temporary story.
340
00:13:00,360 --> 00:13:03,560
And they absolutely don't protect you from the oldest enterprise trick.
341
00:13:03,560 --> 00:13:07,800
Burying expensive shared services in a resource group nobody wants to touch.
342
00:13:07,800 --> 00:13:10,040
Management groups are higher level governance.
343
00:13:10,040 --> 00:13:11,560
They're necessary for scale,
344
00:13:11,560 --> 00:13:14,280
but they're not where cost accountability becomes personal.
345
00:13:14,280 --> 00:13:15,960
They're where standards get inherited.
346
00:13:15,960 --> 00:13:18,760
The place where spend becomes owned is lower,
347
00:13:18,760 --> 00:13:21,480
where budgets and permissions map to actual teams.
348
00:13:21,480 --> 00:13:23,160
That's the subscription.
349
00:13:23,160 --> 00:13:25,960
A well-designed subscription is a budget boundary first.
350
00:13:25,960 --> 00:13:27,560
It's the unit where you can say,
351
00:13:27,560 --> 00:13:30,200
this is the maximum financial exposure we will tolerate
352
00:13:30,200 --> 00:13:31,880
for this workload or this team.
353
00:13:31,880 --> 00:13:35,240
And if it exceeds expected behavior, escalation happens immediately.
354
00:13:35,240 --> 00:13:37,400
Not at invoice time, at deviation time.
355
00:13:37,400 --> 00:13:39,400
A subscription is also an R-back boundary.
356
00:13:39,400 --> 00:13:41,080
If you want to stop anonymous spending,
357
00:13:41,080 --> 00:13:43,560
you need to stop handing out broad contributors at scopes
358
00:13:43,560 --> 00:13:45,240
where nobody can be clearly blamed.
359
00:13:45,240 --> 00:13:47,320
Subscriptions let you define who can deploy,
360
00:13:47,320 --> 00:13:49,080
who can approve, who can see costs,
361
00:13:49,080 --> 00:13:50,360
and who can grant further rights.
362
00:13:50,360 --> 00:13:52,120
That separation matters because otherwise,
363
00:13:52,120 --> 00:13:54,680
the same identity that can create spend can also hide it.
364
00:13:54,680 --> 00:13:57,240
And a subscription is a policy boundary.
365
00:13:57,240 --> 00:13:59,400
As your policy assignments at subscription scope
366
00:13:59,400 --> 00:14:01,960
are where enforcement stops being aspirational.
367
00:14:01,960 --> 00:14:03,640
You can deny premium skews in dev,
368
00:14:03,640 --> 00:14:06,200
you can restrict regions, you can require tags,
369
00:14:06,200 --> 00:14:07,800
you can force diagnostic settings
370
00:14:07,800 --> 00:14:09,880
that you've decided you're willing to pay for.
371
00:14:09,880 --> 00:14:13,160
You can also carve exceptions with visibility and expiration
372
00:14:13,160 --> 00:14:16,920
instead of letting them live forever as silent entropy generators.
373
00:14:16,920 --> 00:14:19,320
Now look at the failure mode most enterprises live in.
374
00:14:19,320 --> 00:14:20,600
Subscriptions sprawl.
375
00:14:20,600 --> 00:14:22,120
Subscriptions get created ad hoc.
376
00:14:22,120 --> 00:14:24,280
A team needs a sandbox, so they create one.
377
00:14:24,280 --> 00:14:26,440
And another team needs a POC, so they create one.
378
00:14:26,440 --> 00:14:28,680
The platform team needs to unblock delivery,
379
00:14:28,680 --> 00:14:29,800
so they create one.
380
00:14:29,800 --> 00:14:32,760
Over time, you get dozens or hundreds of subscriptions
381
00:14:32,760 --> 00:14:34,920
with inconsistent policies, inconsistent tagging,
382
00:14:34,920 --> 00:14:37,640
inconsistent permissions, and no coherent budget story.
383
00:14:37,640 --> 00:14:40,040
And when the bill spikes, nobody knows where to look first.
384
00:14:40,040 --> 00:14:42,040
Because the sprawl wasn't just more subscriptions,
385
00:14:42,040 --> 00:14:44,440
it was more unreviewed pathways for spend,
386
00:14:44,440 --> 00:14:46,840
more identities, more places where policies weren't assigned.
387
00:14:47,240 --> 00:14:49,880
More corners where a high tier resource could hide for months.
388
00:14:49,880 --> 00:14:52,520
So the real principle is not use fewer subscriptions.
389
00:14:52,520 --> 00:14:56,280
The principle is, a subscription should not exist without declared intent.
390
00:14:56,280 --> 00:14:58,680
That means subscription creation is not a convenience.
391
00:14:58,680 --> 00:14:59,720
It's a governance event.
392
00:14:59,720 --> 00:15:02,280
It is the moment you decide who owns this,
393
00:15:02,280 --> 00:15:04,840
what it can spend, what it is allowed to deploy,
394
00:15:04,840 --> 00:15:06,440
and what happens when it deviates.
395
00:15:06,440 --> 00:15:07,880
Call it a vending model if you want.
396
00:15:07,880 --> 00:15:08,840
The name doesn't matter.
397
00:15:08,840 --> 00:15:10,280
The enforcement does.
398
00:15:10,280 --> 00:15:13,160
Before a subscription is issued, four things must be true.
399
00:15:13,160 --> 00:15:15,160
First, there is an accountable owner.
400
00:15:15,160 --> 00:15:18,760
A human name, not platform team, not a distribution list,
401
00:15:18,760 --> 00:15:21,800
a role with escalation responsibility when budgets fire.
402
00:15:21,800 --> 00:15:24,120
Second, there is a budget with early thresholds,
403
00:15:24,120 --> 00:15:27,640
not 90% at month and early, 50% and 75%
404
00:15:27,640 --> 00:15:29,880
are governance interrupts, not failure notices.
405
00:15:29,880 --> 00:15:33,080
Third, there are allowed skews and regions aligned to purpose.
406
00:15:33,080 --> 00:15:34,040
Dev is not prod.
407
00:15:34,040 --> 00:15:36,440
Non-prod doesn't get premium defaults just in case.
408
00:15:36,440 --> 00:15:40,200
Regions are constrained because global sprawl is both expensive
409
00:15:40,200 --> 00:15:41,400
and operationally chaotic.
410
00:15:41,400 --> 00:15:44,680
Fourth, there is an escalation workflow that actually roots.
411
00:15:45,160 --> 00:15:48,120
If a budget triggers, it creates a ticket, it pages the owner,
412
00:15:48,120 --> 00:15:49,400
it hits the right channel.
413
00:15:49,400 --> 00:15:51,240
Something happens that forces a decision.
414
00:15:51,240 --> 00:15:52,280
This is the point.
415
00:15:52,280 --> 00:15:55,000
Subscriptions turn cost governance from a vague aspiration
416
00:15:55,000 --> 00:15:56,600
into an enforceable boundary.
417
00:15:56,600 --> 00:16:00,200
When you do this, sprawl collapses into an intentional structure.
418
00:16:00,200 --> 00:16:02,920
And that structure is the only thing that lets you scale
419
00:16:02,920 --> 00:16:05,800
Azure without scaling financial chaos,
420
00:16:05,800 --> 00:16:07,400
which brings up the first scenario,
421
00:16:07,400 --> 00:16:09,400
what happens when you don't do any of this.
422
00:16:09,400 --> 00:16:12,200
Scenario one, subscription sprawl with no ownership.
423
00:16:12,200 --> 00:16:14,760
Here's what subscriptions sprawl looks like in a real enterprise,
424
00:16:14,760 --> 00:16:16,920
not the PowerPoint version, the lived version.
425
00:16:16,920 --> 00:16:21,240
There are dozens of subscriptions because every project needed just one more.
426
00:16:21,240 --> 00:16:25,240
Some were created by Central IT, some by App Teams, some by an MSP,
427
00:16:25,240 --> 00:16:26,920
some by whoever still had the rights.
428
00:16:26,920 --> 00:16:29,160
A few are tied to products, many aren't,
429
00:16:29,160 --> 00:16:32,200
and the ones that aren't become the perfect hiding place for spend,
430
00:16:32,200 --> 00:16:34,920
because ambiguity is a financial shelter.
431
00:16:34,920 --> 00:16:38,520
In the before posture, budgets are either missing or decorative.
432
00:16:38,520 --> 00:16:41,960
Cost management exists, sure, but nobody owns the interpretation.
433
00:16:41,960 --> 00:16:44,040
Tags might exist, but they aren't enforced,
434
00:16:44,040 --> 00:16:48,280
and the billing views are full of resources with blank owners or TBD cost centers.
435
00:16:48,280 --> 00:16:52,040
There's usually at least one shared subscription called something like platform,
436
00:16:52,040 --> 00:16:53,880
connectivity or hub.
437
00:16:53,880 --> 00:16:56,280
And it contains everything expensive and unglomerates,
438
00:16:56,280 --> 00:17:00,120
firewalls, VPN gateways, private endpoints,
439
00:17:00,120 --> 00:17:04,120
log-in-gestion, cross-region replication, security tooling,
440
00:17:04,120 --> 00:17:06,360
the stuff that always grows and nobody wants to explain.
441
00:17:06,360 --> 00:17:07,880
And here's the operational pattern.
442
00:17:07,880 --> 00:17:10,120
Anomalies are discovered at invoice time.
443
00:17:10,120 --> 00:17:14,040
Finance sees a spike. It escalates to IT, 84 words it to the cloud team.
444
00:17:14,040 --> 00:17:17,000
The cloud team opens cost analysis and starts doing archaeology,
445
00:17:17,000 --> 00:17:19,400
who created the resources? Why are they still running?
446
00:17:19,400 --> 00:17:20,920
Which subscription is this even in?
447
00:17:20,920 --> 00:17:22,520
Is this production? Is this a POC?
448
00:17:22,520 --> 00:17:23,560
Is it safe to delete?
449
00:17:23,560 --> 00:17:25,320
The answer is usually nobody knows,
450
00:17:25,320 --> 00:17:26,920
not because the logs are missing,
451
00:17:26,920 --> 00:17:30,040
because ownership was never declared in a way the platform could enforce.
452
00:17:30,040 --> 00:17:33,560
So the system is full of spend that is technically attributable,
453
00:17:33,560 --> 00:17:35,160
but practically unowned.
454
00:17:35,160 --> 00:17:36,360
That's the pathology.
455
00:17:36,360 --> 00:17:38,120
The organization can see cost,
456
00:17:38,120 --> 00:17:41,000
but cannot assign responsibility fast enough to intervene.
457
00:17:41,000 --> 00:17:42,680
So every month becomes the same ritual.
458
00:17:42,680 --> 00:17:45,720
Finance wants a name, engineering wants proof it's safe to change,
459
00:17:45,720 --> 00:17:49,240
the platform team wants to avoid breaking workloads it doesn't own.
460
00:17:49,240 --> 00:17:51,080
And leadership wants the bill to stop growing
461
00:17:51,080 --> 00:17:53,480
without having to learn what a private endpoint is.
462
00:17:53,480 --> 00:17:56,280
Over time, the enterprise adapts in the worst possible way.
463
00:17:56,280 --> 00:17:57,720
It normalizes the unknown.
464
00:17:57,720 --> 00:17:59,320
Cloud is just expensive.
465
00:17:59,320 --> 00:18:00,680
It's probably AI.
466
00:18:00,680 --> 00:18:02,280
It's probably security logging.
467
00:18:02,280 --> 00:18:03,800
It's probably the migration.
468
00:18:03,800 --> 00:18:07,240
The bill becomes weather, unpleasant, inevitable, and nobody's fault.
469
00:18:07,880 --> 00:18:10,280
Now the after-poster is not a dashboard upgrade.
470
00:18:10,280 --> 00:18:13,400
It's a subscription-vending model with enforced preconditions.
471
00:18:13,400 --> 00:18:16,360
A subscription cannot be created until it declares intent
472
00:18:16,360 --> 00:18:18,120
in a machine readable way.
473
00:18:18,120 --> 00:18:18,920
Who owns it?
474
00:18:18,920 --> 00:18:20,120
What budget it has?
475
00:18:20,120 --> 00:18:21,640
What environment it is?
476
00:18:21,640 --> 00:18:23,240
What it is allowed to deploy?
477
00:18:23,240 --> 00:18:25,000
And how escalation works?
478
00:18:25,000 --> 00:18:26,440
Ownership is not a wiki page.
479
00:18:26,440 --> 00:18:28,920
It's metadata tied to the subscription itself
480
00:18:28,920 --> 00:18:31,640
and referenced by policy, budget actions, and routing.
481
00:18:31,640 --> 00:18:34,360
This is where budget thresholds stop being polite notifications
482
00:18:34,360 --> 00:18:36,040
and start being governance interrupts.
483
00:18:36,040 --> 00:18:38,360
A budget at 50% isn't your failing.
484
00:18:38,360 --> 00:18:40,760
It's you are deviating from expected behavior early enough
485
00:18:40,760 --> 00:18:42,280
to still have options.
486
00:18:42,280 --> 00:18:45,720
At 75% it escalates harder, not by spamming more people.
487
00:18:45,720 --> 00:18:49,080
By triggering the next step in the enterprise workflow,
488
00:18:49,080 --> 00:18:51,240
a ticket, a routing rule,
489
00:18:51,240 --> 00:18:54,280
an accountable owner who has to either justify the spend,
490
00:18:54,280 --> 00:18:57,320
fix the drift, or request an exception with an expiry.
491
00:18:57,320 --> 00:19:00,360
And yes, this is where the platform team will complain about friction.
492
00:19:00,360 --> 00:19:02,440
Good, friction is how the system signals
493
00:19:02,440 --> 00:19:03,880
that a decision is happening.
494
00:19:03,880 --> 00:19:05,480
The goal isn't to prevent spending.
495
00:19:05,480 --> 00:19:07,480
The goal is to prevent unreviewed spending.
496
00:19:07,480 --> 00:19:10,680
A subscription-vending model makes spend a conscious act again.
497
00:19:10,680 --> 00:19:13,880
Because it forces the enterprise to answer the questions it avoided,
498
00:19:13,880 --> 00:19:17,080
who owns this, what is it for, and what happens when it grows.
499
00:19:17,080 --> 00:19:18,680
You also get a second order effect
500
00:19:18,680 --> 00:19:20,920
that matters more than savings.
501
00:19:20,920 --> 00:19:23,880
Subscriptions sprawl collapses into a comprehensible structure.
502
00:19:23,880 --> 00:19:26,440
If every subscription has a named owner, a budget,
503
00:19:26,440 --> 00:19:28,360
and policy constraints align to purpose
504
00:19:28,360 --> 00:19:31,080
then when an anomaly happens, the investigation path is short.
505
00:19:31,080 --> 00:19:32,680
It becomes days, not months.
506
00:19:32,680 --> 00:19:35,240
And the organization stops paying for unknown spend,
507
00:19:35,240 --> 00:19:36,840
simply because it cannot allocate it.
508
00:19:36,840 --> 00:19:39,720
One caveat, don't pretend you can safely invent numbers here.
509
00:19:39,720 --> 00:19:43,320
The measurable outcome isn't we save 37%.
510
00:19:43,320 --> 00:19:46,200
The measurable outcome is reduction of unknown spend categories,
511
00:19:46,200 --> 00:19:47,800
faster anomaly detection cycles,
512
00:19:47,800 --> 00:19:50,440
and fewer often subscriptions that nobody can defend.
513
00:19:50,440 --> 00:19:52,440
And now the important transition,
514
00:19:52,440 --> 00:19:54,680
ownership alone doesn't solve allocation.
515
00:19:54,680 --> 00:19:57,080
If costs can't be attributed inside the subscription,
516
00:19:57,080 --> 00:19:59,160
down to product, environment, and life cycle,
517
00:19:59,160 --> 00:20:01,240
then subscription ownership becomes a blunt instrument.
518
00:20:01,240 --> 00:20:04,280
You end up with one owner holding a bag of costs they can't explain,
519
00:20:04,280 --> 00:20:06,040
which is where the next slide shows up.
520
00:20:06,040 --> 00:20:09,240
Tagging, tagging fails because it's treated as etiquette.
521
00:20:09,240 --> 00:20:12,040
Tagging is where most phenops programs go to die
522
00:20:12,040 --> 00:20:14,680
because enterprises treat it like manners.
523
00:20:14,680 --> 00:20:16,440
Please tag your resources.
524
00:20:16,440 --> 00:20:18,440
Here's the tagging standard.
525
00:20:18,440 --> 00:20:20,680
Don't forget your cost center.
526
00:20:20,680 --> 00:20:22,360
That language reveals the real posture.
527
00:20:22,360 --> 00:20:24,280
They're asking for compliance the way you ask people
528
00:20:24,280 --> 00:20:25,480
to rinse their dishes.
529
00:20:25,480 --> 00:20:27,880
And then they act surprised when the sink fills up.
530
00:20:27,880 --> 00:20:29,240
Tagging is not etiquette.
531
00:20:29,240 --> 00:20:31,160
Tagging is financial identity.
532
00:20:31,160 --> 00:20:33,720
If a resource doesn't carry ownership metadata,
533
00:20:33,720 --> 00:20:35,880
then it is not a resource in a managed system.
534
00:20:35,880 --> 00:20:38,120
It's a liability with an invoice attached.
535
00:20:38,120 --> 00:20:39,640
And if allocation depends on tags,
536
00:20:39,640 --> 00:20:42,280
then the platform must refuse to create resources without them.
537
00:20:42,280 --> 00:20:43,960
Humans will not do this consistently.
538
00:20:43,960 --> 00:20:44,760
They are busy.
539
00:20:44,760 --> 00:20:46,040
They are optimizing for delivery.
540
00:20:46,040 --> 00:20:47,080
They will forget.
541
00:20:47,080 --> 00:20:49,160
They will type, prod in one place,
542
00:20:49,160 --> 00:20:50,360
and prod in another.
543
00:20:50,360 --> 00:20:52,440
And Azure will treat those values as different
544
00:20:52,440 --> 00:20:55,080
because tag values are case sensitive.
545
00:20:55,080 --> 00:20:57,480
So you don't get slightly imperfect tagging.
546
00:20:57,480 --> 00:20:59,080
You get allocation collapse.
547
00:20:59,080 --> 00:21:01,560
This is what the failure actually looks like in an enterprise.
548
00:21:01,560 --> 00:21:03,400
Half the estate has tags, half doesn't.
549
00:21:03,880 --> 00:21:06,680
The tag half uses inconsistent keys and values.
550
00:21:06,680 --> 00:21:09,400
Cost center, cost center, cost center.
551
00:21:09,400 --> 00:21:12,520
Owner tags contain emails that belong to people who left.
552
00:21:12,520 --> 00:21:15,800
Environment tags say production, prod, PRD, and yes.
553
00:21:15,800 --> 00:21:17,800
Some teams tag at the resource group,
554
00:21:17,800 --> 00:21:19,240
some tag at the resource,
555
00:21:19,240 --> 00:21:22,200
some rely on terraform modules that never got updated,
556
00:21:22,200 --> 00:21:24,840
some deploy through the portal and don't see the field at all
557
00:21:24,840 --> 00:21:25,800
or they do,
558
00:21:25,800 --> 00:21:28,040
and they skip it because nothing stops them.
559
00:21:28,040 --> 00:21:30,360
Then finance shows up and asks for show back.
560
00:21:30,360 --> 00:21:34,680
Engineering says sure and produces a report that is 40% unallocated.
561
00:21:34,680 --> 00:21:36,760
The conversation immediately turns political,
562
00:21:36,760 --> 00:21:37,960
not because people are emotional
563
00:21:37,960 --> 00:21:40,120
but because the data is unusable.
564
00:21:40,120 --> 00:21:42,520
Every charge back discussion becomes an argument
565
00:21:42,520 --> 00:21:44,680
about whether the allocation rules are fair
566
00:21:44,680 --> 00:21:47,320
because the tags aren't reliable enough to be treated as truth.
567
00:21:47,320 --> 00:21:49,800
Then this is where cost entropy hides again.
568
00:21:49,800 --> 00:21:51,640
In the fear of deletion,
569
00:21:51,640 --> 00:21:54,760
when a resource is untagged, nobody can confidently delete it
570
00:21:54,760 --> 00:21:56,440
because nobody can prove ownership.
571
00:21:56,440 --> 00:21:58,440
So the safest move is to keep paying.
572
00:21:58,440 --> 00:22:01,160
Untagged resources become financial fossils,
573
00:22:01,160 --> 00:22:04,920
expensive, old, and politically protected by ambiguity.
574
00:22:04,920 --> 00:22:06,200
The hard rule is simple.
575
00:22:06,200 --> 00:22:08,120
If you require a tag for allocation,
576
00:22:08,120 --> 00:22:09,960
then you require it for deployment.
577
00:22:09,960 --> 00:22:12,200
That means the platform refuses to create resources
578
00:22:12,200 --> 00:22:14,760
that don't carry the minimum financial identity,
579
00:22:14,760 --> 00:22:18,600
owner, environment, and a product or cost center dimension.
580
00:22:18,600 --> 00:22:20,520
Not because those tags are magical,
581
00:22:20,520 --> 00:22:22,120
but because without them,
582
00:22:22,120 --> 00:22:24,600
the organization cannot route accountability
583
00:22:24,600 --> 00:22:28,120
and without routing, budgets and alerts become noise again.
584
00:22:28,120 --> 00:22:30,680
Azure gives you the enforcement mechanisms to do this
585
00:22:30,680 --> 00:22:32,200
and most enterprises still don't
586
00:22:32,200 --> 00:22:35,320
because they confuse being strict with being hostile.
587
00:22:35,320 --> 00:22:36,600
Here's what most people miss.
588
00:22:36,600 --> 00:22:38,600
Enforcement doesn't have to be punitive.
589
00:22:38,600 --> 00:22:39,960
It has to be deterministic.
590
00:22:39,960 --> 00:22:42,360
Use Azure policy, use deny where you must.
591
00:22:42,360 --> 00:22:44,280
Use modify where you can do it safely.
592
00:22:44,280 --> 00:22:46,600
Modify policies can auto-add baseline tags
593
00:22:46,600 --> 00:22:49,640
when they're missing or inherit tags from the resource group.
594
00:22:49,640 --> 00:22:54,200
That's a useful pattern when you can rely on a well-constructed resource group boundary
595
00:22:54,200 --> 00:22:56,760
but you can't treat inheritance as a substitute for governance.
596
00:22:56,760 --> 00:22:59,240
If teams can create arbitrary resource groups,
597
00:22:59,240 --> 00:23:02,200
then inheritance just moves the problem one layer down,
598
00:23:02,200 --> 00:23:03,720
so the correct posture is layered.
599
00:23:03,720 --> 00:23:08,280
Subscription-level tags establish ownership and budget responsibility.
600
00:23:08,280 --> 00:23:11,960
Resource group tags establish workload grouping and life cycle intent,
601
00:23:11,960 --> 00:23:15,240
resource tags handle exceptions and resource specific dimensions.
602
00:23:15,240 --> 00:23:16,680
And yes, you will need a taxonomy,
603
00:23:16,680 --> 00:23:19,720
but keep it small, six to eight tags that actually matter.
604
00:23:19,720 --> 00:23:22,120
Every extra tag is another chance for entropy.
605
00:23:22,120 --> 00:23:25,080
There's another uncomfortable truth hiding and tagging people lie
606
00:23:25,080 --> 00:23:26,440
when tags are optional.
607
00:23:26,440 --> 00:23:28,040
Not maliciously, operationally.
608
00:23:28,040 --> 00:23:30,600
If someone is blocked by a tag policy
609
00:23:30,600 --> 00:23:31,960
and they don't know the right value,
610
00:23:31,960 --> 00:23:33,960
they will pick something to get unblocked.
611
00:23:33,960 --> 00:23:36,040
That's why controlled vocabulary is matter.
612
00:23:36,040 --> 00:23:39,800
That's why free-text-owner tags turn into unknown and TBD and later.
613
00:23:39,800 --> 00:23:41,160
So if you want tagging to work,
614
00:23:41,160 --> 00:23:42,760
you don't just enforce presence.
615
00:23:42,760 --> 00:23:44,920
You enforce meaning, allowed values,
616
00:23:44,920 --> 00:23:47,480
normalized casing, clear ownership mapping,
617
00:23:47,480 --> 00:23:50,760
a real taxonomy tied to the org structure you actually operate,
618
00:23:50,760 --> 00:23:52,440
not the one in your HR system.
619
00:23:52,440 --> 00:23:53,320
And once you do that,
620
00:23:53,320 --> 00:23:55,640
once the platform refuses untag deployments,
621
00:23:55,640 --> 00:23:58,040
the entire FinOps conversation changes.
622
00:23:58,040 --> 00:24:00,440
Cost allocation stops being a quarterly negotiation.
623
00:24:00,440 --> 00:24:02,440
It becomes a boring mechanical process,
624
00:24:02,440 --> 00:24:03,640
which is exactly what you want,
625
00:24:03,640 --> 00:24:05,880
because boring means deterministic.
626
00:24:05,880 --> 00:24:07,960
Now to make this concrete,
627
00:24:07,960 --> 00:24:10,120
the next scenario is where tagging failure
628
00:24:10,120 --> 00:24:12,120
becomes financial archaeology,
629
00:24:12,120 --> 00:24:14,360
untagged resources, no ownership,
630
00:24:14,360 --> 00:24:16,600
and weeks of arguing about who pays.
631
00:24:16,600 --> 00:24:20,280
Scenario two, untagged resources and financial archaeology.
632
00:24:20,280 --> 00:24:22,600
Here's the scenario every enterprise recognizes,
633
00:24:22,600 --> 00:24:24,120
even if they pretend they don't.
634
00:24:24,120 --> 00:24:25,240
The cost spike shows up.
635
00:24:25,240 --> 00:24:26,840
Someone opens cost analysis.
636
00:24:26,840 --> 00:24:29,000
The top line item isn't a clear application name.
637
00:24:29,000 --> 00:24:31,160
It's a storage account with a random suffix,
638
00:24:31,160 --> 00:24:32,680
or a log analytics workspace,
639
00:24:32,680 --> 00:24:35,080
or a database server named like a developer sneezed
640
00:24:35,080 --> 00:24:36,040
on the keyboard,
641
00:24:36,040 --> 00:24:37,320
and the tags are empty.
642
00:24:37,320 --> 00:24:39,880
In the before posture, tagging was recommended,
643
00:24:39,880 --> 00:24:41,960
which means it was ignored whenever delivery pressure
644
00:24:41,960 --> 00:24:43,560
was higher than etiquette.
645
00:24:43,560 --> 00:24:45,240
Finance can't allocate the cost.
646
00:24:45,240 --> 00:24:47,320
Engineering can't tell who owns the resource.
647
00:24:47,320 --> 00:24:49,160
The platform team can't safely delete it.
648
00:24:49,160 --> 00:24:52,760
So everyone does the only thing the enterprise teaches them to do.
649
00:24:52,760 --> 00:24:54,040
They investigate slowly,
650
00:24:54,040 --> 00:24:55,000
and they keep paying.
651
00:24:55,000 --> 00:24:57,560
This is what financial archaeology looks like in practice.
652
00:24:57,560 --> 00:25:01,160
First, someone tries to infer ownership from the resource name
653
00:25:01,160 --> 00:25:03,640
that fails because naming standards are aspirational
654
00:25:03,640 --> 00:25:05,000
and time erodes them.
655
00:25:05,000 --> 00:25:07,880
Then they search activity logs for who created it.
656
00:25:07,880 --> 00:25:09,000
That fails in two common ways.
657
00:25:09,000 --> 00:25:10,600
The identity is a service principle shared
658
00:25:10,600 --> 00:25:11,800
by multiple pipelines,
659
00:25:11,800 --> 00:25:13,080
or the creator left the company.
660
00:25:13,080 --> 00:25:14,360
Then they look for connections,
661
00:25:14,360 --> 00:25:16,440
peering private endpoint diagnostic settings,
662
00:25:16,440 --> 00:25:18,520
linked workspaces to determine impact.
663
00:25:18,520 --> 00:25:19,720
That becomes a graph problem,
664
00:25:19,720 --> 00:25:21,960
and graph problems don't finish in a meeting.
665
00:25:21,960 --> 00:25:24,360
So the resource survives, the cost continues.
666
00:25:24,360 --> 00:25:26,760
And a week later you have a second untanked resource
667
00:25:26,760 --> 00:25:28,200
because the system learned nothing.
668
00:25:28,200 --> 00:25:29,480
Here's the key insight.
669
00:25:29,480 --> 00:25:31,960
Untanked resources don't just prevent allocation.
670
00:25:31,960 --> 00:25:33,640
They prevent intervention.
671
00:25:33,640 --> 00:25:37,400
Because deletion in an enterprise is a political act disguised as a technical act.
672
00:25:37,400 --> 00:25:39,480
If you can't name the owner, you can't escalate.
673
00:25:39,480 --> 00:25:41,400
If you can't escalate, you can't get approval.
674
00:25:41,400 --> 00:25:43,320
If you can't get approval, you don't delete.
675
00:25:43,320 --> 00:25:45,640
The resource becomes too risky to touch,
676
00:25:45,640 --> 00:25:47,720
which is the most expensive category in Azure.
677
00:25:47,720 --> 00:25:51,560
Now the after-poster is not, we reminded people harder.
678
00:25:51,560 --> 00:25:54,280
It's enforced tagging as a deployment precondition.
679
00:25:54,280 --> 00:25:57,160
In production, the platform denies resource creation
680
00:25:57,160 --> 00:25:59,400
when the minimum financial identity is missing.
681
00:25:59,400 --> 00:26:01,640
Not later, not in a monthly report.
682
00:26:01,640 --> 00:26:03,400
At the moment of creation, that sounds harsh
683
00:26:03,400 --> 00:26:05,320
until you realize what it actually does.
684
00:26:05,320 --> 00:26:07,640
It forces the ownership conversation to happen
685
00:26:07,640 --> 00:26:09,240
while change is still cheap.
686
00:26:09,240 --> 00:26:11,240
When the engineer is still at their keyboard.
687
00:26:11,240 --> 00:26:13,240
When the pipeline can still fail fast.
688
00:26:13,240 --> 00:26:16,200
When the workload is still a proposal, not a dependency.
689
00:26:16,200 --> 00:26:18,600
And yes, you will use two different policy effects
690
00:26:18,600 --> 00:26:20,520
depending on what you're protecting.
691
00:26:20,520 --> 00:26:23,320
For baseline tags that are safe to apply universally,
692
00:26:23,320 --> 00:26:25,560
you use modify to add or normalize.
693
00:26:25,560 --> 00:26:28,440
A common pattern is to inherit owner and cost center
694
00:26:28,440 --> 00:26:30,280
from the subscription or resource group
695
00:26:30,280 --> 00:26:32,120
where that identity is already declared.
696
00:26:32,120 --> 00:26:34,120
That's how you avoid making every engineer type
697
00:26:34,120 --> 00:26:36,280
the same metadata 400 times.
698
00:26:36,280 --> 00:26:37,560
But for production workloads,
699
00:26:37,560 --> 00:26:40,280
you also use deny for missing or invalid tags.
700
00:26:40,280 --> 00:26:42,360
Because allocation that depends on best effort
701
00:26:42,360 --> 00:26:44,920
is just a slower version of untagged chaos.
702
00:26:44,920 --> 00:26:46,520
This is where value standards matter.
703
00:26:46,520 --> 00:26:50,360
If the tag key exists, but the value is TBD, nothing improved.
704
00:26:50,360 --> 00:26:52,680
So you constrain values, controlled casing,
705
00:26:52,680 --> 00:26:55,240
allowed environments, approved cost centers,
706
00:26:55,240 --> 00:26:57,240
known owner formats, it's not bureaucracy,
707
00:26:57,240 --> 00:26:59,560
it's how you keep allocation deterministic.
708
00:26:59,560 --> 00:27:01,560
Now the architecture move that makes this stick
709
00:27:01,560 --> 00:27:04,120
is to normalize tags to ownership boundaries.
710
00:27:04,120 --> 00:27:05,960
The subscription holds accountable ownership
711
00:27:05,960 --> 00:27:07,720
and budget responsibility.
712
00:27:07,720 --> 00:27:09,720
The resource group holds workload grouping
713
00:27:09,720 --> 00:27:11,080
and life cycle intent.
714
00:27:11,080 --> 00:27:13,640
Individual resources only carry special cases
715
00:27:13,640 --> 00:27:15,400
because special cases multiply.
716
00:27:15,400 --> 00:27:19,000
If you don't build that hierarchy, tags become another entropy generator,
717
00:27:19,000 --> 00:27:22,280
endlessly debated, inconsistently applied, and never trusted.
718
00:27:22,280 --> 00:27:25,480
With enforcement in place, the operational behavior changes immediately.
719
00:27:25,480 --> 00:27:27,640
Engineering stops treating tags like paperwork
720
00:27:27,640 --> 00:27:30,840
because the platform refuses to deploy without them.
721
00:27:30,840 --> 00:27:33,640
Finance stops treating allocation like a quarterly negotiation
722
00:27:33,640 --> 00:27:35,160
because the data is complete.
723
00:27:35,160 --> 00:27:38,360
And leadership stops hearing we can't tell as an excuse
724
00:27:38,360 --> 00:27:41,640
because the system no longer allows we can't tell resources
725
00:27:41,640 --> 00:27:43,240
to exist in the first place.
726
00:27:43,240 --> 00:27:45,400
The measurable outcome isn't a fantasy percentage,
727
00:27:45,400 --> 00:27:47,160
it's something you can actually verify.
728
00:27:47,160 --> 00:27:49,480
The unallocated bucket shrinks towards zero,
729
00:27:49,480 --> 00:27:50,680
not because people got better
730
00:27:50,680 --> 00:27:53,560
because the control plane stopped accepting ambiguity.
731
00:27:53,560 --> 00:27:55,400
And once attribution becomes boring,
732
00:27:55,400 --> 00:27:57,080
showback and chargeback become boring too,
733
00:27:57,080 --> 00:27:58,120
which is the entire point,
734
00:27:58,120 --> 00:28:01,240
you want the cost conversation to be factual, not political.
735
00:28:01,240 --> 00:28:03,240
Now there's a second order consequence
736
00:28:03,240 --> 00:28:05,400
that shows up right after tagging gets enforced.
737
00:28:05,400 --> 00:28:07,400
Once teams can't hide behind ambiguity,
738
00:28:07,400 --> 00:28:09,160
they hide behind safety.
739
00:28:09,160 --> 00:28:11,240
They start over-provisioning by default
740
00:28:11,240 --> 00:28:12,920
because cost is now visible,
741
00:28:12,920 --> 00:28:15,720
but operational risk still hurts more than the bill.
742
00:28:15,720 --> 00:28:17,000
That's the next failure mode,
743
00:28:17,000 --> 00:28:19,560
premium tiers and multi-region by reflex.
744
00:28:19,560 --> 00:28:22,440
Scenario three, pass over-provisioning by default.
745
00:28:22,440 --> 00:28:24,600
Once tagging and ownership become real,
746
00:28:24,600 --> 00:28:26,200
teams lose the ability to hide,
747
00:28:26,200 --> 00:28:27,480
so they switch strategies.
748
00:28:27,480 --> 00:28:29,000
They hide behind safety.
749
00:28:29,000 --> 00:28:31,640
This is where power turns into a quiet budget murderer
750
00:28:31,640 --> 00:28:34,280
because power as defaults are easy to justify
751
00:28:34,280 --> 00:28:35,800
and hard to unwind.
752
00:28:35,800 --> 00:28:37,880
A developer doesn't have to rack servers anymore.
753
00:28:37,880 --> 00:28:39,640
They click a tier, they pick redundancy,
754
00:28:39,640 --> 00:28:41,720
they enable the features that sound responsible
755
00:28:41,720 --> 00:28:44,840
and nobody stops them because the platform treats premium
756
00:28:44,840 --> 00:28:46,600
as just another valid choice.
757
00:28:46,600 --> 00:28:49,320
In the before posture, over-provisioning isn't malicious.
758
00:28:49,320 --> 00:28:52,040
It's rational engineers are accountable for availability.
759
00:28:52,040 --> 00:28:53,640
They get paged for latency,
760
00:28:53,640 --> 00:28:54,920
they get blamed for downtime,
761
00:28:54,920 --> 00:28:57,480
they do not get praised for choosing the cheaper SKU.
762
00:28:57,480 --> 00:28:59,160
So when faced with uncertainty,
763
00:28:59,160 --> 00:29:01,960
they pick the tier that reduces operational risk,
764
00:29:01,960 --> 00:29:04,120
premium database tier just in case.
765
00:29:04,120 --> 00:29:05,800
Multisone, because what if?
766
00:29:05,800 --> 00:29:09,000
Multiregion because the business might need it later.
767
00:29:09,000 --> 00:29:11,400
Diagnostic retention because security might ask,
768
00:29:11,400 --> 00:29:14,200
each of those decisions can be individually defensible.
769
00:29:14,200 --> 00:29:16,200
Collectively, they are financial entropy
770
00:29:16,200 --> 00:29:17,400
and PAS makes it worse
771
00:29:17,400 --> 00:29:19,800
because it's designed to abstract capacity decisions.
772
00:29:19,800 --> 00:29:21,080
That's the selling point.
773
00:29:21,080 --> 00:29:22,760
But abstraction doesn't remove cost.
774
00:29:22,760 --> 00:29:24,680
It removes friction and when you remove friction
775
00:29:24,680 --> 00:29:27,560
in a large organization, consumption expands
776
00:29:27,560 --> 00:29:28,680
until a boundary stops it.
777
00:29:28,680 --> 00:29:30,280
Most enterprises don't build that boundary.
778
00:29:30,280 --> 00:29:32,600
They treat PAS like it's inherently optimized
779
00:29:32,600 --> 00:29:34,680
because Microsoft marketing implies that it is.
780
00:29:34,680 --> 00:29:35,240
It is not.
781
00:29:35,240 --> 00:29:38,040
It is a set of cost curves you must choose deliberately.
782
00:29:38,040 --> 00:29:39,800
So here's the real failure mechanism.
783
00:29:39,800 --> 00:29:42,120
Teams externalize the cost of safety.
784
00:29:42,120 --> 00:29:43,480
They buy safety with your budget
785
00:29:43,480 --> 00:29:44,600
and the platform lets them.
786
00:29:44,600 --> 00:29:46,520
The after-posture isn't telling engineers
787
00:29:46,520 --> 00:29:47,640
to be more careful.
788
00:29:47,640 --> 00:29:49,640
It's forcing the platform to distinguish
789
00:29:49,640 --> 00:29:51,400
between environments and intents.
790
00:29:51,400 --> 00:29:52,280
Dev is not prod.
791
00:29:52,280 --> 00:29:53,240
Test is not prod.
792
00:29:53,240 --> 00:29:55,320
A sandbox is not a customer facing service.
793
00:29:55,320 --> 00:29:57,720
If you allow the same skill catalog everywhere,
794
00:29:57,720 --> 00:30:00,040
you are telling every team in every environment
795
00:30:00,040 --> 00:30:01,800
that the enterprise is comfortable paying
796
00:30:01,800 --> 00:30:03,640
for worst-case assumptions by default.
797
00:30:03,640 --> 00:30:05,480
That is not governance.
798
00:30:05,480 --> 00:30:07,240
That is surrender.
799
00:30:07,240 --> 00:30:09,880
The control is simple and it's always unpopular at first.
800
00:30:09,880 --> 00:30:11,480
Allow the skills per environment.
801
00:30:11,480 --> 00:30:13,720
In non-production, deny premium tiers
802
00:30:13,720 --> 00:30:15,640
unless there is an explicit exception.
803
00:30:15,640 --> 00:30:18,200
In production, don't deny everything.
804
00:30:18,200 --> 00:30:21,080
But constrain the choices to a set you can defend.
805
00:30:21,080 --> 00:30:23,080
Tears that match real SLOs,
806
00:30:23,080 --> 00:30:24,840
actual throughput requirements
807
00:30:24,840 --> 00:30:27,400
and resilience patterns you've agreed to pay for.
808
00:30:27,400 --> 00:30:29,560
This is where Azure Policy stops being compliance
809
00:30:29,560 --> 00:30:31,400
and starts being cost engineering.
810
00:30:31,400 --> 00:30:34,360
You define policy rules that deny specific SKUs
811
00:30:34,360 --> 00:30:37,240
or deny specific features outside approved scopes.
812
00:30:37,240 --> 00:30:38,200
You restrict regions.
813
00:30:38,200 --> 00:30:41,000
You restrict redundancy options where they don't make sense.
814
00:30:41,000 --> 00:30:42,600
You can also enforce patterns.
815
00:30:42,600 --> 00:30:45,080
Prod databases must have backups configured.
816
00:30:45,080 --> 00:30:47,800
But dev databases must not be zone redundant
817
00:30:47,800 --> 00:30:50,680
because zone redundancy in dev is just expensive cosplay.
818
00:30:50,680 --> 00:30:53,080
And yes, someone will argue that policy can't cover
819
00:30:53,080 --> 00:30:54,520
every past configuration perfectly.
820
00:30:54,520 --> 00:30:55,320
That's true.
821
00:30:55,320 --> 00:30:57,080
But the point isn't perfect coverage.
822
00:30:57,080 --> 00:30:59,000
The point is removing default freedom
823
00:30:59,000 --> 00:31:01,560
where default freedom creates default overspend.
824
00:31:01,560 --> 00:31:03,560
Now the weird part, exceptions don't go away.
825
00:31:03,560 --> 00:31:04,520
They never do.
826
00:31:04,520 --> 00:31:06,840
So you treat exceptions as what they are.
827
00:31:06,840 --> 00:31:08,200
Entropy generators.
828
00:31:08,200 --> 00:31:11,160
An exception should be tracked, justified, and time-boxed.
829
00:31:11,160 --> 00:31:13,640
Not because the justification is morally important,
830
00:31:13,640 --> 00:31:15,400
but because time is the only thing
831
00:31:15,400 --> 00:31:17,960
that prevents an exception from becoming the new baseline.
832
00:31:17,960 --> 00:31:20,920
An exception without an expiry is policy rot.
833
00:31:20,920 --> 00:31:23,800
A premium SKU approval that lasts forever is not an approval.
834
00:31:23,800 --> 00:31:25,240
It's a quiet surrender
835
00:31:25,240 --> 00:31:27,800
that the platform will remember longer than your org chart.
836
00:31:27,800 --> 00:31:30,600
So the after-poster includes an exception workflow.
837
00:31:30,600 --> 00:31:34,120
When a team needs a premium tier in a non-prod subscription,
838
00:31:34,120 --> 00:31:34,920
they request it.
839
00:31:34,920 --> 00:31:37,240
They state why they pick an expiry date.
840
00:31:37,240 --> 00:31:39,800
The platform team grants an exemption at the policy layer
841
00:31:39,800 --> 00:31:42,600
not by giving someone owner and hoping they behave.
842
00:31:42,600 --> 00:31:45,000
Then the exemption expires automatically
843
00:31:45,000 --> 00:31:47,160
and the team has to renew it with intent
844
00:31:47,160 --> 00:31:48,680
or fall back to the baseline.
845
00:31:48,680 --> 00:31:50,440
That's how you stop premium sprawl
846
00:31:50,440 --> 00:31:52,200
from becoming permanent architecture.
847
00:31:52,200 --> 00:31:53,880
The outcome isn't just cost reduction.
848
00:31:53,880 --> 00:31:55,480
It's fewer emergency rollbacks.
849
00:31:55,480 --> 00:31:58,200
Because the enterprise stops discovering cost explosions
850
00:31:58,200 --> 00:31:59,000
after they happen.
851
00:31:59,000 --> 00:32:00,520
It discovers them at deploy time.
852
00:32:00,520 --> 00:32:01,640
The pipeline fails.
853
00:32:01,640 --> 00:32:03,160
The team sees the denial.
854
00:32:03,160 --> 00:32:05,080
They either adjust to the approved tier
855
00:32:05,080 --> 00:32:06,760
or escalate with a conscious decision.
856
00:32:06,760 --> 00:32:10,280
That is a healthier failure mode than an invoice time surprise.
857
00:32:10,280 --> 00:32:12,040
And it has a second order benefit.
858
00:32:12,040 --> 00:32:13,960
The team starts designing for efficiency
859
00:32:13,960 --> 00:32:15,880
because the platform forces them too.
860
00:32:15,880 --> 00:32:19,000
If premium is harder to get, engineers invest in better indexing,
861
00:32:19,000 --> 00:32:20,680
better caching, better query patterns,
862
00:32:20,680 --> 00:32:21,960
better scaling strategies.
863
00:32:21,960 --> 00:32:23,400
Not because they became saints,
864
00:32:23,400 --> 00:32:25,800
because the control plane changed the incentives.
865
00:32:25,800 --> 00:32:27,640
Now there's one more place where this pattern
866
00:32:27,640 --> 00:32:29,960
becomes pathological non-production.
867
00:32:29,960 --> 00:32:32,360
Because non-prod is where teams feel the least financial pain
868
00:32:32,360 --> 00:32:34,520
so it becomes the landfill for over-provisioning,
869
00:32:34,520 --> 00:32:36,840
abandoned experiments and temporary environments
870
00:32:36,840 --> 00:32:37,560
that never die.
871
00:32:37,560 --> 00:32:38,280
That's next.
872
00:32:38,280 --> 00:32:39,240
scenario 4.
873
00:32:39,240 --> 00:32:41,160
Unbounded non-production spend.
874
00:32:41,160 --> 00:32:44,040
Non-production is where Azure budgets go to be embarrassed.
875
00:32:44,040 --> 00:32:45,240
In the before posture,
876
00:32:45,240 --> 00:32:47,800
Devin test are treated like free real estate.
877
00:32:47,800 --> 00:32:49,080
It's not prod,
878
00:32:49,080 --> 00:32:51,080
so nobody bothers with budgets.
879
00:32:51,080 --> 00:32:52,280
Nobody sets thresholds.
880
00:32:52,280 --> 00:32:54,520
Nobody defines what done means for an environment.
881
00:32:54,520 --> 00:32:57,480
And because nobody gets paged when dev costs spike,
882
00:32:57,480 --> 00:32:59,400
the platform quietly turns non-prod
883
00:32:59,400 --> 00:33:02,760
into the largest, least defended surface area of recurring spend.
884
00:33:02,760 --> 00:33:04,440
The failure pattern is always the same.
885
00:33:04,440 --> 00:33:06,920
A team spins up a full stack environment for a sprint.
886
00:33:06,920 --> 00:33:08,360
It was supposed to be temporary.
887
00:33:08,360 --> 00:33:09,080
It isn't.
888
00:33:09,080 --> 00:33:10,920
Another team creates a parallel environment
889
00:33:10,920 --> 00:33:12,600
because the first one is messy
890
00:33:12,600 --> 00:33:14,040
and they don't want to touch it.
891
00:33:14,040 --> 00:33:17,000
Someone enables extra diagnostics just for troubleshooting
892
00:33:17,000 --> 00:33:18,280
and never turns it off.
893
00:33:18,280 --> 00:33:19,960
An internal demo needs more capacity
894
00:33:19,960 --> 00:33:22,520
so the tier gets bumped up and never comes back down.
895
00:33:22,520 --> 00:33:24,360
Then a few of these environments get connected
896
00:33:24,360 --> 00:33:25,720
to shared services,
897
00:33:25,720 --> 00:33:26,920
log ingestion,
898
00:33:26,920 --> 00:33:28,520
private endpoints, hubs,
899
00:33:28,520 --> 00:33:31,080
and now even deleting the compute doesn't stop the bill
900
00:33:31,080 --> 00:33:33,000
because the dependencies keep running.
901
00:33:33,000 --> 00:33:35,800
And because non-prod is full of experimentation,
902
00:33:35,800 --> 00:33:38,200
nobody wants to be the person who deletes the wrong thing.
903
00:33:38,200 --> 00:33:39,400
So they don't delete anything.
904
00:33:39,400 --> 00:33:41,400
That's why non-prod becomes a cost landfill.
905
00:33:41,400 --> 00:33:43,400
It's the intersection of low accountability,
906
00:33:43,400 --> 00:33:44,520
high-change velocity,
907
00:33:44,520 --> 00:33:46,200
and fear-driven retention.
908
00:33:46,200 --> 00:33:48,760
Those three forces always produce the same outcome.
909
00:33:48,760 --> 00:33:51,320
Resources outlive the work that justified them.
910
00:33:51,320 --> 00:33:54,840
This is also where the enterprise commits its most common cost lie.
911
00:33:54,840 --> 00:33:56,600
It's cheap compared to prod.
912
00:33:56,600 --> 00:33:58,680
That statement is usually true in isolation.
913
00:33:58,680 --> 00:34:00,280
It's just irrelevant.
914
00:34:00,280 --> 00:34:01,880
Non-prod isn't supposed to be cheap.
915
00:34:01,880 --> 00:34:03,320
It's supposed to be bounded.
916
00:34:03,320 --> 00:34:06,120
The point of non-prod is to support delivery
917
00:34:06,120 --> 00:34:07,080
with a known purpose
918
00:34:07,080 --> 00:34:08,920
and a known financial blast radius.
919
00:34:08,920 --> 00:34:11,960
If dev test can run indefinitely at any SKU
920
00:34:11,960 --> 00:34:14,040
with no budget and no life cycle rule,
921
00:34:14,040 --> 00:34:16,200
then it stops being a delivery capability
922
00:34:16,200 --> 00:34:18,760
and becomes a parallel cloud estate with no governance.
923
00:34:18,760 --> 00:34:20,920
In other words, you build a second as your environment
924
00:34:20,920 --> 00:34:21,960
inside your first one.
925
00:34:21,960 --> 00:34:24,120
So the after-poster starts with the most boring
926
00:34:24,120 --> 00:34:25,800
but effective design move.
927
00:34:25,800 --> 00:34:27,560
Separate non-production subscriptions.
928
00:34:27,560 --> 00:34:29,320
Not resource groups, not tags.
929
00:34:29,320 --> 00:34:30,600
Subscriptions.
930
00:34:30,600 --> 00:34:34,040
When non-prod lives inside the same subscription as prod,
931
00:34:34,040 --> 00:34:36,200
it inherits prod-level permissions,
932
00:34:36,200 --> 00:34:39,080
prod-level SKU freedom and prod-level ambiguity.
933
00:34:39,080 --> 00:34:41,720
People will also use prod subscriptions
934
00:34:41,720 --> 00:34:43,720
for non-prod work temporarily
935
00:34:43,720 --> 00:34:44,760
because it's convenient.
936
00:34:44,760 --> 00:34:47,080
Separate subscriptions remove that pathway.
937
00:34:47,080 --> 00:34:48,280
They create a clean boundary
938
00:34:48,280 --> 00:34:51,080
where policies can be strict without breaking production.
939
00:34:51,080 --> 00:34:53,320
And now you do something enterprises almost never do.
940
00:34:53,320 --> 00:34:56,600
You treat non-prod budgets as aggressive by design.
941
00:34:56,600 --> 00:34:58,360
Non-prod budgets are not there to track.
942
00:34:58,360 --> 00:35:00,200
They are there to interrupt behavior early.
943
00:35:00,200 --> 00:35:02,920
50% and 70% thresholds aren't warnings.
944
00:35:02,920 --> 00:35:04,040
They are scheduled friction.
945
00:35:04,040 --> 00:35:06,200
They force someone to explain why dev is burning through.
946
00:35:06,200 --> 00:35:08,520
It's expected envelope halfway through the cycle.
947
00:35:08,520 --> 00:35:10,360
And because it's non-prod, you can actually act.
948
00:35:10,360 --> 00:35:11,320
You can scale down.
949
00:35:11,320 --> 00:35:12,360
You can shut things off.
950
00:35:12,360 --> 00:35:13,480
You can delete environments.
951
00:35:13,480 --> 00:35:14,760
You can deny premium tiers.
952
00:35:14,760 --> 00:35:16,040
You can restrict regions.
953
00:35:16,040 --> 00:35:17,640
You can enforce short-lock retention.
954
00:35:17,640 --> 00:35:19,480
You can stop pretending that a dev environment
955
00:35:19,480 --> 00:35:22,840
needs the same resilience posture as customer facing revenue.
956
00:35:22,840 --> 00:35:25,400
This is where automation stops being optimization
957
00:35:25,400 --> 00:35:27,720
and becomes enforcement amplification.
958
00:35:27,720 --> 00:35:29,160
The default posture in non-prod
959
00:35:29,160 --> 00:35:30,360
should be that things turn off
960
00:35:30,360 --> 00:35:32,120
unless someone actively keeps them on.
961
00:35:32,120 --> 00:35:35,640
Schedules, auto shutdown, life cycle rules, whatever mechanism you use,
962
00:35:35,640 --> 00:35:37,000
the intent is the same.
963
00:35:37,000 --> 00:35:40,760
The platform should require explicit justification for idle runtime
964
00:35:40,760 --> 00:35:42,760
because idle runtime is not innovation.
965
00:35:42,760 --> 00:35:43,560
It's just billing.
966
00:35:43,560 --> 00:35:46,680
Now there's a predictable pushback here.
967
00:35:46,680 --> 00:35:48,600
But developers need flexibility.
968
00:35:48,600 --> 00:35:49,240
Yes, they do.
969
00:35:49,240 --> 00:35:50,920
That's why the goal isn't to ban spend.
970
00:35:50,920 --> 00:35:52,120
It's to encode purposes.
971
00:35:52,120 --> 00:35:55,400
If a team needs a larger environment for performance testing, fine.
972
00:35:55,400 --> 00:35:57,240
But it should happen through an approved path.
973
00:35:57,240 --> 00:35:59,320
Time boxed, budgeted, and visible.
974
00:35:59,320 --> 00:36:01,000
The platform can allow the exception
975
00:36:01,000 --> 00:36:02,760
while keeping the baseline strict.
976
00:36:02,760 --> 00:36:05,560
Without that, flexibility just becomes another word
977
00:36:05,560 --> 00:36:07,080
for architectural erosion.
978
00:36:07,080 --> 00:36:08,760
The outcome is also predictable
979
00:36:08,760 --> 00:36:11,720
and it's measurable without inventing magic savings numbers.
980
00:36:11,720 --> 00:36:14,680
First, you reduce the number of long-lived idle environments
981
00:36:14,680 --> 00:36:16,600
because they get shut down by default.
982
00:36:16,600 --> 00:36:18,760
Second, you surface rogue environments early
983
00:36:18,760 --> 00:36:20,920
because budgets fire when spent deviates
984
00:36:20,920 --> 00:36:23,400
and the deviation can't hide in a shared scope.
985
00:36:23,400 --> 00:36:25,800
Third, you force teams to make conscious choices
986
00:36:25,800 --> 00:36:27,320
about what they're paying for in non-prod
987
00:36:27,320 --> 00:36:28,920
which changes behavior faster
988
00:36:28,920 --> 00:36:31,320
than any cost-awareness campaign ever will.
989
00:36:31,320 --> 00:36:32,680
And here's the real payoff.
990
00:36:32,680 --> 00:36:35,640
The organization stops treating dev tests as a junk draw.
991
00:36:35,640 --> 00:36:38,680
Non-production becomes what it was supposed to be
992
00:36:38,680 --> 00:36:41,240
and intentionally bounded delivery capability.
993
00:36:41,240 --> 00:36:43,240
Not an unbounded parallel cloud estate.
994
00:36:43,240 --> 00:36:45,880
Now, even with clean, non-prod boundaries,
995
00:36:45,880 --> 00:36:47,960
there's still one category of spend
996
00:36:47,960 --> 00:36:51,560
that loves to evade accountability, shared platform services.
997
00:36:51,560 --> 00:36:54,760
Because once you centralize networking, logging, and security,
998
00:36:54,760 --> 00:36:57,160
you've built a cost engine that every team depends on
999
00:36:57,160 --> 00:36:58,280
but few teams can see.
1000
00:36:58,280 --> 00:36:59,560
That's the next failure mode.
1001
00:36:59,560 --> 00:37:03,800
Scenario 5, shared platform services with no cost signal,
1002
00:37:03,800 --> 00:37:07,560
shared platform services are where good intentions go to inflate quietly.
1003
00:37:07,560 --> 00:37:10,360
In the before posture, the organization centralizes
1004
00:37:10,360 --> 00:37:14,200
the expensive fundamentals, hub networking, firewalls,
1005
00:37:14,200 --> 00:37:18,600
private DNS, log analytics, Sentinel, Defender plans,
1006
00:37:18,600 --> 00:37:21,800
central key vault patterns, shared container registries,
1007
00:37:21,800 --> 00:37:25,160
monitoring pipelines, maybe an enterprise API gateway.
1008
00:37:25,160 --> 00:37:27,400
All of it is deployed for everyone,
1009
00:37:27,400 --> 00:37:29,560
which sounds efficient and it can be.
1010
00:37:29,560 --> 00:37:32,760
But the cost signal usually disappears the moment it becomes shared
1011
00:37:32,760 --> 00:37:35,000
because those services are built to somewhere,
1012
00:37:35,000 --> 00:37:36,600
a platform subscription,
1013
00:37:36,600 --> 00:37:39,560
a connectivity subscription, a management subscription,
1014
00:37:39,560 --> 00:37:42,280
a catch all that's treated like a necessary tax.
1015
00:37:42,280 --> 00:37:45,080
The application team's consumer, depend on it,
1016
00:37:45,080 --> 00:37:47,240
and then optimize only their own resource groups
1017
00:37:47,240 --> 00:37:49,560
because that's what they can see and what they're measured on.
1018
00:37:49,560 --> 00:37:53,240
So the platform spend becomes a black hole with a justification attached.
1019
00:37:53,240 --> 00:37:55,400
Here's the operational behavior that follows.
1020
00:37:55,400 --> 00:37:58,680
Logangestion grows because every team enables diagnostics
1021
00:37:58,680 --> 00:38:01,000
at maximum verbosity temporarily,
1022
00:38:01,000 --> 00:38:02,920
and nobody owns the retention curve.
1023
00:38:02,920 --> 00:38:06,600
Network egress grows because architecture sprawl across regions,
1024
00:38:06,600 --> 00:38:09,880
vnet's peer-like IV and traffic routes get fixed
1025
00:38:09,880 --> 00:38:11,640
in ways that are correct for availability
1026
00:38:11,640 --> 00:38:13,400
but catastrophic for cost.
1027
00:38:13,400 --> 00:38:17,640
Security tooling grows because every new capability adds another build meter,
1028
00:38:17,640 --> 00:38:20,600
and the platform team is incentivized to be safer, not cheaper,
1029
00:38:20,600 --> 00:38:22,280
and because it's shared, nobody feels it.
1030
00:38:22,280 --> 00:38:24,120
The app team doesn't feel the firewall build,
1031
00:38:24,120 --> 00:38:27,000
the platform team doesn't feel the app team's birth traffic,
1032
00:38:27,000 --> 00:38:29,720
finance sees a line item labeled platform,
1033
00:38:29,720 --> 00:38:32,440
and gets told, "It's foundational," which is true.
1034
00:38:32,440 --> 00:38:34,760
It's also not an excuse for unbounded growth.
1035
00:38:34,760 --> 00:38:37,400
This is the core governance failure of shared services.
1036
00:38:37,400 --> 00:38:39,400
The enterprise funds a cost engine
1037
00:38:39,400 --> 00:38:42,600
without attaching economic feedback to the consumers of that engine,
1038
00:38:42,600 --> 00:38:45,800
without feedback consumption expands until it hits a crisis.
1039
00:38:45,800 --> 00:38:48,120
Then the crisis is framed as Azure is expensive
1040
00:38:48,120 --> 00:38:51,240
when it's actually shared services are unmetered internally.
1041
00:38:51,240 --> 00:38:55,240
The after-poster is not split everything into separate subscriptions.
1042
00:38:55,240 --> 00:38:56,200
That's not the point.
1043
00:38:56,200 --> 00:38:59,800
The point is to make shared costs legible and attributable
1044
00:38:59,800 --> 00:39:02,520
even when the underlying service must remain centralized.
1045
00:39:02,520 --> 00:39:05,080
So the first step is explicit platform subscriptions
1046
00:39:05,080 --> 00:39:06,840
with explicit accountability,
1047
00:39:06,840 --> 00:39:08,280
not the cloud team owns it,
1048
00:39:08,280 --> 00:39:10,840
a name platform owner, a documented service catalog,
1049
00:39:10,840 --> 00:39:13,320
a declared budget, early thresholds,
1050
00:39:13,320 --> 00:39:15,000
the same governance rules you demanded
1051
00:39:15,000 --> 00:39:16,600
for every workload subscription.
1052
00:39:16,600 --> 00:39:18,360
Because shared services don't get a pass,
1053
00:39:18,360 --> 00:39:21,880
they are the highest risk-spend category in the entire estate.
1054
00:39:21,880 --> 00:39:24,440
Then you add the piece everyone avoids.
1055
00:39:24,440 --> 00:39:27,720
An allocation model, not perfect, not theoretically pure,
1056
00:39:27,720 --> 00:39:30,680
just consistent, defensible, and repeatable.
1057
00:39:30,680 --> 00:39:33,080
Some shared costs can be allocated by usage.
1058
00:39:33,080 --> 00:39:35,480
Log analytics ingestion, sentinel data,
1059
00:39:35,480 --> 00:39:37,800
firewall processing metrics in some cases,
1060
00:39:37,800 --> 00:39:40,040
egressed by workload if you collect flow logs
1061
00:39:40,040 --> 00:39:42,600
and map them back to subscriptions or v-nets.
1062
00:39:42,600 --> 00:39:45,000
If you can measure consumption, allocate by consumption.
1063
00:39:45,000 --> 00:39:47,960
But many shared costs can't be allocated cleanly,
1064
00:39:47,960 --> 00:39:49,720
without building an internal billing system
1065
00:39:49,720 --> 00:39:50,920
that nobody wants.
1066
00:39:50,920 --> 00:39:53,640
So you use proportional allocation where you must.
1067
00:39:53,640 --> 00:39:55,240
Percentage by subscription spend,
1068
00:39:55,240 --> 00:39:58,040
percentage by headcount, percentage by throughput class,
1069
00:39:58,040 --> 00:39:59,560
whatever the business will accept
1070
00:39:59,560 --> 00:40:02,280
is stable enough to create a feedback loop.
1071
00:40:02,280 --> 00:40:04,840
The critical requirement isn't mathematical perfection.
1072
00:40:04,840 --> 00:40:06,520
The requirement is that shared spend
1073
00:40:06,520 --> 00:40:07,880
stops being invisible.
1074
00:40:07,880 --> 00:40:09,480
Because invisibility is what lets it grow
1075
00:40:09,480 --> 00:40:10,440
without design review.
1076
00:40:10,440 --> 00:40:12,280
This is also where showback and chargeback
1077
00:40:12,280 --> 00:40:14,120
stop being philosophical arguments
1078
00:40:14,120 --> 00:40:15,880
and become engineering inputs.
1079
00:40:15,880 --> 00:40:18,280
If an application team sees that their architecture is driving
1080
00:40:18,280 --> 00:40:20,040
a disproportionate share of log ingestion
1081
00:40:20,040 --> 00:40:21,160
or cross-region traffic,
1082
00:40:21,160 --> 00:40:23,240
the next design discussion changes.
1083
00:40:23,240 --> 00:40:25,480
Suddenly retention settings, sampling strategies,
1084
00:40:25,480 --> 00:40:27,320
diagnostics scope, entropology,
1085
00:40:27,320 --> 00:40:29,400
are not abstract platform concerns,
1086
00:40:29,400 --> 00:40:32,040
but their product decisions with financial consequences.
1087
00:40:32,040 --> 00:40:34,440
And yes, some teams will complain that it's unfair.
1088
00:40:34,440 --> 00:40:35,000
Good.
1089
00:40:35,000 --> 00:40:36,680
Fairness complaints are often the first sign
1090
00:40:36,680 --> 00:40:38,520
that cost signals are finally reaching the people
1091
00:40:38,520 --> 00:40:39,640
making the trade-offs.
1092
00:40:39,640 --> 00:40:42,280
Now, here's the part that most enterprises miss.
1093
00:40:42,280 --> 00:40:44,200
Shared platform costs should be discussed
1094
00:40:44,200 --> 00:40:46,120
as architectural constraints, not as builds.
1095
00:40:46,120 --> 00:40:49,400
If you centralize logging, you must also centralize logging policy.
1096
00:40:49,400 --> 00:40:51,800
Retention limits by environment, sampling defaults,
1097
00:40:51,800 --> 00:40:53,800
what debug means in production,
1098
00:40:53,800 --> 00:40:55,560
what data is worth paying to store.
1099
00:40:55,560 --> 00:40:56,920
If you centralize networking,
1100
00:40:56,920 --> 00:40:58,840
you must centralize topology standards,
1101
00:40:58,840 --> 00:41:00,440
where traffic is allowed to flow,
1102
00:41:00,440 --> 00:41:02,440
when cross-region is justified,
1103
00:41:02,440 --> 00:41:04,920
what services are allowed to punch through the hub.
1104
00:41:04,920 --> 00:41:07,720
Otherwise, the platform team becomes the custodian
1105
00:41:07,720 --> 00:41:10,280
of a cost-service area that everyone can expand
1106
00:41:10,280 --> 00:41:11,400
and no one can shrink.
1107
00:41:11,400 --> 00:41:12,920
So the outcome of the after-poster
1108
00:41:12,920 --> 00:41:14,200
is not just predictability,
1109
00:41:14,200 --> 00:41:16,120
it's earlier financial design decisions.
1110
00:41:16,120 --> 00:41:17,720
Platform cost becomes a known,
1111
00:41:17,720 --> 00:41:19,960
modeled component of architecture review.
1112
00:41:19,960 --> 00:41:21,800
It becomes part of landing zone standards,
1113
00:41:21,800 --> 00:41:23,960
it becomes part of exception governance.
1114
00:41:23,960 --> 00:41:25,240
And most importantly,
1115
00:41:25,240 --> 00:41:27,960
it stops being a mystery that shows up as a quarterly surprise.
1116
00:41:27,960 --> 00:41:29,720
Now, once platform costs are legible,
1117
00:41:29,720 --> 00:41:32,040
the enterprise usually makes its next mistake.
1118
00:41:32,040 --> 00:41:34,200
They treat budgets like household trackers.
1119
00:41:34,200 --> 00:41:36,360
That's next, budgets are intense signals,
1120
00:41:36,360 --> 00:41:37,640
not household trackers.
1121
00:41:37,640 --> 00:41:39,160
Budgets are the most misunderstood
1122
00:41:39,160 --> 00:41:40,360
Finops control in Azure,
1123
00:41:40,360 --> 00:41:41,960
and it's predictable why.
1124
00:41:41,960 --> 00:41:44,520
Most organizations use them like a household expense app.
1125
00:41:44,520 --> 00:41:47,080
Set a number, watch it panic when it turns red,
1126
00:41:47,080 --> 00:41:48,600
then do nothing meaningful,
1127
00:41:48,600 --> 00:41:50,440
because the month is basically over.
1128
00:41:50,440 --> 00:41:53,080
That is not what budgets are for in an enterprise cloud.
1129
00:41:53,080 --> 00:41:54,600
A budget is not a spending limit,
1130
00:41:54,600 --> 00:41:56,120
Azure will not stop your workloads,
1131
00:41:56,120 --> 00:41:58,040
Azure will not shut down your platform.
1132
00:41:58,040 --> 00:41:59,240
A budget is a signal.
1133
00:41:59,240 --> 00:42:01,880
It's the platform telling you that actual behavior
1134
00:42:01,880 --> 00:42:05,080
is diverging from declared intent early enough to intervene.
1135
00:42:05,080 --> 00:42:06,120
That distinction matters,
1136
00:42:06,120 --> 00:42:08,280
because budgets only work when they are attached
1137
00:42:08,280 --> 00:42:09,880
to ownership and action.
1138
00:42:09,880 --> 00:42:12,840
If a budget alert lands in a shared mailbox, it is theater.
1139
00:42:12,840 --> 00:42:14,520
If it lands with an accountable owner
1140
00:42:14,520 --> 00:42:16,280
who has both authority and consequence,
1141
00:42:16,280 --> 00:42:17,320
it becomes governance.
1142
00:42:17,320 --> 00:42:19,960
And if it lands early enough that changes are still cheap,
1143
00:42:19,960 --> 00:42:21,560
it becomes an operational interrupt,
1144
00:42:21,560 --> 00:42:23,000
not a finance post-mortem.
1145
00:42:23,000 --> 00:42:24,280
Most enterprises do the opposite.
1146
00:42:24,280 --> 00:42:27,160
They set budgets late, they set them at the wrong scope,
1147
00:42:27,160 --> 00:42:28,680
and they set thresholds that fire
1148
00:42:28,680 --> 00:42:30,600
after the enterprises already burn the money.
1149
00:42:30,600 --> 00:42:32,200
The classic example is a monthly budget
1150
00:42:32,200 --> 00:42:33,800
with a 90% alert.
1151
00:42:33,800 --> 00:42:35,800
That alert triggers when the month is nearly finished,
1152
00:42:35,800 --> 00:42:37,000
the spend has already happened,
1153
00:42:37,000 --> 00:42:39,800
and your options are limited to eat it or break something.
1154
00:42:39,800 --> 00:42:40,760
That's not a control.
1155
00:42:40,760 --> 00:42:43,320
That's a notification that your control model failed.
1156
00:42:43,320 --> 00:42:45,640
So budgets need three rules that are non-negotiable.
1157
00:42:45,640 --> 00:42:47,960
First, align budgets to ownership boundaries.
1158
00:42:47,960 --> 00:42:50,840
If a team owns a subscription, that subscription needs a budget.
1159
00:42:50,840 --> 00:42:53,640
If a platform domain owns a shared services subscription,
1160
00:42:53,640 --> 00:42:55,160
that subscription needs a budget.
1161
00:42:55,160 --> 00:42:57,320
If you can't name the owner, you can't budget it,
1162
00:42:57,320 --> 00:42:59,960
because there is no decision maker to receive the signal.
1163
00:42:59,960 --> 00:43:02,120
Budgeting, on-ones spend, is just documenting
1164
00:43:02,120 --> 00:43:03,720
a problem you refuse to fix.
1165
00:43:03,720 --> 00:43:05,720
Second, budgets must fire early.
1166
00:43:05,720 --> 00:43:08,680
50% and 70% thresholds aren't warnings.
1167
00:43:08,680 --> 00:43:10,440
They are deliberately placed interrupts.
1168
00:43:10,440 --> 00:43:12,760
They force the question, is this spend consistent
1169
00:43:12,760 --> 00:43:14,520
with what we expected the subscription to do?
1170
00:43:14,520 --> 00:43:17,000
If yes, then the organization updates its intent,
1171
00:43:17,000 --> 00:43:18,600
budget, forecast or constraints.
1172
00:43:18,600 --> 00:43:20,680
If no, then the organization intervenes,
1173
00:43:20,680 --> 00:43:22,280
right sizing, disabling a feature,
1174
00:43:22,280 --> 00:43:23,560
killing a runaway environment
1175
00:43:23,560 --> 00:43:25,800
or denying the next scale out through policy.
1176
00:43:25,800 --> 00:43:27,400
Budgets are not there to shame people.
1177
00:43:27,400 --> 00:43:28,680
They're there to force a decision
1178
00:43:28,680 --> 00:43:30,200
while the decision is still reversible.
1179
00:43:30,200 --> 00:43:33,960
Third, budget alerts must trigger action, not email.
1180
00:43:33,960 --> 00:43:36,440
Email is where accountability goes to die.
1181
00:43:36,440 --> 00:43:37,800
If you want budgets to matter,
1182
00:43:37,800 --> 00:43:39,720
root the alert into an escalation lane
1183
00:43:39,720 --> 00:43:41,720
that produces a tracked artifact,
1184
00:43:41,720 --> 00:43:43,640
a ticket in your ITSM tool,
1185
00:43:43,640 --> 00:43:46,920
a message into the right teams channel with the owner tagged,
1186
00:43:46,920 --> 00:43:48,920
an incident workflow for spend spikes
1187
00:43:48,920 --> 00:43:50,600
that threaten financial controls.
1188
00:43:50,600 --> 00:43:51,960
The point is not the tool.
1189
00:43:51,960 --> 00:43:53,800
The point is that an alert becomes a cue
1190
00:43:53,800 --> 00:43:56,200
with an owner and a response expectation.
1191
00:43:56,200 --> 00:43:57,800
And yes, you can do this in Azure
1192
00:43:57,800 --> 00:43:59,560
with action groups, webhooks, logic apps,
1193
00:43:59,560 --> 00:44:01,720
and whatever workflow system your org already pretends
1194
00:44:01,720 --> 00:44:02,840
is standardized.
1195
00:44:02,840 --> 00:44:05,080
The mechanism is implementation detail.
1196
00:44:05,080 --> 00:44:06,600
The model is what matters.
1197
00:44:06,600 --> 00:44:09,400
Budget trigger governance, not awareness.
1198
00:44:09,400 --> 00:44:10,760
Now, there's a common objection.
1199
00:44:10,760 --> 00:44:13,320
Budgets create alert fatigue.
1200
00:44:13,320 --> 00:44:14,920
They do if you design them like spam,
1201
00:44:14,920 --> 00:44:16,280
if you create budgets everywhere
1202
00:44:16,280 --> 00:44:18,360
at every scope with noisy thresholds,
1203
00:44:18,360 --> 00:44:19,720
you will flood the organization
1204
00:44:19,720 --> 00:44:21,240
with alerts that represent nothing.
1205
00:44:21,240 --> 00:44:23,080
Then teams mute them
1206
00:44:23,080 --> 00:44:25,240
and your budgets turn into background radiation.
1207
00:44:25,240 --> 00:44:26,360
That's not a people problem.
1208
00:44:26,360 --> 00:44:27,480
That's a design problem.
1209
00:44:27,480 --> 00:44:29,720
Avoid alert fatigue by having fewer budgets
1210
00:44:29,720 --> 00:44:30,680
with sharper scopes.
1211
00:44:30,680 --> 00:44:32,440
Put budgets where there is real ownership
1212
00:44:32,440 --> 00:44:34,120
and real financial exposure,
1213
00:44:34,120 --> 00:44:35,080
subscriptions,
1214
00:44:35,080 --> 00:44:37,080
platform domains, high-risk workloads
1215
00:44:37,080 --> 00:44:39,960
like AI, data, egress, heavy architectures,
1216
00:44:39,960 --> 00:44:42,360
and non-prod estates that love to sprawl.
1217
00:44:42,360 --> 00:44:44,360
Don't budget every resource group in the estate
1218
00:44:44,360 --> 00:44:45,720
because it feels thorough.
1219
00:44:45,720 --> 00:44:47,080
Thuroness is not control.
1220
00:44:47,080 --> 00:44:49,160
Control is knowing which levers matter.
1221
00:44:49,160 --> 00:44:52,760
Also, don't treat budget alerts as failures.
1222
00:44:52,760 --> 00:44:55,160
A fired budget alert is not an incident by default.
1223
00:44:55,160 --> 00:44:56,760
It's an anomaly indicator.
1224
00:44:56,760 --> 00:44:59,240
Sometimes the anomaly is legitimate growth.
1225
00:44:59,240 --> 00:45:03,000
A new workload, a migration phase, a seasonal spike.
1226
00:45:03,000 --> 00:45:04,360
The alert still did its job
1227
00:45:04,360 --> 00:45:06,120
because it forced the organization to acknowledge
1228
00:45:06,120 --> 00:45:07,560
that intent changed.
1229
00:45:07,560 --> 00:45:09,560
The worst outcome is not budget exceeded.
1230
00:45:09,560 --> 00:45:11,160
The worst outcome is budget exceeded
1231
00:45:11,160 --> 00:45:13,240
and nobody noticed until the invoice.
1232
00:45:13,240 --> 00:45:15,960
So if budgets are signals, what are they signaling?
1233
00:45:15,960 --> 00:45:17,400
They're signaling one of three things,
1234
00:45:17,400 --> 00:45:19,000
drift growth or fraud.
1235
00:45:19,000 --> 00:45:21,480
Drift means something is running that shouldn't be.
1236
00:45:21,480 --> 00:45:23,640
Growth means your usage pattern changed
1237
00:45:23,640 --> 00:45:24,920
and your budget model is stale.
1238
00:45:24,920 --> 00:45:28,520
Fraud in the broad sense means an unexpected pathway
1239
00:45:28,520 --> 00:45:30,040
is consuming resources
1240
00:45:30,040 --> 00:45:33,000
and cost is acting as the earliest signal something is wrong.
1241
00:45:33,000 --> 00:45:34,600
Budgets can't tell you which one it is.
1242
00:45:34,600 --> 00:45:37,320
That's your job, but budgets can tell you when to look.
1243
00:45:37,320 --> 00:45:39,800
Early, reliably at scale.
1244
00:45:39,800 --> 00:45:41,160
Now, here's the catch.
1245
00:45:41,160 --> 00:45:44,040
None of this works unless budgets have an accountability model
1246
00:45:44,040 --> 00:45:44,840
behind them.
1247
00:45:44,840 --> 00:45:46,360
Otherwise, you're just watching numbers move
1248
00:45:46,360 --> 00:45:47,880
and calling it governance.
1249
00:45:47,880 --> 00:45:50,520
Accountability models, showback, chargeback,
1250
00:45:50,520 --> 00:45:51,640
and the real point.
1251
00:45:51,640 --> 00:45:53,880
This is where every enterprise waste six months.
1252
00:45:53,880 --> 00:45:56,360
The religious war between showback and chargeback.
1253
00:45:56,360 --> 00:45:58,600
Finance wants chargeback because it looks like control.
1254
00:45:58,600 --> 00:46:02,120
Engineering wants to showback because it looks like collaboration.
1255
00:46:02,120 --> 00:46:05,480
Someone says we can't do chargeback until tagging is perfect.
1256
00:46:05,480 --> 00:46:09,000
And someone else says we won't fix tagging unless we do chargeback.
1257
00:46:09,000 --> 00:46:12,360
Then the meeting ends, nothing changes and the platform keeps spending.
1258
00:46:12,360 --> 00:46:13,720
That argument misses the point.
1259
00:46:13,720 --> 00:46:15,640
Showback and chargeback are not ideologies.
1260
00:46:15,640 --> 00:46:16,920
They are feedback mechanisms.
1261
00:46:16,920 --> 00:46:19,720
The only thing that matters is whether the cost signal
1262
00:46:19,720 --> 00:46:23,000
reaches the person who made the decision that created the spend.
1263
00:46:23,000 --> 00:46:25,400
Fast enough for them to change the next decision.
1264
00:46:25,400 --> 00:46:27,880
If the signal doesn't reach the decision maker,
1265
00:46:27,880 --> 00:46:29,080
you don't have accountability.
1266
00:46:29,080 --> 00:46:30,280
You have reporting.
1267
00:46:30,280 --> 00:46:33,640
Showback is the early stage tool for building trust in the data.
1268
00:46:33,640 --> 00:46:35,400
It says, here's what you consumed.
1269
00:46:35,400 --> 00:46:37,560
Here's the unit of allocation we agreed on.
1270
00:46:37,560 --> 00:46:39,000
And here's how it maps to the org.
1271
00:46:39,000 --> 00:46:40,760
No money moves, no budgets get hit.
1272
00:46:40,760 --> 00:46:43,400
The goal is to remove the your numbers are wrong debate
1273
00:46:43,400 --> 00:46:45,080
and replace it with boring acceptance.
1274
00:46:45,080 --> 00:46:46,920
Because until the numbers are boring,
1275
00:46:46,920 --> 00:46:48,280
nobody will accept chargeback.
1276
00:46:48,280 --> 00:46:51,080
Chargeback is the enforcement tool for making cost a real constraint
1277
00:46:51,080 --> 00:46:52,920
that moves money or at least it moves budget.
1278
00:46:52,920 --> 00:46:54,360
It creates an economic consequence
1279
00:46:54,360 --> 00:46:56,360
that forces teams to treat cloud consumption
1280
00:46:56,360 --> 00:46:58,760
like any other resource they can't waste without trade-offs.
1281
00:46:58,760 --> 00:47:00,120
But here's the uncomfortable truth.
1282
00:47:00,120 --> 00:47:03,320
Neither showback nor chargeback fixes anything by itself.
1283
00:47:03,320 --> 00:47:04,840
They are both downstream of governance.
1284
00:47:04,840 --> 00:47:07,000
If you didn't enforce ownership boundaries tagging
1285
00:47:07,000 --> 00:47:09,160
SKU constraints and budget escalation,
1286
00:47:09,160 --> 00:47:12,200
then chargeback just turns chaos into internal invoices.
1287
00:47:12,200 --> 00:47:14,920
So you'll spend your year mediating disputes between teams about
1288
00:47:14,920 --> 00:47:17,160
who pays for a shared log analytics workspace
1289
00:47:17,160 --> 00:47:18,680
that nobody scoped correctly.
1290
00:47:18,680 --> 00:47:19,960
That isn't accountability.
1291
00:47:19,960 --> 00:47:21,640
That's internal billing theater.
1292
00:47:21,640 --> 00:47:24,440
And showback without enforcement becomes wallpaper.
1293
00:47:24,440 --> 00:47:27,240
People glance at it, nod and keep deploying the same way
1294
00:47:27,240 --> 00:47:29,560
because nothing in their system changes when the number goes up.
1295
00:47:29,560 --> 00:47:31,160
So the sequence is deterministic.
1296
00:47:31,160 --> 00:47:33,400
First, showback to establish legitimacy.
1297
00:47:33,400 --> 00:47:35,640
Second, chargeback to establish consequence.
1298
00:47:35,640 --> 00:47:38,440
And the real point of both is to create an economic feedback loop
1299
00:47:38,440 --> 00:47:39,800
that closes fast.
1300
00:47:39,800 --> 00:47:42,600
A mature organization doesn't pick showback or chargeback.
1301
00:47:42,600 --> 00:47:44,840
It uses both deliberately at different stages
1302
00:47:44,840 --> 00:47:46,440
for different kinds of spend.
1303
00:47:46,440 --> 00:47:48,600
Now accountability isn't just who pays.
1304
00:47:48,600 --> 00:47:50,280
It's who is accountable for the guardrails.
1305
00:47:50,280 --> 00:47:52,360
This is where enterprises get sloppy
1306
00:47:52,360 --> 00:47:54,920
because they pretend accountability is a cultural concept.
1307
00:47:54,920 --> 00:47:56,680
It isn't. It's a design requirement.
1308
00:47:56,680 --> 00:47:57,960
In a real operating model,
1309
00:47:57,960 --> 00:48:00,440
the FinOps team or cloud economics function,
1310
00:48:00,440 --> 00:48:02,360
call it whatever your org chart tolerates,
1311
00:48:02,360 --> 00:48:04,200
should be accountable for defining the guardrails
1312
00:48:04,200 --> 00:48:05,560
and the measurement model.
1313
00:48:05,560 --> 00:48:08,440
App teams should be responsible for staying inside those guardrails,
1314
00:48:08,440 --> 00:48:11,400
including tagging compliance and cost-aware design decisions.
1315
00:48:11,400 --> 00:48:13,240
Finance should be consulted for budgeting,
1316
00:48:13,240 --> 00:48:16,040
allocation rules, and the mechanics of internal charge.
1317
00:48:16,040 --> 00:48:17,320
Leadership should be informed,
1318
00:48:17,320 --> 00:48:19,880
not asked to resolve the same argument every month.
1319
00:48:19,880 --> 00:48:20,920
That's not bureaucracy.
1320
00:48:20,920 --> 00:48:23,960
That's how you stop ownership from dissolving into everyone's problem,
1321
00:48:23,960 --> 00:48:26,520
which is just a polite way to say nobody's problem.
1322
00:48:26,520 --> 00:48:29,400
And there's another shift happening that enterprises keep ignoring.
1323
00:48:29,400 --> 00:48:30,920
FinOps is no longer just cloud.
1324
00:48:30,920 --> 00:48:34,120
The FinOps Foundation's newer cloud plus framing and scopes idea exists
1325
00:48:34,120 --> 00:48:35,720
because spending doesn't stay contained.
1326
00:48:35,720 --> 00:48:36,760
Cloud leads to SASS.
1327
00:48:36,760 --> 00:48:38,600
SASS leads to licensing sprawl.
1328
00:48:38,600 --> 00:48:40,600
AI leads to token burn and GPU bills
1329
00:48:40,600 --> 00:48:42,840
that make your old VM arguments look adorable.
1330
00:48:42,840 --> 00:48:44,600
If you build an accountability model
1331
00:48:44,600 --> 00:48:46,360
that only works for VMspend,
1332
00:48:46,360 --> 00:48:48,200
you're building yesterday's governance.
1333
00:48:48,200 --> 00:48:49,960
So the practical posture is scopes.
1334
00:48:49,960 --> 00:48:52,840
Apply accountability where spend concentrates.
1335
00:48:52,840 --> 00:48:56,440
Cloud scope, subscriptions, tagging, budget, policy enforcement,
1336
00:48:56,440 --> 00:48:59,480
AI scope, model usage, token forecasts,
1337
00:48:59,480 --> 00:49:02,760
stricter anomaly response because costs can spike fast.
1338
00:49:02,760 --> 00:49:05,080
Shared services scope, allocation rules
1339
00:49:05,080 --> 00:49:06,920
that make platforms spend legible,
1340
00:49:06,920 --> 00:49:08,760
even when it can't be perfectly metered.
1341
00:49:08,760 --> 00:49:10,280
You don't need to boil the ocean.
1342
00:49:10,280 --> 00:49:13,000
You need to stop pretending a single model fits everything.
1343
00:49:13,000 --> 00:49:14,760
The simplest version is this accountability
1344
00:49:14,760 --> 00:49:16,520
must follow decision rights.
1345
00:49:16,520 --> 00:49:18,440
If engineering can choose the SKU,
1346
00:49:18,440 --> 00:49:20,360
engineering must see the cost signal.
1347
00:49:20,360 --> 00:49:23,080
If the platform team controls diagnostics defaults,
1348
00:49:23,080 --> 00:49:26,200
the platform team must own the retention economics.
1349
00:49:26,200 --> 00:49:28,600
If leadership demands multi-region resilience,
1350
00:49:28,600 --> 00:49:30,280
leadership must accept the price tag
1351
00:49:30,280 --> 00:49:32,600
as a design decision, not a surprise invoice.
1352
00:49:32,600 --> 00:49:34,200
Once that operating model is real,
1353
00:49:34,200 --> 00:49:36,440
showback and chargeback, stop being arguments.
1354
00:49:36,440 --> 00:49:37,960
They become implementation details.
1355
00:49:37,960 --> 00:49:39,560
And now the transition that matters.
1356
00:49:39,560 --> 00:49:41,240
Once accountability exists,
1357
00:49:41,240 --> 00:49:43,400
enforcement has to live where the decisions happen.
1358
00:49:43,400 --> 00:49:45,080
Not in meetings in the control plane.
1359
00:49:45,080 --> 00:49:47,240
The enforcement stack, policy, RBIQ,
1360
00:49:47,240 --> 00:49:48,760
budgets, deployment stamps.
1361
00:49:48,760 --> 00:49:50,760
So if cost is an authorization outcome,
1362
00:49:50,760 --> 00:49:52,760
an accountability is the feedback loop.
1363
00:49:52,760 --> 00:49:54,360
Then enforcement is the only part
1364
00:49:54,360 --> 00:49:56,360
that actually survives contact with reality.
1365
00:49:56,360 --> 00:49:59,480
Meetings don't enforce, slide decks don't enforce.
1366
00:49:59,480 --> 00:50:00,920
Cost awareness doesn't enforce.
1367
00:50:00,920 --> 00:50:03,000
The control plane enforces.
1368
00:50:03,000 --> 00:50:06,280
And the enforcement stack in Azure is not complicated.
1369
00:50:06,280 --> 00:50:08,440
It's just unpopular because it removes freedom people
1370
00:50:08,440 --> 00:50:09,640
already got used to.
1371
00:50:09,640 --> 00:50:10,840
Start with Azure policy
1372
00:50:10,840 --> 00:50:13,720
because it's the closest thing Azure has to an authorization
1373
00:50:13,720 --> 00:50:15,160
compiler for cost intent.
1374
00:50:15,160 --> 00:50:17,960
Policy is where you encode the non-negotiables,
1375
00:50:17,960 --> 00:50:21,240
required tags, allowed regions, allowed SKUs
1376
00:50:21,240 --> 00:50:22,520
and configuration baselines
1377
00:50:22,520 --> 00:50:24,280
that have real financial impact,
1378
00:50:24,280 --> 00:50:25,640
deny is the blunt instrument,
1379
00:50:25,640 --> 00:50:27,080
modify is the quieter one.
1380
00:50:27,080 --> 00:50:28,760
Deploy if not exists is the,
1381
00:50:28,760 --> 00:50:30,040
you're going to pay for this anyway,
1382
00:50:30,040 --> 00:50:31,880
so we're going to standardize it move.
1383
00:50:31,880 --> 00:50:34,040
The key is that policy runs at deploy time.
1384
00:50:34,040 --> 00:50:35,240
It doesn't ask for cooperation.
1385
00:50:35,240 --> 00:50:36,760
It evaluates intent
1386
00:50:36,760 --> 00:50:39,400
and either materializes capacity or refuses it.
1387
00:50:39,400 --> 00:50:41,960
That means you stop trying to convince teams to behave
1388
00:50:41,960 --> 00:50:43,480
and you start designing the platform.
1389
00:50:43,480 --> 00:50:44,840
So behavior has boundaries.
1390
00:50:44,840 --> 00:50:47,000
Now this is where most enterprises sabotage themselves.
1391
00:50:47,000 --> 00:50:49,240
They treat policy exemptions like kindness.
1392
00:50:49,240 --> 00:50:50,520
Exemptions are not kindness.
1393
00:50:50,520 --> 00:50:52,200
They are entropy generators.
1394
00:50:52,200 --> 00:50:53,720
Every exemption should be visible,
1395
00:50:53,720 --> 00:50:55,560
justified, time boxed and reviewed
1396
00:50:55,560 --> 00:50:58,760
because if you can't explain why something bypassed the rules
1397
00:50:58,760 --> 00:51:00,120
you didn't make an exception,
1398
00:51:00,120 --> 00:51:01,800
you created a second rule set
1399
00:51:01,800 --> 00:51:03,640
that only some people know exists.
1400
00:51:03,640 --> 00:51:04,760
Next layer is RBAC
1401
00:51:04,760 --> 00:51:06,360
because policy without permission design
1402
00:51:06,360 --> 00:51:08,440
is just a guard rail around a highway exit,
1403
00:51:08,440 --> 00:51:09,480
nobody controls.
1404
00:51:09,480 --> 00:51:11,320
Most organizations have a contributor problem.
1405
00:51:11,320 --> 00:51:13,640
They hand out broad contributor at subscription scope
1406
00:51:13,640 --> 00:51:15,240
because it makes delivery easy
1407
00:51:15,240 --> 00:51:17,800
and then they act surprised when spend is unbounded.
1408
00:51:17,800 --> 00:51:19,560
Contributor is not empowerment.
1409
00:51:19,560 --> 00:51:21,400
It is a spend authorization primitive.
1410
00:51:21,400 --> 00:51:23,880
If a team can create resources they can create cost.
1411
00:51:23,880 --> 00:51:27,640
If they can assign roles they can create new spend pathways
1412
00:51:27,640 --> 00:51:29,560
and if they can deploy without a cost signal
1413
00:51:29,560 --> 00:51:31,800
they can externalize the consequences.
1414
00:51:31,800 --> 00:51:33,160
So RBAC needs two outcomes.
1415
00:51:33,160 --> 00:51:35,000
First the people making deployment decisions
1416
00:51:35,000 --> 00:51:37,560
must be able to see cost, cost management reader
1417
00:51:37,560 --> 00:51:39,720
or equivalent visibility for engineering,
1418
00:51:39,720 --> 00:51:41,160
leads and platform owners.
1419
00:51:41,160 --> 00:51:43,960
If they can't see cost trends, budgets and anomalies
1420
00:51:43,960 --> 00:51:46,760
they are operating blind and blind systems drift.
1421
00:51:46,760 --> 00:51:49,720
Second, deploy authority and spend accountability
1422
00:51:49,720 --> 00:51:52,120
can't be the same unmanaged blob.
1423
00:51:52,120 --> 00:51:54,280
That doesn't mean you create a bureaucracy of approvals.
1424
00:51:54,280 --> 00:51:55,880
That means you design roles
1425
00:51:55,880 --> 00:51:58,200
so the enterprise can tell who is allowed to do what
1426
00:51:58,200 --> 00:52:00,280
and who is responsible when it goes wrong.
1427
00:52:00,280 --> 00:52:01,880
And yes, that often means pipelines
1428
00:52:01,880 --> 00:52:03,800
and managed identities get narrowed permissions
1429
00:52:03,800 --> 00:52:05,640
not contributor because it works.
1430
00:52:05,640 --> 00:52:08,200
Because it works is how cost entropy gets funded.
1431
00:52:08,200 --> 00:52:10,040
Third layer is budgets but not as tracking
1432
00:52:10,040 --> 00:52:11,320
as escalation engines.
1433
00:52:11,320 --> 00:52:13,720
Budgets are the interrupt system
1434
00:52:13,720 --> 00:52:16,200
that tells you intent and reality diverged.
1435
00:52:16,200 --> 00:52:17,080
They don't stop spend.
1436
00:52:17,080 --> 00:52:18,040
They root attention
1437
00:52:18,040 --> 00:52:20,280
and if they aren't wired into action,
1438
00:52:20,280 --> 00:52:22,200
tickets, paging, escalation channels,
1439
00:52:22,200 --> 00:52:24,680
they are a compliance checkbox that produces email.
1440
00:52:24,680 --> 00:52:26,520
Budgets should exist at subscription scope
1441
00:52:26,520 --> 00:52:28,120
because that's where ownership is legible
1442
00:52:28,120 --> 00:52:29,560
and blast radius is bounded.
1443
00:52:29,560 --> 00:52:32,040
They can also exist at higher scopes for rollups
1444
00:52:32,040 --> 00:52:33,560
but the place where action happens
1445
00:52:33,560 --> 00:52:35,640
is where someone can actually change something
1446
00:52:35,640 --> 00:52:37,000
without a committee.
1447
00:52:37,000 --> 00:52:38,280
And budgets should fire early
1448
00:52:38,280 --> 00:52:40,280
because the platform needs time to respond.
1449
00:52:40,280 --> 00:52:42,680
50 and 70% thresholds are not conservative.
1450
00:52:42,680 --> 00:52:43,560
They are practical.
1451
00:52:43,560 --> 00:52:44,840
They're the only way to catch drift
1452
00:52:44,840 --> 00:52:46,040
while you still have options.
1453
00:52:46,040 --> 00:52:48,280
Now the fourth layer is the part people ignore
1454
00:52:48,280 --> 00:52:50,120
because it feels like platform engineering
1455
00:52:50,120 --> 00:52:51,000
not finnops.
1456
00:52:51,000 --> 00:52:52,200
Deployment stamps.
1457
00:52:52,200 --> 00:52:53,480
Guarded environments.
1458
00:52:53,480 --> 00:52:55,240
Standard patterns.
1459
00:52:55,240 --> 00:52:56,520
Call them what you want.
1460
00:52:56,520 --> 00:52:57,640
The idea is the same.
1461
00:52:57,640 --> 00:53:00,600
You stop letting every team invent their own cost model
1462
00:53:00,600 --> 00:53:01,720
by accident.
1463
00:53:01,720 --> 00:53:04,760
A stamp is a pre-approved, pre-constrained deployment pattern
1464
00:53:04,760 --> 00:53:06,520
networking, logging, diagnostics,
1465
00:53:06,520 --> 00:53:08,920
SKU baselines, scaling rules, retention settings
1466
00:53:08,920 --> 00:53:11,160
and whatever else always turns into surprise spend.
1467
00:53:11,160 --> 00:53:13,800
When teams deploy through the stamp,
1468
00:53:13,800 --> 00:53:15,800
they inherit the constraints and the defaults
1469
00:53:15,800 --> 00:53:17,080
and the platform doesn't really
1470
00:53:17,080 --> 00:53:19,080
delegate the same cost mistakes 400 times.
1471
00:53:19,080 --> 00:53:21,560
This is how you scale autonomy without scaling chaos
1472
00:53:21,560 --> 00:53:24,120
because you're not restricting teams to one architecture.
1473
00:53:24,120 --> 00:53:25,640
You're restricting them to architectures
1474
00:53:25,640 --> 00:53:27,160
with known cost behavior
1475
00:53:27,160 --> 00:53:28,440
and then you do one more thing,
1476
00:53:28,440 --> 00:53:29,640
enterprises avoid.
1477
00:53:29,640 --> 00:53:31,800
You make exceptions expensive in process,
1478
00:53:31,800 --> 00:53:32,920
not in politics.
1479
00:53:32,920 --> 00:53:35,000
If a workload needs to break the stamp fine,
1480
00:53:35,000 --> 00:53:36,920
but it does so through a visible exception path
1481
00:53:36,920 --> 00:53:37,800
with an expiry.
1482
00:53:37,800 --> 00:53:39,080
That keeps the baseline clean
1483
00:53:39,080 --> 00:53:40,520
and it forces special cases
1484
00:53:40,520 --> 00:53:41,880
to prove they're still special
1485
00:53:41,880 --> 00:53:43,400
every time the clock runs out.
1486
00:53:43,400 --> 00:53:45,400
So the enforcement stack is simple.
1487
00:53:45,400 --> 00:53:47,000
Policy defines what is allowed,
1488
00:53:47,000 --> 00:53:48,520
our back defines who can attempt it,
1489
00:53:48,520 --> 00:53:50,440
budgets define when reality diverges,
1490
00:53:50,440 --> 00:53:52,360
stamps define the default pathways,
1491
00:53:52,360 --> 00:53:53,880
so divergence is rarer,
1492
00:53:53,880 --> 00:53:55,880
and the outcome is the only thing that matters.
1493
00:53:55,880 --> 00:53:57,880
Cost becomes an enforced design decision,
1494
00:53:57,880 --> 00:53:59,320
not a post-mortem artifact.
1495
00:53:59,320 --> 00:54:01,000
Now the question isn't what tools should we use,
1496
00:54:01,000 --> 00:54:02,680
the way the question is,
1497
00:54:02,680 --> 00:54:05,000
can you roll this out in a way that survives
1498
00:54:05,000 --> 00:54:06,600
organizational pressure?
1499
00:54:06,600 --> 00:54:08,600
That's next, the 90-day rollout
1500
00:54:08,600 --> 00:54:10,520
from surprise bills to enforced in 10.
1501
00:54:10,520 --> 00:54:12,760
This only works if you treat it like a platform rollout,
1502
00:54:12,760 --> 00:54:14,200
not a finance initiative.
1503
00:54:14,200 --> 00:54:17,240
90 days is enough time to change the system behavior.
1504
00:54:17,240 --> 00:54:19,400
If you stop negotiating with entropy
1505
00:54:19,400 --> 00:54:21,480
and start removing its pathways,
1506
00:54:21,480 --> 00:54:22,680
days one to 30,
1507
00:54:22,680 --> 00:54:24,680
define ownership boundaries and make them real.
1508
00:54:24,680 --> 00:54:27,480
Lock in your subscription strategy,
1509
00:54:27,480 --> 00:54:28,840
what is prod, what is non-prod,
1510
00:54:28,840 --> 00:54:29,480
what is platform,
1511
00:54:29,480 --> 00:54:30,280
what is sandbox,
1512
00:54:30,280 --> 00:54:32,120
and who owns each of those scopes,
1513
00:54:32,120 --> 00:54:33,880
then define the minimum tagging taxonomy
1514
00:54:33,880 --> 00:54:35,720
that represents financial identity,
1515
00:54:35,720 --> 00:54:37,960
owner, environment, and cost center or product.
1516
00:54:37,960 --> 00:54:39,400
Keep it small and enforceable.
1517
00:54:39,400 --> 00:54:41,320
At the same time, stand up initial showback
1518
00:54:41,320 --> 00:54:43,400
with whatever accuracy you currently have
1519
00:54:43,400 --> 00:54:45,160
because the point in month one is to surface
1520
00:54:45,160 --> 00:54:46,840
where you can't allocate and why
1521
00:54:46,840 --> 00:54:49,080
that gap is your backlog, not your shame.
1522
00:54:49,080 --> 00:54:52,360
Days 31 to 60, move from visibility to enforcement.
1523
00:54:52,360 --> 00:54:54,120
This is where you start using Azure Policy
1524
00:54:54,120 --> 00:54:55,320
like it's meant to be used
1525
00:54:55,320 --> 00:54:57,800
to stop the platform from accepting ambiguity.
1526
00:54:57,800 --> 00:54:59,720
Deny untagged production deployments.
1527
00:54:59,720 --> 00:55:02,520
Use modify where you can safely add baseline tags
1528
00:55:02,520 --> 00:55:03,560
or inherit them,
1529
00:55:03,560 --> 00:55:06,200
but don't confuse auto tagging with governance.
1530
00:55:06,200 --> 00:55:09,480
Implement budgets at subscription scope with early thresholds.
1531
00:55:09,480 --> 00:55:11,880
Root alerts into an action path, not a mailbox,
1532
00:55:11,880 --> 00:55:13,800
then restrict as used by environment.
1533
00:55:13,800 --> 00:55:15,720
Non-prod doesn't get premium by default
1534
00:55:15,720 --> 00:55:17,160
and regions don't sprawl
1535
00:55:17,160 --> 00:55:19,160
because someone felt adventurous in the portal.
1536
00:55:19,160 --> 00:55:20,440
Day 61 to 90,
1537
00:55:20,440 --> 00:55:23,000
wire escalation and institutionalized exceptions,
1538
00:55:23,000 --> 00:55:24,040
build the workflow,
1539
00:55:24,040 --> 00:55:25,400
budget alert creates a ticket,
1540
00:55:25,400 --> 00:55:26,680
it lands with a named owner,
1541
00:55:26,680 --> 00:55:28,520
it has an SLA and it has an outcome,
1542
00:55:28,520 --> 00:55:31,880
justify, remediate or request an exception with an expiry,
1543
00:55:31,880 --> 00:55:33,880
then formalize shared platform accountability.
1544
00:55:33,880 --> 00:55:36,600
If platform subscriptions aren't budgeted and allocated,
1545
00:55:36,600 --> 00:55:38,280
you're funding a black hole.
1546
00:55:38,280 --> 00:55:40,360
Finally, introduce deployment stamps.
1547
00:55:40,360 --> 00:55:42,520
Guarded patterns that encode cost-bounded default
1548
00:55:42,520 --> 00:55:45,960
so teams stop reinventing expensive architectures accidentally.
1549
00:55:45,960 --> 00:55:48,440
The deliverables at day 90 are boring on purpose,
1550
00:55:48,440 --> 00:55:51,400
a reference architecture for subscriptions and environments,
1551
00:55:51,400 --> 00:55:52,840
a policy starter pack,
1552
00:55:52,840 --> 00:55:55,560
an accountability model that survives org charts
1553
00:55:55,560 --> 00:55:58,680
and an operating cadence that doesn't depend on heroics.
1554
00:55:58,680 --> 00:56:00,760
And avoid the predictable anti-patterns.
1555
00:56:00,760 --> 00:56:03,000
Optimize first, tag later,
1556
00:56:03,000 --> 00:56:06,040
dashboards as governance and exceptions without expiry.
1557
00:56:06,040 --> 00:56:08,440
Those are just different ways of asking entropy to be polite.
1558
00:56:09,160 --> 00:56:11,400
Cost discipline is enforced autonomy.
1559
00:56:11,400 --> 00:56:15,000
Cloud becomes expensive when unbounded choice meets zero accountability
1560
00:56:15,000 --> 00:56:18,760
and as your will happily build you for every unknown decision you allowed.
1561
00:56:18,760 --> 00:56:20,280
If you want predictable spend,
1562
00:56:20,280 --> 00:56:21,960
stop treating finops like reporting
1563
00:56:21,960 --> 00:56:25,000
and start enforcing financial intent in the control plane,
1564
00:56:25,000 --> 00:56:27,240
subscription boundaries, policy constraints,
1565
00:56:27,240 --> 00:56:30,040
budget escalation and time boxed exceptions.
1566
00:56:30,040 --> 00:56:31,240
If you want the next layer,
1567
00:56:31,240 --> 00:56:32,760
how to design the authorization graph
1568
00:56:32,760 --> 00:56:35,160
so cost controls don't erode over time,
1569
00:56:35,160 --> 00:56:36,360
watch the next episode,
1570
00:56:36,360 --> 00:56:38,360
subscribe if you're done paying for ambiguity.