This episode of the M365.FM Podcast explains why Microsoft Fabric governance often fails in real life — even when organizations believe they’ve “solved” governance simply by adopting the platform. The host argues that treating Fabric as a single unified platform with one governance story is a dangerous illusion. Instead, Fabric operates as a composed decision engine with multiple execution paths, shared capacities, and many runtime behaviors that don’t align to org charts or PowerPoint strategies. Common governance efforts — such as naming conventions, Centers of Excellence, and approval workflows — focus on visibility and documentation rather than enforcing actual system constraints. As a result, cost, trust, and meaning quietly decay: costs drift due to shared compute and invisible coupling, workspaces generate entropy when mistaken for control boundaries, and uncontrolled artifacts like semantic models erode metrics and executive confidence. Effective governance in Microsoft Fabric requires enforced defaults, true boundaries between environments, ownership and lifecycle visibility, and constraints that change system behavior — not just policies that look good on paper. The fundamental rule the episode proposes is that any Fabric artifact must declare owner, purpose, and end date — otherwise it doesn’t truly exist.

Apple Podcasts podcast player iconSpotify podcast player iconYoutube Music podcast player iconSpreaker podcast player iconPodchaser podcast player iconAmazon Music podcast player icon

🧠 Core Theme

  • Microsoft Fabric governance appears simple from the UI and licensing story, but underneath it is a composed system of engines with shared control and runtime behavior.

  • Organizations mistakenly believe one tenant, one bill, and one governance model means one execution model — and that assumption leads to governance failures, spiraling costs, and operational entropy.


📌 What’s Broken in Microsoft Fabric Governance

1️⃣ Fabric Is Not a Single Platform — It’s a Decision Engine

  • Behind the unified UI, multiple engines operate (e.g., semantic analytics, Spark, warehouses, event processing, orchestration).

  • These engines share capacities and make thousands of runtime decisions that do not align with organizational boundaries or strategic intentions.

  • Executives often adopt Fabric thinking it brings consolidation, but the platform executed configuration, not intent.


2️⃣ Governance Theater vs. Governance Reality

  • Many governance programs focus on visibility mechanisms rather than control enforcement, including:

    • Naming conventions

    • Centers of Excellence (CoE)

    • Approval workflows

    • Best-practice documentation

  • These artifacts may look mature, but they do not change what the system actually allows users to create.

  • As a result, risk, cost, and entropy continue to grow despite governance efforts.


3️⃣ Cost Entropy Is Structural, Not Abuse-Driven

  • Shared compute capacity and invisible coupling between workloads drive cost drift.

  • Duplicate refresh pipelines, overlapping refresh windows, and background system operations contribute to escalating charges.

  • Teams often respond to rising costs by scaling capacity, which temporarily relieves pressure but reinforces the problem.


4️⃣ Workspace Sprawl and Misunderstood Boundaries

  • Workspaces in Fabric act as collaboration containers, not true governance boundaries.

  • Treating them as control or security boundaries introduces entropy rather than containment.

  • Permissions, artifacts, and collaboration contexts can overlap without enforced lifecycle policies.


5️⃣ Domains & OneLake Are Discovery, Not Enforcement

  • Domains and OneLake improve data discovery and organization but do not govern system behavior.

  • Taxonomy and centralized storage alone do not enforce ownership, lifecycle, or access constraints.


6️⃣ Semantic Model Entropy

  • Uncontrolled self-service semantic models lead to KPI drift and executive distrust.

  • Labels like “certified” or “promoted” signal intent but do not enforce it within the system.

  • Refresh chaos and artifact duplication degrade trust in data strategy over time.


🛠️ Why Microsoft Fabric Governance Fails at Scale

The episode outlines the structural reasons governance collapses:

  • Creation is cheap — it’s easy for anyone to create artifacts.

  • Ownership is optional — producers often fail to declare accountability.

  • Lifecycle is unenforced — artifacts persist indefinitely.

  • Capacities are shared — resource contention and cost drift occur.

  • Metrics measure activity, not accountability — dashboards show usage, not risk or policy compliance.


🔑 The Effective Fabric Governance Model

To govern Fabric successfully, governance must function as an actual control plane — not a committee or documentation exercise:

📍 Enforced Constraints

  • Block unsafe structures at creation.

  • Apply defaults that enforce ownership, sensitivity labels, and lifecycle.

  • Establish true boundaries between development and production.

  • Use automation that has operational consequences, not just email notifications.

  • Implement lifecycle governance including:

    • Birth

    • Promotion

    • Retirement

  • Without these controls, governance is merely performative.


💡 The One Rule That Fixes Fabric Governance

If an artifact in Microsoft Fabric cannot declare:
Owner
Purpose
End date
…then it should not exist.

This rule eliminates more risk, cost, and trust erosion than dashboards, CoEs, or policy documents alone.


🎯 Takeaways for Data Leaders

  • Treat Microsoft Fabric as a composed decision engine, not a monolithic platform.

  • Focus governance on system constraints that enforce intent, not on documentation or visibility alone.

  • Ensure every artifact has clear ownership, purpose, and lifecycle enforcement.

  • Design governance so that the platform’s automation compiles your rules into behavior, instead of just reporting on them.

Transcript

1
00:00:00,000 --> 00:00:04,200
Most organizations buy Microsoft fabric believing they just purchase the platform.

2
00:00:04,200 --> 00:00:09,600
One place, one bill, one governance model, and therefore one set of risks they can control.

3
00:00:09,600 --> 00:00:10,600
They are wrong.

4
00:00:10,600 --> 00:00:14,700
What they actually bought is governance theatre plus cost entropy, a system that looks unified

5
00:00:14,700 --> 00:00:18,200
while decisions fragment across engines, people and workspaces.

6
00:00:18,200 --> 00:00:20,100
And the bill doesn't care about your intent.

7
00:00:20,100 --> 00:00:22,400
It will happily monetize your confusion.

8
00:00:22,400 --> 00:00:27,100
This is going to map the failure modes, why they happen, why they persist, and why they're inevitable

9
00:00:27,100 --> 00:00:29,900
unless you enforce design, not etiquette.

10
00:00:30,000 --> 00:00:32,000
The foundational misunderstanding.

11
00:00:32,000 --> 00:00:34,800
Fabric is not a platform, it's a decision engine.

12
00:00:34,800 --> 00:00:38,000
Fabric gets sold as one platform, but the UI supports that story.

13
00:00:38,000 --> 00:00:39,500
The marketing supports that story.

14
00:00:39,500 --> 00:00:42,200
Execs love that story because it sounds like consolidation.

15
00:00:42,200 --> 00:00:45,400
Fewer tools, fewer vendors, fewer invoices, fewer problems.

16
00:00:45,400 --> 00:00:47,200
Architecturally, it is something else.

17
00:00:47,200 --> 00:00:51,800
Fabric is a composed system of engines that share a control plane and a capacity pool

18
00:00:51,800 --> 00:00:54,700
and then make thousands of runtime decisions on your behalf.

19
00:00:54,700 --> 00:00:56,800
Those decisions don't align to your org chart.

20
00:00:56,800 --> 00:00:58,600
They don't align to your project boundaries.

21
00:00:58,600 --> 00:01:01,800
And they definitely don't align to the PowerPoint data strategy

22
00:01:01,800 --> 00:01:04,500
you presented to the steering committee, that distinction matters.

23
00:01:04,500 --> 00:01:07,100
A platform implies a consistent execution model.

24
00:01:07,100 --> 00:01:09,800
One set of guardrails, one predictable cost behavior,

25
00:01:09,800 --> 00:01:12,500
one place where intent becomes enforcement.

26
00:01:12,500 --> 00:01:15,500
Fabric behaves more like a distributed decision engine.

27
00:01:15,500 --> 00:01:17,500
A compiler for analytics workloads.

28
00:01:17,500 --> 00:01:22,400
You submit work, refreshes, queries, pipelines, notebooks, streaming, mirroring,

29
00:01:22,400 --> 00:01:26,500
and the system chooses how to run it within a shared set of constraints.

30
00:01:26,500 --> 00:01:29,400
And the moment you accept that the governance story changes,

31
00:01:29,400 --> 00:01:31,800
let's break down the four myths that create the rot.

32
00:01:31,800 --> 00:01:34,600
First, unified versus composed.

33
00:01:34,600 --> 00:01:37,300
Yes, you can build in one place, but behind the curtain,

34
00:01:37,300 --> 00:01:38,600
fabric isn't one runtime.

35
00:01:38,600 --> 00:01:40,900
You have power BI's semantic model behaviors.

36
00:01:40,900 --> 00:01:42,600
You have spark-based workloads.

37
00:01:42,600 --> 00:01:44,700
You have warehouses with their own query patterns.

38
00:01:44,700 --> 00:01:46,300
You have event house and KQL.

39
00:01:46,300 --> 00:01:48,100
You have data factory style orchestration.

40
00:01:48,100 --> 00:01:51,100
You have background operations that no human asked for in a meeting.

41
00:01:51,100 --> 00:01:53,000
But the system executes anyway.

42
00:01:53,000 --> 00:01:55,300
One UI does not equal one execution model.

43
00:01:55,300 --> 00:01:57,900
In reality, it is many engines with one veneer.

44
00:01:57,900 --> 00:02:01,100
And every engine has different tuning knobs, different failure modes,

45
00:02:01,100 --> 00:02:02,500
and different cost signatures.

46
00:02:02,500 --> 00:02:04,400
Second myth, capacity is a boundary.

47
00:02:04,400 --> 00:02:06,100
Capacity is not a project boundary.

48
00:02:06,100 --> 00:02:07,500
It is not a tenant boundary.

49
00:02:07,500 --> 00:02:09,000
It is not a safety boundary.

50
00:02:09,000 --> 00:02:10,800
Capacity is a shared compute pool.

51
00:02:10,800 --> 00:02:12,500
So when team A runs a heavy refresh,

52
00:02:12,500 --> 00:02:14,800
team B doesn't just see slower dashboards.

53
00:02:14,800 --> 00:02:16,600
Team B inherits contention.

54
00:02:16,600 --> 00:02:17,900
Team B inherits throttling.

55
00:02:17,900 --> 00:02:19,600
Team B inherits the smoothing behavior.

56
00:02:19,600 --> 00:02:21,500
And then team B files an incident.

57
00:02:21,500 --> 00:02:22,900
Not because they did something wrong,

58
00:02:22,900 --> 00:02:25,000
because the shared pool reallocated pressure.

59
00:02:25,000 --> 00:02:26,900
This is where deterministic planning dies.

60
00:02:26,900 --> 00:02:28,800
Executives want cost predictability.

61
00:02:28,800 --> 00:02:30,900
Architects want workload predictability.

62
00:02:30,900 --> 00:02:34,100
Capacity provides neither when you treat it as a shared commons

63
00:02:34,100 --> 00:02:35,900
without enforced allocation.

64
00:02:35,900 --> 00:02:37,700
The system did what it was designed to do.

65
00:02:37,700 --> 00:02:38,600
Share.

66
00:02:38,600 --> 00:02:39,800
You did what you always do.

67
00:02:39,800 --> 00:02:42,100
Assume sharing would behave like isolation.

68
00:02:42,100 --> 00:02:43,000
It will not.

69
00:02:43,000 --> 00:02:45,000
Third myth, work spaces are governance.

70
00:02:45,000 --> 00:02:46,500
Work spaces look like containers.

71
00:02:46,500 --> 00:02:49,500
So people treat them like security boundaries and cost boundaries.

72
00:02:49,500 --> 00:02:51,000
They are neither.

73
00:02:51,000 --> 00:02:53,100
A workspace is a social structure.

74
00:02:53,100 --> 00:02:55,900
A collaboration container with permissions and artifacts.

75
00:02:55,900 --> 00:02:57,500
It is a place for people to build.

76
00:02:57,500 --> 00:02:59,800
That is not the same as a control structure.

77
00:02:59,800 --> 00:03:03,000
Because workspaces don't inherently enforce isolation.

78
00:03:03,000 --> 00:03:04,900
They don't enforce life cycle separation.

79
00:03:04,900 --> 00:03:06,700
They don't enforce cost accountability.

80
00:03:06,700 --> 00:03:09,600
They don't enforce that prod means anything other than a string

81
00:03:09,600 --> 00:03:10,800
in the workspace name.

82
00:03:10,800 --> 00:03:14,200
And once you put multiple workspaces on the same capacity,

83
00:03:14,200 --> 00:03:15,600
you've admitted the truth.

84
00:03:15,600 --> 00:03:17,600
The workspace boundary is performative.

85
00:03:17,600 --> 00:03:19,100
You can segment the catalog.

86
00:03:19,100 --> 00:03:20,400
You can segment permissions.

87
00:03:20,400 --> 00:03:23,200
But the actual execution pressure lands on the same compute.

88
00:03:23,200 --> 00:03:26,100
That means your governance model must be stronger than the workspace abstraction

89
00:03:26,100 --> 00:03:27,900
or you're just organizing the chaos.

90
00:03:27,900 --> 00:03:30,200
Fourth myth, one lake is the governance plane.

91
00:03:30,200 --> 00:03:31,500
One lake is a storage plane.

92
00:03:31,500 --> 00:03:34,700
It's a unification layer for data placement and access patterns.

93
00:03:34,700 --> 00:03:37,700
It's good at being a shared lake, but storage is not governance.

94
00:03:37,700 --> 00:03:39,700
Governance is enforcement of intent.

95
00:03:39,700 --> 00:03:40,800
Who can create what?

96
00:03:40,800 --> 00:03:41,300
Where?

97
00:03:41,300 --> 00:03:42,500
Under which defaults?

98
00:03:42,500 --> 00:03:43,500
With which sensitivity?

99
00:03:43,500 --> 00:03:44,300
With which lineage?

100
00:03:44,300 --> 00:03:45,800
And with which life cycle constraints?

101
00:03:45,800 --> 00:03:47,100
One lake can hold labels.

102
00:03:47,100 --> 00:03:48,200
It can hold shortcuts.

103
00:03:48,200 --> 00:03:50,300
It can hold data you wish was curated.

104
00:03:50,300 --> 00:03:53,100
But it cannot by itself prevent the classic behaviors.

105
00:03:53,100 --> 00:03:54,600
Just loaded to my lake house.

106
00:03:54,600 --> 00:03:56,300
Just mirror it so I don't wait.

107
00:03:56,300 --> 00:03:58,900
Just make another semantic model because mine is urgent.

108
00:03:58,900 --> 00:04:01,800
The system incentives are speed now, debt later.

109
00:04:01,800 --> 00:04:03,700
One lake makes the speed now path easier.

110
00:04:03,700 --> 00:04:04,800
That's not a criticism.

111
00:04:04,800 --> 00:04:07,900
It's the reality of a shared lake in a platform built for adoption.

112
00:04:07,900 --> 00:04:09,600
So what's the consequence of all of this?

113
00:04:09,600 --> 00:04:11,300
Intent doesn't propagate.

114
00:04:11,300 --> 00:04:13,400
You the architect may have intent.

115
00:04:13,400 --> 00:04:15,000
This data set is sensitive.

116
00:04:15,000 --> 00:04:16,400
This domain owns the metric.

117
00:04:16,400 --> 00:04:17,700
This workspace is prod.

118
00:04:17,700 --> 00:04:18,800
This capacity is critical.

119
00:04:18,800 --> 00:04:20,000
This pipeline is scheduled.

120
00:04:20,000 --> 00:04:21,800
This model is certified.

121
00:04:21,800 --> 00:04:23,500
But fabric does not infer intent.

122
00:04:23,500 --> 00:04:26,900
It executes configuration and configuration fragments across.

123
00:04:26,900 --> 00:04:30,100
Workspaces capacities, identities, labels, domains, subdomains,

124
00:04:30,100 --> 00:04:32,100
artifacts, engines, and background operations.

125
00:04:32,100 --> 00:04:35,200
This is how policy turns into a probabilistic security model.

126
00:04:35,200 --> 00:04:38,300
You don't have a single control plane making deterministic decisions.

127
00:04:38,300 --> 00:04:41,600
You have many surfaces where creation is cheap and enforcement is optional.

128
00:04:41,600 --> 00:04:43,800
Every exception becomes an entropy generator.

129
00:04:43,800 --> 00:04:46,200
Every temporary workspace becomes permanent.

130
00:04:46,200 --> 00:04:48,600
Every just this once capacity scale up becomes

131
00:04:48,600 --> 00:04:49,600
the new baseline.

132
00:04:49,600 --> 00:04:53,600
Everything clicked when the system is treated like an authorization compiler.

133
00:04:53,600 --> 00:04:55,400
It will compile whatever you feed it.

134
00:04:55,400 --> 00:04:58,400
If your rules are weak, it will compile weak outcomes at scale.

135
00:04:58,400 --> 00:05:01,400
So if fabric is a decision engine, governance can't be a committee.

136
00:05:01,400 --> 00:05:04,400
It has to be deterministic constraints that change system behavior.

137
00:05:04,400 --> 00:05:06,200
Next, the governance theater.

138
00:05:06,200 --> 00:05:10,400
Policies that look mature on paper and change nothing in runtime reality.

139
00:05:10,400 --> 00:05:14,000
Governance theater policies that don't change system behavior.

140
00:05:14,000 --> 00:05:16,900
Most fabric governance programs fail the same way.

141
00:05:16,900 --> 00:05:18,900
They confuse visibility with control.

142
00:05:18,900 --> 00:05:20,300
They build a center of excellence.

143
00:05:20,300 --> 00:05:21,200
They write standards.

144
00:05:21,200 --> 00:05:22,500
They publish a confluence page.

145
00:05:22,500 --> 00:05:24,200
They run enablement sessions.

146
00:05:24,200 --> 00:05:26,700
They create a request form for new workspaces.

147
00:05:26,700 --> 00:05:28,000
And then they call it governance.

148
00:05:28,000 --> 00:05:28,900
That is not governance.

149
00:05:28,900 --> 00:05:32,000
That is documentation of intent with no coupling to execution.

150
00:05:32,000 --> 00:05:33,700
The platform doesn't read your slide deck.

151
00:05:33,700 --> 00:05:34,800
It reads configuration.

152
00:05:34,800 --> 00:05:35,800
It reads permissions.

153
00:05:35,800 --> 00:05:39,300
It reads what the user can click, what the service principle can create,

154
00:05:39,300 --> 00:05:43,900
and what the capacity will execute when the refresh hits at 8.10 clock a.m.

155
00:05:43,900 --> 00:05:45,500
This is the uncomfortable truth.

156
00:05:45,500 --> 00:05:48,800
If a policy doesn't change what the system allows, it's not a policy.

157
00:05:48,800 --> 00:05:49,800
It's a suggestion.

158
00:05:49,800 --> 00:05:52,000
And fabric is a suggestion friendly environment.

159
00:05:52,000 --> 00:05:53,300
That's the whole adoption play.

160
00:05:53,300 --> 00:05:56,400
Creation is cheap, deployment is fast, self-service is the default,

161
00:05:56,400 --> 00:05:59,600
which means your governance program has to work against gravity.

162
00:05:59,600 --> 00:06:03,000
Carmly, repeatedly, with enforcement.

163
00:06:03,000 --> 00:06:06,000
Now let's talk about the three biggest forms of governance theater

164
00:06:06,000 --> 00:06:07,500
that show up in fabric estates.

165
00:06:07,500 --> 00:06:10,200
First, naming conventions as entropy management cosplay.

166
00:06:10,200 --> 00:06:11,400
You'll see rules like,

167
00:06:11,400 --> 00:06:14,600
all workspaces must be named BU platform NV Purpose.

168
00:06:14,600 --> 00:06:17,100
Or all lake houses must start with LH.

169
00:06:17,100 --> 00:06:19,900
Or all semantic models must include the domain.

170
00:06:19,900 --> 00:06:22,900
Fine, but naming conventions don't constrain behavior.

171
00:06:22,900 --> 00:06:27,200
They don't prevent a user from creating test to final final at 2 a.m.

172
00:06:27,200 --> 00:06:30,100
They don't prevent someone from publishing an uncertified dataset

173
00:06:30,100 --> 00:06:31,500
used by the CEO's dashboard.

174
00:06:31,500 --> 00:06:32,800
They don't prevent duplication.

175
00:06:32,800 --> 00:06:34,700
They don't prevent cross-domain data grabs.

176
00:06:34,700 --> 00:06:35,900
They don't prevent the bill.

177
00:06:35,900 --> 00:06:38,600
They create the illusion of structure, which is worse than no structure

178
00:06:38,600 --> 00:06:41,600
because it makes leadership think the mess is handled.

179
00:06:41,600 --> 00:06:44,600
Second, approval workflows that don't constrain execution parts.

180
00:06:44,600 --> 00:06:47,600
This one is my favorite because it's the most expensive theater.

181
00:06:47,600 --> 00:06:50,500
A team builds a form, request a workspace.

182
00:06:50,500 --> 00:06:53,500
The series approves, everyone feels safe, a ticket gets closed.

183
00:06:53,500 --> 00:06:57,000
And then the user still has contributor in 10 other workspaces.

184
00:06:57,000 --> 00:07:00,000
Can create a lake house anywhere they already have access.

185
00:07:00,000 --> 00:07:03,300
Can create a semantic model against any data they can see.

186
00:07:03,300 --> 00:07:07,300
And can run refresh schedules that collide with the rest of the capacity.

187
00:07:07,300 --> 00:07:10,300
The approval workflow approved the creation of a container.

188
00:07:10,300 --> 00:07:14,300
It did not constrain the pathways that actually spend money and create risk.

189
00:07:14,300 --> 00:07:17,400
So what you get is a system where governance approves the front door

190
00:07:17,400 --> 00:07:19,300
while the side doors remain unlocked.

191
00:07:19,300 --> 00:07:23,500
Shortcuts, mirroring, dataset duplication, ad hoc notebooks,

192
00:07:23,500 --> 00:07:26,300
refresh schedules and background operations.

193
00:07:26,300 --> 00:07:29,300
Every unmanaged pathway becomes an entropy generator.

194
00:07:29,300 --> 00:07:32,200
Third, best practices without guardrails.

195
00:07:32,200 --> 00:07:35,300
This is where the COE publishes guidance, use direct lake.

196
00:07:35,300 --> 00:07:39,300
Don't duplicate data, use domains, use sensitivity labels,

197
00:07:39,300 --> 00:07:41,300
use certified data sets.

198
00:07:41,300 --> 00:07:43,300
Separate dev, test, prod.

199
00:07:43,300 --> 00:07:46,300
And then the platform offers no friction for ignoring any of it.

200
00:07:46,300 --> 00:07:49,300
So the estate becomes a museum of how followed recommendations.

201
00:07:49,300 --> 00:07:50,800
The people who care comply.

202
00:07:50,800 --> 00:07:52,300
The people who ship things fast don't.

203
00:07:52,300 --> 00:07:55,300
And the people who get rewarded are the ones who deliver dashboards,

204
00:07:55,300 --> 00:07:57,300
not the ones who prevent architectural erosion.

205
00:07:57,300 --> 00:07:59,800
This is where audit success becomes the final trap.

206
00:07:59,800 --> 00:08:03,300
Because audits love evidence, policies exist, training happen.

207
00:08:03,300 --> 00:08:04,300
There's a governance committee.

208
00:08:04,300 --> 00:08:05,800
Workspaces have owners.

209
00:08:05,800 --> 00:08:08,300
Sensitivity labels are available.

210
00:08:08,300 --> 00:08:09,800
There are quarterly reviews.

211
00:08:09,800 --> 00:08:11,300
The checkbox is light up.

212
00:08:11,300 --> 00:08:13,300
But audits don't measure runtime behavior.

213
00:08:13,300 --> 00:08:16,300
They don't measure whether your prod capacity got throttled by a dev notebook.

214
00:08:16,300 --> 00:08:20,300
They don't measure whether the same KPI exists in five semantic models.

215
00:08:20,300 --> 00:08:24,300
They don't measure whether your one-lake strategy turned into a shared junk drawer

216
00:08:24,300 --> 00:08:27,300
full of mirrored copies and personal lake houses.

217
00:08:27,300 --> 00:08:29,300
Passing an audit doesn't mean you have control.

218
00:08:29,300 --> 00:08:30,300
It means you have paperwork.

219
00:08:30,300 --> 00:08:34,300
And fabric, like every distributed platform, punishes paperwork first governance

220
00:08:34,300 --> 00:08:35,800
because drift is inevitable.

221
00:08:35,800 --> 00:08:37,300
Permissions expand.

222
00:08:37,300 --> 00:08:38,800
Exceptions accumulate.

223
00:08:38,800 --> 00:08:39,800
Teams change.

224
00:08:39,800 --> 00:08:41,800
Dead artifacts stay alive.

225
00:08:41,800 --> 00:08:42,800
Scheduled stack.

226
00:08:42,800 --> 00:08:45,800
Capacities get resized temporarily and never revisited.

227
00:08:45,800 --> 00:08:47,300
The platform did not betray you.

228
00:08:47,300 --> 00:08:49,800
You just tried to govern a decision engine with etiquette.

229
00:08:49,800 --> 00:08:50,800
So the test is simple.

230
00:08:50,800 --> 00:08:52,800
Does your governance change system behavior?

231
00:08:52,800 --> 00:08:54,800
Does it prevent creation without ownership?

232
00:08:54,800 --> 00:08:56,800
Does it block unsafe sharing paths?

233
00:08:56,800 --> 00:08:58,300
Does it enforce promotion gates?

234
00:08:58,300 --> 00:09:00,300
Does it quarantine non-compliant artifacts?

235
00:09:00,300 --> 00:09:03,300
Does it make cost visible and attributable at the moment of creation?

236
00:09:03,300 --> 00:09:04,800
Not at invoice time?

237
00:09:04,800 --> 00:09:06,300
If not, you don't have governance.

238
00:09:06,300 --> 00:09:07,800
You have theater.

239
00:09:07,800 --> 00:09:10,800
Next, the bill exposes what governance hides.

240
00:09:10,800 --> 00:09:13,800
Why fabric spend drifts even when usage looks stable?

241
00:09:13,800 --> 00:09:15,300
Cost entropy.

242
00:09:15,300 --> 00:09:18,300
Why fabric spend drifts even when usage looks stable?

243
00:09:18,300 --> 00:09:21,800
Here's where the illusion collapses because money has no patience for your governance theater.

244
00:09:21,800 --> 00:09:25,300
Most teams look at fabric cost drift and assume one of two stories.

245
00:09:25,300 --> 00:09:28,800
Either usage increased, or someone is abusing the platform.

246
00:09:28,800 --> 00:09:29,800
Sometimes that's true.

247
00:09:29,800 --> 00:09:31,800
Usually it's lazier than that.

248
00:09:31,800 --> 00:09:33,300
The real pattern is cost entropy.

249
00:09:33,300 --> 00:09:37,300
Spend rises because the estate accumulates coupling, background load,

250
00:09:37,300 --> 00:09:40,800
and contention even when the visible business workload looks stable.

251
00:09:40,800 --> 00:09:45,300
You still have the same reports, same refresh schedules, same number of users.

252
00:09:45,300 --> 00:09:47,300
Same data sets people point to in meetings.

253
00:09:47,300 --> 00:09:49,300
But the capacity burn climbs anyway.

254
00:09:49,300 --> 00:09:50,300
This is not mysterious.

255
00:09:50,300 --> 00:09:53,300
It's just shared compute plus growing uncertainty.

256
00:09:53,300 --> 00:09:55,300
Start with variable demand.

257
00:09:55,300 --> 00:09:58,300
Fabric workloads don't behave like a nice flat line.

258
00:09:58,300 --> 00:09:59,800
Interactive queries come in bursts.

259
00:09:59,800 --> 00:10:01,800
Refreshes stack, pipelines overlap.

260
00:10:01,800 --> 00:10:04,300
Someone runs a notebook just to test something.

261
00:10:04,300 --> 00:10:05,800
Mirroring catches up.

262
00:10:05,800 --> 00:10:07,800
Events streams continue to ingest.

263
00:10:07,800 --> 00:10:09,800
And all of that happens inside the same pool.

264
00:10:09,800 --> 00:10:14,300
So even if the average business usage stays stable, the shape of demand changes.

265
00:10:14,300 --> 00:10:16,800
Peaks get sharper, overlaps get more frequent.

266
00:10:16,800 --> 00:10:19,800
And in a shared pool, peaks are what you pay for operationally,

267
00:10:19,800 --> 00:10:21,800
even if you pay for capacity financially.

268
00:10:21,800 --> 00:10:25,800
This is where organizations confuse I bought an F-Squeue with I bought predictable behavior.

269
00:10:25,800 --> 00:10:28,300
You didn't. You bought the right to compete for a shared scheduler.

270
00:10:28,300 --> 00:10:29,800
Now add the invisible coupling.

271
00:10:29,800 --> 00:10:33,800
In a well-governed architecture, team A's workload should not become team B's incident.

272
00:10:33,800 --> 00:10:36,800
In fabric, if you don't enforce isolation, it will.

273
00:10:36,800 --> 00:10:38,800
The coupling is usually not malicious.

274
00:10:38,800 --> 00:10:40,800
It's accidental. The system invites it.

275
00:10:40,800 --> 00:10:41,800
Two workspaces share a capacity.

276
00:10:41,800 --> 00:10:44,800
A third workspace gets added because it's temporary.

277
00:10:44,800 --> 00:10:46,800
A fourth gets added because it's just dev.

278
00:10:46,800 --> 00:10:48,800
Then the refresh windows collide.

279
00:10:48,800 --> 00:10:50,800
The interactive users hit at the same time.

280
00:10:50,800 --> 00:10:53,800
And the smoothing model quietly absorbs the damage until it can't.

281
00:10:53,800 --> 00:10:56,800
And when it can't, the symptoms show up somewhere else.

282
00:10:56,800 --> 00:10:59,800
Not where the cause lives. That's what makes cost entropy so hard to fight.

283
00:10:59,800 --> 00:11:03,800
The operational pain doesn't line up with the organizational ownership.

284
00:11:03,800 --> 00:11:05,800
The capacity absorbs the blast radius.

285
00:11:05,800 --> 00:11:07,800
Then the most visible team screams first.

286
00:11:07,800 --> 00:11:09,800
Then you scale up to stabilize production.

287
00:11:09,800 --> 00:11:11,800
And the bill locks in the mistake.

288
00:11:11,800 --> 00:11:13,800
Now let's talk about background operations.

289
00:11:13,800 --> 00:11:16,800
Because this is where most forecasts die.

290
00:11:16,800 --> 00:11:18,800
Fabric has worked. You didn't schedule.

291
00:11:18,800 --> 00:11:21,800
Internal operations that support the platform's promises.

292
00:11:21,800 --> 00:11:25,800
Power BI has background evaluation and refresh related activity.

293
00:11:25,800 --> 00:11:28,800
Lakehouse and warehouse behaviors create their own maintenance patterns.

294
00:11:28,800 --> 00:11:31,800
Murrowing doesn't ask permission to keep itself current.

295
00:11:31,800 --> 00:11:36,800
Even metadata and caching behaviors exist to make interactive experiences feel fast.

296
00:11:36,800 --> 00:11:39,800
And none of that is free and it's not linear.

297
00:11:39,800 --> 00:11:43,800
Background load doesn't politely scale with your report count.

298
00:11:43,800 --> 00:11:49,800
It scales with sprawl. More artifacts, more models, more refreshed definitions, more duplication pathways.

299
00:11:49,800 --> 00:11:52,800
More places the platform must keep ready.

300
00:11:52,800 --> 00:11:57,800
So the org things we didn't add users. But the platform thinks you added complexity.

301
00:11:57,800 --> 00:11:59,800
And complexity consumes compute.

302
00:11:59,800 --> 00:12:04,800
Now the weird part. The smoothing and throttling model turns deterministic planning into probabilistic reality.

303
00:12:04,800 --> 00:12:10,800
In a traditional isolated system you can predict this refresh takes X minutes at Y time with Z impact.

304
00:12:10,800 --> 00:12:16,800
In a shared fabric capacity the same refresh can behave differently day to day because it's contending with whatever else is happening in the pool.

305
00:12:16,800 --> 00:12:24,800
smoothing makes this worse in one specific way. It delays the pain. Instead of immediate failure the system tolerates bursts for a window which sounds helpful.

306
00:12:24,800 --> 00:12:28,800
But it also makes teams believe they're safe. They aren't. They're borrowing from the future.

307
00:12:28,800 --> 00:12:35,800
And when the future arrives the platform doesn't negotiate. It throttles, it queues, it slows, sometimes it fails.

308
00:12:35,800 --> 00:12:40,800
And now the human response is always the same. Increase capacity because that's the one lever everyone understands.

309
00:12:40,800 --> 00:12:45,800
This is how cost drift becomes permanent because capacity scaling is easy. Cost reduction is political.

310
00:12:45,800 --> 00:12:50,800
So forecasting without workload isolation becomes fiction. You can model average utilization.

311
00:12:50,800 --> 00:12:58,800
You can count refreshes. You can estimate query volume. But you cannot reliably forecast contention behavior when multiple teams share a pool and operate independently.

312
00:12:58,800 --> 00:13:03,800
The variance dominates the mean. What this actually means is simple. Fabric cost doesn't drift because you don't have dashboards.

313
00:13:03,800 --> 00:13:08,800
It drifts because you don't have boundaries. And if you don't believe that you'll love the next trap.

314
00:13:08,800 --> 00:13:12,800
The metric tools exist. They are pretty. And they still tell the wrong story.

315
00:13:12,800 --> 00:13:17,800
The metrics trap capacity apps measure activity not accountability.

316
00:13:17,800 --> 00:13:27,800
Now we get to the part where people feel safe again because Microsoft gives you dashboards capacity metrics app charge back reports, CU seconds, pretty charts, trend lines, percentages that look like engineering.

317
00:13:27,800 --> 00:13:33,800
And everyone relaxes because the platform is measured. But this is the trap. These tools measure activity not accountability.

318
00:13:33,800 --> 00:13:44,800
They answer what happened in the narrowest technical sense and they rarely answer the only question the business cares about who owns this behavior and what decision will change it. Start with the capacity metrics app.

319
00:13:44,800 --> 00:13:55,800
It tells you utilization peaks background versus interactive top work spaces top items all useful. But it's still a view of physics not meaning utilization doesn't tell you whether the compute produced value.

320
00:13:55,800 --> 00:14:09,800
It tells you the engine was busy. That's it. A CPU at 5% doing nonsense is still doing nonsense a capacity at 80% serving duplicate semantic models is still serving duplicate semantic models. And the app can't tell the difference because it has no business semantics.

321
00:14:09,800 --> 00:14:16,800
It doesn't know which data set represents an executive KPI. It doesn't know which notebook exists because someone forgot to turn off a scheduled run.

322
00:14:16,800 --> 00:14:21,800
It doesn't know which refresh is supporting revenue reporting and which refresh is feeding someone's personal sandbox.

323
00:14:21,800 --> 00:14:33,800
So teams optimize the wrong thing. They optimize utilization not architecture. This might seem backwards but low utilization is not a win in fabric because capacity is a reservation not a meter.

324
00:14:33,800 --> 00:14:43,800
If you bought the capacity and you keep it running you pay whether it's busy or idle 2% utilization isn't a badge of efficiency. It's evidence of over commitment. It's proof you converted budget into unused headroom.

325
00:14:43,800 --> 00:14:57,800
And then leadership asks the obvious question why are we paying for an empty parking lot. Now the chargeback story. Fabric chargeback reporting usually allocates cost by workspace sometimes by item and that sounds fair until you realize workspaces are social containers not economic units.

326
00:14:57,800 --> 00:15:05,800
A workspace doesn't map cleanly to a product, a domain, a value stream or a business outcome. It maps to who had access and clicked create.

327
00:15:05,800 --> 00:15:13,800
So when finance wants to know why the bill doubled engineering hands them a report that says this workspace used 38% of the capacity.

328
00:15:13,800 --> 00:15:21,800
Okay, what does that mean? It means the workspace participated in shared content. It doesn't mean the workspace caused it. It doesn't mean the workspace benefited from it.

329
00:15:21,800 --> 00:15:28,800
And it doesn't mean the people responsible for the workload will feel pressure to change behavior because they can always move. They can split workloads into a new workspace.

330
00:15:28,800 --> 00:15:38,800
They can spread artifacts across multiple containers. They can create temporary workspaces. They can place the heavy stuff in a shared platform workspace and keep their consumer reports somewhere else.

331
00:15:38,800 --> 00:15:43,800
So workspace based allocation becomes political, not corrective. The number is precise. The ownership is ambiguous.

332
00:15:43,800 --> 00:15:50,800
CU Seconds are the purest form of this problem. CU Seconds feel like truth. They're a real unit. They're granular. They look like billing grade instrumentation.

333
00:15:50,800 --> 00:16:09,800
But CU Seconds don't contain intent. They don't contain accountability. They contain consumption and consumption in fabric is often a symptom of prior architectural drift. Too many semantic models. Too many refreshes. Too much duplication. Too much background work. Too many engines running the same logic because nobody enforced reuse.

334
00:16:09,800 --> 00:16:13,800
So CU Seconds become an input to conflict, not to design.

335
00:16:13,800 --> 00:16:27,800
Counter-intuitive part is what gets optimized. Once you give teams a utilization chart, they will chase utilization. They'll reschedule refreshes to avoid peaks without reducing the number of refreshes. They'll tune one expensive query while leaving three duplicate models untouched.

336
00:16:27,800 --> 00:16:36,800
They'll optimize a notebook run while continuing to mirror data into three lake houses because our team needs control. They will do local optimizations that preserve the global problem.

337
00:16:36,800 --> 00:16:48,800
Because the metric encourages tactical behavior, not structural change, and the structural change you need is boring. Fewer artifacts, fewer copies, fewer execution paths, stricter defaults, enforced ownership and real life cycle control.

338
00:16:48,800 --> 00:16:59,800
But your metrics app doesn't reward that. It rewards we reduced peak utilization by 10%. Meanwhile, the estate keeps growing. Here's the foundational misunderstanding behind the metrics trap.

339
00:16:59,800 --> 00:17:10,800
A capacity is not an accountability boundary. It is a shared execution pool. So any tooling that starts with capacity utilization is already one level too low. It's measuring engine pressure, not organizational responsibility.

340
00:17:10,800 --> 00:17:18,800
What this actually means is, don't ignore the metrics app. Use it as a smoke detector, but stop treating it as a governance model because governance is not knowing that something burned.

341
00:17:18,800 --> 00:17:26,800
Governance is preventing ignition pathways in the first place, and the first ignition pathway in fabric is where everyone pretends boundaries exist. The workspace.

342
00:17:26,800 --> 00:17:35,800
Workspace sprawl the default entropy generator. Workspaces are where fabric entropy becomes physical. Not because workspaces are evil, because workspaces are easy.

343
00:17:35,800 --> 00:17:47,800
Creation is a two-click decision with near zero immediate consequence. No architectural review, no cost signal, no life cycle gate, no mandatory ownership model that bites you later. Just name capacity done.

344
00:17:47,800 --> 00:17:54,800
And the system law is simple. If it can be created without friction, it will be created without thought. So, workspaces proliferate.

345
00:17:54,800 --> 00:18:04,800
They proliferate for legitimate reasons at first. A new project, a proof of concept, a merger, a new team, a vendor engagement, a temporary migration workspace.

346
00:18:04,800 --> 00:18:13,800
Then they proliferate for the reasons nobody admits, politics, speed, and avoiding governance because the easiest way to bypass a messy conversation about ownership is to create a new container.

347
00:18:13,800 --> 00:18:17,800
This is why workspace sprawl isn't an accident. It's a platform incentive.

348
00:18:17,800 --> 00:18:35,800
Now the first route is that dev, test, and prod collapse into whatever workspace exists. Every organization claims they separate environments. Most don't. They separate names. They'll have sales dev and sales prod and sales final and sales prod too because somebody needed to ship while access was being negotiated. Then the refresh schedules move.

349
00:18:35,800 --> 00:18:46,800
The pipelines get copied. The same semantic model definition shows up in three places with three different data sources because the team didn't want to break prod. And now you don't have environments. You have parallel universes.

350
00:18:46,800 --> 00:18:54,800
And when an executive asks, which one is the source of truth? And the answer becomes a sentence that starts with "well".

351
00:18:54,800 --> 00:19:04,800
Second route. Shared capacity makes workspace boundaries performative. You can pretend a workspace is isolated, but if it runs on the same capacity it shares the scheduler, the smoothing model, and the pane.

352
00:19:04,800 --> 00:19:17,800
So the so-called dev workspace runs a notebook at 9.15 am. The prod workspace has a refresh at 9.20 am and a critical report renders at 9.22 am. The capacity doesn't care what you wrote in the workspace name. It just sees concurrent pressure.

353
00:19:17,800 --> 00:19:27,800
So sprawl doesn't just create more containers. It creates more chances for accidental collisions, more overlapping schedules, more independent teams tuning their own workloads without seeing the global impact.

354
00:19:27,800 --> 00:19:41,800
And because the coupling is invisible, everyone blames everyone else. Third-rot, permission drift. Workspace permissions don't degrade because people are malicious. They degrade because work happens. A developer needs access just for today.

355
00:19:41,800 --> 00:19:48,800
A contractor needs access just for the sprint. An analyst needs contributor because viewer doesn't let them do the one thing they want.

356
00:19:48,800 --> 00:20:02,800
A manager becomes admin because someone has to approve a share and you know where this goes over time owners multiply contributors proliferate and the permission model becomes a social negotiation system. It stops being access control. It becomes access accretion.

357
00:20:02,800 --> 00:20:17,800
And in fabric, access accretion has a unique problem. Permissions aren't just about viewing. They're about creation. If someone can create, they can create debt. They can create semantic models. They can schedule refreshes. They can publish reports that become business critical because someone bookmarked them.

358
00:20:17,800 --> 00:20:28,800
They can duplicate data into their own lake house for performance. So permission drift isn't just a security risk. It's a spend multiplier. And it grows without anyone noticing because each permission change looks harmless in isolation.

359
00:20:28,800 --> 00:20:43,800
Fourth-rot, workspace ownership becomes a fiction. Most governance models say every workspace needs an owner. Great. But owner in practice means the person who created it, the person who inherited it, or the person who didn't dodge the meeting invite. That is not accountable ownership.

360
00:20:43,800 --> 00:20:56,800
Accountable ownership means someone is responsible for life cycle cost behavior access pathways and retirement. But retirement is where the entire model collapses because nobody gets promoted for shutting things down. So sprawl accumulates dead assets.

361
00:20:56,800 --> 00:21:08,800
Often workspaces for gotten models pipelines that still run because turning them off might break something and reports nobody uses, but nobody dares delete. This is where the default fabric estate becomes a graveyard of still billing artifacts.

362
00:21:08,800 --> 00:21:15,800
And you can't govern a graveyard with naming standards. What this actually means is the workspace is not just a container. It is the default entropy generator.

363
00:21:15,800 --> 00:21:24,800
Because it's the point where creation is cheap. Permission drift begins. Life cycle becomes optional and shared capacity turns separation into a story you tell auditors.

364
00:21:24,800 --> 00:21:32,800
Next fabric introduces domains and subdomains and everyone expects that to fix the sprawl. It won't. Domains and subdomains, taxonomy without control.

365
00:21:32,800 --> 00:21:36,800
Domains show up right after workspace sprawl like a cleanup crew arriving with clipboards.

366
00:21:36,800 --> 00:21:50,800
And executives love them because domains look like organizational alignment finance domain sales domain operations domain clean labels clean hierarchy clean screenshots for a steering committee deck but domains and subdomains as implemented attack.

367
00:21:50,800 --> 00:22:05,800
Not control that distinction matters because taxonomy is a story you tell humans control is behavior you enforce in systems fabric will happily let you build a beautifully organized catalog while the runtime continues to behave like a shared decision engine with no respect for your labels.

368
00:22:05,800 --> 00:22:18,800
Here's the foundational mistake leaders assume domains behave like boundaries. They don't domains group content they help discovery they help catalog navigation they help you reduce the where is the thing problem and that's real value.

369
00:22:18,800 --> 00:22:29,800
But it's not isolation it's not policy distribution it's not a security perimeter it's not compute segmentation it does not stop a refresh from colliding with a notebook in another domain because both can still sit on the same capacity.

370
00:22:29,800 --> 00:22:39,800
So when someone says we aligned fabric to the business using domains what they usually mean is we built a directory structure that is not architecture it's filing.

371
00:22:39,800 --> 00:22:58,800
Now the next failure mode is subtler the belief that business alignment equals enforcement a domain can be named finance but that doesn't mean finance owns the metric definitions it doesn't mean finance controls the semantic layer it doesn't mean finance controls the one lake shortcuts pointing into their data it doesn't mean finance decides what gets mirror duplicated or certain.

372
00:22:58,800 --> 00:23:18,800
A domain label does not grant authority and it definitely does not prevent the most common cross domain behavior in fabric data appropriation a team in sales finds a finance table needs it for a dashboard and creates their own semantic model because it's faster than asking finance for a change they might even copy the data for performance or for stability.

373
00:23:18,800 --> 00:23:26,800
Now you have multiple definitions of the same numbers spread across domains all labeled neatly the catalog looks aligned the truth is fractured.

374
00:23:26,800 --> 00:23:53,800
Subdomains make this worse when they're used as a substitute for ownership people add subdomains like they add folders North America amir product customer reporting analytics it feels like progress it creates the appearance of a mature information architecture but labels don't stop duplication labels don't stop drift labels don't stop cost they don't stop the same data set from being recreated five times because each team insists they need their own version.

375
00:23:53,800 --> 00:24:12,800
And because fabric is optimized for self service the system does not naturally converge it diverges over time subdomains become a map of organizational politics rather than a map of data products you'll see the same content duplicated across subdomains because nobody wants to be dependent on anyone else and nobody wants to accept a shared support model.

376
00:24:12,800 --> 00:24:39,800
The executive dashboard view is the ultimate deception here domains and subdomains generate top level structure that makes leadership feel like governance exists you can show charts number of items per domain activity per domain top consumers per domain you can create the illusion of accountability but the rot underneath continues identical metrics inconsistent definitions unclear owners unmanaged refresh schedules and shared capacity contention that doesn't care about your taxonomy.

377
00:24:39,800 --> 00:25:08,800
What domains should have been is obvious policy distribution points in a deterministic governance model a domain isn't a label it's a control boundary where defaults and constraints propagate creation rules who can create which artifact types inside the domain ownership rules every artifact must have an accountable owner tied to a real team sensitivity rules baseline labeling and access pathways required for the domain lifecycle rules promotion gates and retirement expectations enforced by automation cost rules.

378
00:25:08,800 --> 00:25:37,800
Cost rules budgets and thresholds tied to domain behavior not just workspace activity in other words domains should have acted like an authorization compiler target if you're building in this domain these constraints apply but without enforcement you get the comfortable outcome need categorization with no change in runtime behavior so domains are not useless they are just insufficient they can help you discover and communicate structure they can reduce catalog chaos they can support executive reporting but they cannot prevent entropy not when creation is still cheap.

379
00:25:37,800 --> 00:26:06,800
Ownership is still optional and capacities remain shared pools were contention ignore your orchard and this is why the next expectation is so predictable once domains fail to create order everyone looks to one leg for the magic they're about to be disappointed one leg the one drive for data metaphor that misleads leaders one leg gets pitched with a metaphor that leaders instantly understand it's one drive for data that metaphor is convenient it is also misleading because one drive works like a personal file cabinet with sharing it implies ownership is not a problem.

380
00:26:06,800 --> 00:26:35,800
It implies ownership it implies a default boundary my stuff and then stuff I shared it implies a primary access path and a recognizable permission model and most importantly it implies that centralization equals control one leg does not behave like that one leg is a storage plane that multiple engines can read and write through multiple pathways with multiple copies and references all while your organization tells itself a story about single source of truth what this actually means is one leg centralizes placement not responsibility and centralizing placement without enforced responsibility just

381
00:26:35,800 --> 00:27:03,800
creates a more efficient way to accumulate junk here's what most people miss one leg is not a single copy guarantee it is a single name space with multiple ways to materialize data you can land data you can shortcut it you can mirror it you can write it through pipelines you can create it through notebooks you can ingest streaming and persisted and you can do all of that across workspaces domains and engines so the executive here's one leg and assumes one version the architect here's one leg and should assume one battlefield because a shared storage plane does not eliminate fragmentation it just

382
00:27:03,800 --> 00:27:19,800
moves fragmentation upstream into how data gets into that plane and how many teams decide they need their own representation of it that distinction matters now add shortcuts shortcuts are genuinely useful they can reduce copying when teams accept shared ownership and shared life cycle

383
00:27:19,800 --> 00:27:48,800
but shortcuts also expose the cultural truth most teams don't want shared ownership they want independent delivery with no dependency risk so what happens teams avoid shortcuts when shortcuts require coordination they use shortcuts when shortcuts let them bypass a conversation and when shortcuts aren't used the fallback path becomes predictable copy it into my leg house so one leg becomes a centralization layer that still hosts multiple versions of the same thing because the platform can't enforce the social contract required to keep one copy true

384
00:27:48,800 --> 00:28:04,800
now add mirroring mirroring is another adoption accelerant it gets operational data into one leg fast with minimal engineering friction great but it also changes data gravity once mirroring exists everything wants to land in one leg because one leg becomes the easiest place to build analytics

385
00:28:04,800 --> 00:28:24,800
and that gravitational pull is not neutral it drags incomplete governance along with it you end up with mirrored operational data that looks official because it centralized but still lacks the things governance needs to stay deterministic explicit ownership clear consumption contracts controlled exposure pathways and life cycle rules about what gets created downstream

386
00:28:24,800 --> 00:28:33,800
so leaders say we mirrored the data now the analytics team can self serve and the platform here's we will multiply semantic models and refresh schedules until the capacity throttles

387
00:28:33,800 --> 00:28:50,800
this is where sensitivity labels show up in the story as the comforting solution yes labels help they are necessary but they don't solve access pathways a label can follow a file it can follow a table it can inform downstream tools it can trigger DLP policies and reduce obvious leakage but it can't stop the two most common one leg failures

388
00:28:50,800 --> 00:29:10,800
first the shared junk draw if one leg is the place where everyone lands data for later then one leg becomes a landfill of half model tables ad hoc files experimental notebooks and multiple copies of data sets with slightly different names centralization doesn't prevent junk it just makes junk easier to find second governance lag

389
00:29:10,800 --> 00:29:30,800
data lands faster than governance can assign ownership certified definitions and enforce life cycle that means the estate becomes time based the newest thing is the thing people use not the govern thing and once people build reports on the newest thing that thing becomes critical and now you are retroactively trying to govern a dependency chain you didn't approve fabric is not unique here

390
00:29:30,800 --> 00:29:59,800
this is what happens in every self service environment when the storage plane is easy and the control plane is optional and there's a final problem with the one drive metaphor it frames one leg as a collaboration convenience but you are not collaborating on files you are operating an authorization graph every shortcut every mirror every table every semantic model every refresh schedule every shared item is a new edge in that graph and edges accumulate they do not self heal they do not retire themselves they become permanent until you enforce deletion so one leg becomes the shared center of gravity for the estate

391
00:29:59,800 --> 00:30:27,800
and gravity always wins if you don't enforce constraints on what can land how it can be reused and who is accountable for it one leg doesn't simplify governance it concentrates your failure into a single place next the duplication pathways short cuts copies mirroring and zero ETL where the cost and trust rot quietly compounds duplication pathways short cuts copies mirroring and the illusion of zero ETL duplication is the silent killer in fabric because it looks like progress

392
00:30:27,800 --> 00:30:43,800
a team landstata once then another team reuses it by copying it into their own lake house then someone mirrors the source system so we're not blocked then a third team builds a semantic model directly over their copy because the original one is too slow or owned by someone else

393
00:30:43,800 --> 00:30:49,800
and suddenly your unified platform contains three versions of the same truth not maliciously predictably

394
00:30:49,800 --> 00:31:03,800
here's what most people miss fabric didn't create duplication it removed the friction that used to limit duplication it made the easy path faster than the correct path start with short cuts short cuts are the closest thing fabric has to an anti entropy mechanism

395
00:31:03,800 --> 00:31:18,800
they let you reference data without copying it in the same world shortcuts would become the default one copy many consumers shared ownership shared life cycle shared truth but shortcuts come with a requirement that most organizations refuse to meet trust if team a short cuts team bees data

396
00:31:18,800 --> 00:31:34,800
team a has accepted dependency that means team a is now exposed to team bees schema changes refresh timing access decisions retention rules and incident response quality and team a doesn't want that so short cuts get used in the narrow cases where dependency risk is low or where a central platform team forces it

397
00:31:34,800 --> 00:31:44,800
everywhere else teams choose the path that guarantees local control copy and copying is where both cost and governance rot compound because once you copy data you also copy responsibility

398
00:31:44,800 --> 00:32:00,800
except nobody agrees to own the copied version they just use it now mirroring mirroring sells itself as zero ETL and leaders here less engineering faster analytics fewer pipelines again convenient again misleading

399
00:32:00,800 --> 00:32:12,800
mirroring is not no work it is work you no longer control it creates a new ingestion pathway that runs continuously and it changes who gets blamed when data looks wrong the source team says that's what the system produced

400
00:32:12,800 --> 00:32:25,800
the analytics team says that's the data we got and now governance has to deal with the third party the platform itself executing sync logic on your behalf mirroring also multiplies downstream consumption because it makes data feel official

401
00:32:25,800 --> 00:32:37,800
once data is mirrored into one lake every team treats it as available and available turns into allowed inside self service cultures people don't wait for curated products they build now an argue later

402
00:32:37,800 --> 00:32:55,800
so mirroring accelerates ingestion but it also accelerates duplication multiple lake houses layering transformations over the same mirrored base multiple semantic models over similar structures multiple refresh schedules fighting for the same capacity pool you didn't remove ETL you distributed it

403
00:32:55,800 --> 00:33:24,800
now the most common phrase in fabric estates just loaded to my lake house that sentence is not a technical decision it's an organizational confession it means the organization lacks a trusted shared layer lacks a product ownership model and lacks in force consumption pathways so individuals create private certainty and private certainty is expensive because the lake house copy is not just storage it is compute ingestion transformation refresh indexing catching behaviors background operations and repeated query work across multiple engines every copy becomes its own

404
00:33:24,800 --> 00:33:45,800
gravitational center it attracts its own semantic model its own dashboards its own this is critical stories then you get the next layer of entropy multiple semantic models over similar data this is where trust dies the business doesn't care that the data came from the same source system they care that the KPI is consistent but when five teams build five semantic models you get five definitions

405
00:33:45,800 --> 00:34:02,800
even if they all start identical they won't stay identical because drift is the default state of unmanaged artifacts and here's the uncomfortable truth direct lake speed can make this worse because it reduces the pain that used to force rationalization when everything is fast enough nobody consolidates they proliferate

406
00:34:02,800 --> 00:34:23,800
so why does the system incentivize this because the platform rewards speed now externalizes that later fabric makes it easy to create artifacts easy to duplicate easy to mirror easy to build semantic models quickly easy to refresh easy to publish it is an adoption engine but adoption engines create debt unless you enforce constraints so the illusion of zero etl becomes zero accountability

407
00:34:23,800 --> 00:34:46,800
the pathways multiply the copies accumulate and the bill grows while everyone insists usage is stable next once the data exists in multiple forms the engine start disagreeing lake house versus warehouse versus event house versus power be I all sharing one bill and one capacity scheduler multi engine reality lake house versus warehouse versus event house versus power be I fabric pretends you bought one experience

408
00:34:46,800 --> 00:35:10,800
what you actually bought is a portfolio of engines with different rules different optimizers and different cost behaviors then you staple them to the same capacity bill that is not simplification that is consolidation of blast radius lake house is spark first and file first warehouse is SQL first with its own execution model even houses KQL first and time series optimized power be I semantic first interaction driven each engine exists because it solves a different class of problem

409
00:35:10,800 --> 00:35:27,800
but fabric markets the boundary as just choose the item and that's where the rod begins because team street engine choice like you I preference I like SQL so warehouse I like notebooks so lake house we have streaming so event house we need dashboards so power be I that is not architecture

410
00:35:27,800 --> 00:35:39,800
that is too comfort disguised as strategy the system consequence is predictable the same data lands in multiple engines in multiple representations with multiple refresh patterns each consuming the same capacity pool in different ways

411
00:35:39,800 --> 00:36:01,800
and then you wonder why the bill is confusing here's the uncomfortable truth some query patterns are cheap in one engine and expensive in another a wide scan over delta tables might be tolerable in spark when you batch it but expensive when it's driven by interactive slicing a star schema style query might feel natural in a warehouse but the moment you bolt on multiple transformations upstream and keep rematerializing the cost moves

412
00:36:01,800 --> 00:36:22,800
even house might make ingestion and time window queries feel effortless but if you start using it as a general purpose store because it feels fast you'll pay for that illusion later in retention hot cash behavior and downstream duplication and power be I is its own category of cost behavior it doesn't just run queries it creates an interaction surface that triggers compute at the speed of curiosity

413
00:36:22,800 --> 00:36:46,800
if the model is clean this is fine if the model is proliferating it becomes death by a thousand clicks now at the most common multi engine failure mode same data multiple representations teams will mirror a database for freshness then they'll land a copy into a lake house for transformation then they'll build a warehouse for SQL users then they'll build a semantic model for the exact dashboard

414
00:36:46,800 --> 00:37:10,800
four engines one data set four cost signatures and the platform doesn't stop you because nothing in fabrics default posture says this is the canonical representation everything else is forbidden so the estate converges on redundancy that redundancy creates confusion in performance tuning too one engine hides another engines constraints people see a slow report and start tuning DAX measures when the real bottleneck is upstream contention on the capacity

415
00:37:10,800 --> 00:37:31,800
or they see a slow query and start indexing in the warehouse while the real issue is that the same data got duplicated and refreshed three times before the query even ran the engines aren't cooperating they're competing and because you are paying at the capacity layer you don't get a clean bill per engine you get a shared pool where workload types collide interactive and background streaming and batch notebook and refresh

416
00:37:31,800 --> 00:37:41,800
SQL query and model render this is where the one experience story does the most damage it hides that you're operating multiple runtimes that require different governance models

417
00:37:41,800 --> 00:37:51,800
lake house needs guardrails on notebooks jobs scheduling and data product boundaries warehouse needs guardrails on schema ownership query patterns and consumer concurrency

418
00:37:51,800 --> 00:38:09,800
even house needs guardrails on ingestion patterns retention and downstream export pathways power be I needs guardrails on semantic model ownership refresh cadence and metric definition control if you apply one generic governance template to all of them naming a COE a request form you will get the same outcome conditional chaos

419
00:38:09,800 --> 00:38:19,800
because the weakest control plane governs the portfolio and the weakest control plane in most orgs is the semantic layer the part that turns data into meaning and then exposes that meaning to everyone

420
00:38:19,800 --> 00:38:28,800
that's where the next failure mode lives semantic model entropy where self service become self harm semantic model entropy where self service become self harm

421
00:38:28,800 --> 00:38:37,800
this is where fabrics governance illusion turns into executive grade damage because the semantic model is the product not the lake house not the warehouse not the pipeline those are plumbing

422
00:38:37,800 --> 00:38:46,800
the semantic layer is where raw data becomes a number someone will better budget on and fabric makes that layer easy to create so it gets created a lot

423
00:38:46,800 --> 00:38:58,800
self service sounds empowering but in a shared capacity environment unmanage self service is just self harm with a Microsoft logo on it the foundational mistake is treating semantic models as disposable artifacts they aren't

424
00:38:58,800 --> 00:39:13,800
a semantic model is an agreement definitions relationships measures filters security rules refresh behavior and performance assumptions when you allow everyone to mint their own agreements you don't get agility you get a combinatorial explosion of meaning and meaning doesn't reconcile itself

425
00:39:13,800 --> 00:39:29,800
the first entropy pattern is data set proliferation someone needs a report quickly they create a model they publish it works then someone else needs the same report but with one extra column and instead of extending the shared model they copy it or they connect directly to the lake house and create a new model just for this use case

426
00:39:29,800 --> 00:39:42,800
and all they build a thin model on top of a thick model because the UI makes that feel like progress fabric doesn't block that behavior it accelerates it so you end up with dozens then hundreds of semantic models that differ by accident not by this model

427
00:39:42,800 --> 00:39:57,800
they differ in measures they differ in filter logic they differ in time intelligence they differ in which tables got included they differ in relationship direction they differ in who has access and then you get the executive moment two dashboards same KPI name different number

428
00:39:57,800 --> 00:40:09,800
that's not a data problem that's a semantic governance failure now admetric drift a KPI is in the number it's a definition under pressure revenue becomes revenue excluding refunds becomes revenue excluding refunds but including credits

429
00:40:09,800 --> 00:40:24,800
but only for shipped orders every team thinks their version is reasonable every version might even be defensible but the business needs one version if you don't enforce a canonical semantic layer you create an environment where every team ships their own truth

430
00:40:24,800 --> 00:40:37,800
and then executive stop trusting any of it not because the math is wrong because the meaning is unstable here's the part nobody likes hearing promoted and certified are not governance they are labels they are signals they are not constraints

431
00:40:37,800 --> 00:40:51,800
a data set can be certified and still be duplicated a model can be promoted and still be bypassed a certified model can be too slow to rigid or too politically owned so teams build around it certification doesn't stop sprawl it documents sprawl

432
00:40:51,800 --> 00:40:58,800
now we talk about direct lake because it's the perfect example of comfort creating rot direct lake is fast it's also seductive

433
00:40:58,800 --> 00:41:09,800
it removes some of the friction that used to force architects to think do we really need another data set do we really want to refresh this again do we really want to import this data into yet another model

434
00:41:09,800 --> 00:41:19,800
when the model feels instant people stop thinking about its life cycle and when people stop thinking about life cycle artifacts never die so direct lake can become a multiplier of semantic model entropy

435
00:41:19,800 --> 00:41:39,800
because the performance is good enough that nobody feels pain early you don't feel the cost of proliferation until the portfolio is large the capacity is contested and the refresh schedule stack then you feel it all at once and because the pain shows up as capacity issues the response is usually scale up not delete models that's how semantic entropy becomes cost entropy now add security and access behavior

436
00:41:39,800 --> 00:41:56,800
row level security object level security sensitivity labels and sharing policies all interact with semantic models when models proliferate you proliferate security policy surfaces too each model becomes a separate authorization decision graph each one must be reviewed tested and maintained nobody does that at scale

437
00:41:56,800 --> 00:42:08,800
so security becomes probabilistic we think it's restricted we think the right people have access we think the model uses the govern table you are no longer operating a deterministic security model you're operating conditional chaos

438
00:42:08,800 --> 00:42:24,800
and the final insult is that this all looks like success on paper look at the number of reports delivered look at the adoption look at the self service until the cfo asks why the number changed until the cso asks who can see what until the platform team gets paged because a refresh storm throttle the capacity

439
00:42:24,800 --> 00:42:37,800
semantic model entropy is where fabric stops being a data platform and becomes a trust destruction machine next once meaning drifts forecasting and smart analytics become performance theater because you can't predict the future with a data set you can't even define

440
00:42:37,800 --> 00:42:53,800
forecasting and smart analytics garbage in executive out forecasting is where leadership finally admits what they actually wanted all along not dashboards but decisions they don't want to know what happened they want to know what will happen what to do about it and whether the numbers can be trusted enough to justify a bet

441
00:42:53,800 --> 00:43:11,800
and this is why forecasting is the most reliable lie detector in a fabric estate because forecasting assumes stability stable definitions stable history stable data completeness stable refresh behavior stable lineage stable ownership it assumes your analytics layer behaves like an engineered system most fabric

442
00:43:11,800 --> 00:43:27,800
estates behave like an ecosystem ecosystems don't forecast well start with the basic requirement seasonality and prediction only work when the past is coherent that means your history must be complete and your meaning must be consistent if the last two years of revenue are actually

443
00:43:27,800 --> 00:43:40,800
three different revenue definitions stitched together by data set churn your model will still forecast it will just forecast nonsense and the worst part is that it will forecast nonsense with confidence the chart will be smooth the line will look precise the confidence

444
00:43:40,800 --> 00:43:59,800
interval will look scientific the executive will not and take the number into a budget meeting this is the moment where governance debt becomes executive debt because dashboards can be ignored forecasts get acted on now ad missing data most organizations assume missing data is obvious it's not in real world fabric estates missing data gets introduced by drift

445
00:43:59,800 --> 00:44:16,800
and get silently rerouted a table stops refreshing a model switches from one lake house to another temporarily a semantic definition changes a filter logic gets modified a data source gets mirrored in a new way none of this raises a red flag in a meeting

446
00:44:16,800 --> 00:44:30,800
it destroys the integrity of time series and time series forecasting is brutally literal it will find patterns in whatever you feed it even if those patterns are artifacts of your governance failures the system doesn't know your pipeline broke it just sees a dip so it learns it's happening q4

447
00:44:30,800 --> 00:44:45,800
then next year q4 arrives in the model predicts a dip and leadership under orders inventory under staff support or cut spend at the wrong time this is how analytics becomes a force multiplier for error the problem isn't that forecasting is bad it's that forecasting amplifies whatever discipline you already have if you want to do it.

448
00:44:45,800 --> 00:45:09,800
if you're history is stable it helps you if your history is chaotic it operationalizes your chaos now we get to the modern excuse smart analytics a i insights copilot whatever branding is currently fashionable these tools don't fix governance gaps they build on top of them they still need stable semantics they still need lineage they still need a canonical metric layer they still need consistent access pathways they still need data quality

449
00:45:09,800 --> 00:45:11,800
the

450
00:45:11,800 --> 00:45:31,800
schema drift and partial refreshes before an executive sees a polished story otherwise you are automating the production of plausible misinformation and the reason this is so seductive is that it feels like progress you can take a messy model drop in a forecasting visual add a parameter and generate a future it looks like you leveled up from reporting to prediction but you didn't you just moved from describing

451
00:45:31,800 --> 00:45:50,800
to extrapolating it now connected to cost entropy because this is where the platform punishes you twice forecasting workloads are not free they often encourage larger data sets more refresh frequency more exploration more experimentation more what if analysis that means more semantic models more compute more background operations more interactive queries so the

452
00:45:50,800 --> 00:46:05,800
state pays for the privilege of producing forecast it can't trust and then leadership reacts in the only way leadership knows they lose trust they start questioning the numbers they stop funding the platform team they freeze capacity upgrades they turn every analytics initiative into a political negotiation

453
00:46:05,800 --> 00:46:19,800
this is the predictable and state executive distrust of data not because fabric can't do forecasting because you try to do forecasting on top of semantic entropy duplication pathways workspace sprawl and governance theater garbage in executive out so

454
00:46:19,800 --> 00:46:39,800
when someone says we're going to add forecasting to prove value the answer is simple forecasting doesn't prove value it proves whether you have control and when you don't the organization reaches for the next comforting distraction polish the dashboards standardized the UI build a design system because if the charts look consistent maybe the

455
00:46:39,800 --> 00:46:54,800
meaning is consistent it isn't next why design systems fix the wrong layer first and how they become the final stage of analytics performance theater design systems as a distraction consistency of you I isn't consistency of meaning this is the

456
00:46:54,800 --> 00:47:06,800
stage of the program where the organization runs out of patients for invisible problems capacity feels unstable forecast looks suspicious executives start asking why the same KPI changes between meetings and instead of fixing the semantic layer and the

457
00:47:06,800 --> 00:47:21,800
control plane the organization does the most human thing possible it fixes what it can see it standardizes the dashboard so you get a design system initiative a theme file a component library a report template with approved fonts colors headers navigation buttons slice a

458
00:47:21,800 --> 00:47:50,800
little last refresh stamp in the corner like a credibility sticker and yes design systems can be useful but only at the right layer in fabric a state's design systems become a distraction because they offer the feeling of governance without any governance of meaning visual consistency is not semantic consistency a KPI card with the right color palette is still wrong if the measure definition drifted a clean layout does not fix broken lineage a standardized set of icons does not fix the fact that three teams each created their own revenue and all three are still running refresh schedules on the same shared capacity

459
00:47:50,800 --> 00:47:57,800
design is a presentation layer fabric rod happens in the decision layer that distinction matters

460
00:47:57,800 --> 00:48:16,800
is the mechanism design systems reduce ambiguity of interpretation but they do not reduce ambiguity of source they help a user navigate they help a report look professional they reduce cognitive load they might even reduce support tickets like where do I click that's that's real value but they don't answer the two questions that are destroying trust where did this

461
00:48:16,800 --> 00:48:37,800
the number come from and who owns it so a design system is often the organization optimizing for first impressions because first impressions are measurable stakeholders can approve a mockup leadership can see the before and after everyone can clap meanwhile the same data set proliferation continues under the surface because nothing in the design system changes what can be created where it can be created how it gets certified or whether it gets retired

462
00:48:37,800 --> 00:48:59,800
the weird part is that design systems can actually make the rod harder to detect when every report looks the same users assume the data behaves the same visual standardization creates implied standardization of meaning it's an unearned trust signal it makes inconsistent semantics feel consistent and once you do that you build the perfect executive trap a portfolio of dashboards that look unified while the underlying definitions quietly diverge

463
00:48:59,800 --> 00:49:28,800
now add the power be a reality power be i make the trivial to make something look polished themes apply fast layouts replicate templates copy paste you can turn a messy report into a professional report in an afternoon but the back end lineage ownership metric definitions refresh behavior capacity contention still runs like a distributed system that nobody is governing deterministically so you get a familiar failure mode the organ west in design not in meaning and then leadership experiences a more expensive version of disappointment

464
00:49:28,800 --> 00:49:42,800
but it looks governed yes it looks governed it isn't this is also where teams confuse design systems with standardization standardization is an enforcement problem a theme file is not enforcement it's a convenience

465
00:49:42,800 --> 00:49:53,800
if someone can publish a report without using the template they will if someone can create a data set without registering ownership they will if someone can bypass the certified semantic model because it's easier they will

466
00:49:53,800 --> 00:50:22,800
and the platform will accept it so design systems often become the last chapter of analytics performance theater a lot of work that improves optics while the trust problem keeps compounding now there is a same place for design systems in a governor state they work when they sit downstream of enforced semantics when you have a canonical semantic model per domain when measures are controlled when lineage is visible and accurate when artifact creation is constrained by policy not etiquette when dev and prod are separated by enforced promotion not folder naming

467
00:50:22,800 --> 00:50:38,800
when cost behavior is monitored and acted on then design becomes the final 10% clarity and usability on top of a stable meaning layer but if you apply design before you have deterministic semantics you are just painting over cracks in a load bearing wall it might look better it will fail anyway

468
00:50:38,800 --> 00:50:57,800
and it fails in the most predictable way while everyone debates UX the compute bill keeps running because the platform doesn't charge you for ugly it charges you for behavior so if you want a simple rule use design systems to reduce user confusion not to reduce governance risk if you want to reduce governance risk you need constraints that change system behavior

469
00:50:57,800 --> 00:51:07,800
and that means the next topic capacity chaos where start small and scale quietly turns into permanent over provisioning capacity chaos when start small and scale turns into permanent over provisioning

470
00:51:07,800 --> 00:51:27,800
now we get to the move every fabric program makes eventually capacity escalation as a substitute for design it starts innocently someone enables the trial the org gets an f64 for free everything feels fast everyone gets addicted to no friction then the trial ends reality arrives and the first paid capacity gets purchased usually start small f2

471
00:51:27,800 --> 00:51:42,800
f4 may be f8 and within weeks someone says the fatal sentence we need to scale up to stabilize not to create value to stabilize that's when capacity stops being a deliberate choice and becomes a reflex here's why it happens

472
00:51:42,800 --> 00:51:55,800
shared capacity makes performance failures look like platform limitations throttling shows up users complain refreshes miss their window reports spin somebody screenshots a capacity metrics chart with a scary spike then leadership asks for the fix

473
00:51:55,800 --> 00:52:20,800
and the only fix that doesn't require governance conversation is more capacity scaling is easy it's a skewed drop down governance is not so the platform gets bigger and the estate stays undisciplined the weird part is that the bigger you go the less incentive you have to fix root causes extra headroom masks architectural failures duplicated semantic models overlapping refresh storms unnecessary pipelines background operations piled on top of a shared scheduler

474
00:52:20,800 --> 00:52:36,800
it all works again until it doesn't then you scale again that's how start small and scale becomes permanent over provisioning now let's talk about the lie people tell themselves at this stage we can always pause or scale down later technically yes operationally no

475
00:52:36,800 --> 00:52:47,800
pausing and scaling down require certainty they require you to know what workloads run when what breaks if you stop who will scream first and what business process depends on which artifact in a sprawl driven fabric

476
00:52:47,800 --> 00:53:03,800
estate you don't have that certainty you have folklore someone says don't pause it a refresh runs overnight someone else says we've got pipelines on a schedule someone else says a pack might be using it nobody knows nobody wants to be the one who finds out so the capacity stays on forever

477
00:53:03,800 --> 00:53:15,800
and the bill becomes a fixed cost with a vague value story attached the next trap is scheduling everyone here's the same advice scale up during business hours scale down at night pause on weekends and yes you can automate this with

478
00:53:15,800 --> 00:53:35,800
your automation logic apps or scripts hitting the fabric management APIs but schedules fail for one reason the workload isn't actually aligned to the clock pipelines don't only run in one window refreshes don't only run in one window notebooks get triggered by just testing mirroring continues streaming continues background operations

479
00:53:35,800 --> 00:54:02,800
continue and because teams keep adding artifacts without a life cycle gate you accumulate long tail activity that never stops so the schedule becomes an incident generator the org tries it once something breaks at 6 o 3 p.m. someone gets paged the capacity gets turned back on manually the schedule gets disabled then everyone says scheduling isn't realistic for us no your architecture isn't realistic for scheduling now ad reserved instances because Microsoft will happily offer you discounts for committing to the mistake

480
00:54:02,800 --> 00:54:27,800
reservations make the spend looks smarter we saved 40% but the reservation discount is not efficiency its commitment and commitment encourages complacency once you reserve the pain of turning it off feels higher because now the cost is psychologically prepaid it becomes a sunk cost defense mechanism we're already paying for it so keep it running this is how you end up with predictable cost and unpredictable value a stable bill is not the same thing as a control

481
00:54:27,800 --> 00:54:42,800
state and there's one more brutal truth fabric scaling decisions are often made by the loudest team not the most valuable workload the team with the most executive visibility the team with the most users the team that complains fastest those teams get the capacity increased to protect the business

482
00:54:42,800 --> 00:54:52,800
meanwhile the real driver might be an unmanaged dev workspace running a notebook loop or a duplicated semantic model refresh storm or a mirror data set with downstream copies stacking background compute

483
00:54:52,800 --> 00:55:02,800
but that requires investigation scaling doesn't so capacity chaos becomes the default operating mode by bigger hope the problem disappears repeat

484
00:55:02,800 --> 00:55:12,800
this is why in a rotting estate your Finops program ends up optimizing the invoice rather than controlling behavior and the missing layer is obvious accountability tied to consumption

485
00:55:12,800 --> 00:55:24,800
not chargeback theater not utilization charts accountability that forces decisions retire consolidate enforce reuse block creation isolate workloads or pay knowingly

486
00:55:24,800 --> 00:55:34,800
next the org tries to solve that with charge back and it turns into fiction charge back is fiction allocation without ownership charge back is where fabric programs go to die politely

487
00:55:34,800 --> 00:55:46,800
because it sounds like accountability it produces numbers produces spreadsheets it produces meetings where finance finally feels included and it produces the illusion that you can fix architectural drift with allocation math you can't

488
00:55:46,800 --> 00:56:02,800
charge back only works when ownership exists first not someone has access not someone created it ownership as in a named team that can make binding decisions about what gets built what gets reused what gets retired and what gets funded most fabric estates don't have that

489
00:56:02,800 --> 00:56:19,800
so charge back becomes reporting without consequences here's the core problem workspace charge back turns a shared compute pool into a political boundary finance wants the bill broken down engineering wants the bill to stop being their problem so you allocate costs by workspace maybe by item and you declare victory then the fights start

490
00:56:19,800 --> 00:56:26,800
because a workspace is in the product it's not a value stream it's not a contract it's a container that exists because someone needed a place to click new item

491
00:56:26,800 --> 00:56:43,800
so when you allocate spend by workspace you're not assigning cost to value you're assigning cost to convenience and people respond to incentives once teams know that cost will be attributed to their workspace they do what every rational actor does inside a weak control plane they move the workload they split it they rename it they spread it they

492
00:56:43,800 --> 00:56:55,800
tuck expensive things into shared workspaces they publish consumer facing reports in one place and run heavy refreshes somewhere else they create platform workspaces that magically absorb cost because nobody wants to argue with the platform team

493
00:56:55,800 --> 00:57:13,800
this isn't fraud its organizational physics if the attribution model is easy to game it will be game and in fabric it is easy because the boundaries are social now add the next failure mode charge back creates conflict not change you show a business unit they spent 40% of capacity their first response

494
00:57:13,800 --> 00:57:25,800
isn't we should consolidate models their first response is we didn't authorize that then then you spend three weeks debating which artifacts belong to which team which refreshes count as shared services and which workspaces should be exempt

495
00:57:25,800 --> 00:57:39,800
meanwhile the capacity bill continues the system doesn't stop billing while you argue so charge back becomes a quarterly ritual of resentment finance pushing for granularity engineering pushing back on semantics and business leaders treating the numbers like accusations

496
00:57:39,800 --> 00:57:57,800
and even when the number is correct it still doesn't force a decision because cost without control is just guilt the next problem is externalization in a shared pool it's always tempting to push cost outward this data set supports everyone this pipeline is enterprise this warehouse is shared this domain is foundational sometimes that's true

497
00:57:57,800 --> 00:58:14,800
but without a consumption contract shared becomes a dumping ground for unowned workloads the same way shared mailbox becomes the place where accountability goes to rot shared workspace becomes the place where compute goes to hide and because charge back tends to focus on who consumed not why it exists

498
00:58:14,800 --> 00:58:24,800
you end up punishing the consumers while the producers keep shipping new artifacts that's backward the producers control architecture consumers mostly inherited

499
00:58:24,800 --> 00:58:39,800
now to the part leaders think will fix it show back versus charge back show back says we're not billing you were just informing you charge back says we're billing you neither matters if the organization can't take action the only model that works is one where ownership is coupled to life cycle

500
00:58:39,800 --> 00:59:08,800
because spend is rarely the problem unretired assets are duplicated models are unbounded self service is refresh storms are background operations that nobody revisits are so a charge back model that doesn't force life cycle decisions is just accounting theater what charge back should have done is simple force explicit tradeoffs if team X wants to keep 10 semantic models fine but they should have to fund 10 semantic models if they want to mirror data and then copy it into three lake houses fine but they should have to justify the cost

501
00:59:08,800 --> 00:59:31,800
and accept the support burden and accept an end date that requires ownership not allocation so the harsh truth is when you implement charge back in an estate without enforceable boundaries you don't create accountability you create survival strategies and those strategies increase entropy more work spaces more fragmentation more hidden coupling more exceptions more temporary artifacts that never die

502
00:59:31,800 --> 00:59:59,800
the number gets cleaner the architecture gets worse so if you're determined to do charge back the rule is brutal and non-negotiable never allocate cost to a container allocate cost to an owner with the power to delete that means a workload owner a data product owner domain owner with enforcement authority not a random workspace admin and if you can't identify that owner then the workload is ungoverned by definition at that point the only honest thing you can do is quarantine it because reporting spend on unowned assets doesn't reduce spend

503
00:59:59,800 --> 01:00:08,800
it just documents your inability to control your own platform which brings us to the actual fix the system needs guardrails that prevent entropy not dashboards that describe it

504
01:00:08,800 --> 01:00:20,800
the control plane you actually need policy as an authorization compiler so here's the part that actually fixes it not another committee not another slide deck not another center of excellence with a mailbox and a share point side full of PDFs nobody reads

505
01:00:20,800 --> 01:00:44,800
you need a control plane and in fabric terms a control plane is not a dashboard that tells you what already happened it's a set of constraints that decides what is allowed to exist in the first place what defaults apply when it's created and what happens when it drifts out of compliance governance is not documentation it is denial that sounds harsh it's also how every stable system works the system must be able to say no to bad structure at scale without human intervention

506
01:00:44,800 --> 01:01:13,800
this is the uncomfortable truth most fabric governance programs treat policy like advice they publish standards they publish recommended patterns and they publish please don't and then they act surprised when the platform becomes a landfill fabric doesn't run on your intentions it runs on permissions defaults an automation so treat policy like an authorization compiler think of it the same way a compiler treats code you don't suggest syntax you enforce it if the code doesn't compile it doesn't run full stop

507
01:01:13,800 --> 01:01:42,800
fabric estate needs the same posture if an artifact doesn't conform to your rules it doesn't get created or it gets quarantined or it gets downgraded to a non-production tier something that changes behavior otherwise your policy is just pros now what does authorization compiler mean in practical terms it means you define rules in four categories and you enforce them at creation and continuously first creation constraints who can create workspaces who can create lake houses who can create warehouses who can create semantic models who can create event streams notebooks pipelines

508
01:01:42,800 --> 01:02:06,800
and not just who has permission but who can create them in which zones because everyone can create everything everywhere is not self service it's uncontrolled manufacturing second defaults that bite every created artifact must inherit required settings naming sensitivity labeling baselines ownership metadata and the right workspace capacity placement

509
01:02:06,800 --> 01:02:35,800
you don't ask people to remember the rules you make the rules the default the platform already does this in small ways you just haven't weaponized it third authorization boundaries that are real not domains as labels boundaries that map to enforcement decisions production zones with lockdown creation paths dev zones where experimentation is allowed but isolated and shared zones where reuse is mandatory and duplication is expensive if you cannot enforce a boundary it is not a boundary it's an opinion

510
01:02:35,800 --> 01:03:04,800
fourth consequences this is where governance stops being theater if someone violates the constraints something happens revoke permissions disable refresh move the artifact to quarantine block sharing or force remediation before it can be promoted email is not a consequence a ticket is not a consequence a consequence is when the system refuses to continue until the structure is corrected now minimum viable enforcement because people love to pretend this requires a two year programming it doesn't start with the things that generate entropy fastest naming is not aesthetic

511
01:03:04,800 --> 01:03:22,800
naming is not aesthetics naming is index ability and ownership detection ownership is not a field ownership is who gets paged and who gets charged sensitivity is not a check box its downstream access behavior and data loss pathways lineage is not a nice to have it's the only way to answer where did this come from without a meeting

512
01:03:22,800 --> 01:03:41,800
life cycle is not clean up later life cycle is an end date a retirement action and a default that everything expires unless renewed if you enforce only these five naming ownership sensitivity lineage life cycle you will remove more entropy than a hundred dashboards ever will because you will change what can be created and what is allowed to persist

513
01:03:41,800 --> 01:03:49,800
and here's the rule that governs all of this the one your estate currently violates if it can be created without friction it will be created without thought

514
01:03:49,800 --> 01:04:01,800
fabric is frictionless by design so you must add friction by design not by making users miserable but by forcing intent to exist forcing someone to declare purpose owner and life cycle before the platform allows the artifact to become real

515
01:04:01,800 --> 01:04:17,800
that's what a control plane does it turns self service into bounded self service freedom inside constraints and once you have that you can finally do what every fabric estate claims it will do but never does you can enforce intent through life cycle not just that creation that's next life cycle governance birth promotion

516
01:04:17,800 --> 01:04:30,800
retirement life cycle governance is the part everyone promises and nobody funds because it isn't glamorous it doesn't ship new dashboards it doesn't make the quarterly update deck it's just the mechanics of keeping a distributed platform from becoming a symmetry

517
01:04:30,800 --> 01:04:41,800
and fabric by design becomes a symmetry unless you treat life cycle as a first class control plane feature here's the core reframe every artifact is a liability until it proves it is a product

518
01:04:41,800 --> 01:04:53,800
lake house warehouse even house pipeline notebook semantic model report dashboard shortcut mirror they all start as experiments the system doesn't know when an experiment became business critical

519
01:04:53,800 --> 01:05:22,800
it just sees a growing graph of dependencies so life cycle is how you force intent to exist over time not just at creation start with birth birth is not someone clicked new birth is when the organization agrees an artifact is allowed to exist in the estate with a declared purpose a declared owner and a declared expiration owner means a real team not the person who created it people leave teams persist purpose means the business decision it serves not analytics and expiration means a date not a wish

520
01:05:22,800 --> 01:05:36,800
not will clean it up later a date because in distributed platforms later is the most expensive date on the calendar so the first life cycle gate is simple no owner no artifact no purpose no artifact no end date no artifact that is not harsh that is adulthood

521
01:05:36,800 --> 01:05:53,800
no promotion promotion is where fabric estates fail hardest because they confuse etiquette with enforcement they tell teams dev test pro should be separate and then they implemented as naming conventions all dev and by prod sprinkled into workspace names like it means something it doesn't

522
01:05:53,800 --> 01:06:04,800
a promotion gate is a boundary that the system enforces the artifact cannot become production without passing checks and those checks are boring on purpose they reduce the probability of surprise

523
01:06:04,800 --> 01:06:17,800
at minimum promotion should require a named owner and support model lineage present and reviewable sensitivity labels correct access paths reviewed who can read who can share who can export

524
01:06:17,800 --> 01:06:46,800
cost behavior understood refresh schedule expected concurrency expected data movement and the crucial one a rollback plan if you can't roll it back it's not production it's a public experiment in fabric terms promotion also means moving from works in my workspace to works in the estate that means you don't just validate the report you validate the runtime behavior background operations refresh overlap notebooks scheduling capacity contention and whether it's degrades other workloads promotion is an architectural contract not a deployment step

525
01:06:46,800 --> 01:07:11,800
now retirement retirement is the only scalable phina up strategy you actually control right sizing helps scheduling helps reservations help those are financial levers retirement is an architectural lever because dead assets still generate cost in fabric even when nobody is looking at them refresh schedules background optimization retention mirrored things often semantic models that keep running because someone might be using it

526
01:07:11,800 --> 01:07:39,800
and the most reliable statement in any platform program is someone might be using it which is why you need a deterministic retirement process that is designed to survive fear retirement isn't deletion first it's deprecation first you mark an artifact is deprecated you surface it in the catalog you want consumers you remove it from promoted status you disabled scheduled refresh you watch for usage you create a replacement path if one exists then you delete and you delete with confidence because the system recorded ownership purpose and consumers

527
01:07:39,800 --> 01:08:05,800
that's the point life cycle isn't clean up life cycle is observability plus authority now connect this to data product thinking because it's the only mental model that makes life cycle enforceable a data product has consumers and owner and SLA or SLO definition and a support boundary it has a road map it has a retirement plan it has versioning a random data set does not so you stop pretending every artifact is a product most artifacts should never be promoted most artifacts should expire and that's fine

528
01:08:05,800 --> 01:08:20,800
the failure is when expired artifacts remain running consuming capacity confusing users and dragging your estate into permanent ambiguity fabric doesn't drift because you are careless it drifts because the platform is unattended and creation is cheap so treat life cycle governance as non-negotiable

529
01:08:20,800 --> 01:08:30,800
birth with intent promotion within forced gates and retirement as a default outcome then the estate starts behaving like an engineered system without it it behaves like a landfill with a gooey

530
01:08:30,800 --> 01:08:40,800
Finops for fabric from cost reporting to behavioral control finops for fabric isn't a dashboard it's not a monthly email with a CSV it's not a cost review meeting where everyone nods and nothing changes

531
01:08:40,800 --> 01:08:52,800
Finops is behavioral control if spend stays flat but architecture keeps rotting you didn't do finops you did accounting this is the uncomfortable truth fabric cost is a symptom of permissions and defaults

532
01:08:52,800 --> 01:09:04,800
it is the byproduct of what the platform allows people to create how long it allows it to live and how many times it allows the same data to be rematerialized if you don't change those behaviors you will only ever get better at watching the bill arrive so the real

533
01:09:04,800 --> 01:09:21,800
Finops move is to stop asking how much did we spend and start asking what behavior did we fund because cost in fabric is not a meter on a single workload it is contention in a shared pool it is background work you didn't schedule it is refresh patterns you didn't standardize it is duplication you didn't prevent it is semantic models you didn't retire

534
01:09:21,800 --> 01:09:36,800
in other words it is governance expressed as money that distinction matters now the first trap is optimizing CU charts capacity utilization feels objective CU seconds feel precise the capacity metrics app gives you curves spikes and heat maps that look like truth

535
01:09:36,800 --> 01:09:45,800
But utilization is not accountability utilization doesn't tell you whether the work created value whether it created trust or whether it created another artifact that will haunt you for two years

536
01:09:45,800 --> 01:09:52,800
a capacity can run at 10% and still be pure waste a capacity can run at 80% and still be the best money you spend

537
01:09:52,800 --> 01:10:04,800
Finops is not get utilization up that's a data center mindset fabric is not a data center it's a shared decision engine building you for entropy so instead of worship in utilization you need unit economics tied to decisions

538
01:10:04,800 --> 01:10:18,800
cost per decision cost per data set cost per refresh cost per domain cost per report render those metrics force the right conversations because they collapse the fantasy that self service is free

539
01:10:18,800 --> 01:10:27,800
every refresh is a decision to burn compute every duplicated data set is a decision to multiply downstream cost every extra semantic model is a decision to multiply meaning drift

540
01:10:27,800 --> 01:10:42,800
and those decisions should have owners so the second move in fabric phenops is to connect cost to a control loop reporting without consequences is theatre thresholds without automation are emails bam if you want the estate to behave the platform must react budget thresholds tied to action

541
01:10:42,800 --> 01:10:55,800
Not notify the team action scale down deaf capacities outside business hours automatically because dev environments are not sacred pause capacities when nothing is scheduled to run because someone might be using it is not a business case

542
01:10:55,800 --> 01:11:06,800
quarantine workspaces that exceed policy too many semantic models no owners no lifecycle metadata excessive refresh frequency high background utilization with low consumption value

543
01:11:06,800 --> 01:11:21,800
the point is to be punitive the point is to make cost a feedback signal that changes bill behavior because in platforms behavior follows incentives right now the incentive in most fabric tenants is create anything any time and the bill becomes central it is problem

544
01:11:21,800 --> 01:11:31,800
that incentive produces the exact estate you deserve so align finance and engineering with shared definitions and shared consequences finance doesn't care about CU seconds engineering doesn't care about cost center allocations

545
01:11:31,800 --> 01:11:50,800
so you need translation a small set of metrics that both sides can defend and a workflow that turns those metrics into decisions decision one consolidate or keep duplicating decision to promote to prod or keep us dev decision three refresh frequency or stale tolerance decision four retire or fund

546
01:11:50,800 --> 01:12:05,800
finance that doesn't force these decisions is just reporting and this is where governance and Finops merge lifecycle enforcement is Finops dot creation constraints are Finops canonical semantic models are Finops because the cheapest workload in fabric is the one you never allow to exist

547
01:12:05,800 --> 01:12:19,800
so the practical model is simple define cost policies the same way you define security policies you don't recommend encryption you enforce it do the same with spend behavior set a budget per domain set an allowed artifact count per workspace tier

548
01:12:19,800 --> 01:12:40,800
set a maximum refresh cadence for non-prote set a maximum number of semantic models per data set boundary then automate enforcement and yes this means some people will be blocked good block creation is how you stop entropy at scale because your estate is not failing due to a lack of dashboards it is failing due to a lack of constraints next the platform level implication is obvious automation isn't optional

549
01:12:40,800 --> 01:13:09,800
it's the only enforcement mechanism that scales faster than your users can create new artifacts automation is enforcement using the platform against entropy automation is where governance stops being a mood and becomes a mechanism people love to describe fabric governance as process process is human humans get tired humans change roles humans ignore emails humans approve exceptions because someone senior asked nicely automation doesn't in a platform that can mint new artifacts in seconds the only control surface that scale

550
01:13:09,800 --> 01:13:38,800
the one that operates at platform speed that means rules express does code enforced continuously with consequences that don't require a meeting this is the uncomfortable truth in fabric every manual control becomes optional overtime every optional control becomes drift and drift becomes cost so the goal of automation isn't convenience its entropy management start with scheduled capacity scaling yes it's useful no it's not a strategy if you scale down every night in scale up every morning but your estate has uncontrolled refresh storms and background operations that run in the city

551
01:13:38,800 --> 01:14:04,800
the real operations that run whenever they feel like it the schedule will become a recurring incident schedules only work when the workload truth matches the schedule truth so the real automation pattern is measure behavior then change behavior not pick a time hope nothing breaks that means you automate around the signals that actually predict pain sustain throttling sustain smoothing dead abnormal background utilization and refresh overlap that correlates with user complaints

552
01:14:04,800 --> 01:14:24,800
signals you treat them like circuit breakers if the estate hits a defined threshold the system reacts notifies yes but also reacts now alerts most orgs already have alerts what they have is email email is how you outsource consequences to people who will ignore them and alert only matters when it triggers action scale pause block quarantine or revoke

553
01:14:24,800 --> 01:14:43,800
it's telemetry for your guilt so the practical model is simple tie alerts to run books and tie run books to APIs fabric has management endpoints as your has automation primitives logic apps exists functions exist scheduled jobs exist pick your poison the point is the platform must be able to change its own posture when thresholds are met

554
01:14:43,800 --> 01:15:12,800
the part everyone dodges policy driven provisioning if you let people create workspaces freely they will create workspaces freely if you let people create lake houses and semantic models with no defaults they will create them without defaults then you will spend your life retrofitting governance onto artifacts that never should have existed so you inverted you create golden parts and you automate creation through them a new workspace request shouldn't create a workspace it should compile a standard environment correct naming correct

555
01:15:12,800 --> 01:15:31,800
capacity placement correct default permissions correct label baselines required ownership metadata and a life cycle timer that starts on day one and if the request doesn't supply the metadata required to compile the workspace the request fails that is governance that's what policy as an authorization compiler looks like in code

556
01:15:31,800 --> 01:15:50,800
next continuous audit with consequences point in time audits are theater because fabric is not static it is a living system so you audit continuously owner still present refresh schedule still justify data set still referenced model still certified correctly sharing still within policy sensitivity labels still aligned

557
01:15:50,800 --> 01:16:14,800
and the artifact count still within tier limits then you act if an artifact drifts downgraded remove promotion disabled refresh quarantine it to a non-prod capacity revoke sharing rights require remediation before reactivation this is the part governance committees hate because it removes negotiation good negotiation is how entropy wins and yes you will get pushback every automated control will produce one of two responses

558
01:16:14,800 --> 01:16:32,800
this is blocking my work or can we get an exception exceptions are entropy generators so you treat exceptions like production deployments time bound reviewed and revoked automatically unless renewed with justification an exception should expire by default because otherwise it becomes the new baseline that's how policy erodes

559
01:16:32,800 --> 01:17:01,800
the simplest automation lever that most people ignore retirement automation if an artifact hasn't been used in n days it gets marked deprecated automatically if it stays unused refresh gets disabled if it stays unused longer it gets deleted if someone wants to keep it they must reassert ownership and purpose this is how you stop paying for dead things and the system law underneath all of this is boring and absolute unattended platforms drift toward waste so if you want fabric to behave like an engineered system you stop managing it and you start enforcing it automation isn't optional

560
01:17:01,800 --> 01:17:22,800
it's the only thing faster than sprawl executive reality check what good looks like in a fabric estate executives don't care about your lake house strategy they care about whether the numbers stay the same long enough to run the business so here's what good actually looks like in a fabric estate when the governance illusion dies and the system finally behaves first fewer artifacts

561
01:17:22,800 --> 01:17:41,800
not because the platform can't scale but because your organization can't reason about unlimited objects good estates don't celebrate how many reports shipped to they celebrate how many reports didn't need to exist because the canonical product already solved the question a healthier state has reused pressure a rotting estate has creation gravity second ownership is obvious and it's enforceable

562
01:17:41,800 --> 01:17:59,800
not this was built by Alex in 2023 that's trivia real ownership is a team name a support boundary and a decision right the owner can change refresh cadence the owner can retire assets the owner gets paged the owner has budget authority if you can't point to an owner in under 10 seconds you don't have an asset you have an often

563
01:17:59,800 --> 01:18:28,800
third self service exists but it's bounded the estate has safe zones for experimentation and hard walls for production in dev people can move fast and break things in prod people can move carefully and prove things the wall between those two isn't etiquette it's enforced that means promotion gates enforce defaults and permissions that don't drift because someone got added just for today in good estates just add me is treated like opening a firewall rule it's a request with an owner and an expiry forth meaning is stable

564
01:18:28,800 --> 01:18:50,800
executives can look at two dashboards and see the same number because the semantic layer is treated like the product it is there is a canonical model per domain and the organization invests in it the way it invests in any other shared service design testing versioning and retirement certified isn't a sticker it's the default path and anything that wants to compete with the canonical model must justify why with a time limit

565
01:18:50,800 --> 01:19:07,800
fifth lineage works in the direction executives ask questions when someone asks why did revenue drop last week the estate can answer without assembling a committee lineage is visible and it maps to ownership and life cycle this eliminates the most expensive activity in analytics meetings to reconstruct causality

566
01:19:07,800 --> 01:19:25,800
sixth cost behavior is predictable in business terms not see you seconds not utilization curves business terms cost per decision cost per refresh cost per data set cost per domain and those numbers drive action workloads that don't justify their cost get retired workloads that justify their cost get protected with isolation and capacity planning

567
01:19:25,800 --> 01:19:35,800
the estate can scale up or down because it knows what runs when it runs and why it runs in other words scheduling works because the platform is no longer haunted by unknown background activity

568
01:19:35,800 --> 01:19:52,800
seventh the platform team stops being the adult in every room in a good estate the platform team provides guard rails and automation domain owners run their domains product owners own their products finance and forces budgets with engineering security and forces access with policy the platform team doesn't negotiate exceptions all day

569
01:19:52,800 --> 01:20:04,800
because the system denies non-conforming work by default that's what maturity looks like fewer heroics more constraints now the executive reality check none of this happens because you rolled out fabric

570
01:20:04,800 --> 01:20:18,800
because you stop treating fabric like a tool and started treating it like a distributed system that needs enforce assumptions executives keep asking are we doing well here's the only honest answer you're doing well when the estate gets smaller clearer and more predictable

571
01:20:18,800 --> 01:20:30,800
while value delivered stays flat or rises if your artifact count is exploding your meanings are diverging your capacity keeps scaling and your governance program is updating documentation you are not improving you are decomposing

572
01:20:30,800 --> 01:20:58,800
and the fix isn't more dashboards it's more denial conclusion the only rule that works fabric doesn't rot because people are careless it rots because the platform allows creation without intent and intent never enforces itself if you want to stop the decay enforce one rule if an artifact can't declare owner purpose and end date it doesn't exist if you want the next step listen the next episode on fabric capacity boundaries what to isolate first and what to block immediately