June 20, 2026

Private RAG Isn't Enough: The Missing Layer Between Data Sovereignty and Data Security

Show Notes
Transcript

Everyone is talking about Private RAG.Organizations invest heavily in self-hosted vector databases, sovereign cloud environments, private infrastructure, and regional data residency controls. They focus on where data lives, how it moves, and whether it remains inside specific geographic boundaries.But there is a critical question that almost nobody asks.What happens to permissions when documents leave their original system?In this episode of the M365 FM Podcast, we dive deep into one of the most overlooked security challenges in enterprise AI: the gap between data sovereignty and data security. We explore why Private RAG alone does not solve the authorization problem and how organizations are unknowingly creating massive insider data exposure risks when permissions disappear during the indexing process.

WHY DATA SOVEREIGNTY IS NOT DATA SECURITY

Many organizations assume that storing data inside a specific country or private environment automatically makes it secure.The reality is very different.A document stored in a German data center can still become accessible to unauthorized users if its permission model is lost during ingestion into a retrieval system.Key topics include:

Data sovereignty versus data security
Private RAG misconceptions
Regional hosting limitations
Compliance versus authorization
The sovereignty illusion

The discussion highlights why location alone does not determine security and why access control remains the most important security boundary.

THE MOMENT SHAREPOINT PERMISSIONS DISAPPEAR

Most organizations spend years building sophisticated permission structures across SharePoint, Microsoft 365, and enterprise content platforms.Those permissions define:

Who can access documents
Which teams can view content
Executive-only information
Legal and HR restrictions
External sharing boundaries

The episode explores what happens when documents are extracted, chunked, embedded, and stored inside vector databases without carrying their original authorization context.The result is often a highly searchable knowledge platform that accidentally exposes information to users who should never have access to it.

THE THREE BIGGEST PRIVATE RAG MYTHS

Many AI projects begin with assumptions that sound reasonable but create dangerous security gaps.This episode breaks down three of the most common misconceptions:

Self-hosted automatically means secure
VPN access equals authorization
The LLM will enforce security policies

Listeners learn why none of these assumptions adequately protect enterprise data and why authorization must be enforced outside the model itself.

ACL METADATA EXTRACTION: THE MISSING SECURITY LAYER

One of the most important concepts discussed in this episode is ACL metadata extraction.Rather than simply extracting document content, organizations must also preserve the authorization model that determines who can access each document.Topics include:

Access Control Lists (ACLs)
Permission inheritance
Microsoft Graph integration
Azure AI Search indexing
Entra ID security identifiers
Authorization metadata design

This missing layer transforms RAG from a potential insider threat into a secure enterprise knowledge system.

AUTHORIZATION BEFORE RETRIEVAL

A critical architectural principle explored in this episode is simple:Never retrieve first and filter later.Authorization must occur before retrieval.The discussion covers:

Security trimming
Pre-filtering versus post-filtering
Query-time authorization
Permission-aware vector search
Tenant-aware filtering
Role-based access control

This approach ensures unauthorized content never reaches the retrieval pipeline or influences model outputs.

WHY SINGLE AGENTS CREATE SECURITY RISKS

Many organizations are deploying single-agent AI architectures because they are faster to build and easier to understand.However, the episode explains how single-agent systems often become "confused deputies" that operate with excessive privileges and insufficient oversight.Topics include:

Prompt injection risks
Insider threat exposure
Retrieval abuse
Authorization failures
Governance challenges
Agent accountability

The conversation highlights why security architecture must evolve alongside AI architecture.

THE FIVE-AGENT SECURITY MODEL

To address these challenges, the episode introduces a multi-agent retrieval architecture designed around separation of responsibilities.Listeners learn about:

Routing agents
Query translation agents
Authorized retrieval agents
Validation agents
Response generation agents

Each component performs a specialized function while minimizing the blast radius of potential failures.

ZERO TRUST FOR AI SYSTEMS

The principles of Zero Trust are rapidly becoming essential for modern AI deployments.This episode explores how organizations can apply Zero Trust concepts to agentic AI systems by continuously verifying identity, authorization, and trust at every stage of the workflow.Topics include:

Entra ID integration
OAuth token exchange
Workload identities
Delegated permissions
Mutual TLS
Identity propagation across agents

The result is a system that assumes no implicit trust and verifies every action.

MULTI-TENANT AI AND CROSS-CUSTOMER DATA EXPOSURE

One of the most dangerous failure modes in enterprise AI is cross-tenant data leakage.The episode examines real-world architectural mistakes that allow data from one customer, department, or business unit to become visible to another.Discussion areas include:

Tenant isolation
Semantic cache risks
Cross-tenant retrieval
Shared vector databases
Encryption boundaries
Compliance requirements

These risks become especially significant in healthcare, finance, and government environments.

THE FUTURE OF GOVERNED AI

As AI adoption accelerates, governance becomes a competitive advantage rather than a compliance burden.Organizations that preserve permissions, implement authorization-aware retrieval, and embrace Zero Trust principles will be positioned to scale AI safely across regulated environments.The discussion explores the future of:

Agentic AI governance
Permission-aware retrieval
AI security architecture
Regulatory compliance
Enterprise AI adoption
Sovereign AI strategies

FINAL THOUGHTS

Private RAG solves only part of the problem.The real challenge begins when organizations move documents from systems that understand permissions into systems that do not.Without authorization-aware retrieval, preserved access controls, and Zero Trust architecture, even the most sophisticated Private RAG deployment can become a large-scale insider data exposure platform.The future of enterprise AI is not simply about where data lives.It is about ensuring the right people can access the right information at the right time—and nobody else.

Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-fm-modern-work-security-and-productivity-with-microsoft-365--6704921/support.

🚀 Want to be part of m365.fm?

Then stop just listening… and start showing up.

👉 Connect with me on LinkedIn and let’s make something happen:

🎙️ Be a podcast guest and share your story
🎧 Host your own episode (yes, seriously)
💡 Pitch topics the community actually wants to hear
🌍 Build your personal brand in the Microsoft 365 space

This isn’t just a podcast — it’s a platform for people who take action.

🔥 Most people wait. The best ones don’t.

👉 Connect with me on LinkedIn and send me a message:
"I want in"

Let’s build something awesome 👊

1
00:00:00,000 --> 00:00:02,960
There is a specific comment you hear at every enterprise AI

2
00:00:02,960 --> 00:00:05,640
conference lately, and it usually starts as a question

3
00:00:05,640 --> 00:00:07,400
whispered in a security meeting.

4
00:00:07,400 --> 00:00:09,440
An architect leans over to the person next to them

5
00:00:09,440 --> 00:00:12,200
and asks if they are actually carrying each file's permissions

6
00:00:12,200 --> 00:00:15,000
into the index to filter per user at query time.

7
00:00:15,000 --> 00:00:17,160
Most people in the room go silent when they hear that.

8
00:00:17,160 --> 00:00:19,280
They stay quiet because the honest answer is no,

9
00:00:19,280 --> 00:00:20,640
and they know they aren't doing it.

10
00:00:20,640 --> 00:00:22,320
That silence is the entire problem.

11
00:00:22,320 --> 00:00:25,400
This episode is about the massive gap between where your data

12
00:00:25,400 --> 00:00:27,920
sits and who is actually allowed to see it.

13
00:00:27,920 --> 00:00:30,960
We are talking about why the term private rag does not mean

14
00:00:30,960 --> 00:00:33,120
what you think it means and how this missing layer

15
00:00:33,120 --> 00:00:35,600
transforms a knowledge system into the biggest insider data

16
00:00:35,600 --> 00:00:37,600
leak your company will ever build.

17
00:00:37,600 --> 00:00:40,240
We are going to explore why permission models matter more

18
00:00:40,240 --> 00:00:42,480
than your choice of vector database or your infrastructure

19
00:00:42,480 --> 00:00:43,160
decisions.

20
00:00:43,160 --> 00:00:46,120
To do that, we have to answer one deceptively simple question.

21
00:00:46,120 --> 00:00:47,960
If you extract documents from SharePoint

22
00:00:47,960 --> 00:00:49,760
to build a perfect retrieval system,

23
00:00:49,760 --> 00:00:52,160
but lose the permission model that SharePoint was enforcing,

24
00:00:52,160 --> 00:00:53,880
what have you actually built?

25
00:00:53,880 --> 00:00:55,960
Why everyone talks about location, not access?

26
00:00:55,960 --> 00:00:58,520
The entire narrative around private rag has a structural floor

27
00:00:58,520 --> 00:01:00,880
that has been hiding in plain sight for a long time.

28
00:01:00,880 --> 00:01:03,240
If you go to any vendor conference and listen to how

29
00:01:03,240 --> 00:01:05,680
Rags Security gets discussed, you will hear a lot

30
00:01:05,680 --> 00:01:07,040
about infrastructure choices.

31
00:01:07,040 --> 00:01:09,160
They talk about self-hosting versus the cloud,

32
00:01:09,160 --> 00:01:11,840
or they brag about German data centers and EU data boundary

33
00:01:11,840 --> 00:01:12,840
certifications.

34
00:01:12,840 --> 00:01:15,160
The conversation centers on one specific obsession,

35
00:01:15,160 --> 00:01:17,120
which is where the data physically sits.

36
00:01:17,120 --> 00:01:19,600
Companies invest heavily in this single question

37
00:01:19,600 --> 00:01:22,880
by building VPNs and deploying on-premise infrastructure.

38
00:01:22,880 --> 00:01:25,760
They sign complex compliance agreements for data residency

39
00:01:25,760 --> 00:01:27,240
because the assumption is simple.

40
00:01:27,240 --> 00:01:29,840
If the data stays in Germany and never leaves the firewall,

41
00:01:29,840 --> 00:01:31,360
then it must be secure.

42
00:01:31,360 --> 00:01:32,720
This is the sovereignty illusion.

43
00:01:32,720 --> 00:01:34,600
The moment you start indexing documents for Rags,

44
00:01:34,600 --> 00:01:36,440
that illusion collapses completely.

45
00:01:36,440 --> 00:01:39,000
Here is what actually happens in a real organization.

46
00:01:39,000 --> 00:01:41,880
You have a SharePoint instance with millions of documents,

47
00:01:41,880 --> 00:01:43,880
and every single one is protected by a permission

48
00:01:43,880 --> 00:01:45,960
model that took your team years to build.

49
00:01:45,960 --> 00:01:48,280
You have site-level access controls and library-level

50
00:01:48,280 --> 00:01:50,360
permissions, plus folder hierarchies

51
00:01:50,360 --> 00:01:53,160
with inherited access rules and item-level ACLs.

52
00:01:53,160 --> 00:01:55,480
All of these layers from groups to external sharing work

53
00:01:55,480 --> 00:01:57,320
together to ensure that only the right people

54
00:01:57,320 --> 00:01:58,760
see the right documents.

55
00:01:58,760 --> 00:02:00,320
Then you decide to build a Rags system

56
00:02:00,320 --> 00:02:02,760
because you want to make all that knowledge searchable.

57
00:02:02,760 --> 00:02:05,800
You want employees to ask questions and get answers grounded

58
00:02:05,800 --> 00:02:08,640
in your internal documents so you start the extraction process.

59
00:02:08,640 --> 00:02:11,040
Document by document, you pull content out of SharePoint

60
00:02:11,040 --> 00:02:12,600
to chunk it and embed it.

61
00:02:12,600 --> 00:02:15,240
You store it in a vector database that might be self-hosted

62
00:02:15,240 --> 00:02:16,720
or sitting on a VPS in Berlin

63
00:02:16,720 --> 00:02:18,480
with premium compliance certifications.

64
00:02:18,480 --> 00:02:20,560
But the moment that document leaves SharePoint,

65
00:02:20,560 --> 00:02:22,040
the permission model dies.

66
00:02:22,040 --> 00:02:24,240
You capture the words and the metadata,

67
00:02:24,240 --> 00:02:25,720
and you even captured the context,

68
00:02:25,720 --> 00:02:27,440
but you did not capture the guardians.

69
00:02:27,440 --> 00:02:29,120
You left behind the access control lists

70
00:02:29,120 --> 00:02:31,120
that actually decided who was allowed to open those files

71
00:02:31,120 --> 00:02:31,880
in the first place.

72
00:02:31,880 --> 00:02:33,160
This is not just a hypothesis.

73
00:02:33,160 --> 00:02:35,720
Gardner is very direct about what comes next,

74
00:02:35,720 --> 00:02:38,400
and their prediction for 2027 is stark.

75
00:02:38,400 --> 00:02:41,520
They expect 40% of AI data breaches to come from cross-border

76
00:02:41,520 --> 00:02:42,920
Gen-AI misuse.

77
00:02:42,920 --> 00:02:45,560
But if you read carefully, the real issue isn't geography.

78
00:02:45,560 --> 00:02:47,520
It is not about the data moving across borders,

79
00:02:47,520 --> 00:02:50,520
but rather about the data moving without its access controls.

80
00:02:50,520 --> 00:02:52,160
The false equation everyone has been using

81
00:02:52,160 --> 00:02:54,440
is that data in Germany equals security.

82
00:02:54,440 --> 00:02:55,640
But that is not security.

83
00:02:55,640 --> 00:02:57,280
That is sovereignty.

84
00:02:57,280 --> 00:02:59,480
Real security requires a different question entirely,

85
00:02:59,480 --> 00:03:02,160
and you have to ask who can see what and under what conditions.

86
00:03:02,160 --> 00:03:04,960
You have to know what happens when permissions change

87
00:03:04,960 --> 00:03:06,520
or what prevents a junior employee

88
00:03:06,520 --> 00:03:08,920
from accessing executive strategy documents.

89
00:03:08,920 --> 00:03:10,800
Most rag projects never ask these questions

90
00:03:10,800 --> 00:03:12,320
because they are too busy optimizing

91
00:03:12,320 --> 00:03:13,840
for retrieval speed and accuracy.

92
00:03:13,840 --> 00:03:15,360
They spend their time tuning embeddings

93
00:03:15,360 --> 00:03:18,160
and testing the ranking to make sure the system is fast and private.

94
00:03:18,160 --> 00:03:20,680
The result is a system where your most sensitive information

95
00:03:20,680 --> 00:03:23,640
is now searchable by anyone with a log into the rag interface.

96
00:03:23,640 --> 00:03:25,840
A single compromised account or a well-crafted prompt

97
00:03:25,840 --> 00:03:27,880
can surface documents that should have stayed locked away

98
00:03:27,880 --> 00:03:28,480
from view.

99
00:03:28,480 --> 00:03:30,360
You have built something that looks like a knowledge system,

100
00:03:30,360 --> 00:03:33,120
but in reality, it functions like an insider data leak

101
00:03:33,120 --> 00:03:34,320
waiting to happen.

102
00:03:34,320 --> 00:03:36,480
The organization invested months building that permission

103
00:03:36,480 --> 00:03:38,520
model in SharePoint for a reason,

104
00:03:38,520 --> 00:03:41,160
and it was about governance rather than bureaucracy.

105
00:03:41,160 --> 00:03:44,200
It was a way of saying that a financial report is for the board,

106
00:03:44,200 --> 00:03:46,600
and a legal document is only for the general counsel,

107
00:03:46,600 --> 00:03:47,920
or rag strips all of that away.

108
00:03:47,920 --> 00:03:49,000
So what do you do about it?

109
00:03:49,000 --> 00:03:50,240
That is where we go next.

110
00:03:51,080 --> 00:03:53,400
The three misconceptions about private.

111
00:03:53,400 --> 00:03:54,800
Before we solve the problem,

112
00:03:54,800 --> 00:03:57,080
we need to dismantle the thinking that created it.

113
00:03:57,080 --> 00:03:59,800
There are three beliefs that teams have about private rag

114
00:03:59,800 --> 00:04:01,600
that sound reasonable on the surface,

115
00:04:01,600 --> 00:04:02,800
but they are just wrong,

116
00:04:02,800 --> 00:04:04,880
and each one is dangerous in a different way.

117
00:04:04,880 --> 00:04:07,480
The first misconception is about infrastructure.

118
00:04:07,480 --> 00:04:09,360
Self-hosted equals secure.

119
00:04:09,360 --> 00:04:11,680
It feels intuitive to run your own vector database

120
00:04:11,680 --> 00:04:13,880
on your own servers where nobody else can reach it,

121
00:04:13,880 --> 00:04:16,480
and because your data never touches a third party cloud,

122
00:04:16,480 --> 00:04:18,680
this makes sense from a sovereignty standpoint.

123
00:04:18,680 --> 00:04:21,320
However, from a security standpoint, it is incomplete.

124
00:04:21,320 --> 00:04:24,640
A self-hosted vector database with no role-based access control

125
00:04:24,640 --> 00:04:27,560
is fundamentally less secure than a managed cloud database

126
00:04:27,560 --> 00:04:29,440
that enforces per user filtering.

127
00:04:29,440 --> 00:04:32,280
When you optimize for location, you solve where the data sits,

128
00:04:32,280 --> 00:04:34,480
but you haven't solved who can actually access it.

129
00:04:34,480 --> 00:04:36,520
A contractor with a laptop on your VPN

130
00:04:36,520 --> 00:04:39,000
can query the same vector index as your CFO,

131
00:04:39,000 --> 00:04:42,040
and because the database has no way to distinguish between them,

132
00:04:42,040 --> 00:04:44,000
both queries return the same documents.

133
00:04:44,000 --> 00:04:46,960
The only difference is which physical network they are coming from.

134
00:04:46,960 --> 00:04:49,040
This is why a German hosted database

135
00:04:49,040 --> 00:04:52,040
with all of SharePoint indexed but zero permission metadata

136
00:04:52,040 --> 00:04:53,680
is not a security achievement.

137
00:04:53,680 --> 00:04:56,760
It is a compliance nightmare waiting to be discovered.

138
00:04:56,760 --> 00:04:58,600
The second misconception is about access.

139
00:04:58,600 --> 00:05:00,960
VPN access equals authorization.

140
00:05:00,960 --> 00:05:03,360
This one creeps in quietly because the thinking assumes

141
00:05:03,360 --> 00:05:05,800
that since only employees are on the VPN,

142
00:05:05,800 --> 00:05:08,080
only employees can reach the RAC system.

143
00:05:08,080 --> 00:05:10,920
The logic follows that only employees will see employee data

144
00:05:10,920 --> 00:05:12,200
and the problem is solved,

145
00:05:12,200 --> 00:05:14,560
but here is the problem that isn't authorization.

146
00:05:14,560 --> 00:05:16,520
It is just network topology.

147
00:05:16,520 --> 00:05:19,400
Network access and data authorization are completely different layers,

148
00:05:19,400 --> 00:05:21,320
which means you can restrict who reaches the system

149
00:05:21,320 --> 00:05:24,160
without restricting what they see once they are inside.

150
00:05:24,160 --> 00:05:26,960
A junior sales representative and the VP of business development

151
00:05:26,960 --> 00:05:30,320
both have VPN access and can both query the RAC system,

152
00:05:30,320 --> 00:05:32,120
but they should not see the same documents.

153
00:05:32,120 --> 00:05:33,720
In most RAC deployments, they do.

154
00:05:33,720 --> 00:05:36,160
The real question authorization answers is this,

155
00:05:36,160 --> 00:05:37,840
given that you can reach the system,

156
00:05:37,840 --> 00:05:40,040
what subset of documents are you allowed to see?

157
00:05:40,040 --> 00:05:42,720
Network perimeter security doesn't answer that question,

158
00:05:42,720 --> 00:05:44,560
but permission metadata does.

159
00:05:44,560 --> 00:05:47,360
The third misconception is about the LLM itself.

160
00:05:47,360 --> 00:05:48,720
The model will handle it.

161
00:05:48,720 --> 00:05:51,560
This is the most insidious one because it pushes the problem

162
00:05:51,560 --> 00:05:53,200
inside the model where you can't see it.

163
00:05:53,200 --> 00:05:54,960
The reasoning usually sounds like this.

164
00:05:54,960 --> 00:05:57,120
We trust the LLM to be intelligent enough

165
00:05:57,120 --> 00:05:59,160
to figure out which documents it should cite

166
00:05:59,160 --> 00:06:00,480
and which it shouldn't.

167
00:06:00,480 --> 00:06:02,160
We will rely on the model to remember

168
00:06:02,160 --> 00:06:04,320
that certain information is confidential

169
00:06:04,320 --> 00:06:06,160
and will use a system prompt to instruct it

170
00:06:06,160 --> 00:06:07,920
not to expose sensitive data.

171
00:06:07,920 --> 00:06:09,240
This is a structural failure.

172
00:06:09,240 --> 00:06:11,720
The LLM has no way to know which documents are confidential

173
00:06:11,720 --> 00:06:14,280
because it only knows what is in its context window.

174
00:06:14,280 --> 00:06:16,800
If you have retrieved a document and passed it to the model,

175
00:06:16,800 --> 00:06:18,600
the model sees it as grounded information

176
00:06:18,600 --> 00:06:19,760
and treats it as evidence.

177
00:06:19,760 --> 00:06:21,960
It will reason over it and if that reasoning serves

178
00:06:21,960 --> 00:06:24,280
the user's query, it will surface that information

179
00:06:24,280 --> 00:06:25,200
in the answer.

180
00:06:25,200 --> 00:06:26,560
A system prompt that says,

181
00:06:26,560 --> 00:06:30,120
"Don't expose confidential data" is a hope rather than a guarantee.

182
00:06:30,120 --> 00:06:32,280
It can be bypassed, it can be misunderstood

183
00:06:32,280 --> 00:06:33,840
and it can fail silently.

184
00:06:33,840 --> 00:06:36,440
The only reliable way to ensure sensitive documents

185
00:06:36,440 --> 00:06:38,080
don't influence the answer is to ensure

186
00:06:38,080 --> 00:06:40,080
they never reach the model in the first place.

187
00:06:40,080 --> 00:06:42,240
And that requires knowing which documents the user

188
00:06:42,240 --> 00:06:43,800
is actually authorized to see

189
00:06:43,800 --> 00:06:45,080
before retrieval happens.

190
00:06:45,080 --> 00:06:47,360
The stakes of these misconceptions are concrete.

191
00:06:47,360 --> 00:06:49,960
HR documents become visible to contractors,

192
00:06:49,960 --> 00:06:52,320
executive strategy leaks to junior employees

193
00:06:52,320 --> 00:06:54,720
and legal files are accessible to sales teams.

194
00:06:54,720 --> 00:06:56,840
This doesn't happen because the teams were malicious

195
00:06:56,840 --> 00:06:59,040
but because the system never asked if the user had

196
00:06:59,040 --> 00:07:00,520
permission to see the data.

197
00:07:00,520 --> 00:07:03,480
That is where the permission problem becomes concrete.

198
00:07:03,480 --> 00:07:06,080
What happens when documents leave SharePoint?

199
00:07:06,080 --> 00:07:08,600
Let's walk through the actual mechanics of the problem

200
00:07:08,600 --> 00:07:10,600
because understanding the extraction process

201
00:07:10,600 --> 00:07:13,360
is where the vulnerability becomes unavoidable.

202
00:07:13,360 --> 00:07:15,480
SharePoint's permission model is built in layers

203
00:07:15,480 --> 00:07:17,360
that start at the site collection level

204
00:07:17,360 --> 00:07:19,800
where you can restrict who accesses an entire site.

205
00:07:19,800 --> 00:07:21,560
Then it flows down to the library level

206
00:07:21,560 --> 00:07:24,000
to limit access to specific document libraries

207
00:07:24,000 --> 00:07:27,000
and folders within those libraries can have their own rules.

208
00:07:27,000 --> 00:07:28,800
Individual items like specific files

209
00:07:28,800 --> 00:07:31,280
or documents can have their own access control lists

210
00:07:31,280 --> 00:07:33,000
that either inherit from above

211
00:07:33,000 --> 00:07:35,440
or break that inheritance to define their own rules.

212
00:07:35,440 --> 00:07:37,000
On top of that, you have security groups

213
00:07:37,000 --> 00:07:39,920
and Microsoft 365 groups that change constantly

214
00:07:39,920 --> 00:07:42,120
as people join projects or leave teams.

215
00:07:42,120 --> 00:07:45,040
That ripple effect flows through the entire permission model

216
00:07:45,040 --> 00:07:47,320
which means a user's access to a document

217
00:07:47,320 --> 00:07:49,480
is the product of every layer they inherit from

218
00:07:49,480 --> 00:07:51,200
and every group they belong to.

219
00:07:51,200 --> 00:07:53,800
This is complex but it is also why it works.

220
00:07:53,800 --> 00:07:55,760
It is why a financial report can be visible

221
00:07:55,760 --> 00:07:57,920
to the finance team and the executive board

222
00:07:57,920 --> 00:08:00,600
while remaining hidden from the sales team and contractors.

223
00:08:00,600 --> 00:08:02,720
Now you extract that financial report for RAC.

224
00:08:02,720 --> 00:08:05,120
The extraction process looks simple on the surface

225
00:08:05,120 --> 00:08:07,360
because you pull the document and break it into chunks

226
00:08:07,360 --> 00:08:08,680
by section or paragraph.

227
00:08:08,680 --> 00:08:10,360
You generate embeddings for each chunk

228
00:08:10,360 --> 00:08:12,800
and store those embeddings in your vector database.

229
00:08:12,800 --> 00:08:14,360
But here is what you lose at each step.

230
00:08:14,360 --> 00:08:15,680
When you extract the document,

231
00:08:15,680 --> 00:08:17,160
you get the content and the metadata

232
00:08:17,160 --> 00:08:18,400
like the title and author,

233
00:08:18,400 --> 00:08:20,840
but you don't get the complete access control list.

234
00:08:20,840 --> 00:08:22,840
You don't get the list of every user and group

235
00:08:22,840 --> 00:08:26,200
who can open this file and you don't get the inheritance chain.

236
00:08:26,200 --> 00:08:28,800
You lose the information that says section one is visible

237
00:08:28,800 --> 00:08:31,560
to everyone while section three is visible only to the board.

238
00:08:31,560 --> 00:08:32,400
Then you chunk it.

239
00:08:32,400 --> 00:08:34,800
Now you have broken a document with a single permission model

240
00:08:34,800 --> 00:08:36,240
into a hundred smaller pieces

241
00:08:36,240 --> 00:08:39,000
that are semantically independent but share an origin.

242
00:08:39,000 --> 00:08:41,000
In your RAC system, most setups default

243
00:08:41,000 --> 00:08:43,560
to having all chunks inherited from the parent document.

244
00:08:43,560 --> 00:08:46,400
But if the parent had 20 different permissions scopes,

245
00:08:46,400 --> 00:08:48,440
you have now lost that granularity.

246
00:08:48,440 --> 00:08:51,000
The chunk that contained board-only strategy

247
00:08:51,000 --> 00:08:53,120
is now tied to the same access rules

248
00:08:53,120 --> 00:08:55,320
as the chunk that contained public information.

249
00:08:55,320 --> 00:08:56,320
Then you embed it.

250
00:08:56,320 --> 00:08:57,840
The embedding captures semantic meaning

251
00:08:57,840 --> 00:08:59,280
but it does not capture permissions.

252
00:08:59,280 --> 00:09:00,920
Your embedding engine has no concept

253
00:09:00,920 --> 00:09:02,640
of this vector represents information

254
00:09:02,640 --> 00:09:04,080
that only five people should see

255
00:09:04,080 --> 00:09:05,360
and the vector exists in a space

256
00:09:05,360 --> 00:09:07,320
where access control is invisible.

257
00:09:07,320 --> 00:09:09,200
Then you store it in your vector database

258
00:09:09,200 --> 00:09:11,920
and this is where the real problem materializes.

259
00:09:11,920 --> 00:09:14,360
For your database to enforce permissions at query time,

260
00:09:14,360 --> 00:09:16,320
it needs to carry the actual list of users

261
00:09:16,320 --> 00:09:18,320
and groups allowed to see each specific chunk.

262
00:09:18,320 --> 00:09:19,720
If that metadata isn't there

263
00:09:19,720 --> 00:09:21,320
because you lost it during extraction,

264
00:09:21,320 --> 00:09:23,240
then the database has nothing to filter on.

265
00:09:23,240 --> 00:09:26,040
Consider a real scenario where a 50 page financial report

266
00:09:26,040 --> 00:09:29,120
has 20 different permissions scopes for different audiences.

267
00:09:29,120 --> 00:09:30,560
When it lands in your vector database,

268
00:09:30,560 --> 00:09:33,320
it becomes a single record with a single permission field

269
00:09:33,320 --> 00:09:35,400
or perhaps no permission metadata at all.

270
00:09:35,400 --> 00:09:37,520
Now that same report is retrievable by anyone

271
00:09:37,520 --> 00:09:39,160
who can query your Ragsystem.

272
00:09:39,160 --> 00:09:40,480
This is the permission drift problem

273
00:09:40,480 --> 00:09:43,200
that Gartner identifies as a recurring failure mode

274
00:09:43,200 --> 00:09:44,880
in multi-tenant Ragsystems.

275
00:09:44,880 --> 00:09:46,600
The source of truth has a permission model

276
00:09:46,600 --> 00:09:48,560
but your retrieval system has a different model

277
00:09:48,560 --> 00:09:49,560
or no model at all.

278
00:09:49,560 --> 00:09:52,520
They drift apart and that gap is where breaches happen.

279
00:09:52,520 --> 00:09:54,040
The false assumption most teams make

280
00:09:54,040 --> 00:09:56,440
is that they will handle permissions at query time.

281
00:09:56,440 --> 00:09:58,240
But if the index has no metadata encoding

282
00:09:58,240 --> 00:10:00,480
who can see what, there is nothing to filter.

283
00:10:00,480 --> 00:10:03,000
You cannot enforce access rules that were never captured.

284
00:10:03,000 --> 00:10:05,120
The technical solution requires something different.

285
00:10:05,120 --> 00:10:08,080
Materializing permissions at ingestion time.

286
00:10:08,080 --> 00:10:10,800
ACL metadata extraction, the missing layer,

287
00:10:10,800 --> 00:10:12,480
giving up on Rags isn't the answer.

288
00:10:12,480 --> 00:10:14,720
The real solution is to rebuild the very thing

289
00:10:14,720 --> 00:10:16,800
that extraction usually destroys.

290
00:10:16,800 --> 00:10:19,200
ACL metadata extraction means you capture

291
00:10:19,200 --> 00:10:21,080
the entire SharePoint permission model,

292
00:10:21,080 --> 00:10:22,920
the exact second you pull a document.

293
00:10:22,920 --> 00:10:24,600
You aren't just grabbing the text.

294
00:10:24,600 --> 00:10:27,440
You are grabbing the authorization context that lives with it.

295
00:10:27,440 --> 00:10:30,600
You then encode that context directly into your vector database

296
00:10:30,600 --> 00:10:32,280
so that when a user asks a question,

297
00:10:32,280 --> 00:10:34,680
the system knows exactly who is allowed to see what.

298
00:10:34,680 --> 00:10:36,960
This fundamentally changes how your data is structured

299
00:10:36,960 --> 00:10:38,760
instead of just storing a block of text

300
00:10:38,760 --> 00:10:41,120
and its mathematical embedding every single record

301
00:10:41,120 --> 00:10:44,440
in your database carries explicit permission metadata.

302
00:10:44,440 --> 00:10:46,320
You'll see fields like allowed user id,

303
00:10:46,320 --> 00:10:48,840
which list the specific people who can see that chunk.

304
00:10:48,840 --> 00:10:50,680
You'll have allowed group ids for team access

305
00:10:50,680 --> 00:10:53,480
and sensitivity label to mark the classification level.

306
00:10:53,480 --> 00:10:55,600
If you're working in a multi-tenant environment,

307
00:10:55,600 --> 00:10:57,160
you'll even include a tenant ID.

308
00:10:57,160 --> 00:10:59,120
Every chunk becomes self-describing

309
00:10:59,120 --> 00:11:01,240
and it tells the system exactly who it's visible to

310
00:11:01,240 --> 00:11:02,800
before the search even starts.

311
00:11:02,800 --> 00:11:04,840
Microsoft provides specific guidance

312
00:11:04,840 --> 00:11:06,880
on how to handle this in a real-world setup.

313
00:11:06,880 --> 00:11:08,480
When you use Azure AI Search,

314
00:11:08,480 --> 00:11:11,360
you set indexer permission options on your SharePoint data source

315
00:11:11,360 --> 00:11:13,200
to give the indexer a specific job.

316
00:11:13,200 --> 00:11:15,360
You're telling it to pull the user ids and group ids

317
00:11:15,360 --> 00:11:17,920
from SharePoint and turn them into metadata fields

318
00:11:17,920 --> 00:11:18,760
in your index.

319
00:11:18,760 --> 00:11:19,960
The indexer does the heavy lifting

320
00:11:19,960 --> 00:11:22,080
of reading those complex SharePoint ACLs

321
00:11:22,080 --> 00:11:25,200
and translating them into a clean, normalized format

322
00:11:25,200 --> 00:11:27,120
that your search layer can actually understand.

323
00:11:27,120 --> 00:11:30,240
But here is the critical detail that most teams miss.

324
00:11:30,240 --> 00:11:33,200
You cannot use user names or email addresses for this.

325
00:11:33,200 --> 00:11:35,320
People get married and change their last names.

326
00:11:35,320 --> 00:11:37,440
Companies rename their domains, display names,

327
00:11:37,440 --> 00:11:38,360
shift all the time.

328
00:11:38,360 --> 00:11:41,000
Instead, you must use EntraID Goides,

329
00:11:41,000 --> 00:11:43,000
which are the unique strings of numbers and letters

330
00:11:43,000 --> 00:11:44,680
that represent every user and group

331
00:11:44,680 --> 00:11:46,680
in the Microsoft ecosystem.

332
00:11:46,680 --> 00:11:48,360
A Go-Ide is stable and never changes

333
00:11:48,360 --> 00:11:50,120
when a person updates their profile

334
00:11:50,120 --> 00:11:52,280
and it won't break if an old email address

335
00:11:52,280 --> 00:11:53,920
gets reassigned to someone else.

336
00:11:53,920 --> 00:11:55,400
This ensures the identity matching

337
00:11:55,400 --> 00:11:57,400
between your rack system and your source of truth

338
00:11:57,400 --> 00:11:58,840
stays perfectly consistent.

339
00:11:58,840 --> 00:12:01,080
User A in SharePoint will always be recognized

340
00:12:01,080 --> 00:12:03,160
as user A in your vector database.

341
00:12:03,160 --> 00:12:05,040
The extraction happens during the ingestion phase

342
00:12:05,040 --> 00:12:06,400
rather than at query time.

343
00:12:06,400 --> 00:12:08,360
This timing is vital for both your security

344
00:12:08,360 --> 00:12:09,800
and your system performance.

345
00:12:09,800 --> 00:12:12,160
You aren't waiting for a user to ask a question

346
00:12:12,160 --> 00:12:14,680
and then scrambling to figure out permissions on the fly.

347
00:12:14,680 --> 00:12:16,440
You've already resolved every single permission

348
00:12:16,440 --> 00:12:18,240
the moment you index the document.

349
00:12:18,240 --> 00:12:20,440
The metadata is already there, it's encoded

350
00:12:20,440 --> 00:12:21,520
and it's ready to go.

351
00:12:21,520 --> 00:12:22,840
When the query finally arrives,

352
00:12:22,840 --> 00:12:24,400
the filtering is incredibly fast

353
00:12:24,400 --> 00:12:26,800
because the database just applies a simple expression

354
00:12:26,800 --> 00:12:28,800
before it ever returns a result.

355
00:12:28,800 --> 00:12:30,400
A real implementation looks like this.

356
00:12:30,400 --> 00:12:32,160
You start by setting up a SharePoint connector

357
00:12:32,160 --> 00:12:33,400
through Microsoft Graph.

358
00:12:33,400 --> 00:12:35,000
But instead of using the broadsides,

359
00:12:35,000 --> 00:12:37,520
read or permission that would give the connector access

360
00:12:37,520 --> 00:12:39,960
to every file in the company you use sites.

361
00:12:39,960 --> 00:12:40,760
Select it.

362
00:12:40,760 --> 00:12:42,240
This allows you to scope the access

363
00:12:42,240 --> 00:12:45,320
to only the specific sites you actually need to index.

364
00:12:45,320 --> 00:12:47,120
As the connector pulls each document,

365
00:12:47,120 --> 00:12:49,600
it calls the Microsoft Graph API to ask

366
00:12:49,600 --> 00:12:52,200
who can see the file and what the inheritance chain looks like.

367
00:12:52,200 --> 00:12:54,080
It resolves all of those complex rules

368
00:12:54,080 --> 00:12:55,840
into a simple list of guides.

369
00:12:55,840 --> 00:12:58,000
Finally, it packages the document chunk

370
00:12:58,000 --> 00:12:59,280
with that permission data

371
00:12:59,280 --> 00:13:01,160
and sends it off to the vector database.

372
00:13:01,160 --> 00:13:03,320
The real challenge is that the SharePoint permission model

373
00:13:03,320 --> 00:13:05,400
is incredibly messy to translate.

374
00:13:05,400 --> 00:13:07,920
You have inheritance rules and broken inheritance

375
00:13:07,920 --> 00:13:09,760
where a specific folder tells the system

376
00:13:09,760 --> 00:13:11,040
to ignore the parent rules.

377
00:13:11,040 --> 00:13:13,160
You have external guests with limited access

378
00:13:13,160 --> 00:13:14,680
and sharing links that grant permission

379
00:13:14,680 --> 00:13:16,120
to people without names.

380
00:13:16,120 --> 00:13:19,320
Direct assignments can override group memberships at any time.

381
00:13:19,320 --> 00:13:21,480
Getting this right requires serious engineering

382
00:13:21,480 --> 00:13:23,320
because you have to turn all that chaos

383
00:13:23,320 --> 00:13:26,080
into a metadata model your database can actually use.

384
00:13:26,080 --> 00:13:27,400
Most teams underestimate

385
00:13:27,400 --> 00:13:28,400
how hard this is.

386
00:13:28,400 --> 00:13:30,520
They assume pulling permissions is a simple checkbox

387
00:13:30,520 --> 00:13:32,520
but it's actually a complex mapping exercise

388
00:13:32,520 --> 00:13:35,160
even though it's difficult, it is completely necessary.

389
00:13:35,160 --> 00:13:36,800
Once that metadata lives in your index

390
00:13:36,800 --> 00:13:39,120
and every chunk carries its own security badge,

391
00:13:39,120 --> 00:13:40,840
your retrieval layer finally has something

392
00:13:40,840 --> 00:13:43,560
it can enforce and that is where the actual security begins.

393
00:13:43,560 --> 00:13:45,640
Authorization before retrieval not after.

394
00:13:45,640 --> 00:13:46,920
This is the exact moment

395
00:13:46,920 --> 00:13:49,760
where most rag deployments fail the test.

396
00:13:49,760 --> 00:13:52,920
It happens right when it matters most at query time.

397
00:13:52,920 --> 00:13:54,040
The principle here is simple

398
00:13:54,040 --> 00:13:55,360
but most people get it wrong.

399
00:13:55,360 --> 00:13:56,840
You should never retrieve a document

400
00:13:56,840 --> 00:13:58,960
and then decide if the user is allowed to see it.

401
00:13:58,960 --> 00:14:01,800
That decision has to happen before the retrieval not after.

402
00:14:01,800 --> 00:14:04,040
This distinction is much more important than it sounds.

403
00:14:04,040 --> 00:14:06,400
If you retrieve the data first and filter it later,

404
00:14:06,400 --> 00:14:08,120
you've already allowed sensitive information

405
00:14:08,120 --> 00:14:09,680
to influence the entire process.

406
00:14:09,680 --> 00:14:11,520
The vector database has already ranked it.

407
00:14:11,520 --> 00:14:13,720
The re-ranking model has already scored it.

408
00:14:13,720 --> 00:14:15,960
The document might have even made it into the context window

409
00:14:15,960 --> 00:14:17,600
where the LLM started thinking about it

410
00:14:17,600 --> 00:14:19,880
before you realized the user shouldn't have access.

411
00:14:19,880 --> 00:14:21,320
By the time your filter kicks in,

412
00:14:21,320 --> 00:14:23,880
the damage isn't being prevented because it's already happening.

413
00:14:23,880 --> 00:14:25,600
Post filtering is a structural failure

414
00:14:25,600 --> 00:14:27,480
because sensitive data leaves a trail.

415
00:14:27,480 --> 00:14:30,520
It changes ranking scores and influences semantic caches.

416
00:14:30,520 --> 00:14:32,280
It shapes how the model reasons.

417
00:14:32,280 --> 00:14:34,680
Even if the user doesn't see the final text,

418
00:14:34,680 --> 00:14:36,280
a document that should have been invisible

419
00:14:36,280 --> 00:14:37,920
still changed the final answer.

420
00:14:37,920 --> 00:14:41,000
A filter that shows up too late cannot erase that influence.

421
00:14:41,000 --> 00:14:44,120
Real security requires a completely different sequence of events.

422
00:14:44,120 --> 00:14:45,480
When a user asks a question,

423
00:14:45,480 --> 00:14:47,760
the system has to follow a very specific order.

424
00:14:47,760 --> 00:14:49,440
First, it must authenticate the user

425
00:14:49,440 --> 00:14:51,400
and verify their session to confirm they are

426
00:14:51,400 --> 00:14:52,760
who they say they are.

427
00:14:52,760 --> 00:14:55,000
Second, it has to resolve their group memberships

428
00:14:55,000 --> 00:14:55,920
in EntryD.

429
00:14:55,920 --> 00:14:58,280
This is a huge step because permissions usually flow

430
00:14:58,280 --> 00:15:00,840
through groups rather than individual names.

431
00:15:00,840 --> 00:15:03,800
If the system doesn't know which groups a user belongs to,

432
00:15:03,800 --> 00:15:06,120
it has no way of knowing what they are allowed to see.

433
00:15:06,120 --> 00:15:08,440
Third, the system builds an authorization filter.

434
00:15:08,440 --> 00:15:09,920
It creates a query that says,

435
00:15:09,920 --> 00:15:12,600
"Only show documents where the tenant matches this user,

436
00:15:12,600 --> 00:15:15,320
and either the user's ID or one of their group IDs

437
00:15:15,320 --> 00:15:16,600
is in the allowed list."

438
00:15:16,600 --> 00:15:18,840
Fourth, it applies that filter to the vector search

439
00:15:18,840 --> 00:15:20,440
before the search even runs.

440
00:15:20,440 --> 00:15:21,800
The database narrows the field

441
00:15:21,800 --> 00:15:23,520
to only the authorized documents.

442
00:15:23,520 --> 00:15:26,040
Fifth, it retrieves only from that safe subset,

443
00:15:26,040 --> 00:15:27,440
only those documents come back,

444
00:15:27,440 --> 00:15:30,240
and only those documents can influence the final answer.

445
00:15:30,240 --> 00:15:33,160
The filter expression is simple in theory, but complex to build.

446
00:15:33,160 --> 00:15:35,080
You are essentially telling the database

447
00:15:35,080 --> 00:15:37,360
to find documents where the tenant ID is correct

448
00:15:37,360 --> 00:15:40,320
and the user's identity intersects with the allowed list.

449
00:15:40,320 --> 00:15:41,960
Notice what this does to the workflow.

450
00:15:41,960 --> 00:15:44,920
You aren't pulling everything and then sorting through the pile.

451
00:15:44,920 --> 00:15:46,320
You are only pulling the documents

452
00:15:46,320 --> 00:15:49,000
that meet the security criteria from the very start.

453
00:15:49,000 --> 00:15:51,440
The filtering is inseparable from the search itself.

454
00:15:51,440 --> 00:15:54,000
The database never even looks at unauthorized documents

455
00:15:54,000 --> 00:15:56,680
because they are outside the search area from the beginning.

456
00:15:56,680 --> 00:15:58,440
But there is a practical side to this

457
00:15:58,440 --> 00:16:00,120
that most architects miss.

458
00:16:00,120 --> 00:16:01,600
Permissions are not static.

459
00:16:01,600 --> 00:16:04,040
People leave projects, group memberships change,

460
00:16:04,040 --> 00:16:06,680
and files get reshared with new teams every day.

461
00:16:06,680 --> 00:16:09,400
The SharePoint model is alive and constantly shifting.

462
00:16:09,400 --> 00:16:11,400
Your system might have indexed everything on Tuesday,

463
00:16:11,400 --> 00:16:13,240
but if permissions changed on Wednesday,

464
00:16:13,240 --> 00:16:15,720
your Thursday query might be using all data.

465
00:16:15,720 --> 00:16:18,600
This is why you need a plan for continuous authorization.

466
00:16:18,600 --> 00:16:20,840
You can't just assume the metadata you saved

467
00:16:20,840 --> 00:16:23,080
during ingestion is still 100% accurate.

468
00:16:23,080 --> 00:16:25,120
You generally have two ways to handle this.

469
00:16:25,120 --> 00:16:27,680
You can refresh your permission metadata on a tight schedule,

470
00:16:27,680 --> 00:16:29,560
which is thorough, but can get expensive,

471
00:16:29,560 --> 00:16:31,280
or you can use a fallback check.

472
00:16:31,280 --> 00:16:33,560
The database filter is based on the saved metadata,

473
00:16:33,560 --> 00:16:35,440
but before the results go to the LLM,

474
00:16:35,440 --> 00:16:37,520
the system makes a quick live call to SharePoint

475
00:16:37,520 --> 00:16:39,040
to confirm nothing has changed.

476
00:16:39,040 --> 00:16:41,320
This catches those rare cases where someone's access

477
00:16:41,320 --> 00:16:43,640
was revoked right after the document was indexed.

478
00:16:43,640 --> 00:16:46,320
There is a performance cost to doing things the right way.

479
00:16:46,320 --> 00:16:48,560
One financial firm indexed half a million documents

480
00:16:48,560 --> 00:16:52,120
with this metadata and saw their latency go from 200 milliseconds

481
00:16:52,120 --> 00:16:54,480
to 250, that is a 25% slowdown.

482
00:16:54,480 --> 00:16:57,080
For most companies, that is a perfectly fine tradeoff.

483
00:16:57,080 --> 00:16:59,440
If you have a massive, high volume system,

484
00:16:59,440 --> 00:17:01,480
you'll need to optimize by caching group lookups

485
00:17:01,480 --> 00:17:04,760
or using database indexes designed for these specific checks.

486
00:17:04,760 --> 00:17:06,360
The tradeoff is a conscious choice.

487
00:17:06,360 --> 00:17:08,600
A slower search that is actually secure will always

488
00:17:08,600 --> 00:17:10,960
be the fast search that leaks private data.

489
00:17:10,960 --> 00:17:12,760
And this is where the work gets even harder.

490
00:17:12,760 --> 00:17:15,120
A single retriever following these rules is one thing,

491
00:17:15,120 --> 00:17:17,520
but when you have multiple agents passing user context

492
00:17:17,520 --> 00:17:20,040
back and forth, you run into a completely different set

493
00:17:20,040 --> 00:17:20,800
of problems.

494
00:17:20,800 --> 00:17:23,040
Why one agent can't do everything?

495
00:17:23,040 --> 00:17:25,680
This is where the architecture question becomes unavoidable.

496
00:17:25,680 --> 00:17:27,960
Most organizations start with a single agent,

497
00:17:27,960 --> 00:17:30,400
which is usually just one LLM process.

498
00:17:30,400 --> 00:17:33,200
It receives the user's question, decides what to retrieve,

499
00:17:33,200 --> 00:17:35,760
gets the documents back, and then reasons over them

500
00:17:35,760 --> 00:17:36,720
to generate an answer.

501
00:17:36,720 --> 00:17:38,520
Everything happens in one place, one actor,

502
00:17:38,520 --> 00:17:40,120
one point of responsibility.

503
00:17:40,120 --> 00:17:43,640
The appeal is obvious because it's simpler and easier to build.

504
00:17:43,640 --> 00:17:45,440
You don't need to orchestrate multiple systems,

505
00:17:45,440 --> 00:17:47,600
and you don't need to think about how identity flows

506
00:17:47,600 --> 00:17:49,160
from one agent to another.

507
00:17:49,160 --> 00:17:50,960
You just point the agent at your index documents

508
00:17:50,960 --> 00:17:51,880
and let it work.

509
00:17:51,880 --> 00:17:55,280
But it will fail under pressure in specific predictable ways.

510
00:17:55,280 --> 00:17:57,240
The term for this problem in security architecture

511
00:17:57,240 --> 00:17:59,240
is the confused deputy.

512
00:17:59,240 --> 00:18:01,880
It's an agent that holds all permissions simultaneously,

513
00:18:01,880 --> 00:18:03,800
meaning it can read anything and do anything

514
00:18:03,800 --> 00:18:05,760
without any built-in constraint on itself.

515
00:18:05,760 --> 00:18:07,520
When a user asks it a question, the agent

516
00:18:07,520 --> 00:18:09,760
doesn't think about what the user is allowed to see.

517
00:18:09,760 --> 00:18:13,000
Instead, it only thinks about what information it has access to

518
00:18:13,000 --> 00:18:14,280
that answers the question.

519
00:18:14,280 --> 00:18:15,600
Those are two completely different questions.

520
00:18:15,600 --> 00:18:18,920
The first is about authorization, while the second is about capability.

521
00:18:18,920 --> 00:18:20,560
A single agent conflates those two.

522
00:18:20,560 --> 00:18:22,880
It has the capability to retrieve everything in the index

523
00:18:22,880 --> 00:18:25,160
because it has broad permissions to the database,

524
00:18:25,160 --> 00:18:27,920
and it treats that capability as a guide for what to do.

525
00:18:27,920 --> 00:18:30,040
If the information exists and answers the query,

526
00:18:30,040 --> 00:18:31,400
the agent retrieves it.

527
00:18:31,400 --> 00:18:33,160
Now compound that with prompt injection.

528
00:18:33,160 --> 00:18:36,160
A malicious user, or even a malicious document inside your index,

529
00:18:36,160 --> 00:18:39,440
crafts a prompt design to make the agent ignore its intended purpose.

530
00:18:39,440 --> 00:18:42,360
It might be something subtle that manipulates the agent

531
00:18:42,360 --> 00:18:45,840
into exfiltrating data through a seemingly innocent query.

532
00:18:45,840 --> 00:18:47,760
The single agent has no checkpoint to stop it.

533
00:18:47,760 --> 00:18:50,600
There's no second system verifying if the request is legitimate,

534
00:18:50,600 --> 00:18:53,640
and there's no gatekeeper saying, "You're not supposed to access that."

535
00:18:53,640 --> 00:18:54,840
The agent just executes.

536
00:18:54,840 --> 00:18:57,240
It retrieves, it reasons, it outputs.

537
00:18:57,240 --> 00:18:59,360
The failure mode is worse than simple data leakage

538
00:18:59,360 --> 00:19:02,040
because it follows the principle of confused delegation.

539
00:19:02,040 --> 00:19:04,400
The user never explicitly asked for the sensitive data,

540
00:19:04,400 --> 00:19:06,960
but the agent was manipulated into surfacing it anyway

541
00:19:06,960 --> 00:19:09,280
while operating under one set of instructions.

542
00:19:09,280 --> 00:19:10,960
And because the agent is the final authority

543
00:19:10,960 --> 00:19:12,920
with nobody validating what it retrieved,

544
00:19:12,920 --> 00:19:14,240
the damage happens silently.

545
00:19:14,240 --> 00:19:15,680
Here's what happens in practice.

546
00:19:15,680 --> 00:19:19,080
A junior employee asks the system a question about a client project,

547
00:19:19,080 --> 00:19:20,760
and while the question is legitimate,

548
00:19:20,760 --> 00:19:23,760
there is instruction text embedded in the index documents.

549
00:19:23,760 --> 00:19:26,520
A well-meaning senior employee put a note there

550
00:19:26,520 --> 00:19:29,680
saying this section contains pricing strategy and margin analysis

551
00:19:29,680 --> 00:19:33,480
and should only be shown to accounts over $5 million in ARR.

552
00:19:33,480 --> 00:19:35,360
The single agent retrieves that section,

553
00:19:35,360 --> 00:19:37,280
but it doesn't understand the note as a boundary.

554
00:19:37,280 --> 00:19:39,080
It understands it as context.

555
00:19:39,080 --> 00:19:41,680
It includes the pricing and margin information in its answer

556
00:19:41,680 --> 00:19:43,880
because the information is relevant to the query.

557
00:19:43,880 --> 00:19:46,400
The junior employee now has access to pricing data

558
00:19:46,400 --> 00:19:47,840
they weren't supposed to see.

559
00:19:47,840 --> 00:19:49,960
This didn't happen because they hacked the system

560
00:19:49,960 --> 00:19:51,760
or manipulated the agent directly.

561
00:19:51,760 --> 00:19:54,760
It happened because the single agent was the only decision maker

562
00:19:54,760 --> 00:19:56,600
and had nobody checking its work.

563
00:19:56,600 --> 00:19:59,040
The individual agent also doesn't know what it doesn't know.

564
00:19:59,040 --> 00:20:01,840
It can't distinguish between a document marked confidential

565
00:20:01,840 --> 00:20:04,640
for executives and a document that is public.

566
00:20:04,640 --> 00:20:07,200
To the agent, those are just different documents

567
00:20:07,200 --> 00:20:08,440
in its retrieval results.

568
00:20:08,440 --> 00:20:09,560
It treats them the same.

569
00:20:09,560 --> 00:20:11,560
A confidence score, a relevance ranking.

570
00:20:11,560 --> 00:20:13,960
There is no categorical difference and no permission boundary.

571
00:20:13,960 --> 00:20:16,040
The document either answers the question or it doesn't.

572
00:20:16,040 --> 00:20:17,920
This is why gardeners' data matters here.

573
00:20:17,920 --> 00:20:22,640
They predict that 40% of agente AI projects will be canceled by 2027.

574
00:20:22,640 --> 00:20:24,960
The reason isn't model quality or retrieval accuracy.

575
00:20:24,960 --> 00:20:25,760
It's governance.

576
00:20:25,760 --> 00:20:28,240
It's the realization, usually late in deployment,

577
00:20:28,240 --> 00:20:30,240
that a single agent making all the decisions

578
00:20:30,240 --> 00:20:33,200
creates liability that the organization isn't prepared to accept.

579
00:20:33,200 --> 00:20:37,080
The false economy of building one agent is that you save time upfront

580
00:20:37,080 --> 00:20:39,440
but you trade that time for risk downstream.

581
00:20:39,440 --> 00:20:42,960
When that risk materializes and data leaks or prompt injection succeeds,

582
00:20:42,960 --> 00:20:44,960
the cost of fixing it is exponentially higher

583
00:20:44,960 --> 00:20:47,160
than the cost of building it right initially.

584
00:20:47,160 --> 00:20:48,680
The solution isn't to give up on Rags.

585
00:20:48,680 --> 00:20:50,240
It's to distribute responsibility.

586
00:20:50,240 --> 00:20:52,240
You need to add checkpoints and create agents

587
00:20:52,240 --> 00:20:55,440
with specific roles, specific permissions and specific constraints.

588
00:20:55,440 --> 00:20:57,280
That's the multi-agent model.

589
00:20:57,280 --> 00:20:59,160
The five-agent retrieval model.

590
00:20:59,160 --> 00:21:01,240
The alternative is to think about the retrieval problem

591
00:21:01,240 --> 00:21:03,400
as a sequence of distinct responsibilities.

592
00:21:03,400 --> 00:21:05,160
Each one is owned by an agent

593
00:21:05,160 --> 00:21:07,720
with a specific mandate and specific constraints.

594
00:21:07,720 --> 00:21:09,360
Not all agents need broad access

595
00:21:09,360 --> 00:21:11,800
and not all agents need to see the full permission model.

596
00:21:11,800 --> 00:21:14,560
Some agents don't touch data at all because they handle routing

597
00:21:14,560 --> 00:21:16,480
while others validate or generate.

598
00:21:16,480 --> 00:21:19,840
The idea is that by breaking the monolith into specialized components,

599
00:21:19,840 --> 00:21:23,280
you distribute authority in a way that prevents any single actor

600
00:21:23,280 --> 00:21:24,480
from making all the decisions.

601
00:21:24,480 --> 00:21:25,840
Here's how it works in practice.

602
00:21:25,840 --> 00:21:27,160
Agent one is the router.

603
00:21:27,160 --> 00:21:29,880
Its job is to understand what the user is actually asking for

604
00:21:29,880 --> 00:21:31,800
and determine if the request is legitimate.

605
00:21:31,800 --> 00:21:34,040
It doesn't touch documents and it doesn't retrieve anything.

606
00:21:34,040 --> 00:21:36,040
It receives the natural language question

607
00:21:36,040 --> 00:21:38,680
and evaluates whether it is a reasonable request.

608
00:21:38,680 --> 00:21:40,880
It asks if this person is asking for something

609
00:21:40,880 --> 00:21:42,440
that makes sense within their role.

610
00:21:42,440 --> 00:21:44,720
It checks if the question triggers any red flags

611
00:21:44,720 --> 00:21:46,560
like patterns that look like prompt injection

612
00:21:46,560 --> 00:21:48,480
or data exfiltration attempts.

613
00:21:48,480 --> 00:21:51,120
The router makes a simple decision to either proceed or block

614
00:21:51,120 --> 00:21:52,000
if something looks wrong.

615
00:21:52,000 --> 00:21:53,240
The request stops here.

616
00:21:53,240 --> 00:21:54,200
No retrieval happens.

617
00:21:54,200 --> 00:21:55,120
No data is at risk.

618
00:21:55,120 --> 00:21:56,840
Agent two is the query translator.

619
00:21:56,840 --> 00:21:58,720
Once the router has approved the request,

620
00:21:58,720 --> 00:22:01,080
the translator's job is to convert natural language

621
00:22:01,080 --> 00:22:02,960
into a structured retrieval query.

622
00:22:02,960 --> 00:22:04,400
The user asks the question in English

623
00:22:04,400 --> 00:22:07,080
and the translator transforms that into the exact parameters

624
00:22:07,080 --> 00:22:08,360
the retrieval system needs.

625
00:22:08,360 --> 00:22:10,240
It determines what keywords to search for,

626
00:22:10,240 --> 00:22:11,760
what semantic meaning to match,

627
00:22:11,760 --> 00:22:13,520
and what metadata filters to apply.

628
00:22:13,520 --> 00:22:15,080
But here's the critical part.

629
00:22:15,080 --> 00:22:17,280
The translator never decides which documents the user

630
00:22:17,280 --> 00:22:18,160
is allowed to see.

631
00:22:18,160 --> 00:22:19,320
It only shapes the query.

632
00:22:19,320 --> 00:22:20,920
It has no access to permission models

633
00:22:20,920 --> 00:22:22,440
and no authorization logic.

634
00:22:22,440 --> 00:22:24,040
It's a pure transformation function.

635
00:22:24,040 --> 00:22:26,400
Agent three is the retriever with authorization.

636
00:22:26,400 --> 00:22:28,120
This is where the security boundary lives.

637
00:22:28,120 --> 00:22:30,920
This agent takes the translated query from Agent two

638
00:22:30,920 --> 00:22:33,240
and executes it against the vector database.

639
00:22:33,240 --> 00:22:34,400
But before it returns anything,

640
00:22:34,400 --> 00:22:36,240
it applies the user's permission filter.

641
00:22:36,240 --> 00:22:37,920
It constructs the authorization expression

642
00:22:37,920 --> 00:22:39,720
we talked about earlier and verifies

643
00:22:39,720 --> 00:22:41,800
if this user has access to this document.

644
00:22:41,800 --> 00:22:43,120
It acts as a bouncer.

645
00:22:43,120 --> 00:22:44,680
Documents that fail the permission check

646
00:22:44,680 --> 00:22:46,560
never make it out of this agent.

647
00:22:46,560 --> 00:22:47,840
The retriever has a simple rule.

648
00:22:47,840 --> 00:22:50,360
If the user can't access it, they don't get to see it.

649
00:22:50,360 --> 00:22:51,200
Period.

650
00:22:51,200 --> 00:22:52,640
The retriever is narrow in scope,

651
00:22:52,640 --> 00:22:54,520
but absolute in its enforcement.

652
00:22:54,520 --> 00:22:55,920
Agent four is the validator.

653
00:22:55,920 --> 00:22:58,280
The retriever sent back a set of authorized documents,

654
00:22:58,280 --> 00:23:00,680
but the validator asks if they are the right documents.

655
00:23:00,680 --> 00:23:03,000
It checks if this collection of sources

656
00:23:03,000 --> 00:23:05,320
actually makes sense for answering the user's question

657
00:23:05,320 --> 00:23:06,920
and if it fits this person's role.

658
00:23:06,920 --> 00:23:10,040
It looks for inconsistencies or red flags in the retrieved set.

659
00:23:10,040 --> 00:23:12,640
The validator is a quality gate that can refuse to proceed

660
00:23:12,640 --> 00:23:14,960
if the retrieved documents don't pass validation.

661
00:23:14,960 --> 00:23:17,400
If a financial analyst queries about production logistics

662
00:23:17,400 --> 00:23:19,360
and the retriever comes back with internal memos

663
00:23:19,360 --> 00:23:20,480
from the finance department,

664
00:23:20,480 --> 00:23:22,560
the validator flags that mismatch.

665
00:23:22,560 --> 00:23:24,680
It prevents nonsensical or suspicious retrievals

666
00:23:24,680 --> 00:23:25,960
from reaching the next step.

667
00:23:25,960 --> 00:23:27,400
Agent five is the generator.

668
00:23:27,400 --> 00:23:29,200
This agent receives only the documents

669
00:23:29,200 --> 00:23:31,560
that have passed authorization and validation.

670
00:23:31,560 --> 00:23:33,000
Its job is straightforward.

671
00:23:33,000 --> 00:23:35,160
It creates a coherent, well-sourced answer

672
00:23:35,160 --> 00:23:37,080
using only the context provided.

673
00:23:37,080 --> 00:23:39,600
Here's what's crucial about the generator's position.

674
00:23:39,600 --> 00:23:41,280
It never sees the full permission model.

675
00:23:41,280 --> 00:23:44,120
It doesn't know which groups could access which documents

676
00:23:44,120 --> 00:23:46,040
and it doesn't have the authorization logic.

677
00:23:46,040 --> 00:23:48,160
It only sees documents that have already been vetted

678
00:23:48,160 --> 00:23:49,840
as appropriate for this user.

679
00:23:49,840 --> 00:23:52,280
The generator operates under a fundamental constraint

680
00:23:52,280 --> 00:23:54,280
where it can only work with what it was given.

681
00:23:54,280 --> 00:23:56,720
It cannot fetch additional context on its own

682
00:23:56,720 --> 00:23:59,120
and it cannot bypass the authorization layer.

683
00:23:59,120 --> 00:24:01,560
It has no way to know that sensitive documents exist

684
00:24:01,560 --> 00:24:04,000
because they were filtered out before it ever saw them.

685
00:24:04,000 --> 00:24:05,480
Each agent has a role.

686
00:24:05,480 --> 00:24:07,280
Each agent has specific permissions.

687
00:24:07,280 --> 00:24:09,440
Each agent has specific responsibility.

688
00:24:09,440 --> 00:24:10,800
The router guards the gate.

689
00:24:10,800 --> 00:24:12,480
The translator shapes the request.

690
00:24:12,480 --> 00:24:14,320
The retriever enforces access.

691
00:24:14,320 --> 00:24:15,760
The validator ensures coherence.

692
00:24:15,760 --> 00:24:17,240
The generator creates the answer.

693
00:24:17,240 --> 00:24:19,120
This separation of duties changes everything.

694
00:24:19,120 --> 00:24:21,040
No single agent holds all permissions

695
00:24:21,040 --> 00:24:23,000
and no single agent makes all decisions.

696
00:24:23,000 --> 00:24:25,480
The permission model itself doesn't live in the LLM.

697
00:24:25,480 --> 00:24:26,400
It lives in the retriever,

698
00:24:26,400 --> 00:24:29,040
which is a deterministic component outside the model.

699
00:24:29,040 --> 00:24:31,080
Microsoft's multi-agent reference architecture

700
00:24:31,080 --> 00:24:33,320
uses this pattern for co-pilot deployments

701
00:24:33,320 --> 00:24:34,920
because it works at scale.

702
00:24:34,920 --> 00:24:37,400
And it requires something that basic RAC doesn't demand,

703
00:24:37,400 --> 00:24:39,240
which is strong identity and access control

704
00:24:39,240 --> 00:24:40,480
at every single hop.

705
00:24:40,480 --> 00:24:42,480
From user token to agent action,

706
00:24:42,480 --> 00:24:44,560
we need to look at how identity actually moves

707
00:24:44,560 --> 00:24:46,080
through a multi-agent system

708
00:24:46,080 --> 00:24:48,000
because having five specialized agents

709
00:24:48,000 --> 00:24:50,320
is only useful if every single one knows exactly

710
00:24:50,320 --> 00:24:51,280
who they are serving.

711
00:24:51,280 --> 00:24:52,840
When a user logs in through EntraID,

712
00:24:52,840 --> 00:24:55,480
the system validates them and issues a digital token.

713
00:24:55,480 --> 00:24:58,720
This token carries specific details like the user's unique ID.

714
00:24:58,720 --> 00:25:01,520
Their group memberships, their role, and their tenant.

715
00:25:01,520 --> 00:25:03,160
Because it is cryptographically signed,

716
00:25:03,160 --> 00:25:04,960
nobody can forge or change the data.

717
00:25:04,960 --> 00:25:08,320
So the token stands as proof that says this is user A

718
00:25:08,320 --> 00:25:10,200
and here is what they are allowed to do.

719
00:25:10,200 --> 00:25:11,760
The obvious temptation is to just pass

720
00:25:11,760 --> 00:25:14,160
that raw token to every agent in the chain.

721
00:25:14,160 --> 00:25:16,520
You might think it makes sense to let each agent use it

722
00:25:16,520 --> 00:25:19,040
to prove the user's identity as they talk to one another.

723
00:25:19,040 --> 00:25:21,040
But in reality, that is a massive mistake.

724
00:25:21,040 --> 00:25:24,160
You should never pass raw user tokens directly to an LLM

725
00:25:24,160 --> 00:25:26,000
or share them freely between agents.

726
00:25:26,000 --> 00:25:28,320
The problem isn't that your agents are untrustworthy

727
00:25:28,320 --> 00:25:29,960
since these are your own systems,

728
00:25:29,960 --> 00:25:32,840
but rather that the token itself becomes a massive security

729
00:25:32,840 --> 00:25:33,680
liability.

730
00:25:33,680 --> 00:25:36,240
If an agent gets compromised or a prompt injection trick

731
00:25:36,240 --> 00:25:38,560
forces an agent to reveal what it knows,

732
00:25:38,560 --> 00:25:41,600
that raw token becomes the crown jewel for an attacker.

733
00:25:41,600 --> 00:25:44,000
It can be replayed in other systems or even sold,

734
00:25:44,000 --> 00:25:46,680
acting as a secret key that grants access to places

735
00:25:46,680 --> 00:25:48,000
the user never authorized.

736
00:25:48,000 --> 00:25:51,440
Instead, you should use OAuth 2.0 token exchange, which

737
00:25:51,440 --> 00:25:53,480
acts like a translation layer for identity.

738
00:25:53,480 --> 00:25:56,000
The gateway receives the user's original token

739
00:25:56,000 --> 00:25:57,840
and confirms that the person is real,

740
00:25:57,840 --> 00:25:59,120
but then it does something different.

741
00:25:59,120 --> 00:26:01,200
Instead of passing that original credential forward,

742
00:26:01,200 --> 00:26:03,840
the gateway creates a new, scoped token.

743
00:26:03,840 --> 00:26:06,040
This is a derivative token that tells agent 3

744
00:26:06,040 --> 00:26:09,400
it is authorized to act for user A for exactly 15 minutes

745
00:26:09,400 --> 00:26:12,000
and only for the purpose of retrieving documents.

746
00:26:12,000 --> 00:26:14,840
Every agent in your chain receives its own limited token.

747
00:26:14,840 --> 00:26:16,640
The router gets a token that only allows it

748
00:26:16,640 --> 00:26:19,280
to evaluate requests while the translator gets one

749
00:26:19,280 --> 00:26:21,920
that lets it shape queries without ever seeing a document.

750
00:26:21,920 --> 00:26:23,320
The retriever receives a token that

751
00:26:23,320 --> 00:26:27,240
says, "Act for user A with user A's specific access level."

752
00:26:27,240 --> 00:26:30,400
And the validator gets read only access to check the results.

753
00:26:30,400 --> 00:26:33,160
Finally, the generator gets no special access at all,

754
00:26:33,160 --> 00:26:35,760
since its only job is to process the text it receives.

755
00:26:35,760 --> 00:26:37,760
This is the core principle of delegation.

756
00:26:37,760 --> 00:26:39,880
Agent 3 isn't acting as a powerful service account

757
00:26:39,880 --> 00:26:42,480
with broad access, but is instead acting as a deputy

758
00:26:42,480 --> 00:26:44,560
with the exact same authority as the user.

759
00:26:44,560 --> 00:26:46,480
If user A is allowed to see the sales folder

760
00:26:46,480 --> 00:26:48,120
but is blocked from the finance folder,

761
00:26:48,120 --> 00:26:50,560
then Agent 3 will be blocked from finance as well.

762
00:26:50,560 --> 00:26:53,200
Let's look at a real scenario where user A works in sales

763
00:26:53,200 --> 00:26:55,680
and belongs to the North America and product feedback groups.

764
00:26:55,680 --> 00:26:57,280
When they ask the system a question,

765
00:26:57,280 --> 00:26:59,320
the retriever receives a scoped token

766
00:26:59,320 --> 00:27:01,840
that encodes those specific group memberships.

767
00:27:01,840 --> 00:27:03,520
The retriever then builds a filter

768
00:27:03,520 --> 00:27:06,320
that only shows documents where the ID matches user A

769
00:27:06,320 --> 00:27:08,960
or belongs to one of those three specific groups.

770
00:27:08,960 --> 00:27:11,120
The retriever happens under that strict constraint,

771
00:27:11,120 --> 00:27:13,600
so the user sees their sales and feedback documents

772
00:27:13,600 --> 00:27:16,680
but never catches a glimpse of HR or executive files.

773
00:27:16,680 --> 00:27:19,040
Handling group memberships is where the technical complexity

774
00:27:19,040 --> 00:27:19,960
usually kicks in.

775
00:27:19,960 --> 00:27:22,320
You generally have two choices for how to handle this.

776
00:27:22,320 --> 00:27:24,080
First, you can resolve groups at the moment

777
00:27:24,080 --> 00:27:26,480
over authentication by fetching everything from EntraID

778
00:27:26,480 --> 00:27:27,840
and putting it in the token.

779
00:27:27,840 --> 00:27:30,400
This is fast, but if a user joins a new group,

780
00:27:30,400 --> 00:27:31,880
the token won't show that change

781
00:27:31,880 --> 00:27:33,800
until they log out and log back in.

782
00:27:33,800 --> 00:27:35,720
The second option is to cache group memberships

783
00:27:35,720 --> 00:27:36,960
with a short lifespan,

784
00:27:36,960 --> 00:27:39,360
meaning the retriever refreshes the list every few minutes

785
00:27:39,360 --> 00:27:41,160
so changes propagate quickly.

786
00:27:41,160 --> 00:27:43,680
There is also a technical hurdle called group overage

787
00:27:43,680 --> 00:27:45,080
that you need to watch out for.

788
00:27:45,080 --> 00:27:47,800
If a user belongs to more than 150 groups,

789
00:27:47,800 --> 00:27:51,280
EntraID cannot fit all those IDs into a standard token.

790
00:27:51,280 --> 00:27:52,280
Once you hit that ceiling,

791
00:27:52,280 --> 00:27:54,560
the system has to choose between dropping group info

792
00:27:54,560 --> 00:27:55,960
or making extra API calls.

793
00:27:55,960 --> 00:27:58,200
This matters because if the group information is missing,

794
00:27:58,200 --> 00:28:00,280
your authorization filter won't know which documents

795
00:28:00,280 --> 00:28:01,320
to show the user.

796
00:28:01,320 --> 00:28:04,080
This identity model is the foundation of the whole system.

797
00:28:04,080 --> 00:28:06,640
It ensures that every agent acts with the right authority

798
00:28:06,640 --> 00:28:09,440
and it keeps the user's actual credentials safe.

799
00:28:09,440 --> 00:28:11,560
Why policy can't live in the prompt?

800
00:28:11,560 --> 00:28:13,760
This is where the architecture has to make a hard choice

801
00:28:13,760 --> 00:28:15,840
about where authority actually lives.

802
00:28:15,840 --> 00:28:18,040
When teams start building access controls for Rags,

803
00:28:18,040 --> 00:28:20,000
they usually have the same intuitive idea.

804
00:28:20,000 --> 00:28:22,200
They know the LLM understands language and context

805
00:28:22,200 --> 00:28:24,120
so they try to put the authorization rules

806
00:28:24,120 --> 00:28:25,360
right in the system prompt.

807
00:28:25,360 --> 00:28:27,360
They tell the model to only show public documents

808
00:28:27,360 --> 00:28:29,680
and to never reveal anything tagged as confidential.

809
00:28:29,680 --> 00:28:30,920
It sounds like a reasonable plan

810
00:28:30,920 --> 00:28:33,240
because the model is smart and follows instructions.

811
00:28:33,240 --> 00:28:35,160
But the truth is, it won't work reliably.

812
00:28:35,160 --> 00:28:36,520
The reason is simple.

813
00:28:36,520 --> 00:28:39,560
LLMs do not enforce rules, they follow instructions.

814
00:28:39,560 --> 00:28:41,680
Instructions can be overridden or bypassed

815
00:28:41,680 --> 00:28:45,080
through prompt injection and a user can easily craft a prompt

816
00:28:45,080 --> 00:28:47,080
that conflicts with your original setup.

817
00:28:47,080 --> 00:28:49,040
If a retrieved document contains text

818
00:28:49,040 --> 00:28:50,440
that contradicts your rules,

819
00:28:50,440 --> 00:28:52,840
the model doesn't have a way to prioritize one over the other.

820
00:28:52,840 --> 00:28:54,200
It just looks at all the information

821
00:28:54,200 --> 00:28:56,080
and makes a probabilistic guess,

822
00:28:56,080 --> 00:28:59,000
which means it will eventually violate your security rules.

823
00:28:59,000 --> 00:29:01,040
This isn't a problem with the quality of the model.

824
00:29:01,040 --> 00:29:03,000
Even the most advanced models fail at this

825
00:29:03,000 --> 00:29:04,640
because you are asking them to do something

826
00:29:04,640 --> 00:29:06,840
that should be handled entirely outside of the AI.

827
00:29:06,840 --> 00:29:08,960
The real security requires a policy decision point

828
00:29:08,960 --> 00:29:11,440
or a PDP, which sits outside the model.

829
00:29:11,440 --> 00:29:12,800
This is a deterministic engine

830
00:29:12,800 --> 00:29:14,360
that looks at an authorization request

831
00:29:14,360 --> 00:29:16,400
and gives a simple yes or no answer.

832
00:29:16,400 --> 00:29:18,480
It isn't making a suggestion or a guess,

833
00:29:18,480 --> 00:29:20,480
but is instead making a rule-based decision

834
00:29:20,480 --> 00:29:22,320
about whether a specific user is allowed

835
00:29:22,320 --> 00:29:24,320
to see a specific document right now.

836
00:29:24,320 --> 00:29:28,160
The PDP takes in structured data like the user's ID and roles,

837
00:29:28,160 --> 00:29:31,120
along with the document's classification and sensitivity level.

838
00:29:31,120 --> 00:29:32,320
It also looks at the context

839
00:29:32,320 --> 00:29:34,360
such as whether the user is a contractor

840
00:29:34,360 --> 00:29:36,960
or if they are logging in from a high-risk location.

841
00:29:36,960 --> 00:29:38,960
It evaluates all these factors against your rules

842
00:29:38,960 --> 00:29:40,680
and returns a final decision.

843
00:29:40,680 --> 00:29:43,160
These rules are very direct like finance employees

844
00:29:43,160 --> 00:29:45,680
can see budgets, but contractors cannot.

845
00:29:45,680 --> 00:29:47,960
You might have a rule that denies access

846
00:29:47,960 --> 00:29:50,280
if a user tries to view a confidential file

847
00:29:50,280 --> 00:29:52,360
from a VPN outside of office hours.

848
00:29:52,360 --> 00:29:54,160
The PDP doesn't care about your prompt

849
00:29:54,160 --> 00:29:55,720
or which LLM you are using

850
00:29:55,720 --> 00:29:58,240
because it only cares about the rules and the requests.

851
00:29:58,240 --> 00:30:00,720
Crucially, the PDP itself is not an LLM.

852
00:30:00,720 --> 00:30:04,000
It is a dedicated rule engine like openFGA or turbos

853
00:30:04,000 --> 00:30:06,160
that is designed to be auditable and consistent.

854
00:30:06,160 --> 00:30:08,000
You can see exactly which rule was triggered,

855
00:30:08,000 --> 00:30:09,120
you can trace every decision

856
00:30:09,120 --> 00:30:10,320
and you can update your policies

857
00:30:10,320 --> 00:30:12,160
without ever touching your code.

858
00:30:12,160 --> 00:30:14,800
In a real workflow, the retriever agent calls the PDP

859
00:30:14,800 --> 00:30:16,520
before it ever returns a document.

860
00:30:16,520 --> 00:30:18,160
The retriever might find 10 documents

861
00:30:18,160 --> 00:30:19,440
that match the user's query,

862
00:30:19,440 --> 00:30:22,120
but it asks the PDP for permission before moving forward.

863
00:30:22,120 --> 00:30:25,000
If the PDP says no to five of those documents,

864
00:30:25,000 --> 00:30:26,280
they are dropped immediately.

865
00:30:26,280 --> 00:30:28,760
Only the approved documents make it into the context window

866
00:30:28,760 --> 00:30:30,120
for the LLM to see.

867
00:30:30,120 --> 00:30:31,600
If you use Azure AI search,

868
00:30:31,600 --> 00:30:34,400
this logic is often built directly into the query filter.

869
00:30:34,400 --> 00:30:36,720
You aren't pulling a bunch of data and filtering it later,

870
00:30:36,720 --> 00:30:39,880
but are instead ensuring the database only scores documents

871
00:30:39,880 --> 00:30:41,880
that the user is actually allowed to see.

872
00:30:41,880 --> 00:30:43,080
The separation is how you move

873
00:30:43,080 --> 00:30:45,840
from probabilistic security to deterministic security.

874
00:30:45,840 --> 00:30:48,400
The LLM never has to make a judgment call on access

875
00:30:48,400 --> 00:30:50,560
and it never sees data that shouldn't be there.

876
00:30:50,560 --> 00:30:53,440
Everything happens in a component designed for rules and auditing.

877
00:30:53,440 --> 00:30:55,760
When you need to scale this across different business units

878
00:30:55,760 --> 00:30:57,680
or countries with different regulations,

879
00:30:57,680 --> 00:31:00,280
the PDP architecture is the only way to make it work.

880
00:31:00,280 --> 00:31:01,840
You can't rely on prompt engineering

881
00:31:01,840 --> 00:31:03,720
or hope the model remembers the rules.

882
00:31:03,720 --> 00:31:05,360
You need a separate system that knows the law

883
00:31:05,360 --> 00:31:07,440
and enforces it every single time.

884
00:31:07,440 --> 00:31:10,080
Why one leak can compromise everything?

885
00:31:10,080 --> 00:31:11,880
The problem we have been discussing involves

886
00:31:11,880 --> 00:31:13,680
enforcing permissions correctly

887
00:31:13,680 --> 00:31:15,320
and this challenge becomes catastrophic

888
00:31:15,320 --> 00:31:19,040
when you multiply it across dozens of customers or business units.

889
00:31:19,040 --> 00:31:20,520
In a multi-tenant system,

890
00:31:20,520 --> 00:31:24,160
a single architectural failure does not just leak one person's data

891
00:31:24,160 --> 00:31:26,440
but instead it leaks everyone's data at once.

892
00:31:26,440 --> 00:31:28,360
Imagine you are building a rag-powered

893
00:31:28,360 --> 00:31:31,440
knowledge management system for enterprise customers.

894
00:31:31,440 --> 00:31:34,240
Each client indexes their own documents into your platform

895
00:31:34,240 --> 00:31:36,640
and your user base includes healthcare providers,

896
00:31:36,640 --> 00:31:38,520
financial firms and manufacturers

897
00:31:38,520 --> 00:31:40,320
with sensitive supply chain data.

898
00:31:40,320 --> 00:31:43,400
These companies trust you with their most confidential information

899
00:31:43,400 --> 00:31:46,600
because they believe their data is isolated from their competitors.

900
00:31:46,600 --> 00:31:47,680
But here is the problem.

901
00:31:47,680 --> 00:31:49,120
Isolation is often an illusion.

902
00:31:49,120 --> 00:31:51,080
One of the first real cases of this breaking

903
00:31:51,080 --> 00:31:53,440
involved a semantic cash collision.

904
00:31:53,440 --> 00:31:56,480
Customer A ran a query about their annual strategy planning

905
00:31:56,480 --> 00:31:58,160
so the system retrieved the documents,

906
00:31:58,160 --> 00:32:01,640
generated an answer and cashed the response for speed.

907
00:32:01,640 --> 00:32:03,680
Two minutes later, customer B ran a query

908
00:32:03,680 --> 00:32:05,280
that was semantically similar

909
00:32:05,280 --> 00:32:07,520
and it matched the cash results so perfectly

910
00:32:07,520 --> 00:32:10,800
that the system returned customer A's private answer to customer B.

911
00:32:10,800 --> 00:32:13,200
This was not a breach of the raw documents themselves

912
00:32:13,200 --> 00:32:14,560
but it was a breach of intelligence

913
00:32:14,560 --> 00:32:16,920
that gave a competitor a window into a private strategy.

914
00:32:16,920 --> 00:32:19,560
It was an accident caused by a configuration mistake

915
00:32:19,560 --> 00:32:21,680
and it was also completely unrecoverable.

916
00:32:21,680 --> 00:32:23,400
The root cause was architectural

917
00:32:23,400 --> 00:32:26,120
because the vector database used a single shared index

918
00:32:26,120 --> 00:32:27,840
where all documents from every customer

919
00:32:27,840 --> 00:32:29,080
lived in the same namespace.

920
00:32:29,080 --> 00:32:31,040
Since the cash operated at the query level

921
00:32:31,040 --> 00:32:33,040
without accounting for tenant identity,

922
00:32:33,040 --> 00:32:36,200
the system only asked if anyone had run the query before

923
00:32:36,200 --> 00:32:39,400
rather than asking if this specific customer had run it.

924
00:32:39,400 --> 00:32:42,760
One missing constraint in how the cash key was constructed

925
00:32:42,760 --> 00:32:46,480
meant that confidential information started leaking between rivals.

926
00:32:46,480 --> 00:32:48,560
The blast radius of that mistake was absolute

927
00:32:48,560 --> 00:32:51,560
because one bug affected every tenant simultaneously.

928
00:32:51,560 --> 00:32:53,840
Every single query running through the cash system

929
00:32:53,840 --> 00:32:55,680
was potentially exposed to other users.

930
00:32:55,680 --> 00:32:57,760
You could not fix the issue for one tenant

931
00:32:57,760 --> 00:32:59,160
without fixing it for all of them

932
00:32:59,160 --> 00:33:00,760
and the worst part was that nobody knew

933
00:33:00,760 --> 00:33:02,640
how many times it had already happened.

934
00:33:02,640 --> 00:33:04,800
Gartner identifies this exact failure mode

935
00:33:04,800 --> 00:33:06,480
as one of the five recurring problems

936
00:33:06,480 --> 00:33:07,840
in multi-agent rag systems

937
00:33:07,840 --> 00:33:09,280
and they call it cross-tenant leakage.

938
00:33:09,280 --> 00:33:10,840
This is not a hypothetical theory.

939
00:33:10,840 --> 00:33:14,400
It is happening right now and organizations are discovering the floor

940
00:33:14,400 --> 00:33:16,920
only after a customer finds evidence of a leak.

941
00:33:16,920 --> 00:33:18,400
The technical fixes are clear

942
00:33:18,400 --> 00:33:20,000
and they start with the principle of using

943
00:33:20,000 --> 00:33:22,640
per-tenant encryption keys or separate indexes

944
00:33:22,640 --> 00:33:24,440
if your scale allows for it.

945
00:33:24,440 --> 00:33:26,920
A more common approach involves strong metadata filters

946
00:33:26,920 --> 00:33:29,560
enforced at the database layer with a hard constraint

947
00:33:29,560 --> 00:33:32,440
which means every single query must include a tenant ID check.

948
00:33:32,440 --> 00:33:34,400
This cannot be a suggestion or a soft filter

949
00:33:34,400 --> 00:33:36,040
that a developer can disable

950
00:33:36,040 --> 00:33:37,880
but must be a mandatory constraint

951
00:33:37,880 --> 00:33:40,920
that the database enforces before it even looks for documents.

952
00:33:40,920 --> 00:33:42,840
The constraint works because it is enforceable

953
00:33:42,840 --> 00:33:44,520
outside the application logic.

954
00:33:44,520 --> 00:33:47,760
If your database configuration is set to always filter by tenant

955
00:33:47,760 --> 00:33:50,360
there is no code path where a developer can forget to include it.

956
00:33:50,360 --> 00:33:52,040
It becomes architectural.

957
00:33:52,040 --> 00:33:53,360
But here is why this is urgent.

958
00:33:53,360 --> 00:33:55,760
The failure mode is not rare, it is systemic.

959
00:33:55,760 --> 00:33:58,280
If a healthcare SaaS company indexes patient records

960
00:33:58,280 --> 00:33:59,680
from 50 different hospitals

961
00:33:59,680 --> 00:34:02,640
and the permission metadata is lost during extraction

962
00:34:02,640 --> 00:34:06,480
the system treats those records as one giant pile of data.

963
00:34:06,480 --> 00:34:09,640
A query from hospital A could return patient data from hospital B

964
00:34:09,640 --> 00:34:11,280
not by accident but by design

965
00:34:11,280 --> 00:34:14,320
because the system never knew which hospital owned which record.

966
00:34:14,320 --> 00:34:16,480
The regulatory exposure here is crushing.

967
00:34:16,480 --> 00:34:19,200
HIPAA does not care if the leak was a simple configuration error

968
00:34:19,200 --> 00:34:21,080
it only cares that protected health information

969
00:34:21,080 --> 00:34:23,000
was accessed by the wrong person.

970
00:34:23,000 --> 00:34:25,680
GDPR will find you for data transfers between customers

971
00:34:25,680 --> 00:34:28,000
without authorization which makes your organization

972
00:34:28,000 --> 00:34:30,200
liable both technically and legally.

973
00:34:30,200 --> 00:34:31,960
Security trimming is not an optional feature

974
00:34:31,960 --> 00:34:33,800
you add when you have extra time.

975
00:34:33,800 --> 00:34:35,320
It is a compliance requirement

976
00:34:35,320 --> 00:34:38,040
that marks the difference between a system you can defend in court

977
00:34:38,040 --> 00:34:39,960
and one that is completely indefensible.

978
00:34:39,960 --> 00:34:41,480
The ungoverned archae problem

979
00:34:41,480 --> 00:34:43,920
the multi agent architecture identity propagation

980
00:34:43,920 --> 00:34:45,920
and policy engines we have talked about all assume

981
00:34:45,920 --> 00:34:48,200
you actually know where your rag systems are running.

982
00:34:48,200 --> 00:34:49,920
The assumption is that you have visibility

983
00:34:49,920 --> 00:34:51,440
and can enforce controls

984
00:34:51,440 --> 00:34:54,800
but for most organizations that assumption is completely false.

985
00:34:54,800 --> 00:34:57,760
Shadow AI happens when employees build their own AI systems

986
00:34:57,760 --> 00:35:01,360
without any formal approval, IT review or security assessment.

987
00:35:01,360 --> 00:35:03,040
A team member might decide they want to make

988
00:35:03,040 --> 00:35:04,680
the internal wiki searchable

989
00:35:04,680 --> 00:35:07,840
and instead of waiting 18 months for an enterprise project to start

990
00:35:07,840 --> 00:35:10,560
they spend a weekend with an open source framework.

991
00:35:10,560 --> 00:35:14,000
They grab a free pinecon account, index the entire company wiki

992
00:35:14,000 --> 00:35:15,720
and share the link with their team.

993
00:35:15,720 --> 00:35:19,280
Gartner predicts that 75% of employees will use unapproved AI tools

994
00:35:19,280 --> 00:35:22,200
by 2026 that is not just a few tech enthusiasts

995
00:35:22,200 --> 00:35:24,080
it is the vast majority of your workforce

996
00:35:24,080 --> 00:35:26,960
and every one of them represents a potential data leak.

997
00:35:26,960 --> 00:35:30,200
These shadow systems follow a predictable and dangerous pattern.

998
00:35:30,200 --> 00:35:33,200
You find hard coded API keys in Python scripts

999
00:35:33,200 --> 00:35:35,480
no access controls on the vector database

1000
00:35:35,480 --> 00:35:38,840
and absolutely no audit logging of who is asking what.

1001
00:35:38,840 --> 00:35:41,520
Because the person building it has broad internal access

1002
00:35:41,520 --> 00:35:44,160
they index everything from design docs to financial forecasts

1003
00:35:44,160 --> 00:35:46,360
into a flat corpus with no permission model.

1004
00:35:46,360 --> 00:35:48,280
The system has no idea what should be public

1005
00:35:48,280 --> 00:35:49,720
and what is a confidential strategy.

1006
00:35:49,720 --> 00:35:51,920
The problem gets worse when that system is shared.

1007
00:35:51,920 --> 00:35:54,800
It starts with the immediate team then moves to adjacent departments

1008
00:35:54,800 --> 00:35:57,000
and eventually someone mentions it in a Slack channel

1009
00:35:57,000 --> 00:36:00,440
where contractors see it, suddenly those contractors have query access

1010
00:36:00,440 --> 00:36:02,640
to your entire internal knowledge base

1011
00:36:02,640 --> 00:36:04,560
and they can get answers based on documents

1012
00:36:04,560 --> 00:36:06,400
that should never have left the building.

1013
00:36:06,400 --> 00:36:08,400
The system cannot enforce the obvious rule

1014
00:36:08,400 --> 00:36:11,040
that a contractor should only see public project materials

1015
00:36:11,040 --> 00:36:12,920
rather than internal financial data.

1016
00:36:12,920 --> 00:36:14,480
Because the system was a weekend project

1017
00:36:14,480 --> 00:36:17,040
there is no identity propagation or policy engine sitting

1018
00:36:17,040 --> 00:36:18,800
between the query and the results.

1019
00:36:18,800 --> 00:36:21,280
The vector database does not care that a contractor is different

1020
00:36:21,280 --> 00:36:24,920
from an executive so it just returns whatever documents match the search.

1021
00:36:24,920 --> 00:36:27,280
Since this is a shadow system flying under the radar

1022
00:36:27,280 --> 00:36:29,520
nobody in security even knows it exists.

1023
00:36:29,520 --> 00:36:31,200
There is no audit log to check

1024
00:36:31,200 --> 00:36:34,160
and no way to detect that sensitive information is being exposed

1025
00:36:34,160 --> 00:36:35,160
to the wrong people.

1026
00:36:35,160 --> 00:36:36,840
The breach happens silently and repeatedly

1027
00:36:36,840 --> 00:36:39,160
until a customer eventually notices their private data

1028
00:36:39,160 --> 00:36:41,360
being quoted back to them by a third party.

1029
00:36:41,360 --> 00:36:44,200
The solution is not to ban AI tools because that is impossible

1030
00:36:44,200 --> 00:36:47,120
and only encourages employees to build more hidden systems.

1031
00:36:47,120 --> 00:36:49,280
Instead you have to inventory what is running

1032
00:36:49,280 --> 00:36:51,720
and classify each system by its risk level.

1033
00:36:51,720 --> 00:36:55,120
A shadow rag that only indexes public documentation is low risk

1034
00:36:55,120 --> 00:36:57,320
so you can just implement basic logging and move on.

1035
00:36:57,320 --> 00:37:00,040
However, if a shadow system has access to customer data

1036
00:37:00,040 --> 00:37:02,760
or proprietary IP it is a high risk threat

1037
00:37:02,760 --> 00:37:06,320
that you must shut down or migrate into a governed architecture immediately.

1038
00:37:06,320 --> 00:37:09,160
This is why zero trust thinking is operationally urgent.

1039
00:37:09,160 --> 00:37:11,920
It is not just about building the right architecture for new projects

1040
00:37:11,920 --> 00:37:15,800
but about finding the systems that are already running without your permission.

1041
00:37:15,800 --> 00:37:18,480
Never trust always verify at every hop.

1042
00:37:18,480 --> 00:37:21,200
Zero trust isn't a new buzzword in the security world.

1043
00:37:21,200 --> 00:37:25,240
The military, financial institutions and critical infrastructure providers

1044
00:37:25,240 --> 00:37:27,760
have used this model for years because their stakes are too high

1045
00:37:27,760 --> 00:37:29,120
to assume anyone is safe.

1046
00:37:29,120 --> 00:37:32,520
They operate on the reality that you can't trust a user or a device

1047
00:37:32,520 --> 00:37:35,040
just because they manage to get onto your internal network.

1048
00:37:35,040 --> 00:37:36,240
But here's the problem.

1049
00:37:36,240 --> 00:37:40,600
Most people think a firewall is security but in reality a perimeter is just a shell.

1050
00:37:40,600 --> 00:37:44,120
The moment you assume a component is safe because it sits behind that shell

1051
00:37:44,120 --> 00:37:47,560
you've created a massive vulnerability for an attacker to exploit.

1052
00:37:47,560 --> 00:37:50,680
Zero trust rejects that assumption entirely by requiring you to verify

1053
00:37:50,680 --> 00:37:54,960
every single interaction, every request and every piece of data moving between systems.

1054
00:37:54,960 --> 00:37:57,120
There are no shortcuts and there is no implicit trust.

1055
00:37:57,120 --> 00:38:00,960
Now apply that same diagnostic thinking to your multi agent rag system.

1056
00:38:00,960 --> 00:38:02,080
Most teams don't do this.

1057
00:38:02,080 --> 00:38:05,200
They build a specialized architecture we've been talking about with five different agents

1058
00:38:05,200 --> 00:38:08,520
and then assume those agents are safe because they own the whole stack.

1059
00:38:08,520 --> 00:38:12,280
Since everything is running on the same infrastructure and part of the same project

1060
00:38:12,280 --> 00:38:13,840
they skip the hard security work.

1061
00:38:13,840 --> 00:38:17,520
They pass tokens freely and trust that if an agent has a capability

1062
00:38:17,520 --> 00:38:18,920
it should be allowed to use it.

1063
00:38:18,920 --> 00:38:21,920
That's the exact assumption zero trust is designed to break.

1064
00:38:21,920 --> 00:38:24,760
In a zero trust agent system the rule is absolute.

1065
00:38:24,760 --> 00:38:29,280
Every agent, every retrieval and every tool call must be authenticated and authorised independently.

1066
00:38:29,280 --> 00:38:30,880
This doesn't just happen at the front door.

1067
00:38:30,880 --> 00:38:32,680
It happens at every single hop in the chain.

1068
00:38:32,680 --> 00:38:35,120
Here is how that looks when the system is actually running.

1069
00:38:35,120 --> 00:38:36,960
It starts with agent one, the router.

1070
00:38:36,960 --> 00:38:40,200
When a user sends a request the router doesn't just pass it along.

1071
00:38:40,200 --> 00:38:42,880
It verifies the session, checks if the token is valid

1072
00:38:42,880 --> 00:38:44,640
and looks at the user's risk posture.

1073
00:38:44,640 --> 00:38:47,560
It asks if they are logging in from a weird location

1074
00:38:47,560 --> 00:38:51,080
or if they've recently changed their password and only after gathering all that evidence

1075
00:38:51,080 --> 00:38:53,280
does it decide if the request is legitimate.

1076
00:38:53,280 --> 00:38:56,000
If the verification fails the process stops right there.

1077
00:38:56,000 --> 00:38:58,760
If it passes agent two receives a scope token.

1078
00:38:58,760 --> 00:39:03,280
This is a critical distinction because that token doesn't give agent two the keys to the kingdom

1079
00:39:03,280 --> 00:39:05,640
or the ability to call external APIs.

1080
00:39:05,640 --> 00:39:08,320
It grants the authority to do exactly one thing.

1081
00:39:08,320 --> 00:39:11,120
Transform natural language into a structured query.

1082
00:39:11,120 --> 00:39:15,360
When agent two is done it passes that query and a fresh token to agent three

1083
00:39:15,360 --> 00:39:18,360
but it never passes its own credentials or raw user data.

1084
00:39:18,360 --> 00:39:21,640
It only passes a token representing the user's delegated authority

1085
00:39:21,640 --> 00:39:23,200
for that specific next step.

1086
00:39:23,200 --> 00:39:27,120
Then agent three takes over the token it receives says it can act for a specific user

1087
00:39:27,120 --> 00:39:30,480
to retrieve documents that the user is permitted to see and nothing more.

1088
00:39:30,480 --> 00:39:35,000
It can't modify files or write logs because its scope is strictly bounded by the token's signature.

1089
00:39:35,000 --> 00:39:38,320
Agent three checks that signature for cryptographic proof of its source,

1090
00:39:38,320 --> 00:39:40,240
pulls the documents that pass the filters

1091
00:39:40,240 --> 00:39:43,400
and then hands those documents and a new token to agent four.

1092
00:39:43,400 --> 00:39:45,000
Next agent four validates the work.

1093
00:39:45,000 --> 00:39:48,400
It's token only allows read only access to document metadata

1094
00:39:48,400 --> 00:39:50,600
so it can check if the results actually make sense.

1095
00:39:50,600 --> 00:39:53,760
It can't retrieve more documents or override previous authorizations

1096
00:39:53,760 --> 00:39:56,560
and if it rejects the results the flow stops immediately

1097
00:39:56,560 --> 00:39:59,680
no documents ever reach the model of the validator says no.

1098
00:39:59,680 --> 00:40:02,040
Finally agent five generates the response.

1099
00:40:02,040 --> 00:40:05,280
By this point the token it holds only allows it to work with context

1100
00:40:05,280 --> 00:40:07,880
that has already been pre-validated and pre-authorized.

1101
00:40:07,880 --> 00:40:10,480
It can't go off script to call APIs or fetch new data

1102
00:40:10,480 --> 00:40:12,480
because the token is intentionally narrow

1103
00:40:12,480 --> 00:40:15,200
every single one of these agents logs exactly what happened.

1104
00:40:15,200 --> 00:40:18,040
They record who called them, what tokens were exchanged

1105
00:40:18,040 --> 00:40:20,680
and whether a request was approved or blocked.

1106
00:40:20,680 --> 00:40:23,440
This creates a full audit trail so that if something breaks

1107
00:40:23,440 --> 00:40:26,480
you can trace the chain and see exactly where the failure happened.

1108
00:40:26,480 --> 00:40:29,800
To make this work you need specific technology like mutual TLS.

1109
00:40:29,800 --> 00:40:31,520
This ensures that both the calling agent

1110
00:40:31,520 --> 00:40:34,440
and the receiving agent verify each other's identity cryptographically

1111
00:40:34,440 --> 00:40:37,200
before a single byte of data is exchanged.

1112
00:40:37,200 --> 00:40:40,560
Every message is signed with a private key to prove it hasn't been tampered with

1113
00:40:40,560 --> 00:40:44,040
and every agent has its own unique identity in the certificate system.

1114
00:40:44,040 --> 00:40:45,840
This does add a layer of complexity.

1115
00:40:45,840 --> 00:40:48,160
You have to maintain the certificate infrastructure

1116
00:40:48,160 --> 00:40:51,400
and validating tokens at every hop adds latency to the system.

1117
00:40:51,400 --> 00:40:54,880
In fact network overhead from MTLS handshakes could increase your latency

1118
00:40:54,880 --> 00:40:58,000
by 30% compared to a basic rag setup.

1119
00:40:58,000 --> 00:40:59,600
That is a real cost you have to weigh.

1120
00:40:59,600 --> 00:41:01,400
But the security gain is absolute.

1121
00:41:01,400 --> 00:41:03,000
No agent can pretend to be another

1122
00:41:03,000 --> 00:41:05,960
and no request can bypass the verification chain.

1123
00:41:05,960 --> 00:41:09,680
If one agent is compromised the damage is trapped within that agent's narrow scope.

1124
00:41:09,680 --> 00:41:12,400
The architecture is built to assume a breach will happen

1125
00:41:12,400 --> 00:41:14,440
and contain it before it spreads.

1126
00:41:14,440 --> 00:41:16,120
Detecting when something goes wrong.

1127
00:41:16,120 --> 00:41:19,640
Even the most perfect architecture has to assume that something will eventually fail.

1128
00:41:19,640 --> 00:41:23,480
Whether it's a compromised agent, a malicious prompt that slips past validation

1129
00:41:23,480 --> 00:41:27,120
or a simple configuration error, the goal isn't just to prevent failures.

1130
00:41:27,120 --> 00:41:30,120
The goal is to detect them fast enough to stop the bleeding.

1131
00:41:30,120 --> 00:41:32,560
Behavioral monitoring is your early warning system.

1132
00:41:32,560 --> 00:41:36,440
You start by establishing a baseline for what normal looks like in your system.

1133
00:41:36,440 --> 00:41:39,160
You track how many documents agent three usually pulls,

1134
00:41:39,160 --> 00:41:43,240
how long a retrieval takes, and how often agent four rejects a result.

1135
00:41:43,240 --> 00:41:48,120
By looking at historical data, you build a profile of how the agents should behave during a typical day.

1136
00:41:48,120 --> 00:41:50,760
Then you watch for the moment things deviate.

1137
00:41:50,760 --> 00:41:54,440
If agent three suddenly pulls a thousand documents instead of the usual five,

1138
00:41:54,440 --> 00:41:55,680
that's a massive red flag.

1139
00:41:55,680 --> 00:41:59,520
It could be a broad query, a malfunction, or an attacker trying to scrape your data,

1140
00:41:59,520 --> 00:42:00,960
but the reason doesn't matter yet.

1141
00:42:00,960 --> 00:42:04,640
What matters is that the pattern is broken and the system needs to react.

1142
00:42:04,640 --> 00:42:08,600
The same applies if agent four's rejection rate jumps from 1% to 30%.

1143
00:42:08,600 --> 00:42:12,080
This tells you that something the retriever is sending back no longer makes sense.

1144
00:42:12,080 --> 00:42:15,880
Either the retrieval logic is broken, or the documents themselves have become problematic,

1145
00:42:15,880 --> 00:42:18,120
but either way, the anomaly needs an answer.

1146
00:42:18,120 --> 00:42:20,800
You also watch for agents calling tools they've never used,

1147
00:42:20,800 --> 00:42:23,360
or making requests from unknown IP addresses.

1148
00:42:23,360 --> 00:42:26,760
If an agent's latency spikes from 200 milliseconds to five seconds,

1149
00:42:26,760 --> 00:42:28,920
that's a behavioral shift worth investigating.

1150
00:42:28,920 --> 00:42:30,880
Your response framework should have layers.

1151
00:42:30,880 --> 00:42:33,440
Low-level issues might just trigger an alert for the ops team,

1152
00:42:33,440 --> 00:42:36,640
but high-savarity anomalies should trigger an automated kill switch.

1153
00:42:36,640 --> 00:42:39,200
The system can disable the agent, root traffic around it,

1154
00:42:39,200 --> 00:42:41,880
and save all the evidence for a forensic investigation.

1155
00:42:41,880 --> 00:42:45,880
Let's look at a real scenario where agent three retrieves a thousand documents.

1156
00:42:45,880 --> 00:42:47,760
Because this is outside the baseline,

1157
00:42:47,760 --> 00:42:51,680
the system flags it immediately without waiting for a human to check a dashboard.

1158
00:42:51,680 --> 00:42:54,720
It logs the details and escalates the issue to a security engineer

1159
00:42:54,720 --> 00:42:58,680
who can see the user, the query, and exactly which documents were pulled.

1160
00:42:58,680 --> 00:43:01,840
If it looks like a real threat, the engineer can roll back the results,

1161
00:43:01,840 --> 00:43:04,160
clear the caches, and notify the right people.

1162
00:43:04,160 --> 00:43:06,560
The logging requirements for this are absolute.

1163
00:43:06,560 --> 00:43:09,480
Every retrieval must record the user ID, the agent ID,

1164
00:43:09,480 --> 00:43:11,680
and the specific document IDs that were returned.

1165
00:43:11,680 --> 00:43:16,080
You need the timestamp and the authorization decision to see if the user passed or failed the check.

1166
00:43:16,080 --> 00:43:18,360
If an agent had to try multiple times,

1167
00:43:18,360 --> 00:43:21,520
every single attempt needs to be in that forensic trail.

1168
00:43:21,520 --> 00:43:22,520
But here is the catch.

1169
00:43:22,520 --> 00:43:23,920
Logging itself is a risk.

1170
00:43:23,920 --> 00:43:27,920
A log file shows exactly which users are looking at which sensitive documents,

1171
00:43:27,920 --> 00:43:32,320
and if those logs are stolen, they become a map of your organization's most private data.

1172
00:43:32,320 --> 00:43:35,280
That metadata is just as sensitive as the documents themselves.

1173
00:43:35,280 --> 00:43:38,760
Because of that, you have to protect your logs as carefully as your primary data.

1174
00:43:38,760 --> 00:43:43,520
You should use Redaction or Encryption, so the log shows the decision but hides the full document title.

1175
00:43:43,520 --> 00:43:47,720
You can log a document ID and keep the actual title in a separate encrypted mapping.

1176
00:43:47,720 --> 00:43:52,000
Logs should be stored in a tamper-evident system where any attempt to delete them triggers an alarm.

1177
00:43:52,000 --> 00:43:56,280
In a real implementation, you'd use tools like Splunk or DataDog to ingest this telemetry.

1178
00:43:56,280 --> 00:44:00,640
You define the rules, such as firing an alert if retrieval volume hits a certain threshold

1179
00:44:00,640 --> 00:44:04,000
or if a user starts querying at odd hours from a new location.

1180
00:44:04,000 --> 00:44:07,800
The monitoring system correlates these signals to find this suspicious activity.

1181
00:44:07,800 --> 00:44:12,280
This is where your zero-trust architecture meets the reality of daily operations.

1182
00:44:12,280 --> 00:44:15,480
Technical controls will stop most failures, but for the ones that slip through,

1183
00:44:15,480 --> 00:44:17,520
detection and response are your last line of defense.

1184
00:44:17,520 --> 00:44:21,360
And if this shift in how you think about AI security was helpful, follow me,

1185
00:44:21,360 --> 00:44:23,760
Mirko Peters, on LinkedIn for more.

1186
00:44:23,760 --> 00:44:27,040
If you want to see more content like this, leave a review so other people can find it

1187
00:44:27,040 --> 00:44:31,520
and definitely share this with your team if you're building these systems right now.

1188
00:44:31,520 --> 00:44:34,440
Building this is hard, not building it is harder.

1189
00:44:34,440 --> 00:44:36,720
Let's be honest about what we're asking you to build here.

1190
00:44:36,720 --> 00:44:40,880
Everything we've covered, the multi-agent architecture, identity propagation,

1191
00:44:40,880 --> 00:44:45,720
permission-aware retrieval and policy point sitting outside the model, is not a weekend project.

1192
00:44:45,720 --> 00:44:49,520
This isn't a basic RAC system with some access control bolted on at the end.

1193
00:44:49,520 --> 00:44:53,960
It is a security first information system that just happens to use LLMs as the interface.

1194
00:44:53,960 --> 00:44:55,280
The time investment is real.

1195
00:44:55,280 --> 00:44:59,160
If you're in a mature organization with solid identity infrastructure and clear governance,

1196
00:44:59,160 --> 00:45:02,160
you can probably design and deploy this in three to six months.

1197
00:45:02,160 --> 00:45:04,920
But if you're dealing with legacy systems, fragmented data,

1198
00:45:04,920 --> 00:45:10,240
or security teams new to fine-grained authorization, you should probably add another six months to that estimate.

1199
00:45:10,240 --> 00:45:13,880
That timeline also assumes you have the right people available from start to finish.

1200
00:45:13,880 --> 00:45:18,360
Most organizations don't actually do it right the first time because they follow the path of least resistance.

1201
00:45:18,360 --> 00:45:23,640
They start with a basic, single-agent RAC to show value and get buy-in from stakeholders quickly.

1202
00:45:23,640 --> 00:45:29,680
Then, around month four or five, the security team asks the obvious question about how permissions are being enforced.

1203
00:45:29,680 --> 00:45:33,720
And the honest answer is usually that they aren't being enforced in any systematic way.

1204
00:45:33,720 --> 00:45:35,920
At that point, you're stuck with a difficult choice.

1205
00:45:35,920 --> 00:45:41,320
You can shut the system down and redesign it properly or you can try to retrofit security trimming onto what you've already built.

1206
00:45:41,320 --> 00:45:42,640
Both options are expensive.

1207
00:45:42,640 --> 00:45:46,960
Shutting down kills your credibility and momentum with stakeholders who wanted this work in yesterday.

1208
00:45:46,960 --> 00:45:54,120
Retrofitting creates massive technical debt because you're trying to force permission layers into a system that was never designed to carry them.

1209
00:45:54,120 --> 00:45:57,440
A financial services firm went through this exact nightmare recently.

1210
00:45:57,440 --> 00:46:03,960
They spent $200,000 building a basic RAC system for their trading desk that indexed equity research and market data.

1211
00:46:03,960 --> 00:46:11,640
It was fast and accurate, but then compliance asked what happens if a junior trader accesses strategy documents meant only for portfolio managers.

1212
00:46:11,640 --> 00:46:16,240
That was the moment they realized their fast solution was actually a massive liability.

1213
00:46:16,240 --> 00:46:18,240
Fixing it took $500,000.

1214
00:46:18,240 --> 00:46:24,080
They had to build a new metadata extraction pipeline, redesigned the retrieval logic and create an entirely new index structure.

1215
00:46:24,080 --> 00:46:28,480
If they had spent $600,000 upfront to do it right, they would have saved a fortune.

1216
00:46:28,480 --> 00:46:33,680
Instead, they spent $200,000 on the wrong thing and then $500,000 more just to fix their mistakes.

1217
00:46:33,680 --> 00:46:34,880
That isn't an unusual story.

1218
00:46:34,880 --> 00:46:37,280
In fact, it's the typical path for most companies.

1219
00:46:37,280 --> 00:46:39,680
The staffing problem is another hurdle you have to clear.

1220
00:46:39,680 --> 00:46:43,880
You can't build a system like this with just machine learning engineers and database experts.

1221
00:46:43,880 --> 00:46:50,280
You need security architects who understand identity systems and engineers who are comfortable with OAuth, Sammell and Authorization frameworks.

1222
00:46:50,280 --> 00:46:55,480
You also need policy experts who can translate complex business rules into machine readable controls.

1223
00:46:55,480 --> 00:46:58,880
These are not junior roles and these people are not in unlimited supply.

1224
00:46:58,880 --> 00:47:02,080
If you're trying to hire these specialists while you're already building the system,

1225
00:47:02,080 --> 00:47:04,880
you're adding at least six months to your project timeline.

1226
00:47:04,880 --> 00:47:08,880
Organizational alignment is where most of these efforts actually fall apart.

1227
00:47:08,880 --> 00:47:12,880
Building this requires teams to work together that usually stay in their own silos.

1228
00:47:12,880 --> 00:47:14,880
Security has to sign off on the architecture.

1229
00:47:14,880 --> 00:47:17,280
Compliance has to verify the regulatory requirements.

1230
00:47:17,280 --> 00:47:19,880
And IT has to maintain the identity tokens.

1231
00:47:19,880 --> 00:47:23,280
Even the business teams have to accept that some requests will be slower

1232
00:47:23,280 --> 00:47:25,080
because Authorization adds latency.

1233
00:47:25,080 --> 00:47:29,880
If any of those groups treats this as someone else's problem, the system will fail immediately.

1234
00:47:29,880 --> 00:47:32,480
The business case for doing this ride sounds a bit painful at first.

1235
00:47:32,480 --> 00:47:33,480
It costs more upfront.

1236
00:47:33,480 --> 00:47:37,480
It takes longer to ship and it requires hiring expensive people you don't currently have on staff.

1237
00:47:37,480 --> 00:47:41,280
But that investment prevents a $10 million breach and avoids regulatory fines

1238
00:47:41,280 --> 00:47:43,280
that could easily reach into eight figures.

1239
00:47:43,280 --> 00:47:47,480
It keeps your company from becoming the primary example in the next big compliance case study.

1240
00:47:47,480 --> 00:47:50,280
That is the only argument that actually works with leadership.

1241
00:47:50,280 --> 00:47:53,480
The alternative is deploying a system that silently leaks data for years

1242
00:47:53,480 --> 00:47:57,280
until a customer notices their confidential info being quoted by a competitor.

1243
00:47:57,280 --> 00:48:01,080
That scenario costs exponentially more when you factor in the lawsuits,

1244
00:48:01,080 --> 00:48:04,080
the investigations, and the permanent hit to your reputation.

1245
00:48:04,080 --> 00:48:06,680
You aren't saving money by cutting corners today.

1246
00:48:06,680 --> 00:48:09,680
You're just deferring those costs until they become catastrophic.

1247
00:48:09,680 --> 00:48:13,480
The honest assessment is that getting this ride requires a massive investment.

1248
00:48:13,480 --> 00:48:17,680
But that investment is always proportional to the risk you're trying to manage.

1249
00:48:17,680 --> 00:48:19,880
The ecosystem for permission aware rag.

1250
00:48:19,880 --> 00:48:24,280
Now we can move from the abstract architecture to the actual tools you'll use to build this.

1251
00:48:24,280 --> 00:48:28,280
The honest truth is that no single vendor has solved this end-to-end yet.

1252
00:48:28,280 --> 00:48:31,280
You're going to have to stitch together multiple pieces to make it work.

1253
00:48:31,280 --> 00:48:35,880
If you're already in the Microsoft 365 ecosystem, you should probably start with Azure AI search.

1254
00:48:35,880 --> 00:48:39,680
It has native support for pulling SharePoint permissions during the ingestion process.

1255
00:48:39,680 --> 00:48:42,680
There is a specific parameter called indexer permission options

1256
00:48:42,680 --> 00:48:46,880
that tells the system to grab user and group IDs directly from the documents.

1257
00:48:46,880 --> 00:48:49,880
Those IDs become metadata fields in your index,

1258
00:48:49,880 --> 00:48:52,280
which allows you to add filters to your queries.

1259
00:48:52,280 --> 00:48:56,280
This ensures the results only show documents the current user is actually allowed to see.

1260
00:48:56,280 --> 00:49:00,280
It's a clean approach because the tool was built for this specific permission model.

1261
00:49:00,280 --> 00:49:03,480
Microsoft Graph is the engine that makes that whole process possible.

1262
00:49:03,480 --> 00:49:06,880
Graph is where you pull the permission metadata from in the first place,

1263
00:49:06,880 --> 00:49:10,480
and it's the primary way to see which users have access to which files.

1264
00:49:10,480 --> 00:49:14,680
It's also the system you call to check group memberships when a user submits a query.

1265
00:49:14,680 --> 00:49:18,680
You authenticate your agent through Graph and it returns the group info in the token.

1266
00:49:18,680 --> 00:49:23,280
The whole architecture hinges on Graph working correctly and your organization using it consistently.

1267
00:49:23,280 --> 00:49:25,080
When it comes to the policy decision point,

1268
00:49:25,080 --> 00:49:27,480
the part that decides if a user can see a file,

1269
00:49:27,480 --> 00:49:29,880
open FGA and service are the industry standards.

1270
00:49:29,880 --> 00:49:34,480
These are fine-grained authorization services where you write rules in a native language.

1271
00:49:34,480 --> 00:49:37,280
You ask the service if user A can see document B,

1272
00:49:37,280 --> 00:49:39,080
and it gives you a simple yes or no.

1273
00:49:39,080 --> 00:49:41,480
These tools aren't tied to any specific cloud provider,

1274
00:49:41,480 --> 00:49:43,880
so they're great if you're working in a multi-cloud environment.

1275
00:49:43,880 --> 00:49:45,280
For the actual agent orchestration,

1276
00:49:45,280 --> 00:49:47,880
Lang Graph is the framework that's currently leading the way.

1277
00:49:47,880 --> 00:49:52,480
It supports multi-agent workflows where identity is passed explicitly from one step to the next.

1278
00:49:52,480 --> 00:49:54,880
You can build that five agent model we talked about

1279
00:49:54,880 --> 00:49:57,080
and attach permission checks to every single hop.

1280
00:49:57,080 --> 00:50:00,680
Lang Graph handles the plumbing like token exchange and credential generation

1281
00:50:00,680 --> 00:50:03,080
which lets you build the system systematically.

1282
00:50:03,080 --> 00:50:06,480
If you need your index to stay in sync with permission changes in real time,

1283
00:50:06,480 --> 00:50:07,880
you should look at pathway.

1284
00:50:07,880 --> 00:50:10,280
Most systems rely on scheduled re-indexes,

1285
00:50:10,280 --> 00:50:12,480
but pathway handles continuous synchronization.

1286
00:50:12,480 --> 00:50:16,480
This is vital in environments where permissions change every few minutes

1287
00:50:16,480 --> 00:50:19,880
and you can't afford to show someone a document they just lost access to.

1288
00:50:19,880 --> 00:50:23,880
Vector databases themselves have very different levels of support for this kind of work.

1289
00:50:23,880 --> 00:50:27,880
Weeveate, Pinecone and QDrant all allow for metadata filtering,

1290
00:50:27,880 --> 00:50:30,080
but they differ in how they enforce those rules.

1291
00:50:30,080 --> 00:50:32,480
Some let you configure access at the collection level

1292
00:50:32,480 --> 00:50:36,280
while others force you to handle the entire authorization model in your application code.

1293
00:50:36,280 --> 00:50:40,080
You need to know exactly how your database handles these controls before you commit to it.

1294
00:50:40,080 --> 00:50:44,480
The standard patent for most organizations is using Azure AI Search and Microsoft Graph

1295
00:50:44,480 --> 00:50:46,080
for their internal documents.

1296
00:50:46,080 --> 00:50:49,280
The integration is clean because you're staying within one ecosystem,

1297
00:50:49,280 --> 00:50:52,880
but for companies that use multiple clouds or non-Microsoft data sources,

1298
00:50:52,880 --> 00:50:55,680
the patent shifts toward using OpenFGA for policies

1299
00:50:55,680 --> 00:50:58,880
and a custom pipeline to pull permissions from various APIs.

1300
00:50:58,880 --> 00:51:02,080
Think about a healthcare organization that needs to index patient records

1301
00:51:02,080 --> 00:51:03,280
from several different hospitals.

1302
00:51:03,280 --> 00:51:08,280
They use Azure AI Search with HIPAA compliant settings where every record is encrypted at the document level.

1303
00:51:08,280 --> 00:51:12,680
Their access policies are written in OpenFGA to ensure doctors in hospital A

1304
00:51:12,680 --> 00:51:14,480
can only see their own patients.

1305
00:51:14,480 --> 00:51:17,080
When a query comes in, the system checks the user's role,

1306
00:51:17,080 --> 00:51:19,080
retrieves only the authorized documents,

1307
00:51:19,080 --> 00:51:20,880
and then passes them back to the agent.

1308
00:51:20,880 --> 00:51:22,280
That isn't one tool doing everything.

1309
00:51:22,280 --> 00:51:26,280
It's a collection of specialized tools integrated into a single coherent system.

1310
00:51:26,280 --> 00:51:31,280
The big gap that nobody really talks about is that no vendor offers a complete out-of-the-box solution for this.

1311
00:51:31,280 --> 00:51:34,680
You have to understand the architecture well enough to assemble these pieces yourself.

1312
00:51:34,680 --> 00:51:36,880
Picking the tools is actually the easy part.

1313
00:51:36,880 --> 00:51:40,280
Making them work together securely is where the real work happens.

1314
00:51:40,280 --> 00:51:42,880
The 40% prediction and what comes after.

1315
00:51:42,880 --> 00:51:46,880
Gartner recently released a prediction for 2027 that carries a lot of weight

1316
00:51:46,880 --> 00:51:48,280
because of how specific it is.

1317
00:51:48,280 --> 00:51:54,080
They project that 40% of AI data breaches will stem from cross-border Gen-AI misuse.

1318
00:51:54,080 --> 00:51:56,680
This isn't about a single system being configured incorrectly

1319
00:51:56,680 --> 00:51:58,280
and it isn't a one-off accident.

1320
00:51:58,280 --> 00:52:02,080
When 40% of your risk comes from one place, it's no longer a rare event.

1321
00:52:02,080 --> 00:52:07,280
It becomes the primary way your company will likely experience a breach by the end of this decade.

1322
00:52:07,280 --> 00:52:11,280
But what does cross-border Gen-AI misuse actually look like in practice?

1323
00:52:11,280 --> 00:52:14,480
It happens when organizations move sensitive data into AI services

1324
00:52:14,480 --> 00:52:16,680
without knowing where that data is being processed.

1325
00:52:16,680 --> 00:52:20,480
They fail to maintain the permission models that protected the data at the source

1326
00:52:20,480 --> 00:52:23,480
and they completely ignore the legal frameworks that apply

1327
00:52:23,480 --> 00:52:25,480
once that data crosses a national border.

1328
00:52:25,480 --> 00:52:28,680
If we translate that into plain English, the scenario is much simpler.

1329
00:52:28,680 --> 00:52:32,480
Imagine a company in the EU has data governed by strict GDPR rules.

1330
00:52:32,480 --> 00:52:37,280
That data has a clear permission model where only specific groups of people can see certain documents.

1331
00:52:37,280 --> 00:52:40,680
The company decides to use a third-party AI service for customer support

1332
00:52:40,680 --> 00:52:43,680
so the team starts uploading customer records to that system.

1333
00:52:43,680 --> 00:52:48,480
Now that data is sitting in the US or Singapore or wherever that vendor happens to keep it servers.

1334
00:52:48,480 --> 00:52:51,680
The permission model from the original system didn't travel with the data

1335
00:52:51,680 --> 00:52:54,280
and the EU's legal protections didn't follow it either.

1336
00:52:54,280 --> 00:52:57,080
You have moved sensitive information across a legal boundary

1337
00:52:57,080 --> 00:52:59,080
and lost all continuity of control.

1338
00:52:59,080 --> 00:53:01,480
Let's look at a concrete case to see how this breaks.

1339
00:53:01,480 --> 00:53:04,880
A European bank starts using a US-hosted version of chat GPT

1340
00:53:04,880 --> 00:53:07,480
to help their support agents draft responses to customers.

1341
00:53:07,480 --> 00:53:10,280
The agents begin pasting customer queries into the chat box

1342
00:53:10,280 --> 00:53:14,280
which often include account numbers, transaction histories, and total balances.

1343
00:53:14,280 --> 00:53:18,280
The vendor's terms of service clearly state they will use your inputs to train their models

1344
00:53:18,280 --> 00:53:21,280
so your private data is now training data for a foreign company.

1345
00:53:21,280 --> 00:53:26,080
Because it's being processed on US infrastructure, it's now subject to US legal discovery

1346
00:53:26,080 --> 00:53:27,280
and intelligence requests.

1347
00:53:27,280 --> 00:53:30,280
Customer data that was supposed to be protected by European law

1348
00:53:30,280 --> 00:53:33,680
is now sitting in a system the EU has no way to monitor.

1349
00:53:33,680 --> 00:53:37,880
That isn't just a small mistake. It is a compliance nightmare that could cost millions of dollars.

1350
00:53:37,880 --> 00:53:42,680
The root cause here is structural because Genai adoption is moving much faster than governance can keep up.

1351
00:53:42,680 --> 00:53:44,680
Teams are desperate for productivity gains

1352
00:53:44,680 --> 00:53:47,080
so they deploy these systems as fast as possible.

1353
00:53:47,080 --> 00:53:50,080
While security and compliance teams try to build new frameworks,

1354
00:53:50,080 --> 00:53:54,080
three more use cases have already launched before the first policy is even finished.

1355
00:53:54,080 --> 00:53:57,280
The people in charge of safety are essentially chasing shadows.

1356
00:53:57,280 --> 00:54:00,080
Gartner has another prediction that makes this even more dangerous.

1357
00:54:00,080 --> 00:54:05,680
They expect 75% of employees will be using unapproved AI tools by 2026.

1358
00:54:05,680 --> 00:54:08,280
This shadow AI isn't just a niche problem anymore.

1359
00:54:08,280 --> 00:54:10,280
It has become the standard way people work.

1360
00:54:10,280 --> 00:54:14,480
Most of these shadow systems were never reviewed by a security team or checked for compliance.

1361
00:54:14,480 --> 00:54:18,880
They were built by employees who just wanted to solve a problem and get their work done faster.

1362
00:54:18,880 --> 00:54:21,080
This reality cuts both ways for a business.

1363
00:54:21,080 --> 00:54:25,880
You can't just ban AI because the productivity gains and competitive pressures are too high to ignore.

1364
00:54:25,880 --> 00:54:28,680
If you tell your employees they aren't allowed to use these tools,

1365
00:54:28,680 --> 00:54:31,080
they will just do it anyway, without any oversight.

1366
00:54:31,080 --> 00:54:33,280
The game has changed from prevention to governance.

1367
00:54:33,280 --> 00:54:36,880
Instead of saying no, you have to learn how to say yes under very specific conditions.

1368
00:54:36,880 --> 00:54:39,880
Permission aware rag is a major part of that governance strategy.

1369
00:54:39,880 --> 00:54:42,880
It gives you a way to let people use AI to improve their work

1370
00:54:42,880 --> 00:54:45,480
while the system itself enforces access controls.

1371
00:54:45,480 --> 00:54:50,680
Your data won't leak outside of its authorized scope because the permission model from your source systems is preserved.

1372
00:54:50,680 --> 00:54:54,480
Every query is logged, every result is filtered and the boundaries stay intact.

1373
00:54:54,480 --> 00:54:56,880
The global regulatory response is already hardening.

1374
00:54:56,880 --> 00:55:02,080
By 2027 AI governance will be a mandatory part of sovereign laws all over the world.

1375
00:55:02,080 --> 00:55:05,480
The EU's AI Act already has strict rules for high-risk systems

1376
00:55:05,480 --> 00:55:08,480
while China and India are building their own residency frameworks.

1377
00:55:08,480 --> 00:55:12,680
Your organization is going to face these requirements in multiple countries at the same time.

1378
00:55:12,680 --> 00:55:15,680
This isn't a problem for the distant future, it is happening right now.

1379
00:55:15,680 --> 00:55:18,680
The real impact is that companies ignoring this will face massive fines,

1380
00:55:18,680 --> 00:55:21,680
lawsuits and mandatory breach notifications.

1381
00:55:21,680 --> 00:55:24,680
A bank that loses data through an unsecured AI system

1382
00:55:24,680 --> 00:55:27,680
might be hit by both data protection laws and financial regulations.

1383
00:55:27,680 --> 00:55:30,680
Healthcare providers could face HIPAA penalties

1384
00:55:30,680 --> 00:55:33,680
and public agencies might see their contracts cancelled entirely.

1385
00:55:33,680 --> 00:55:36,680
But there is an opportunity here if you flip the perspective.

1386
00:55:36,680 --> 00:55:39,680
Organizations that set up permission aware rag correctly

1387
00:55:39,680 --> 00:55:43,280
will have a massive advantage in regulated industries like finance and healthcare.

1388
00:55:43,280 --> 00:55:46,080
These sectors need AI but they have to use it safely.

1389
00:55:46,080 --> 00:55:50,480
The companies that solve the safety problem first will become the only trusted partners in the room

1390
00:55:50,480 --> 00:55:53,080
while everyone else will just look reckless.

1391
00:55:53,080 --> 00:55:55,480
Have you built a system or a vulnerability?

1392
00:55:55,480 --> 00:55:59,480
There is one specific question that will haunt your organization if you get this wrong.

1393
00:55:59,480 --> 00:56:03,280
If you take every document from your share point and build a technically perfect rag system

1394
00:56:03,280 --> 00:56:06,680
that is fast and accurate, but you lose the permission model along the way,

1395
00:56:06,680 --> 00:56:09,680
what have you actually built? Have you created a powerful knowledge system?

1396
00:56:09,680 --> 00:56:13,680
Or have you just constructed the biggest insider data leak in the history of your company?

1397
00:56:13,680 --> 00:56:16,680
For most organizations, the uncomfortable answer is the second one.

1398
00:56:16,680 --> 00:56:19,680
Let's look at how this plays out in a real world scenario.

1399
00:56:19,680 --> 00:56:24,680
Imagine a company with 100,000 employees and 10 million documents stored in SharePoint.

1400
00:56:24,680 --> 00:56:28,680
Every single one of those documents has a permission model attached to it.

1401
00:56:28,680 --> 00:56:33,680
Summer for everyone, summer for specific departments and summer locked away for executives or auditors.

1402
00:56:33,680 --> 00:56:35,680
SharePoint enforces these rules every single day.

1403
00:56:35,680 --> 00:56:39,680
And it works perfectly. Nobody ever sees a file they aren't supposed to see.

1404
00:56:39,680 --> 00:56:43,680
Then the organization decides to build a rag system to unlock the value of all that data.

1405
00:56:43,680 --> 00:56:47,680
The UNDEX all 10 million documents, turn them into embeddings and store them in a vector database.

1406
00:56:47,680 --> 00:56:49,680
The system is incredibly fast.

1407
00:56:49,680 --> 00:56:52,680
A junior employee can ask a complex question and get a perfect answer in two seconds.

1408
00:56:52,680 --> 00:56:56,680
The retrieval is flawless, but the permissions have become completely irrelevant.

1409
00:56:56,680 --> 00:57:01,680
The trouble starts when that junior employee asks the system about the future strategy of the company.

1410
00:57:01,680 --> 00:57:06,680
The rag system finds the most relevant chunks of information, which happened to come from private executive strategy documents.

1411
00:57:06,680 --> 00:57:12,680
Because the permission model was stripped away during the ingestion process, the vector database has no idea that this information is confidential.

1412
00:57:12,680 --> 00:57:14,680
It serves the answer up anyway.

1413
00:57:14,680 --> 00:57:19,680
Now that junior employee is reading about unreleased products, acquisition targets and upcoming layoffs.

1414
00:57:19,680 --> 00:57:22,680
This was information that was never supposed to cross that boundary.

1415
00:57:22,680 --> 00:57:26,680
The legal reality here is very clear. This isn't just a minor technical flaw.

1416
00:57:26,680 --> 00:57:30,680
It is a form of negligence. You had a working permission model and you chose to throw it away.

1417
00:57:30,680 --> 00:57:34,680
You built a system that you knew would leak secrets if the wrong person asked the right question.

1418
00:57:34,680 --> 00:57:39,680
When this eventually comes to light, regulators are going to ask what you did to prevent it.

1419
00:57:39,680 --> 00:57:43,680
Telling them you prioritized speedover security will not be an acceptable answer.

1420
00:57:43,680 --> 00:57:45,680
The fallout usually follows a very predictable pattern.

1421
00:57:45,680 --> 00:57:51,680
Someone discovers the leak when a contractor mentions a secret they shouldn't know, or a compliance ordered flags an unusual query.

1422
00:57:51,680 --> 00:57:54,680
Once the leak is found, the response is immediate and painful.

1423
00:57:54,680 --> 00:57:58,680
The system gets shut down instantly. Every single query has to be audited to see who saw what.

1424
00:57:58,680 --> 00:58:03,680
And external legal counsel has to get involved. The damage to your reputation is often permanent.

1425
00:58:03,680 --> 00:58:07,680
We built an AI that leaked our secrets, becomes the only thing people remember about your company.

1426
00:58:07,680 --> 00:58:12,680
Investors and customers will start to wonder if you are competent enough to handle their data at all.

1427
00:58:12,680 --> 00:58:17,680
This damage goes far beyond one system. It destroys the trust people have in your entire approach to technology.

1428
00:58:17,680 --> 00:58:21,680
The alternative is to just build it the right way from the start.

1429
00:58:21,680 --> 00:58:23,680
You have to materialize permissions at the moment you ingest data.

1430
00:58:23,680 --> 00:58:28,680
You need to capture the access list from SharePoint and encode them directly into your metadata.

1431
00:58:28,680 --> 00:58:33,680
This allows you to filter results at the moment of the query and monitor the system for any strange behavior.

1432
00:58:33,680 --> 00:58:37,680
It will cost more money, it will take more time, and it will require expertise you might not have yet.

1433
00:58:37,680 --> 00:58:44,680
But the cost of doing it wrong is much higher. A CISO at a major bank told us that they spend $600,000 to build their permission-aware system correctly.

1434
00:58:44,680 --> 00:58:49,680
Their competitor tried to save money by spending only $200,000 on a basic rag setup.

1435
00:58:49,680 --> 00:58:55,680
When that competitor suffered a major breach, they ended up spending $10 million on fines and incident response.

1436
00:58:55,680 --> 00:59:00,680
Investing early wasn't just a security choice. It was the only logical financial move.

1437
00:59:00,680 --> 00:59:03,680
That isn't a rare story. It is the pattern we see everywhere.

1438
00:59:03,680 --> 00:59:06,680
For CISOs, architects, and teams.

1439
00:59:06,680 --> 00:59:10,680
Let's bring this down to what it actually means for the people who have to build and operate this.

1440
00:59:10,680 --> 00:59:13,680
For a CISO, permission-aware rag stops being optional.

1441
00:59:13,680 --> 00:59:16,680
The moment you deploy any agentic system that touches internal data.

1442
00:59:16,680 --> 00:59:22,680
This isn't a recommendation or a best practice you can defer, but a hard compliance requirement that your organization must meet.

1443
00:59:22,680 --> 00:59:29,680
Your data governance framework exists specifically to enforce who can see what, and when you build a system that bypasses that framework,

1444
00:59:29,680 --> 00:59:40,680
you're creating a massive control failure. Regulatory frameworks like GDPR, HIPAA, and SOX all assume your organization maintains a coherent access control model across every system, touching regulated data.

1445
00:59:40,680 --> 00:59:47,680
A rag system that ignores permissions violates that core assumption, which puts the entire company at risk.

1446
00:59:47,680 --> 00:59:52,680
The governance model has to shift because you can't review AI systems one at a time in isolation anymore.

1447
00:59:52,680 --> 00:59:58,680
You need an AI governance committee that includes security, privacy, legal, and IT leadership to oversee these projects.

1448
00:59:58,680 --> 01:00:02,680
Permission models must be reviewed before any rag system goes into production.

1449
01:00:02,680 --> 01:00:05,680
And this has to happen as a requirement rather than an afterthought.

1450
01:00:05,680 --> 01:00:11,680
That committee needs to ask the hard questions about where permissions come from and how they stay in sync with the index.

1451
01:00:11,680 --> 01:00:17,680
They need to know if a policy engine is enforcing access at query time, and if you can audit every single retrieval to prove who saw what.

1452
01:00:17,680 --> 01:00:22,680
If the answer to any of those questions is that you'll figure it out later, the system simply does not go live.

1453
01:00:22,680 --> 01:00:28,680
A SISO at a major bank put it directly when he said they treat rag permission trimming the same way they treat database access control.

1454
01:00:28,680 --> 01:00:33,680
It's non-negotiable, and that is the bar every serious organization needs to hit.

1455
01:00:33,680 --> 01:00:37,680
For architects, this changes what a rag architecture actually is at its core.

1456
01:00:37,680 --> 01:00:45,680
It's no longer just a retrieval system with a language model bolted on, but a security architecture that happens to use retrieval and language generation.

1457
01:00:45,680 --> 01:00:48,680
Your design must answer three specific questions to be viable.

1458
01:00:48,680 --> 01:00:53,680
First, you have to show how identity propagates from the user through every single agent in the chain.

1459
01:00:53,680 --> 01:00:59,680
Second, you must define where authorization is enforced and ensure it stays deterministic and outside the model itself.

1460
01:00:59,680 --> 01:01:06,680
Third, you need to separate responsibility between agents so that a compromise in one doesn't automatically grant access to all your data.

1461
01:01:06,680 --> 01:01:10,680
Every rag system must answer those questions before a design review even begins.

1462
01:01:10,680 --> 01:01:17,680
The architecture diagram shouldn't just show how data flows, but it needs to show exactly how identity and authorization boundaries work.

1463
01:01:17,680 --> 01:01:25,680
It has to identify which components are trusted to make access decisions and which are not, which is fundamentally different from how most rag systems are designed today.

1464
01:01:25,680 --> 01:01:31,680
For engineers, the skill set changes entirely because you can't build permission-aware rag with just ML engineers and database experts.

1465
01:01:31,680 --> 01:01:37,680
You need engineers who deeply understand identity systems like OAuth, SAML, and token exchange to manage credentials properly.

1466
01:01:37,680 --> 01:01:43,680
You need people who can work with complex authorization frameworks and find great access control without breaking the system.

1467
01:01:43,680 --> 01:01:51,680
This isn't something you learn from a quick tutorial, but something you pick up by working in high stakes environments like banking or health care where failures have real consequences.

1468
01:01:51,680 --> 01:01:57,680
For product teams, permission-aware retrieval stops being a security concern and actually becomes a competitive feature.

1469
01:01:57,680 --> 01:02:03,680
It's not a bug fix or a technical debt item, but a specific capability that customers will start asking for during the sales process.

1470
01:02:03,680 --> 01:02:10,680
Questions about whether contractors can see sensitive documents or if junior employees are being overexposed will become standard during customer due diligence.

1471
01:02:10,680 --> 01:02:19,680
Teams that build this capability as a first-class feature will have a massive market advantage, while teams that treat it as an afterthought will lose deals to competitors who got it right.

1472
01:02:19,680 --> 01:02:25,680
For organizations, this is a strategic decision about whether you want to deploy a rag safely or become a cautionary tale for the rest of the industry.

1473
01:02:25,680 --> 01:02:31,680
Organizations that implement permission-aware rag correctly will have a huge advantage in regulated sectors like finance, health care, and government.

1474
01:02:31,680 --> 01:02:37,680
These sectors need the productivity gains that AI offers, but they absolutely must have them safely.

1475
01:02:37,680 --> 01:02:46,680
The vendors who solve this problem first will own those markets while everyone else will spend the next three years trying to retrofit security onto systems that will never build for it.

1476
01:02:46,680 --> 01:02:47,680
That's the reality.

1477
01:02:47,680 --> 01:02:50,680
Why this matters beyond security?

1478
01:02:50,680 --> 01:02:53,680
Permission models aren't just a security constraint you have to enforce.

1479
01:02:53,680 --> 01:02:56,680
There are business assets you can actually use to make your systems better.

1480
01:02:56,680 --> 01:03:03,680
Most organizations think about access control as friction or a compliance requirement that slows down productivity to reduce risk.

1481
01:03:03,680 --> 01:03:06,680
Permission-aware rag inverts that thinking entirely.

1482
01:03:06,680 --> 01:03:13,680
The permission model becomes the foundation for serving different audiences from the same knowledge base, which makes the system more valuable rather than more frustrating.

1483
01:03:13,680 --> 01:03:18,680
Start with a simple observation that different roles in your company need different types of data to do their jobs.

1484
01:03:18,680 --> 01:03:25,680
A sales team needs customer information and transaction history, while a legal team needs contracts and regulatory letters to stay compliant.

1485
01:03:25,680 --> 01:03:32,680
An executive team needs strategy and financial forecasts, and in a traditional setup you'd solve this by building three separate systems.

1486
01:03:32,680 --> 01:03:37,680
Sales would have their CRM, legal would have their document manager, and executives would have their dashboards.

1487
01:03:37,680 --> 01:03:41,680
That means three systems, three permission models, and three sets of redundant data to maintain.

1488
01:03:41,680 --> 01:03:47,680
Rage with permission enforcement changes this equation by allowing you to build one system with one index and one knowledge base.

1489
01:03:47,680 --> 01:03:54,680
The system is designed so that a sales rep queering it gets customer focused results, while a lawyer gets contract precedents, and an executive gets strategy.

1490
01:03:54,680 --> 01:03:59,680
It's the same underlying index, but the system provides different views and permissions scopes for each person.

1491
01:03:59,680 --> 01:04:05,680
You end up with one system serving multiple audiences without ever overexposing sensitive information to the wrong people.

1492
01:04:05,680 --> 01:04:15,680
This unlocks a level of operational efficiency that basic rag can't touch, you aren't maintaining three separate systems or duplicating data across multiple platforms, which makes the whole thing cheaper to operate.

1493
01:04:15,680 --> 01:04:22,680
You're managing one permission model instead of three, and that is much simpler to maintain and far more reliable as a single source of truth.

1494
01:04:22,680 --> 01:04:30,680
A financial services firm implemented this pattern by building a single rag system to serve advisors, compliance officers, and executives from one base.

1495
01:04:30,680 --> 01:04:37,680
The advisors can query customer portfolios and transaction history, while the compliance officers look at regulatory files and policy documents.

1496
01:04:37,680 --> 01:04:44,680
The executives can access strategy documents and performance metrics, and because the system handles permissions scopes, everyone sees exactly what they need.

1497
01:04:44,680 --> 01:04:48,680
Nobody sees what they don't, and the firm doesn't have to manage a dozen different AI silos.

1498
01:04:48,680 --> 01:04:55,680
Now zoom out for a moment, organizations that understand their permission models deeply can answer questions they currently can't even ask.

1499
01:04:55,680 --> 01:05:00,680
When you ask what data only five people can see, it sounds like a security question, but it's actually a vital business question.

1500
01:05:00,680 --> 01:05:06,680
Maybe you have a critical process that only five people know about, which means you have a massive bottleneck in your operations.

1501
01:05:06,680 --> 01:05:13,680
Maybe your institutional knowledge is too concentrated in one spot, and permission metadata is the only thing that reveals these dangerous patterns.

1502
01:05:13,680 --> 01:05:17,680
The same logic applies when you look for documents that are overshared across the company.

1503
01:05:17,680 --> 01:05:22,680
You might think a document is confidential, but if your permission model says it's visible to 200 people, you have a problem.

1504
01:05:22,680 --> 01:05:29,680
Maybe the permissions are stale or were set incorrectly, but the question itself forces an audit of what's actually happening versus what you think is happening.

1505
01:05:29,680 --> 01:05:35,680
Permission models also reveal your true organizational structure, and where decision-making authority actually sits.

1506
01:05:35,680 --> 01:05:41,680
The pattern of who can access what exposes how information flows through your teams, and where the real power is concentrated.

1507
01:05:41,680 --> 01:05:48,680
This isn't about surveillance, but about pattern recognition that makes your organization silos and collaboration points visible and actionable.

1508
01:05:48,680 --> 01:05:55,680
The innovation opportunities where this gets really interesting, once you have permission aware retrieval, you can build features that a basic RAC system could never handle.

1509
01:05:55,680 --> 01:06:00,680
You could show employees documents they don't have access to yet, but should request based on their specific role.

1510
01:06:00,680 --> 01:06:06,680
This turns the AI into a career development tool where junior employees can see what senior staff know and ask for access to learn.

1511
01:06:06,680 --> 01:06:12,680
You can even use it to find experts in the organization by looking at who has access to specific high level information.

1512
01:06:12,680 --> 01:06:19,680
The system doesn't just retrieve documents anymore, it connects you to people and builds relationships based on how information is used.

1513
01:06:19,680 --> 01:06:26,680
Permission aware RAC stops being a compliance checkbox at that point, it becomes the actual infrastructure for how your organization works.

1514
01:06:26,680 --> 01:06:31,680
Where this is heading. The future of this space isn't a mystery, and the direction is actually very clear.

1515
01:06:31,680 --> 01:06:35,680
Zero trust is going to become the baseline architecture for every agentic RAC system you build.

1516
01:06:35,680 --> 01:06:40,680
This isn't just a specialized pattern for high-risk environments or an option you implement when you finally have the budget.

1517
01:06:40,680 --> 01:06:47,680
It is the default, it is the minimum bar for entry. If we look at the evolution trajectory, most organizations are currently sitting at step one.

1518
01:06:47,680 --> 01:06:52,680
They are building single agent RAC with almost no access control because they want to move fast and deliver value quickly.

1519
01:06:52,680 --> 01:06:54,680
In this stage, permission models aren't even on the radar.

1520
01:06:54,680 --> 01:07:01,680
But by 2026, we are going to see a massive migration wave as these organizations realize they have a serious problem.

1521
01:07:01,680 --> 01:07:07,680
They will start upgrading to multi-agent RAC with basic access control where permissions are tracked and identity begins to propagate.

1522
01:07:07,680 --> 01:07:11,680
It won't be perfect, but it will be a significant step up from where they are now.

1523
01:07:11,680 --> 01:07:17,680
By 2028, the leaders who got this right will be running fully zero trust agentic RAC with continuous verification.

1524
01:07:17,680 --> 01:07:22,680
Every single hop will be authenticated, every access decision will be logged, and every anomaly will be detected.

1525
01:07:22,680 --> 01:07:28,680
That is the final endpoint. The organizations that refuse to make this transition are going to hit regulatory pressure first.

1526
01:07:28,680 --> 01:07:36,680
Compliance frameworks will eventually mandate these security measures and privacy regulators will look at your agentic systems with very specific questions.

1527
01:07:36,680 --> 01:07:41,680
They will demand that you prove your agents aren't accessing data they shouldn't be touching.

1528
01:07:41,680 --> 01:07:43,680
If you can't provide that proof, you are non-compliant.

1529
01:07:43,680 --> 01:07:48,680
This isn't just a friendly recommendation from an auditor. It is a legal requirement that carries real weight.

1530
01:07:48,680 --> 01:07:52,680
Once the regulators move in, market disadvantage follows right behind them.

1531
01:07:52,680 --> 01:07:59,680
Customers in regulated industries like finance, healthcare and government will start asking their vendors if their agentic systems enforce strict access control.

1532
01:07:59,680 --> 01:08:01,680
They will want to see the proof.

1533
01:08:01,680 --> 01:08:07,680
The vendors who can answer yes will win those deals, while the vendors who say they are still working on it will simply lose.

1534
01:08:07,680 --> 01:08:12,680
The market has a way of sorting itself out like that. On the technical side, identity systems are moving fast.

1535
01:08:12,680 --> 01:08:18,680
Workload identity is becoming standard infrastructure, which means tokens are getting shorter life spans and much narrower scopes.

1536
01:08:18,680 --> 01:08:22,680
Find great authorization is moving out of specialized services and straight into core platforms.

1537
01:08:22,680 --> 01:08:28,680
By 2028, you will expect your cloud provider to natively support per agent identity and per request authorization.

1538
01:08:28,680 --> 01:08:33,680
This won't be a custom integration you have to build yourself. It will be a built in primitive that just works inside the company.

1539
01:08:33,680 --> 01:08:36,680
AI governance committees will become a standard part of the enterprise.

1540
01:08:36,680 --> 01:08:40,680
They won't be optional anymore. They will be just as common as cybersecurity governance is today.

1541
01:08:40,680 --> 01:08:46,680
Security reviews for AI systems will become as rigorous as any infrastructure deployment review you've ever been through.

1542
01:08:46,680 --> 01:08:51,680
You won't be allowed to ship an agent without showing that it has the proper authorization controls in place.

1543
01:08:51,680 --> 01:08:56,680
These standards will start with the most security conscious organizations and then they will become standard industry practice.

1544
01:08:56,680 --> 01:09:01,680
Eventually, the regulators will just bake them into the official requirements. The market itself is shifting its focus.

1545
01:09:01,680 --> 01:09:08,680
Right now, vendors are competing on speed and accuracy by claiming their rag system is faster or their agent is more capable.

1546
01:09:08,680 --> 01:09:12,680
By 2028, that differentiation shifts towards security and governance.

1547
01:09:12,680 --> 01:09:18,680
Companies will win by proving their system prevents cross-tenant leakage or by showing they enforce fine-grained authorization.

1548
01:09:18,680 --> 01:09:23,680
Building on zero-trust principles won't be a secondary feature. It will be the primary value proposition.

1549
01:09:23,680 --> 01:09:28,680
This timeline matters because you need to move right now. If you start building permission-aware rag today,

1550
01:09:28,680 --> 01:09:35,680
you will have a mature system ready by 2026. But if you wait until the regulations force your hand, you are already two years behind the curve.

1551
01:09:35,680 --> 01:09:41,680
You'll be stuck retrofitting security onto legacy systems and fighting technical debt while trying to hit a compliance deadline.

1552
01:09:41,680 --> 01:09:47,680
The organizations that understand this shift and invest properly will be the leaders by 20, 27 and 2028.

1553
01:09:47,680 --> 01:09:52,680
Everyone else will just be trying to catch up. None of this is theoretical because it is already happening.

1554
01:09:52,680 --> 01:09:59,680
Early adopters are reporting success, mid-market organizations are planning their migrations and vendors are building much better tooling.

1555
01:09:59,680 --> 01:10:06,680
The momentum is visible to anyone looking. By the time the regulations officially mandate these changes in 2027, this will already be the industry standard.

1556
01:10:06,680 --> 01:10:08,680
The one question that changes everything.

1557
01:10:08,680 --> 01:10:12,680
Let's go back to that listener comment that started this entire conversation.

1558
01:10:12,680 --> 01:10:16,680
Are you carrying each file's permissions into the index and filtering per user at query time?

1559
01:10:16,680 --> 01:10:23,680
That one question is the divider. It separates the systems that are actually secure from the ones that just look secure on the surface.

1560
01:10:23,680 --> 01:10:27,680
It marks the difference between a system that survives a regulatory audit and one that fails it completely.

1561
01:10:27,680 --> 01:10:31,680
It is the line between protecting your data and exposing it to the world.

1562
01:10:31,680 --> 01:10:34,680
This entire episode has been about finding the answer to that question.

1563
01:10:34,680 --> 01:10:39,680
If your answer is no and you're building rag systems without permission-aware retrieval, then you have a lot of work to do.

1564
01:10:39,680 --> 01:10:45,680
Data sovereignty without security is just an illusion. Where your data actually sits matters much less than who is allowed to access it.

1565
01:10:45,680 --> 01:10:50,680
Permission-aware rag isn't an optional add-on. It is the foundation of the whole system.

1566
01:10:50,680 --> 01:10:52,680
Start asking this question about your own systems today.

1567
01:10:52,680 --> 01:10:55,680
If this changed how you think, follow me, mycopieters, or linked info more.

1568
01:10:55,680 --> 01:11:02,680
Subscribe to M365FM for more deep dives on systems that actually work and leave a review to help others find this.

1569
01:11:02,680 --> 01:11:06,680
Share this with your team, especially if you're dealing with these architecture flaws right now.

Private RAG Isn't Enough: The Missing Layer Between Data Sovereignty and Data Security

Listen On

Support On

Featured Episodes

Recent Episodes

Microsoft Data Podcast – Analytics, Fabric & Data Governance Episodes

Microsoft Power Platform Podcast – Governance, Security & Architecture Episodes

Microsoft Security Podcast – Identity, Cloud & Enterprise Protection Episodes

Microsoft Azure Podcast – Cloud Architecture, Security & Operations Episodes

Microsoft Copilot Podcast – AI Architecture, Security & Governance Episodes

Microsoft Dynamics 365 Podcast – Architecture & Integration Episodes

Microsoft Development Podcast – APIs, Identity & Architecture Episodes

Microsoft 365 Podcast – Teams, SharePoint, Office Apps & Productivity Episodes

Browse episodes by category