May 28, 2026

Microsoft Graph API Discovery for Enterprise Semantic Search

Enterprise search is no longer limited by storage capacity or indexing speed. The real challenge is the growing gap between when information is created and when it becomes discoverable. This article explores how Microsoft Graph API Discovery is changing enterprise search by shifting from traditional crawl-and-index models to a relationship-driven, real-time discovery architecture.

Traditional enterprise search relies on scheduled indexing, which often creates delays, stale results, and fragmented knowledge across systems. As organizations generate data across Teams, SharePoint, Outlook, OneDrive, and other Microsoft 365 services, keeping search indexes current becomes increasingly difficult.

Microsoft Graph approaches the problem differently. Instead of focusing solely on where information is stored, it understands how content, people, conversations, meetings, permissions, and business processes are connected. This graph-based model enables search experiences that are contextual, relationship-aware, and significantly faster at surfacing relevant information.

The article highlights how semantic search performance improves when systems can leverage metadata, organizational relationships, and contextual signals rather than relying only on keywords. Search becomes less about matching words and more about understanding meaning, intent, and relevance. This is especially important for AI-powered experiences such as Microsoft Copilot, which depend on real-time contextual retrieval and semantic indexing to generate accurate and useful responses.

Organizations that continue to depend on legacy search architectures may experience slower discovery, duplicated information, reduced productivity, and weaker AI outcomes. In contrast, Graph API Discovery enables a more dynamic knowledge layer where information is continuously connected, discoverable, and ready for intelligent applications.

You see a major transformation in enterprise search when you use graph api discovery. Instead of waiting for scheduled indexing, you get real-time, event-driven discovery that delivers sub-second freshness. This means your search results always reflect the most current data and knowledge. With graph api discovery, you support AI workflows and automate compliance. The table below shows measurable benefits for any user who depends on search and knowledge graphs.

Benefit	Description
Sub-second freshness	Automatic indexing ensures that searches are executed against the most current data available.
Compliance automation	Enhanced capabilities streamline workflows, reducing export operation time by up to 50%.
Operational efficiency	Organizations report 40-60% time savings in review processes due to AI-driven content indexing.

Key Takeaways

Graph API Discovery provides real-time, event-driven updates, ensuring your search results are always fresh and accurate.
Automating compliance workflows can reduce operational time by up to 50%, streamlining your processes significantly.
A unified semantic layer connects various data sources, enhancing AI's ability to understand and deliver relevant search results.
Real-time data access eliminates the staleness gap, allowing you to make informed decisions quickly.
High precision and recall in search results save time and improve user satisfaction by delivering accurate information.
Integrating Graph API Discovery with existing systems enhances knowledge management and supports AI-driven workflows.
Built-in compliance and security features protect sensitive information and ensure adherence to regulations.
Measuring ROI from Graph API Discovery can reveal cost savings, increased productivity, and improved customer satisfaction.

Graph API Discovery and Enterprise Search

What Is Graph API Discovery

You interact with Microsoft Graph API Discovery when you want to connect your enterprise systems and unlock the power of semantic search. This api discovery gives you a single access point for queries across Microsoft cloud services. You can search Outlook, SharePoint, and OneDrive with one endpoint. You also include external sources using Microsoft 365 Copilot connectors. This unified approach helps you break down data silos and supports data integration. You get a consistent experience, and your queries return relevant results based on graph intelligence. You see how knowledge graphs become more useful when you can query them in real time. You gain context awareness, which helps you understand user intent and deliver personalized recommendations.

Microsoft’s Event-Driven Approach

You notice a big difference when you use Microsoft’s event-driven model for semantic search. Instead of waiting for scheduled crawlers, you get real-time updates. Microsoft’s event-driven architecture streams data changes instantly to message brokers like Azure Event Hubs or Apache Kafka. You do not need custom ETL processes. Your information retrieval pipelines update automatically. You see operational data become analytics-ready in real time. This approach supports live dashboards, monitoring, and AI feedback loops. You always work with fresh data, and your queries reflect the latest user intent. You improve retrieval speed and accuracy. You can respond to queries and intent changes in sub-seconds, which helps you deliver recommendations and insights faster.

You benefit from sub-second freshness and instant data pipelines. Your enterprise search adapts to user intent and queries without delay.

Role in Modern Semantic Search

You rely on semantic search to understand user intent and deliver relevant results. Microsoft Graph API Discovery plays a key role by connecting your queries to a unified semantic layer. You can analyze user intent across multiple sources and provide recommendations that match the context. You use the api to support AI-driven workflows and automate compliance. You see how knowledge graphs help you answer queries with context awareness. You improve information retrieval and make your enterprise search smarter. You can personalize recommendations and adapt to changing queries and intent. You build a system that learns from user intent and delivers insights that matter. You create a digital twin of your organization, reflecting real-time changes and knowledge.

You use semantic search to handle complex queries and intent.
You leverage api discovery to unify your data integration and retrieval.
You deliver personalized recommendations based on user intent and context.

Challenges of Traditional Semantic Search

Data Silos and Staleness Gap

You often face barriers when you try to access information across different departments or systems. Data silos make it hard for you to find what you need, especially when you want to use AI tools or build knowledge graphs. Many organizations struggle with this issue.

70% of respondents in a recent biopharma study reported difficulty accessing data for AI projects because their systems are siloed.
Only 39% of biopharma companies use standardized formats and ontologies, which limits the flow of knowledge across teams.
The cost of working with intermediaries includes vendor fees, internal labor, and missed opportunities from delayed decisions.
Competitive reports can take 6-16 weeks to deliver insights, so you often receive information that is already outdated.

You lose valuable time and resources when you cannot access current data. The staleness gap means your search results may not reflect the latest knowledge, making it harder for you to make informed decisions.

Indexing Delays and Fragmentation

You depend on search systems to deliver accurate results, but indexing delays can cause problems. When indexing is poorly managed, you may retrieve outdated information. Fragmentation happens when documents are split incorrectly, which reduces the precision of your search outcomes.

Indexing delays can result in outdated information being retrieved, affecting the accuracy of search results.
Fragmentation can lead to loss of context when documents are split improperly, reducing precision in search outcomes.
Effective chunking that respects document structure is crucial for maintaining context and improving search accuracy.

You also face challenges with designing data pipelines, managing vector databases, and building user interfaces.

Data pipelines can introduce delays in accessing current information.
Vector databases require ongoing tuning for accuracy, which takes time and resources.
Handling permissions at scale complicates the search process and can lead to errors.

You need centralized configuration and clear logs to improve the reliability of your search system. Without these, you risk losing context and accuracy.

Limited Context and AI Integration

You want AI tools to help you find answers quickly, but traditional search systems often lack the context needed for effective automation.

AI initiatives struggle to deliver value when models lack access to unified, reliable, and up-to-date knowledge.
Employees face difficulties due to searching across multiple tools, leading to inefficiencies.
Automation efforts are hindered by the absence of accurate contextual information, resulting in ineffective AI-driven processes.
The return on AI investments drops when you cannot easily access the information you need.

You need a search solution that integrates context and supports AI workflows. Without this, your user experience suffers and your digital investments do not reach their full potential.

How Graph API Discovery Transforms Semantic Search

Real-Time Data Access

You need instant access to information to make fast decisions. Graph API Discovery gives you real-time data access, which means you always work with the latest updates. Traditional search systems often leave you with outdated results because they rely on scheduled indexing. With Graph API Discovery, you eliminate the staleness gap. You see new content as soon as it is created or changed.

Here is how real-time data access works in practice:

Mechanism	Description
Real-time indexing	Continuously updates the system as new content is created or changed, capturing only what’s new.
On-demand data fetching	Retrieves data at the moment of search, ensuring that the information is current and relevant.

You also benefit from Retrieval-Augmented Generation (RAG). RAG grounds large language models in your current company data. This approach provides accurate and relevant responses based on live context. You can trust that your search results reflect the most recent knowledge in your organization.

Tip: Real-time data access helps you close the staleness gap and keeps your semantic search results fresh and reliable.

Unified Semantic Layer

You want your search to be accurate and meaningful. A unified semantic layer organizes and tags your data, making it easier for AI to understand and process information. This layer connects different sources, so you get a complete view of your organization’s knowledge. When you use Graph API Discovery, you create a single point of truth for your semantic search.

A semantic layer enhances AI’s understanding of data by providing structured, high-quality information. This leads to more accurate responses and reduces errors.
It organizes and tags data, allowing AI to deliver faster and more precise answers. This improves search accuracy.
By linking AI outputs to trusted data sources, semantic layers minimize the risk of hallucinations and ensure results align with business rules.
Semantic layers provide a structured view of data, enabling AI models to grasp the meaning behind the data. This enhances the relevance and accuracy of results.
They prevent AI from relying on ambiguous data, ensuring that insights generated are reliable and explainable.

You use this unified approach to support AI-driven workflows. Your search engine can now deliver results that match user intent and context. You also make your knowledge graphs more powerful by connecting all your information in one place.

Compliance and Security by Design

You must protect sensitive information and follow strict rules. Graph API Discovery builds compliance and security into every step of the search process. You do not need to worry about unauthorized access or data leaks. The system respects permissions and privacy settings, so only the right people see the right information.

You also support compliance-aware retrieval. This means your search results always follow your organization’s policies. You can track who accesses what information, which helps you meet legal and regulatory requirements. By using Graph API Discovery, you create a digital twin of your organization. This digital twin reflects real-time changes and gives you a clear view of your data landscape.

Note: Compliance and security by design help you build trust in your semantic search system and protect your organization’s knowledge.

Semantic Search Engine Performance Metrics

Speed and Latency

You want your search results fast. Speed and latency measure how quickly your semantic search engine responds to your queries. When you use Microsoft Graph API Discovery, you see sub-second response times. This means you get answers almost instantly. Fast search helps you make decisions quickly and keeps your workflow smooth. You do not have to wait for scheduled indexing or batch updates. Real-time data access ensures your search always reflects the latest information. Low latency also supports AI-driven tasks, so your tools can use fresh knowledge to deliver better results.

Tip: Fast search performance boosts productivity and helps you stay ahead in a competitive environment.

Precision and Recall

You need your semantic search engine to find the right information. Precision measures how many of your search results are relevant. Recall shows how many relevant items your search engine finds out of all possible matches. High precision and recall mean you get accurate and complete answers. Microsoft Graph API Discovery uses context and intent to improve both metrics. The system understands what you mean, not just what you type. This leads to tailored results that match your needs. You also benefit from knowledge graphs, which connect related data and help your search engine understand complex relationships.

Precision: You get fewer irrelevant results.
Recall: You find more of the information you need.
Context: The search engine uses your intent to deliver better answers.

A semantic search engine with high precision and recall saves you time and reduces frustration. You spend less effort sorting through unrelated results and more time using the knowledge you find.

User Satisfaction and Adoption

You want a search experience that feels natural and helpful. User satisfaction and adoption rates show how well your semantic search engine meets your needs. Enterprises report clear business value from semantic search. You see improved customer satisfaction and lower support costs. Organizations using Microsoft Graph API Discovery notice better information retrieval and smoother discovery workflows.

Enhanced search precision gives you results that match your intent.
Higher engagement means you use the search engine more often.
Improved conversion rates show that you find what you need faster.
Better content optimization makes important information easier to find.

The intelligent context-building features of Copilot make your assistant more useful. You manage information across many channels with less effort. As a result, you feel more satisfied and confident in your search experience.

Note: High user satisfaction leads to greater adoption of semantic search tools, helping your organization unlock the full value of its data.

Use Cases and Success Stories

AI-Powered Knowledge Management

You can transform your organization’s knowledge management with Microsoft Graph API Discovery. Many enterprises now use API-based agents to improve how they handle information. These agents connect to Microsoft 365 Copilot, which lets you access different sources without switching tools. You do not need to adjust your AI models every time new data appears. Instead, you use Copilot’s reasoning skills to work with fresh information. This approach helps you manage knowledge efficiently and ensures compliance with company policies. You see how these agents make workflows smoother and more reliable.

API-based agents enhance knowledge management by connecting to multiple sources.
You use Copilot to access and reason over new data quickly.
These solutions align with enterprise needs for compliance and efficiency.

You build smarter knowledge graphs that reflect real-time changes. Your team finds answers faster and makes better decisions.

Customer Support and Compliance

You want your customer support and compliance processes to run smoothly. Microsoft Graph API Discovery helps you automate many tasks that used to take hours. You can manage secure scores, handle alerts, and track incidents with ease. Integration with Azure Active Directory improves how you control user access and monitor compliance. You reduce security risks by automating identity and access management. This means you remove access rights quickly when someone leaves the company.

Automation streamlines access reviews and supports regulatory mandates like GDPR and HIPAA.
You monitor user activity and data sharing to maintain governance standards.
Standardized workflows make audits easier and more defensible.

You see fewer errors and faster responses in customer support. Your compliance team meets legal requirements without extra effort.

Use Case	Description
Managing secure scores	Update secure score control profiles to enhance security posture.
Handling alerts and incidents	List and update alerts and incidents to manage security threats effectively.
Utilizing eDiscovery functions	List eDiscovery cases and operations for compliance and investigations.

Tip: Automation ensures you remove group memberships and application access during offboarding, which protects your organization from lingering risks.

Breaking the 80% Accuracy Ceiling

You may notice that traditional search systems often reach a limit in accuracy. Many engines struggle to go beyond 80% because they cannot keep up with real-time data or understand context. With Microsoft Graph API Discovery, you break through this barrier. You get search results that reflect the latest knowledge and user intent. The unified semantic layer connects all your information, so you do not miss important details. AI-driven workflows use live data to deliver precise answers.

You see improvements in both precision and recall. Your search engine understands what you need and finds the right information. This helps you make decisions faster and with more confidence. You move past the old limits and unlock the full value of your organization’s knowledge.

Note: When you use real-time search and unified knowledge graphs, you achieve higher accuracy and better outcomes for your business.

Implementation and Best Practices

Integration with Existing Systems

You want your search solution to work smoothly with your current tools. Microsoft Graph API Discovery makes this possible by supporting different integration patterns. If you connect fewer than four systems, you can use a point-to-point approach. For four or more systems, a hub-and-spoke model works better. When you manage over fifteen systems, you should consider an enterprise service bus pattern.

You can use Microsoft tools like Power Automate, Azure Logic Apps, and Azure Service Bus to build your integrations. Before you start, set clear rules for data consistency. This step helps you avoid problems later. You should also test your setup with real-world data to catch any issues early. Plan for ongoing maintenance, which usually costs about 15-20% of your initial build.

Tip: Always design for failure. Use retry logic, dead-letter queues, and alerting to keep your search running smoothly. Document your integration architecture so you can manage it well after launch.

Security and Governance

You need to protect your organization’s knowledge and keep your search secure. Trust your employees, but always verify their work. Make sure sensitivity labels match your data loss prevention standards. Set up strong lifecycle management and attestation policies. These steps help you hold everyone accountable.

Control how links are shared in your organization to prevent oversharing. Use Microsoft Graph Data Connect to monitor sharing and spot problems quickly. You can extract inventory reports to see where oversharing happens. This helps you keep your knowledge safe and your search compliant with company rules.

Apply sensitivity labels and check them often.
Limit default link-sharing to reduce risks.
Monitor sharing activity and address issues right away.

Measuring ROI

You want to know if your investment in Microsoft Graph API Discovery pays off. You can measure return on investment in several ways. Automating tasks saves money by reducing labor costs and boosting efficiency. You may find new revenue streams and improve pricing strategies, which can increase your income.

Better tools and information help your user work faster and make fewer mistakes. Improved data quality means fewer errors and lower costs. When you deliver personalized experiences and faster service, customer satisfaction goes up. You can also bring new products to market faster, giving you an edge over competitors.

ROI Factor	Benefit Example
Cost Savings	Lower labor costs through automation
Revenue Increase	New revenue streams, better pricing
Productivity Gains	Employees work faster and smarter
Data Quality Improvements	Fewer errors, more accurate results
Customer Satisfaction	Faster, more personal service
Faster Time-to-Market	Quicker product launches

Note: Tracking these benefits helps you show the value of your search investment and guides future decisions.

You see Microsoft Graph API Discovery transform enterprise search by making knowledge instantly available. You benefit from advanced semantic technologies, high-availability clustering, and real-time data flow. These features help you build smarter knowledge graphs and improve user experiences. The table below shows how real-time, event-driven, and compliance-aware capabilities change search for the better:

Capability	Description
Automating Compliance Workflows	Integrates compliance and security policies into your search and data management.
Programmatic Access to Audit Logs	Lets you monitor and improve compliance with automated audit log retrieval.
Supporting Policy-Aware Applications	Ensures your search tools respect governance frameworks and prevent data misuse.

You can expect rapid growth in semantic search applications as the need for context-aware search and better information retrieval increases. Now is the time to adopt these tools and unlock the full value of your organization’s knowledge.

FAQ

How does Graph API Discovery improve search speed?

You get search results almost instantly because Graph API Discovery uses real-time, event-driven updates. This approach removes delays from scheduled indexing. You can access information as soon as it appears, which helps you make decisions faster.

Can I use Graph API Discovery with existing knowledge graphs?

You can connect Graph API Discovery to your current knowledge graphs. This integration lets you unify information from different sources. You gain a complete view of your organization’s knowledge and improve search accuracy.

What makes search results more accurate with Graph API Discovery?

You benefit from a unified semantic layer that organizes data. This layer helps AI understand user intent and context. You receive search results that match your needs and reflect the latest knowledge.

How does Graph API Discovery support compliance?

You use built-in compliance features that respect permissions and privacy settings. The system tracks who accesses information. You meet legal requirements and protect sensitive data during every search.

Is real-time search possible for all users?

You can access real-time search regardless of your role. The system updates instantly for everyone. You always work with the most current information, which improves productivity and user satisfaction.

How does Graph API Discovery handle data security?

You rely on strong security measures. The system enforces access controls and monitors sharing activity. You keep your data safe while using search tools that follow your organization’s rules.

Can Graph API Discovery help automate workflows?

You automate many tasks with Graph API Discovery. The system supports AI-driven workflows that use live data. You save time and reduce errors by letting search tools handle routine operations.

🎧 Listen to this episode

Want a practical explanation of Microsoft Graph API Discovery for Enterprise Semantic Search? This episode breaks down the topic in clear language and shows why it matters for Microsoft 365, Azure, Power Platform, security, AI, and modern work.

Listen to this episode if you want to:

Understand the key concepts behind Microsoft Graph API Discovery for Enterprise Semantic Search
See how it fits into the wider Microsoft technology ecosystem
Learn where it can create practical value for your organization

You may also enjoy these related M365 FM episodes:

Discover more practical Microsoft conversations on M365 FM.

🚀 Want to be part of m365.fm?

Then stop just listening… and start showing up.

👉 Connect with me on LinkedIn and let’s make something happen:

🎙️ Be a podcast guest and share your story
🎧 Host your own episode (yes, seriously)
💡 Pitch topics the community actually wants to hear
🌍 Build your personal brand in the Microsoft 365 space

This isn’t just a podcast — it’s a platform for people who take action.

🔥 Most people wait. The best ones don’t.

👉 Connect with me on LinkedIn and send me a message:
"I want in"

Let’s build something awesome 👊

1
00:00:00,000 --> 00:00:03,560
Your enterprise search isn't actually finding what you need when you need it.

2
00:00:03,560 --> 00:00:06,360
It's failing because it assumes your data stays still.

3
00:00:06,360 --> 00:00:10,800
Most systems today still use a model where a crawler periodically scans your files.

4
00:00:10,800 --> 00:00:14,400
But in reality, that model is already dead by the time the scan finishes.

5
00:00:14,400 --> 00:00:17,320
You aren't looking at your data, you're looking at a ghost of it.

6
00:00:17,320 --> 00:00:20,760
Today we are shifting away from the old model of periodic indexing.

7
00:00:20,760 --> 00:00:22,920
We're moving toward a new model of live context.

8
00:00:22,920 --> 00:00:27,240
We're moving from a system that pulls data to one that subscribes to it through the graph API.

9
00:00:27,240 --> 00:00:29,600
This isn't just a minor speed boost for your team.

10
00:00:29,600 --> 00:00:32,800
It is a total rewrite of how performance works in a semantic world.

11
00:00:32,800 --> 00:00:36,200
If you want to build AI that actually knows what is happening right now,

12
00:00:36,200 --> 00:00:37,480
you need to listen closely.

13
00:00:37,480 --> 00:00:38,960
Before we dive into the protocol level,

14
00:00:38,960 --> 00:00:44,520
make sure you're subscribed to the M365FM podcast for more deep dives into the systems running your business.

15
00:00:44,520 --> 00:00:47,440
The latency chasm. Why search feels broken?

16
00:00:47,440 --> 00:00:51,280
Think about the last time you uploaded a critical document and then tried to find it

17
00:00:51,280 --> 00:00:54,640
using your internal search, you probably waited and then you waited some more.

18
00:00:54,640 --> 00:00:56,080
This is the staleness gap.

19
00:00:56,080 --> 00:01:00,960
It is the period between when work happens and when the system finally admits that work exists.

20
00:01:00,960 --> 00:01:05,720
In most organizations, this gap is measured in hours and sometimes it takes a full day for a file to appear.

21
00:01:05,720 --> 00:01:07,360
The reason for this is structural.

22
00:01:07,360 --> 00:01:12,800
Legacy crawling is a brute force approach where the system starts at point A, scans everything to point Z,

23
00:01:12,800 --> 00:01:15,200
and then starts the whole process over again.

24
00:01:15,200 --> 00:01:17,680
But here's the problem. Your data isn't a static library.

25
00:01:17,680 --> 00:01:18,520
It's a flowing river.

26
00:01:18,520 --> 00:01:20,960
As your organization grows, the volume of data increases.

27
00:01:20,960 --> 00:01:26,160
And because legacy crawling is linear, the time it takes to complete a full scan grows right along with your data.

28
00:01:26,160 --> 00:01:31,920
If you have 10,000 files, maybe it takes an hour to scan them all, but the moment you hit 10 million, the math breaks.

29
00:01:31,920 --> 00:01:34,400
You get stuck in a loop where the system is always behind.

30
00:01:34,400 --> 00:01:37,840
It creates a linear bottleneck that cannot be solved by throwing more hardware at it.

31
00:01:37,840 --> 00:01:44,000
You can't just buy a faster crawler when the fundamental assumption is that you have to look at every single thing every single time.

32
00:01:44,000 --> 00:01:47,240
This has a massive psychological cost that most IT leaders miss.

33
00:01:47,240 --> 00:01:51,480
We are asking employees to trust AI. We're giving them co-pilots and semantic search tools,

34
00:01:51,480 --> 00:01:53,880
while telling them these tools will make them faster.

35
00:01:53,880 --> 00:01:57,320
But the first time an employee cannot find a file, they just edited.

36
00:01:57,320 --> 00:02:00,120
That trust evaporates instantly. They stop using the tool.

37
00:02:00,120 --> 00:02:05,720
They go back to manual navigation. They start sending emails with attachments because they don't believe the search index is real.

38
00:02:05,720 --> 00:02:08,520
The search failure is a silent killer of AI adoption.

39
00:02:08,520 --> 00:02:12,920
If the retrieval layer isn't fresh, the LLM is just hallucinating based on old news.

40
00:02:12,920 --> 00:02:15,960
When the index is stale, the AI becomes a liability.

41
00:02:15,960 --> 00:02:20,040
It provides answers based on outdated versions of policies or project plans,

42
00:02:20,040 --> 00:02:22,840
which creates a risk that goes beyond mere inconvenience.

43
00:02:22,840 --> 00:02:25,400
It leads to bad decision making at the executive level.

44
00:02:25,400 --> 00:02:27,480
People rely on the single source of truth.

45
00:02:27,480 --> 00:02:31,080
But if that truth is 12 hours old, it's just a well-formatted lie.

46
00:02:31,080 --> 00:02:34,920
We have to address this gap if we want semantic search to be a professional grade tool.

47
00:02:34,920 --> 00:02:37,160
The frustration isn't just about missing files.

48
00:02:37,160 --> 00:02:40,840
It's about the feeling that the digital workplace is disconnected from reality.

49
00:02:40,840 --> 00:02:41,720
You're in a meeting.

50
00:02:41,720 --> 00:02:44,120
You need an answer from a memo sent 10 minutes ago.

51
00:02:44,120 --> 00:02:44,840
You search.

52
00:02:44,840 --> 00:02:45,400
Nothing.

53
00:02:45,400 --> 00:02:48,600
You're forced to pause the conversation to go digging through folders.

54
00:02:48,600 --> 00:02:51,240
This friction is the direct result of the pull model.

55
00:02:51,240 --> 00:02:54,680
The system assumes it should go looking for data on its own schedule

56
00:02:54,680 --> 00:02:58,200
and it treats discovery as a chore it does once a week or once a day

57
00:02:58,200 --> 00:02:59,240
and one level deeper.

58
00:02:59,240 --> 00:03:02,280
This model fails because it ignores the nature of modern work.

59
00:03:02,280 --> 00:03:04,680
Work today doesn't happen in silos that sit still.

60
00:03:04,680 --> 00:03:08,200
It happens in team's chats, in shared loop components, and in live Excel sheets.

61
00:03:08,200 --> 00:03:09,720
These are high-velocity data points.

62
00:03:09,720 --> 00:03:14,760
A crawler designed for a 2005 file share cannot keep up with a 2025 collaboration loop.

63
00:03:14,760 --> 00:03:16,840
The assumption that people know what they're looking for

64
00:03:16,840 --> 00:03:20,600
and are willing to wait for the index to catch up is fundamentally broken.

65
00:03:20,600 --> 00:03:23,640
Most architects try to fix this by increasing the crawl frequency.

66
00:03:23,640 --> 00:03:26,200
They change it from once a day to once an hour,

67
00:03:26,200 --> 00:03:27,720
but that just leads to throttling.

68
00:03:27,720 --> 00:03:30,280
It puts a massive load on your sharepoint environment

69
00:03:30,280 --> 00:03:33,320
and slows down the very services people are trying to use.

70
00:03:33,320 --> 00:03:34,440
It's a losing game.

71
00:03:34,440 --> 00:03:37,320
You're trying to sprint faster on a treadmill that is moving backward.

72
00:03:37,320 --> 00:03:39,480
The more you crawl, the more you strain the system,

73
00:03:39,480 --> 00:03:42,600
and the more likely you are to get blocked by service protection limits.

74
00:03:42,600 --> 00:03:46,440
So what's actually happening is that we've reached the limit of the full crawl era.

75
00:03:46,440 --> 00:03:50,280
We've built hierarchies and indexing schedules for a world that no longer exists.

76
00:03:50,280 --> 00:03:52,120
We need a system that doesn't go looking for work,

77
00:03:52,120 --> 00:03:54,200
but instead hears when work happens.

78
00:03:54,200 --> 00:03:55,880
The floor isn't the data itself.

79
00:03:55,880 --> 00:03:57,000
It's the model behind it.

80
00:03:57,000 --> 00:04:00,840
We need to move away from the idea that a search engine is an external observer

81
00:04:00,840 --> 00:04:04,520
and start treating it as an integrated part of the ecosystem's nervous system.

82
00:04:04,520 --> 00:04:05,880
That's where things change.

83
00:04:05,880 --> 00:04:08,040
Because when you stop pulling and start listening,

84
00:04:08,040 --> 00:04:09,880
the latency chasm disappears.

85
00:04:09,880 --> 00:04:12,520
Beyond the crawler, the evolution of discovery.

86
00:04:12,520 --> 00:04:15,080
We've spent decades treating discovery as a search problem,

87
00:04:15,080 --> 00:04:17,160
but in reality, it's a synchronization problem.

88
00:04:17,160 --> 00:04:21,400
The shift we're seeing right now is a fundamental move away from the full crawl

89
00:04:21,400 --> 00:04:24,280
toward what we call event-driven discovery.

90
00:04:24,280 --> 00:04:28,280
To understand why this matters, you have to look at how we define the search engine itself.

91
00:04:28,280 --> 00:04:30,520
In the old model, the search engine was an intruder.

92
00:04:30,520 --> 00:04:34,120
It was an external process that had to force its way into your databases

93
00:04:34,120 --> 00:04:35,960
and file shares to see what was going on.

94
00:04:35,960 --> 00:04:38,360
Think of it as a visitor that showed up once a day,

95
00:04:38,360 --> 00:04:41,480
knocked on every single door and asked if anything had changed.

96
00:04:41,480 --> 00:04:45,640
The process was incredibly loud, incredibly expensive and incredibly slow.

97
00:04:45,640 --> 00:04:47,080
Think about your own health for a second.

98
00:04:47,080 --> 00:04:48,360
If you want to know if you're fit,

99
00:04:48,360 --> 00:04:52,280
a weekly checkup at the clinic gives you a snapshot of your blood pressure at 10am on a Tuesday.

100
00:04:52,280 --> 00:04:55,800
But that appointment doesn't tell you how your heart reacted when you ran for the bus on Wednesday

101
00:04:55,800 --> 00:04:57,160
because it lacks a pulse.

102
00:04:57,160 --> 00:05:00,600
A search engine that relies on crawling is exactly like that weekly checkup.

103
00:05:00,600 --> 00:05:02,040
It sees the organization at rest.

104
00:05:02,040 --> 00:05:05,960
It misses the moments of peak activity and the context of the struggle,

105
00:05:05,960 --> 00:05:08,360
which results in a static view of a dynamic world.

106
00:05:08,360 --> 00:05:11,560
And for AI, a static view is a recipe for irrelevance.

107
00:05:11,560 --> 00:05:14,920
What we're building today is a digital twin of the organization.

108
00:05:14,920 --> 00:05:17,000
A twin isn't just a copy, it's a live reflection.

109
00:05:17,000 --> 00:05:18,680
If the physical organization moves,

110
00:05:18,680 --> 00:05:21,480
the digital twin must move at the exact same time.

111
00:05:21,480 --> 00:05:24,840
This is where legacy crawling fails because you can't build a twin with a crawler.

112
00:05:24,840 --> 00:05:25,640
You need a pulse.

113
00:05:25,640 --> 00:05:28,760
You need a continuous stream of telemetry that updates the digital model

114
00:05:28,760 --> 00:05:30,200
as work happens in the real world.

115
00:05:30,200 --> 00:05:33,240
We're moving from a state where discovery is a scheduled event

116
00:05:33,240 --> 00:05:36,040
to a state where discovery is a background reality.

117
00:05:36,040 --> 00:05:38,600
The twin stays in sync because it isn't waiting for a report.

118
00:05:38,600 --> 00:05:39,960
It's experiencing the change.

119
00:05:39,960 --> 00:05:42,680
This is where the Microsoft Graph API comes into play.

120
00:05:42,680 --> 00:05:46,680
Most people think of the graph as just a set of endpoints for developers to pull data

121
00:05:46,680 --> 00:05:48,040
or they see it as a library.

122
00:05:48,040 --> 00:05:50,360
But that's a narrow view in reality.

123
00:05:50,360 --> 00:05:54,360
The graph is the nervous system of the entire Microsoft 365 ecosystem.

124
00:05:54,360 --> 00:05:57,720
Every time a user sends a message in teams, a synapse fires in the graph.

125
00:05:57,720 --> 00:06:00,920
Every time a file is shared, a meeting is booked or a task is assigned,

126
00:06:00,920 --> 00:06:02,840
a signal travels through that nervous system.

127
00:06:02,840 --> 00:06:05,240
The graph knows about the change, the millisecond it occurs

128
00:06:05,240 --> 00:06:07,400
because it is the medium through which the work happens.

129
00:06:07,400 --> 00:06:10,520
Event-driven discovery means the search engine is no longer a visitor.

130
00:06:10,520 --> 00:06:11,320
It's a listener.

131
00:06:11,320 --> 00:06:13,160
It's plugged directly into that nervous system.

132
00:06:13,160 --> 00:06:16,360
When a document is updated, the graph doesn't wait for someone to ask.

133
00:06:16,360 --> 00:06:17,800
It sends a notification.

134
00:06:17,800 --> 00:06:19,960
The search engine hears it, processes the change,

135
00:06:19,960 --> 00:06:21,720
and updates the index immediately.

136
00:06:21,720 --> 00:06:24,920
There is no waiting for a schedule and no scanning of 10 million files

137
00:06:24,920 --> 00:06:26,520
to find the three that actually changed.

138
00:06:26,520 --> 00:06:29,240
We finally decoupled discovery from the total size of your data.

139
00:06:29,240 --> 00:06:31,320
It doesn't matter if you have 10 files or 10 billion

140
00:06:31,320 --> 00:06:33,160
because the effort is exactly the same.

141
00:06:33,160 --> 00:06:35,880
This is the fundamental evolution of discovery.

142
00:06:35,880 --> 00:06:38,680
In the old world, performance was limited by the speed of the disc

143
00:06:38,680 --> 00:06:40,040
and the bandwidth of the network.

144
00:06:40,040 --> 00:06:43,080
The more data you had, the worse your performance became.

145
00:06:43,080 --> 00:06:46,840
In the new world, performance is limited only by the latency of the event signal

146
00:06:46,840 --> 00:06:49,800
and because those signals are tiny, just a few bytes of metadata,

147
00:06:49,800 --> 00:06:51,720
they move at the speed of light.

148
00:06:51,720 --> 00:06:53,880
This is how you achieve sub-second freshness.

149
00:06:53,880 --> 00:06:55,800
You stop trying to look at the whole world

150
00:06:55,800 --> 00:06:58,360
and start listening to the parts of the world that are moving.

151
00:06:58,360 --> 00:07:01,400
You move from a model of pull to a model of subscribe.

152
00:07:01,400 --> 00:07:03,560
But to really appreciate the power of this shift,

153
00:07:03,560 --> 00:07:06,360
we have to go deeper than just the concept of listening.

154
00:07:06,360 --> 00:07:09,160
We need to look at how the protocol actually handles these changes

155
00:07:09,160 --> 00:07:12,840
without falling apart under the weight of massive enterprise-scale traffic.

156
00:07:12,840 --> 00:07:15,880
Listening to every single heartbeat of a 100,000 person company

157
00:07:15,880 --> 00:07:17,640
sounds like a recipe for a system crash

158
00:07:17,640 --> 00:07:19,560
and the secret isn't just in the events themselves,

159
00:07:19,560 --> 00:07:22,360
it's in how the graph manages the state of those events.

160
00:07:22,360 --> 00:07:24,760
To understand that, we need to look at the protocol level,

161
00:07:24,760 --> 00:07:27,000
we need to look at the mechanics of the Delta query

162
00:07:27,000 --> 00:07:28,840
and how state tokens change the game.

163
00:07:28,840 --> 00:07:31,080
The Delta query breakthrough.

164
00:07:31,080 --> 00:07:32,280
Efficiency at scale.

165
00:07:32,280 --> 00:07:35,640
The breakthrough that makes this possible at massive scale is the Delta query.

166
00:07:35,640 --> 00:07:38,680
If you've ever built a sync engine, you know the pain of dipping.

167
00:07:38,680 --> 00:07:40,920
In the legacy world, you take a snapshot of the source

168
00:07:40,920 --> 00:07:43,640
compared to your local index and try to figure out what changed.

169
00:07:43,640 --> 00:07:45,560
This is what I call the reconciliation tax.

170
00:07:45,560 --> 00:07:48,360
It's expensive in terms of CPU, memory and time.

171
00:07:48,360 --> 00:07:50,920
But the graph API removes that tax entirely

172
00:07:50,920 --> 00:07:53,000
by providing a state-based protocol.

173
00:07:53,000 --> 00:07:55,880
It works through a mechanism called the OData Delta link.

174
00:07:55,880 --> 00:07:58,040
Think of this link as a high-tech bookmark.

175
00:07:58,040 --> 00:08:01,400
When you first query a resource, say all the files in a SharePoint site,

176
00:08:01,400 --> 00:08:03,400
the graph gives you the data in pages.

177
00:08:03,400 --> 00:08:06,600
You follow the OData next link to get the next batch.

178
00:08:06,600 --> 00:08:08,280
But when you reach the end of the list,

179
00:08:08,280 --> 00:08:09,720
the graph doesn't just stop.

180
00:08:09,720 --> 00:08:11,000
It hands you a Delta link.

181
00:08:11,000 --> 00:08:13,240
This link is an opaque string

182
00:08:13,240 --> 00:08:15,880
that represents the exact state of that data set

183
00:08:15,880 --> 00:08:17,720
at that specific moment in time.

184
00:08:17,720 --> 00:08:19,560
Now, here's where the efficiency kicks in.

185
00:08:19,560 --> 00:08:20,760
You don't throw that link away.

186
00:08:20,760 --> 00:08:24,520
You store it in a durable place like Azure Key Vault or a SQL database.

187
00:08:24,520 --> 00:08:26,520
The next time your search engine needs an update,

188
00:08:26,520 --> 00:08:28,200
it doesn't call the standard endpoint.

189
00:08:28,200 --> 00:08:29,800
It calls that specific Delta link.

190
00:08:29,800 --> 00:08:32,360
The graph looks at the token embedded in that URL,

191
00:08:32,360 --> 00:08:34,120
checks its internal change logs,

192
00:08:34,120 --> 00:08:36,440
and returns only the items that were added,

193
00:08:36,440 --> 00:08:39,560
updated or deleted since that token was created.

194
00:08:39,560 --> 00:08:42,520
The security trimming trap, why vectors aren't safe?

195
00:08:42,520 --> 00:08:45,720
Most architects assume that moving data from SharePoint to a vector store

196
00:08:45,720 --> 00:08:47,960
means the security settings just follow along.

197
00:08:47,960 --> 00:08:51,000
It's a comfortable thought, but in reality, it's fundamentally wrong.

198
00:08:51,000 --> 00:08:53,880
In the old world, security trimming happened at the application layer.

199
00:08:53,880 --> 00:08:56,600
You searched SharePoint and SharePoint checked your permissions

200
00:08:56,600 --> 00:08:58,040
before showing you any results.

201
00:08:58,040 --> 00:09:00,040
But in the new world of semantic search,

202
00:09:00,040 --> 00:09:02,280
we are separating the storage from the retrieval.

203
00:09:02,280 --> 00:09:04,520
We take chunks of text, turn them into vectors,

204
00:09:04,520 --> 00:09:07,240
and shove them into a database like Pinecone or Weviate.

205
00:09:07,240 --> 00:09:08,680
And that is where the model breaks.

206
00:09:08,680 --> 00:09:11,160
Vectors are just arrays of floating point numbers.

207
00:09:11,160 --> 00:09:13,240
They don't have a read or write permission bit,

208
00:09:13,240 --> 00:09:15,240
and they certainly don't know who created them

209
00:09:15,240 --> 00:09:16,600
or who is allowed to see them.

210
00:09:16,600 --> 00:09:18,680
When a user sends a query to your AI,

211
00:09:18,680 --> 00:09:21,480
the system looks for the closest numbers in that vector space.

212
00:09:21,480 --> 00:09:25,400
If the most relevant piece of information happens to be in a restricted document,

213
00:09:25,400 --> 00:09:28,360
the vector store will happily hand that snippet over to the LLM.

214
00:09:28,360 --> 00:09:30,520
The LLM doesn't know it's looking at forbidden data.

215
00:09:30,520 --> 00:09:31,880
It just sees context.

216
00:09:31,880 --> 00:09:35,240
It processes the information and gives the user an answer.

217
00:09:35,240 --> 00:09:37,480
This is what we call semantic leakage.

218
00:09:37,480 --> 00:09:40,040
It's a silent breach where sensitive data is exposed

219
00:09:40,040 --> 00:09:41,880
through the back door of a search query.

220
00:09:41,880 --> 00:09:43,800
Think about the implications for a moment.

221
00:09:43,800 --> 00:09:46,040
You have an HR document about a planned layoff

222
00:09:46,040 --> 00:09:47,800
that is restricted to a few managers,

223
00:09:47,800 --> 00:09:50,440
but you've indexed your whole tenant into a rag pipeline

224
00:09:50,440 --> 00:09:52,600
to help people be more productive.

225
00:09:52,600 --> 00:09:56,520
An intern asks a vague question about company changes next month,

226
00:09:56,520 --> 00:09:58,680
and the vector search finds the layoff document

227
00:09:58,680 --> 00:10:00,280
because the semantic match is high.

228
00:10:00,280 --> 00:10:03,320
The LLM summarizes the plan for the intern.

229
00:10:03,320 --> 00:10:05,960
You didn't lose a file, but you lost control of the information.

230
00:10:05,960 --> 00:10:09,640
Real-time governance, compliance as a performance metric.

231
00:10:09,640 --> 00:10:13,080
We need to talk about why compliance is actually a performance metric.

232
00:10:13,080 --> 00:10:16,440
For years, we treated governance like a tax that slowed us down,

233
00:10:16,440 --> 00:10:18,440
or a manual audit that happened once a quarter,

234
00:10:18,440 --> 00:10:22,600
but as we moved toward 2026, the regulatory environment is shifting.

235
00:10:22,600 --> 00:10:26,440
The EU AI Act and similar laws are making data lineage mandatory.

236
00:10:26,440 --> 00:10:28,680
You can't just say your AI is safe, you have to prove it.

237
00:10:28,680 --> 00:10:31,240
This is where the old model of crawling fails again.

238
00:10:31,240 --> 00:10:32,840
When a crawler scans your tenant,

239
00:10:32,840 --> 00:10:35,000
it creates a massive monolithic index.

240
00:10:35,000 --> 00:10:37,720
It's like a soup where all the ingredients are mixed together.

241
00:10:37,720 --> 00:10:40,840
If an auditor asks where a specific piece of information came from,

242
00:10:40,840 --> 00:10:42,920
you have to work backward through millions of files.

243
00:10:42,920 --> 00:10:46,680
It's a forensic nightmare because legacy systems weren't built for traceability.

244
00:10:46,680 --> 00:10:48,040
They were built for convenience.

245
00:10:48,040 --> 00:10:50,040
They assumed that as long as the data was there,

246
00:10:50,040 --> 00:10:51,880
the how and when didn't matter.

247
00:10:51,880 --> 00:10:54,200
But today, the how is everything.

248
00:10:54,200 --> 00:10:56,280
If you don't know the source, you don't know the risk.

249
00:10:56,280 --> 00:10:57,800
Graph discovery changes the model,

250
00:10:57,800 --> 00:10:59,320
because we're using Delta queries.

251
00:10:59,320 --> 00:11:01,400
Every update is a distinct event.

252
00:11:01,400 --> 00:11:02,680
We aren't just getting data.

253
00:11:02,680 --> 00:11:05,160
We're getting assigned immutable log of changes.

254
00:11:05,160 --> 00:11:07,640
Every single data fragment that enters your vector store

255
00:11:07,640 --> 00:11:09,720
is linked to a specific delta token.

256
00:11:09,720 --> 00:11:12,760
You have a clear path from the LLM back to the source.

257
00:11:12,760 --> 00:11:14,600
That is the shift from guessing to knowing.

258
00:11:14,600 --> 00:11:16,840
Sub-second discovery.

259
00:11:16,840 --> 00:11:18,680
The 250ms benchmark.

260
00:11:18,680 --> 00:11:21,400
Performance in the AI era isn't the secondary feature.

261
00:11:21,400 --> 00:11:23,080
It is the primary constraint.

262
00:11:23,080 --> 00:11:26,040
If your system feels slow, it is essentially broken.

263
00:11:26,040 --> 00:11:28,200
Most architects focus on the time it takes

264
00:11:28,200 --> 00:11:30,040
for an LLM to produce the first token.

265
00:11:30,040 --> 00:11:31,320
But that's only half the story.

266
00:11:31,320 --> 00:11:33,480
The real bottleneck is the discovery phase.

267
00:11:33,480 --> 00:11:34,920
We need to define a clear target.

268
00:11:34,920 --> 00:11:36,440
I call this the flash latency budget.

269
00:11:36,440 --> 00:11:39,800
It represents the total time allowed for retrieval and prompt assembly.

270
00:11:39,800 --> 00:11:41,880
To keep an experience feeling interactive,

271
00:11:41,880 --> 00:11:45,720
this phase must complete in 250 milliseconds or less at the 95th percentile.

272
00:11:45,720 --> 00:11:47,000
This is a brutal requirement.

273
00:11:47,000 --> 00:11:49,400
It means your system has to be right nearly every single time,

274
00:11:49,400 --> 00:11:51,160
regardless of how much data you're searching.

275
00:11:51,160 --> 00:11:53,240
Why 250 milliseconds?

276
00:11:53,240 --> 00:11:56,680
Humans perceive anything under 100 milliseconds as instantaneous.

277
00:11:56,680 --> 00:11:58,520
But as we move toward half a second,

278
00:11:58,520 --> 00:12:00,520
the brain starts to notice the gap.

279
00:12:00,520 --> 00:12:03,640
Once you hit one second, the user's focus begins to drift.

280
00:12:03,640 --> 00:12:07,640
In an enterprise setting, your AI is competing with the speed of a conversation.

281
00:12:07,640 --> 00:12:09,480
If a colleague asks you a question,

282
00:12:09,480 --> 00:12:11,960
you don't wait three seconds to start your sentence

283
00:12:11,960 --> 00:12:13,800
and your search engine shouldn't either.

284
00:12:13,800 --> 00:12:15,480
Let's look at how that time is spent.

285
00:12:15,480 --> 00:12:18,920
First, you have to turn the user's natural language query into a vector

286
00:12:18,920 --> 00:12:20,760
which takes roughly 30 milliseconds.

287
00:12:20,760 --> 00:12:22,040
Then you run a vector search.

288
00:12:22,040 --> 00:12:24,280
In a large tenet with millions of items,

289
00:12:24,280 --> 00:12:27,480
that can take 100 milliseconds if your index is tuned correctly.

290
00:12:27,480 --> 00:12:30,040
That leaves you with only 120 milliseconds.

291
00:12:30,040 --> 00:12:32,680
In that tiny sliver of time, you have to fetch the text,

292
00:12:32,680 --> 00:12:35,480
check the permissions, and build the prompt for the model.

293
00:12:35,480 --> 00:12:37,640
This is where legacy crawling fails the test.

294
00:12:37,640 --> 00:12:40,360
Because legacy indexes are built on periodic scans,

295
00:12:40,360 --> 00:12:43,800
the data is often stored in a way that prioritizes storage efficiency

296
00:12:43,800 --> 00:12:45,160
over retrieval speed.

297
00:12:45,160 --> 00:12:47,880
The system has to perform multiple joins across relational tables

298
00:12:47,880 --> 00:12:49,480
just to find out where a file lives.

299
00:12:49,480 --> 00:12:52,760
The P95 latency for these old systems often exceeds two seconds,

300
00:12:52,760 --> 00:12:54,520
which is eight times slower than our budget

301
00:12:54,520 --> 00:12:56,920
and a total non-starter for modern work.

302
00:12:56,920 --> 00:12:59,080
The impact on your business is measurable.

303
00:12:59,080 --> 00:13:01,480
When you hit that sub-second discovery benchmark,

304
00:13:01,480 --> 00:13:03,640
user retention jumps by 40%.

305
00:13:03,640 --> 00:13:05,960
People use the tool because it actually helps them.

306
00:13:05,960 --> 00:13:07,480
They don't have to think about the tool,

307
00:13:07,480 --> 00:13:09,000
they just think about the task.

308
00:13:09,000 --> 00:13:11,080
But when you miss it, the AI becomes a nuisance.

309
00:13:11,080 --> 00:13:12,840
It is the difference between a helpful assistant

310
00:13:12,840 --> 00:13:15,880
and a slow intern who constantly interrupts your flow.

311
00:13:15,880 --> 00:13:18,680
Architects often try to hide this latency with clever UI tricks

312
00:13:18,680 --> 00:13:20,200
like loading shimmers or progress bars,

313
00:13:20,200 --> 00:13:22,200
but you can't trick the human brain for long.

314
00:13:22,200 --> 00:13:25,560
If the retrieval is slow, the quality of the conversation suffers.

315
00:13:25,560 --> 00:13:26,920
The LLM gets the data late,

316
00:13:26,920 --> 00:13:28,440
which means the first token appears late

317
00:13:28,440 --> 00:13:30,760
and the whole pipeline starts to sag under its own weight.

318
00:13:30,760 --> 00:13:33,560
We need to stop treating discovery as a background process

319
00:13:33,560 --> 00:13:35,720
and start treating it as a real-time requirement.

320
00:13:35,720 --> 00:13:38,680
This shift in thinking changes how we evaluate our infrastructure.

321
00:13:38,680 --> 00:13:40,920
We stop looking at total items indexed

322
00:13:40,920 --> 00:13:42,200
and start looking at time to knowledge.

323
00:13:42,200 --> 00:13:44,200
We need a discovery engine that bypasses

324
00:13:44,200 --> 00:13:46,200
the traditional file system overhead.

325
00:13:46,200 --> 00:13:48,600
We need a system that can traverse organizational relationships

326
00:13:48,600 --> 00:13:50,120
at the speed of an API call.

327
00:13:50,120 --> 00:13:51,720
This is the only way to hit the numbers

328
00:13:51,720 --> 00:13:54,200
that keep employees engaged and trusting the system.

329
00:13:54,200 --> 00:13:56,920
To achieve this, we have to move into the actual design of the system.

330
00:13:56,920 --> 00:13:58,680
We can't just talk about speed in the abstract.

331
00:13:58,680 --> 00:14:01,400
We have to look at how the data objects are actually linked.

332
00:14:01,400 --> 00:14:03,720
We need to understand the geometry of the graph.

333
00:14:03,720 --> 00:14:05,720
Let's move into the technical architecture

334
00:14:05,720 --> 00:14:07,560
for the designers and architects.

335
00:14:07,560 --> 00:14:09,880
We'll look at how the unified rest surface handles nodes

336
00:14:09,880 --> 00:14:12,600
and edges to make the sub-secondary morality.

337
00:14:12,600 --> 00:14:15,320
The graph API architecture nodes and edges.

338
00:14:15,320 --> 00:14:18,040
The Microsoft graph isn't just a list of separate endpoints

339
00:14:18,040 --> 00:14:18,840
for different apps.

340
00:14:18,840 --> 00:14:21,640
It's a single unified rest API surface

341
00:14:21,640 --> 00:14:23,640
that wraps everything in your tenant.

342
00:14:23,640 --> 00:14:25,800
In the past, if you wanted to find a file

343
00:14:25,800 --> 00:14:27,720
and then see who the author's manager was,

344
00:14:27,720 --> 00:14:29,400
you had to jump between different services

345
00:14:29,400 --> 00:14:31,080
and authenticate multiple times.

346
00:14:31,080 --> 00:14:33,160
You'd call SharePoint for the file metadata,

347
00:14:33,160 --> 00:14:35,480
then you'd call Active Directory for the user profile

348
00:14:35,480 --> 00:14:37,080
and finally, you'd make another call

349
00:14:37,080 --> 00:14:38,680
to find the reporting structure.

350
00:14:38,680 --> 00:14:41,160
It was a fragmented mess that killed your performance

351
00:14:41,160 --> 00:14:43,080
because every jump added network overhead

352
00:14:43,080 --> 00:14:44,200
and authentication lag.

353
00:14:44,200 --> 00:14:46,040
But now, with the graph, you're interacting

354
00:14:46,040 --> 00:14:47,000
with a single gateway.

355
00:14:47,000 --> 00:14:49,080
It doesn't matter if the data lives in exchange,

356
00:14:49,080 --> 00:14:50,440
SharePoint or Teams.

357
00:14:50,440 --> 00:14:51,880
It all looks and acts the same

358
00:14:51,880 --> 00:14:54,280
because it's presented through a consistent schema.

359
00:14:54,280 --> 00:14:56,840
This unification is what allows us to build discovery engines

360
00:14:56,840 --> 00:14:59,480
that don't get bogged down by the underlying silos.

361
00:14:59,480 --> 00:15:01,720
The magic happens in how the data is structured.

362
00:15:01,720 --> 00:15:03,880
Most of us grew up with relational databases

363
00:15:03,880 --> 00:15:06,200
where everything is organized in rows and columns.

364
00:15:06,200 --> 00:15:09,400
In that world, you have a table for users and a table for files.

365
00:15:09,400 --> 00:15:12,040
To find a relationship, you perform a SQL join.

366
00:15:12,040 --> 00:15:14,520
But joins are expensive and they don't scale well

367
00:15:14,520 --> 00:15:16,040
when relationships get deep.

368
00:15:16,040 --> 00:15:17,480
The graph uses a different design

369
00:15:17,480 --> 00:15:19,080
called a web of connected objects.

370
00:15:19,080 --> 00:15:20,760
In this model, every entity is a node.

371
00:15:20,760 --> 00:15:23,080
Your users are nodes, your groups are nodes.

372
00:15:23,080 --> 00:15:26,280
Every single file, chat message, and calendar event is a node.

373
00:15:26,280 --> 00:15:28,280
The connections between them are called edges

374
00:15:28,280 --> 00:15:30,280
because the database understands these connections

375
00:15:30,280 --> 00:15:32,520
natively without needing to scan entire tables

376
00:15:32,520 --> 00:15:34,920
for matching IDs, traversing these relationships

377
00:15:34,920 --> 00:15:36,920
becomes an operation that is incredibly fast.

378
00:15:36,920 --> 00:15:39,000
You aren't searching through a table to find a match.

379
00:15:39,000 --> 00:15:41,320
You're simply following a pointer from one node to the next.

380
00:15:41,320 --> 00:15:44,680
This geometry changes how we handle organizational hierarchies.

381
00:15:44,680 --> 00:15:47,480
Imagine a scenario where you need to find all the documents shared

382
00:15:47,480 --> 00:15:49,160
with a specific project team

383
00:15:49,160 --> 00:15:52,200
that has members spread across three different geographical regions.

384
00:15:52,200 --> 00:15:54,680
In a relational model, you'd have to query the group,

385
00:15:54,680 --> 00:15:55,880
get the list of members,

386
00:15:55,880 --> 00:15:59,080
and then check the permissions for every file against that list of IDs.

387
00:15:59,080 --> 00:16:00,760
It's a massive computational burden.

388
00:16:00,760 --> 00:16:02,920
But in the graph, you can traverse that entire hierarchy

389
00:16:02,920 --> 00:16:04,200
in a single API call.

390
00:16:04,200 --> 00:16:06,840
You can ask for the files connected to a specific group node.

391
00:16:06,840 --> 00:16:08,120
The system follows the edges

392
00:16:08,120 --> 00:16:10,200
and returns the results set immediately.

393
00:16:10,200 --> 00:16:13,240
This path-based discovery is what makes semantic search feel

394
00:16:13,240 --> 00:16:15,160
like it actually understands your business.

395
00:16:15,160 --> 00:16:18,040
It's not just looking for keywords, it's looking for context.

396
00:16:18,040 --> 00:16:19,720
It understands that a file is important

397
00:16:19,720 --> 00:16:21,480
because it's linked to a person who is linked

398
00:16:21,480 --> 00:16:22,680
to a specific department.

399
00:16:22,680 --> 00:16:26,040
Architects need to realize that this isn't just a convenience for developers.

400
00:16:26,040 --> 00:16:27,240
It's a performance strategy.

401
00:16:27,240 --> 00:16:29,720
When you use the graph to build your discovery pipeline,

402
00:16:29,720 --> 00:16:32,360
you're offloading the heavy lifting of relationship mapping

403
00:16:32,360 --> 00:16:34,760
to Microsoft's back end instead of doing it yourself.

404
00:16:34,760 --> 00:16:37,800
If they've optimized the traversal of these nodes at a global scale,

405
00:16:37,800 --> 00:16:40,920
your application doesn't have to manage the complexity of who reports to whom

406
00:16:40,920 --> 00:16:43,080
or which group owns which SharePoint site.

407
00:16:43,080 --> 00:16:45,000
You just query the relationship you need.

408
00:16:45,000 --> 00:16:47,320
This allows your discovery engine to stay lean.

409
00:16:47,320 --> 00:16:49,320
You can focus on the semantic analysis

410
00:16:49,320 --> 00:16:50,600
and the vector embeddings

411
00:16:50,600 --> 00:16:53,240
while the graph handles the structural truth of the organization.

412
00:16:53,240 --> 00:16:55,880
It's a shift from managing data to managing connections.

413
00:16:55,880 --> 00:16:57,960
It replaces the heavy lifting of your local server

414
00:16:57,960 --> 00:16:59,560
with the optimized paths of the cloud.

415
00:16:59,560 --> 00:17:01,800
But here is where most people mess up.

416
00:17:01,800 --> 00:17:03,640
They see the power of this unified surface

417
00:17:03,640 --> 00:17:05,000
and they want to open the floodgates.

418
00:17:05,000 --> 00:17:07,560
They assume that since one API can see everything,

419
00:17:07,560 --> 00:17:09,720
their application should have access to everything.

420
00:17:09,720 --> 00:17:11,080
This is a dangerous path.

421
00:17:11,080 --> 00:17:13,880
The unified nature of the graph is its greatest strength,

422
00:17:13,880 --> 00:17:15,320
but it's also its biggest risk

423
00:17:15,320 --> 00:17:18,280
if you don't control the scope of what your application can see.

424
00:17:18,280 --> 00:17:21,240
You can't just grant read all permissions and hope for the best.

425
00:17:21,240 --> 00:17:23,320
You need to understand the granular control plane

426
00:17:23,320 --> 00:17:25,240
that sits on top of this architecture

427
00:17:25,240 --> 00:17:28,360
before we can handle the actual data flow into our vector stores.

428
00:17:28,360 --> 00:17:30,040
We have to master the permissions model

429
00:17:30,040 --> 00:17:32,200
that keeps this web of objects secure.

430
00:17:32,200 --> 00:17:34,600
We need to distinguish between what an admin can do

431
00:17:34,600 --> 00:17:36,760
and what an application is allowed to see.

432
00:17:36,760 --> 00:17:39,400
That transition is where search becomes professional.

433
00:17:39,400 --> 00:17:40,680
Scopes and permissions.

434
00:17:40,680 --> 00:17:42,280
The granular control plane.

435
00:17:42,280 --> 00:17:44,200
You might assume that a global admin account

436
00:17:44,200 --> 00:17:45,400
gives your search application

437
00:17:45,400 --> 00:17:48,040
the same sweeping powers you use to manage the tenant.

438
00:17:48,040 --> 00:17:49,640
But in reality, it does the opposite.

439
00:17:49,640 --> 00:17:51,960
Microsoft built a wall between administrative roles

440
00:17:51,960 --> 00:17:53,480
and application permissions.

441
00:17:53,480 --> 00:17:55,160
This is the granular control plane.

442
00:17:55,160 --> 00:17:56,680
It's where most architects get stuck

443
00:17:56,680 --> 00:17:58,440
because they try to use a human model

444
00:17:58,440 --> 00:17:59,640
for a machine problem.

445
00:17:59,640 --> 00:18:03,480
In Enter ID, roles like security administrator or SharePoint admin

446
00:18:03,480 --> 00:18:05,400
are designed for people who log into portals

447
00:18:05,400 --> 00:18:06,840
to perform manual tasks.

448
00:18:06,840 --> 00:18:08,040
These roles are broad.

449
00:18:08,040 --> 00:18:10,600
But the Graph API doesn't care about your job title

450
00:18:10,600 --> 00:18:12,280
or your seniority in the company.

451
00:18:12,280 --> 00:18:13,880
It cares about scopes.

452
00:18:13,880 --> 00:18:15,880
A scope is a specific permission

453
00:18:15,880 --> 00:18:18,120
granted to an application rather than a person.

454
00:18:18,120 --> 00:18:19,320
You can think of it as a key

455
00:18:19,320 --> 00:18:22,360
that only fits one specific lock on one specific door.

456
00:18:22,360 --> 00:18:24,360
If you want your semantic search to find files,

457
00:18:24,360 --> 00:18:25,640
you don't give it admin rights.

458
00:18:25,640 --> 00:18:28,040
Instead, you give it the files read.

459
00:18:28,040 --> 00:18:30,520
All scope or the even more secure files.

460
00:18:30,520 --> 00:18:32,200
Read.selectedOption.

461
00:18:32,200 --> 00:18:33,640
This is the least privileged design

462
00:18:33,640 --> 00:18:36,600
that 2026 compliance frameworks are going to demand.

463
00:18:36,600 --> 00:18:39,160
By 2026, simply saying you needed access

464
00:18:39,160 --> 00:18:41,000
to everything to make the AI work

465
00:18:41,000 --> 00:18:43,320
won't be an acceptable answer during an audit.

466
00:18:43,320 --> 00:18:44,680
Regulators are looking for proof

467
00:18:44,680 --> 00:18:46,280
that you limited your applications reach

468
00:18:46,280 --> 00:18:48,920
to the bare minimum required for the task.

469
00:18:48,920 --> 00:18:52,040
Ingestion mechanics from SharePoint to Vector Store.

470
00:18:52,040 --> 00:18:53,720
Once you have your scopes locked down,

471
00:18:53,720 --> 00:18:56,200
you have to face the hard reality of moving data.

472
00:18:56,200 --> 00:18:57,640
This is the ingestion pipeline.

473
00:18:57,640 --> 00:19:00,120
It's the bridge between where your data lives in SharePoint

474
00:19:00,120 --> 00:19:02,200
and where your AI retrieves it in the Vector Store.

475
00:19:02,200 --> 00:19:04,520
Most people think ingestion is just a file copy.

476
00:19:04,520 --> 00:19:06,200
But in a semantic search architecture,

477
00:19:06,200 --> 00:19:08,520
it's actually an identity mapping exercise.

478
00:19:08,520 --> 00:19:11,320
When you use a graph connector to pull data into your index,

479
00:19:11,320 --> 00:19:13,480
the most difficult part isn't the text extraction.

480
00:19:13,480 --> 00:19:16,040
It's the access control list or ACL mapping.

481
00:19:16,040 --> 00:19:18,280
You have to take the permissions from the source system

482
00:19:18,280 --> 00:19:20,920
and translate them into something M365 understands.

483
00:19:20,920 --> 00:19:22,760
If you're pulling from an external file share,

484
00:19:22,760 --> 00:19:25,400
those NTFS permissions don't just work in the cloud.

485
00:19:25,400 --> 00:19:27,240
The connector has to map those local IDs

486
00:19:27,240 --> 00:19:28,280
to enter ID objects.

487
00:19:28,280 --> 00:19:29,960
This is a multi-step logic flow.

488
00:19:29,960 --> 00:19:31,960
First, the connector identifies the user.

489
00:19:31,960 --> 00:19:35,320
Then, it looks up the corresponding object ID in your tenant.

490
00:19:35,320 --> 00:19:38,040
Finally, it attaches that ID to the document metadata

491
00:19:38,040 --> 00:19:39,320
in the search index.

492
00:19:39,320 --> 00:19:42,200
If this mapping fails, your security trimming breaks.

493
00:19:42,200 --> 00:19:43,560
You either end up with a dog document

494
00:19:43,560 --> 00:19:45,640
that nobody can find or a leaked document

495
00:19:45,640 --> 00:19:46,840
that everyone can see.

496
00:19:46,840 --> 00:19:49,560
Then we have to talk about the physical speed of this process.

497
00:19:49,560 --> 00:19:51,880
The graph API has built in guardrails

498
00:19:51,880 --> 00:19:54,760
that most architects don't plan for until they hit them.

499
00:19:54,760 --> 00:19:59,160
Standard throughput for a graph connector is capped at roughly 25 items per second.

500
00:19:59,160 --> 00:20:00,520
On paper, that sounds fast.

501
00:20:00,520 --> 00:20:02,600
But do the math for an enterprise estate.

502
00:20:02,600 --> 00:20:04,200
If you have one million documents,

503
00:20:04,200 --> 00:20:06,440
a single connection will take over 11 hours

504
00:20:06,440 --> 00:20:07,880
to finish the initial load.

505
00:20:07,880 --> 00:20:10,600
And that's assuming you don't hit any other throttling limits.

506
00:20:10,600 --> 00:20:12,600
This is a hard physical constraint of the service.

507
00:20:12,600 --> 00:20:13,880
You can't just wish it away.

508
00:20:13,880 --> 00:20:15,160
So, how do you optimize?

509
00:20:15,160 --> 00:20:17,160
You don't just open one pipe and hope for the best.

510
00:20:17,160 --> 00:20:20,120
You have to use parallel connections and intelligent batching.

511
00:20:20,120 --> 00:20:23,320
The system allows 25 actions per second per connection.

512
00:20:23,320 --> 00:20:25,480
Smart architects segment their data

513
00:20:25,480 --> 00:20:28,120
into multiple connections based on business priority.

514
00:20:28,120 --> 00:20:30,600
You ingest your critical project folders first.

515
00:20:30,600 --> 00:20:32,360
You leave the legacy archives for later.

516
00:20:32,360 --> 00:20:34,920
This ensures that your time to value for the AI stays low

517
00:20:34,920 --> 00:20:37,560
even while the bulk of the data is still moving.

518
00:20:37,560 --> 00:20:39,640
Another critical lever is the schema annotation.

519
00:20:39,640 --> 00:20:41,640
When you define how your data looks in the graph,

520
00:20:41,640 --> 00:20:44,040
you have to tell the system how to treat each property.

521
00:20:44,040 --> 00:20:46,520
There is a big difference between a property being searchable

522
00:20:46,520 --> 00:20:48,040
and it being retrievable.

523
00:20:48,040 --> 00:20:51,160
Searchable means the engine can use that field to find the document.

524
00:20:51,160 --> 00:20:52,920
Retrievable means the engine can actually

525
00:20:52,920 --> 00:20:54,680
return that data to the user.

526
00:20:54,680 --> 00:20:57,320
Token life cycles, managing the state of truth.

527
00:20:57,320 --> 00:20:59,720
The most dangerous assumption you can make in a sync pipeline

528
00:20:59,720 --> 00:21:01,320
is that your state is permanent.

529
00:21:01,320 --> 00:21:03,960
Once you've successfully ingested your initial baseline,

530
00:21:03,960 --> 00:21:05,560
you enter the maintenance phase

531
00:21:05,560 --> 00:21:09,160
and this is where the Delta token life cycle takes center stage.

532
00:21:09,160 --> 00:21:11,240
If you treat this token like a simple timestamp,

533
00:21:11,240 --> 00:21:13,320
your search engine will eventually fail,

534
00:21:13,320 --> 00:21:16,760
because in reality, a Delta token is not a date.

535
00:21:16,760 --> 00:21:19,400
It is an opaque pointer to a specific transaction log

536
00:21:19,400 --> 00:21:20,920
on the Microsoft backend.

537
00:21:20,920 --> 00:21:23,480
Think about how a traditional database handles changes.

538
00:21:23,480 --> 00:21:26,360
It keeps a record of every insert, update, and delete,

539
00:21:26,360 --> 00:21:27,960
but that record isn't infinite.

540
00:21:27,960 --> 00:21:30,680
Eventually the logs are rotated or truncated to save space.

541
00:21:30,680 --> 00:21:32,200
The graph API does the same thing.

542
00:21:32,200 --> 00:21:34,760
Your token represents a specific spot in that log.

543
00:21:34,760 --> 00:21:37,480
As long as that spot exists, you can get your changes.

544
00:21:37,480 --> 00:21:39,800
But if your sync engine stays offline too long,

545
00:21:39,800 --> 00:21:41,400
the log moves past your bookmark

546
00:21:41,400 --> 00:21:43,320
and the pointer now points to nothing.

547
00:21:43,320 --> 00:21:44,440
That's where the system breaks.

548
00:21:44,440 --> 00:21:46,040
When you present an outdated token,

549
00:21:46,040 --> 00:21:47,640
the graph doesn't just give you all data.

550
00:21:47,640 --> 00:21:49,080
It throws a 410 gone error.

551
00:21:49,080 --> 00:21:50,040
This is a hard stop.

552
00:21:50,040 --> 00:21:53,240
It's the protocol telling you that the chain of truth has been broken.

553
00:21:53,240 --> 00:21:56,840
You might also see a recent required signal in the response body.

554
00:21:56,840 --> 00:21:58,360
Most developers see this and panic.

555
00:21:58,360 --> 00:21:59,480
They try to retry the call.

556
00:21:59,480 --> 00:22:00,920
They think it's a temporary glitch,

557
00:22:00,920 --> 00:22:04,120
but the reality is that your shortcut to the data has expired.

558
00:22:04,120 --> 00:22:06,440
When the chain breaks, you have to start over.

559
00:22:06,440 --> 00:22:08,760
You must discard your old state and perform a full crawl

560
00:22:08,760 --> 00:22:10,200
to rebuild the baseline.

561
00:22:10,200 --> 00:22:13,320
This is why token management is the heartbeat of your system.

562
00:22:13,320 --> 00:22:15,480
If you don't store these tokens with a timestamp

563
00:22:15,480 --> 00:22:16,280
of when they were issued,

564
00:22:16,280 --> 00:22:18,760
you won't know if you're approaching the expiration window.

565
00:22:18,760 --> 00:22:22,280
Most logs in the Microsoft ecosystem only last for 7 to 30 days.

566
00:22:22,280 --> 00:22:24,040
If your service goes down for a long weekend

567
00:22:24,040 --> 00:22:26,200
and you don't have a plan for recovery,

568
00:22:26,200 --> 00:22:29,880
you'll wake up to a broken index and a massive re-indexing bill.

569
00:22:29,880 --> 00:22:32,520
Throttling and back-off, engineering for resilience.

570
00:22:32,520 --> 00:22:34,040
We've talked about the logic of the sync,

571
00:22:34,040 --> 00:22:35,640
but we haven't talked about the physics.

572
00:22:35,640 --> 00:22:38,360
The Microsoft Graph API is a shared resource.

573
00:22:38,360 --> 00:22:40,920
Microsoft has to protect the tenant from being overwhelmed

574
00:22:40,920 --> 00:22:42,440
by a single aggressive app,

575
00:22:42,440 --> 00:22:44,040
and this is where we meet throttling.

576
00:22:44,040 --> 00:22:46,840
Most people see a 429 error and think their code is broken,

577
00:22:46,840 --> 00:22:48,040
but in reality.

578
00:22:48,040 --> 00:22:49,480
It's the system working correctly.

579
00:22:49,480 --> 00:22:52,440
It's a signal that you're moving too fast for the current environment.

580
00:22:52,440 --> 00:22:55,480
When your application requests too much data in a short window,

581
00:22:55,480 --> 00:22:57,880
the service sends a message that forces you to wait

582
00:22:57,880 --> 00:22:59,400
before making another call.

583
00:22:59,400 --> 00:23:01,080
Starting in March of 2026,

584
00:23:01,080 --> 00:23:03,560
the rules for data extraction are getting much stricter.

585
00:23:03,560 --> 00:23:05,160
Microsoft is rolling out new limits

586
00:23:05,160 --> 00:23:07,880
that specifically target large-scale discovery engines

587
00:23:07,880 --> 00:23:09,320
because they want to ensure

588
00:23:09,320 --> 00:23:13,320
that search indexing doesn't kill the performance of teams for the end users.

589
00:23:13,320 --> 00:23:15,400
If you haven't updated your logic by then,

590
00:23:15,400 --> 00:23:17,000
your sync will simply stop.

591
00:23:17,000 --> 00:23:19,720
It won't be a slow crawl, it will be a complete lockout.

592
00:23:19,720 --> 00:23:22,840
You need to understand how to read the signals the API is sending.

593
00:23:22,840 --> 00:23:24,600
Failing to adapt to these new policies

594
00:23:24,600 --> 00:23:27,160
means your AI will lose its connection to the real-time data

595
00:23:27,160 --> 00:23:29,400
it needs to function properly for your employees.

596
00:23:29,400 --> 00:23:31,560
The most important part of a 429 response

597
00:23:31,560 --> 00:23:32,840
isn't the error code itself.

598
00:23:32,840 --> 00:23:34,440
It's the retry after header.

599
00:23:34,440 --> 00:23:36,040
This is a specific value in seconds

600
00:23:36,040 --> 00:23:38,760
that tells you exactly how long to wait before trying again.

601
00:23:38,760 --> 00:23:40,520
I see so many developers ignore this.

602
00:23:40,520 --> 00:23:43,160
They write a loop that waits five seconds and tries again.

603
00:23:43,160 --> 00:23:45,800
If the header says 60 seconds and you try and five,

604
00:23:45,800 --> 00:23:47,720
the system sees you as a hostile actor.

605
00:23:47,720 --> 00:23:49,480
You have to respect the service instructions

606
00:23:49,480 --> 00:23:51,800
because ignoring them will lead to a permanent block

607
00:23:51,800 --> 00:23:54,920
that requires manual intervention from Microsoft support to resolve.

608
00:23:54,920 --> 00:23:57,400
Respecting the header is the first step,

609
00:23:57,400 --> 00:23:59,800
but you also need exponential back-off logic.

610
00:23:59,800 --> 00:24:01,720
This means that every time you hit a limit,

611
00:24:01,720 --> 00:24:03,240
you increase your wait time.

612
00:24:03,240 --> 00:24:06,280
If the first wait was 10 seconds, the next might be 20

613
00:24:06,280 --> 00:24:09,000
and this gives the service room to breathe and recover from the search,

614
00:24:09,000 --> 00:24:11,800
it's about being a good citizen in a multi-tenant cloud.

615
00:24:11,800 --> 00:24:14,120
This logic ensures that your pipeline remains stable

616
00:24:14,120 --> 00:24:16,120
even during periods of high network congestion

617
00:24:16,120 --> 00:24:18,600
when the back end is struggling to keep up with global demand

618
00:24:18,600 --> 00:24:20,600
and the needs of other organizations.

619
00:24:20,600 --> 00:24:23,480
We need to move toward what I call throttle-aware design.

620
00:24:23,480 --> 00:24:25,160
This means building your discovery engine

621
00:24:25,160 --> 00:24:27,000
with the assumption that you will be throttled.

622
00:24:27,000 --> 00:24:28,040
It's not an exception.

623
00:24:28,040 --> 00:24:29,800
It's a standard part of the workflow.

624
00:24:29,800 --> 00:24:32,360
You design your cues so they can pause and resume

625
00:24:32,360 --> 00:24:33,720
without losing data.

626
00:24:33,720 --> 00:24:35,400
If the system shuts you down for an hour,

627
00:24:35,400 --> 00:24:36,920
your app should just go to sleep.

628
00:24:36,920 --> 00:24:39,880
When the timer expires, it wakes up and picks up right where it left off.

629
00:24:39,880 --> 00:24:42,680
This approach transforms a fragile script

630
00:24:42,680 --> 00:24:45,800
into a resilient service that can handle the unpredictable nature

631
00:24:45,800 --> 00:24:47,880
of cloud scale data synchronization

632
00:24:47,880 --> 00:24:50,520
without crashing or losing its place in the log.

633
00:24:50,520 --> 00:24:52,360
In a large-scale environment,

634
00:24:52,360 --> 00:24:54,760
you might have multiple connections running at once.

635
00:24:54,760 --> 00:24:55,960
This compounds the risk.

636
00:24:55,960 --> 00:24:58,520
If 10 different connections all hit the limit at the same time,

637
00:24:58,520 --> 00:25:00,120
your whole tenant could be flagged.

638
00:25:00,120 --> 00:25:02,280
You need a central orchestrator that manages the total load

639
00:25:02,280 --> 00:25:03,880
across all your graph calls.

640
00:25:03,880 --> 00:25:07,160
It should act as a governor that limits the total requests per second.

641
00:25:07,160 --> 00:25:10,200
This prevents you from ever reaching the threshold in the first place.

642
00:25:10,200 --> 00:25:11,640
By staying just under the limit,

643
00:25:11,640 --> 00:25:13,080
you actually finish the crawl faster

644
00:25:13,080 --> 00:25:14,920
because you avoid the dead time associated

645
00:25:14,920 --> 00:25:17,320
with waiting for long-retry windows to expire.

646
00:25:17,320 --> 00:25:18,920
Once you nail this resilience,

647
00:25:18,920 --> 00:25:20,360
the system becomes invisible.

648
00:25:20,360 --> 00:25:21,880
It just works in the background,

649
00:25:21,880 --> 00:25:24,520
keeping your index fresh without causing drama.

650
00:25:24,520 --> 00:25:25,800
Now that we have the plumbing sorted,

651
00:25:25,800 --> 00:25:28,120
let's look at how this applies to something real.

652
00:25:28,120 --> 00:25:30,920
We'll look at a scenario every executive cares about,

653
00:25:30,920 --> 00:25:32,520
meeting intelligence.

654
00:25:32,520 --> 00:25:35,720
This is where the gap between older new models becomes obvious.

655
00:25:35,720 --> 00:25:37,880
Imagine a world where your meeting notes are

656
00:25:37,880 --> 00:25:39,720
searchable seconds after the call ends

657
00:25:39,720 --> 00:25:42,280
because your discovery engine was built to handle the heat.

658
00:25:42,280 --> 00:25:44,360
This level of responsiveness is only possible

659
00:25:44,360 --> 00:25:46,200
when you move away from the scheduled crawl

660
00:25:46,200 --> 00:25:48,920
and embrace the live pulse of the organization

661
00:25:48,920 --> 00:25:50,760
while maintaining a steady flow of data

662
00:25:50,760 --> 00:25:52,600
that respects the boundaries of the service.

663
00:25:52,600 --> 00:25:55,960
Scenario A, the live meeting intelligence loop.

664
00:25:55,960 --> 00:25:58,040
Let's walk through what actually happens in the real world

665
00:25:58,040 --> 00:25:59,960
when you build discovery the right way.

666
00:25:59,960 --> 00:26:02,440
Imagine it is 3pm on a Tuesday afternoon.

667
00:26:02,440 --> 00:26:05,880
Your executive team is finishing a quarterly planning session in teams

668
00:26:05,880 --> 00:26:08,440
where they just pivoted the entire go-to-market strategy.

669
00:26:08,440 --> 00:26:11,000
They spent the hour debating which regions to expand into

670
00:26:11,000 --> 00:26:14,280
and which ones to cut while also locking in specific product timelines

671
00:26:14,280 --> 00:26:16,040
and discussing competitive threats.

672
00:26:16,040 --> 00:26:19,480
One executive mentions a high stakes customer called scheduled for Thursday

673
00:26:19,480 --> 00:26:21,560
that depends entirely on these new decisions.

674
00:26:21,560 --> 00:26:23,640
The meeting ends, the transcript is generated

675
00:26:23,640 --> 00:26:25,640
and the notes are saved into SharePoint.

676
00:26:25,640 --> 00:26:27,640
But in a legacy system, nothing happens.

677
00:26:27,640 --> 00:26:30,680
The meeting wrapped up at 3.01pm

678
00:26:30,680 --> 00:26:32,920
but the next scheduled crawl isn't until 6pm.

679
00:26:32,920 --> 00:26:34,280
This means for the next three hours,

680
00:26:34,280 --> 00:26:36,120
your AI is completely blind to the fact

681
00:26:36,120 --> 00:26:38,360
that a major strategic shift just occurred.

682
00:26:38,360 --> 00:26:41,080
While this is happening, a product manager is at their desk

683
00:26:41,080 --> 00:26:43,240
working on a proposal and asks the AI

684
00:26:43,240 --> 00:26:45,480
for the current strategy in the APEC region.

685
00:26:45,480 --> 00:26:47,240
Because the system is waiting on a schedule,

686
00:26:47,240 --> 00:26:50,680
it searches the old index and finds a document from last quarter.

687
00:26:50,680 --> 00:26:53,480
The AI confidently summarizes the outdated plan

688
00:26:53,480 --> 00:26:56,120
and the product manager spends an hour writing a proposal

689
00:26:56,120 --> 00:26:58,360
based on information that is no longer true.

690
00:26:58,360 --> 00:27:00,760
By the time the crawl finally runs at 7pm,

691
00:27:00,760 --> 00:27:02,360
the damage is already done.

692
00:27:02,360 --> 00:27:04,120
The proposal has been sent to leadership

693
00:27:04,120 --> 00:27:06,840
and now everyone has to stop what they are doing to fix the mistake.

694
00:27:06,840 --> 00:27:08,600
That is institutional friction

695
00:27:08,600 --> 00:27:11,160
and it is the direct result of the cost of stale data.

696
00:27:11,160 --> 00:27:13,400
Now let's look at how this works with graph discovery.

697
00:27:13,400 --> 00:27:15,640
The moment that meeting ends at 3.01pm,

698
00:27:15,640 --> 00:27:17,240
a change event fires in the graph

699
00:27:17,240 --> 00:27:19,160
before the team's window even closes.

700
00:27:19,160 --> 00:27:21,160
The transcript is treated as a live object

701
00:27:21,160 --> 00:27:23,640
and the Delta query engine picks it up instantly.

702
00:27:23,640 --> 00:27:25,640
Within two seconds, the metadata is updated

703
00:27:25,640 --> 00:27:27,560
in your vector store, the embeddings are finished

704
00:27:27,560 --> 00:27:29,240
and security permissions are verified.

705
00:27:29,240 --> 00:27:32,280
By 3.03pm, the new decision is fully discoverable.

706
00:27:32,280 --> 00:27:35,880
When that same product manager asks their question at 3.15pm,

707
00:27:35,880 --> 00:27:37,720
the AI finds the fresh meeting notes

708
00:27:37,720 --> 00:27:40,680
and incorporates the updated strategy into the response.

709
00:27:40,680 --> 00:27:42,760
The entire cycle takes seconds instead of hours

710
00:27:42,760 --> 00:27:45,240
which means the proposal is written correctly the first time.

711
00:27:45,240 --> 00:27:47,640
But here is where the business impact really shows up.

712
00:27:47,640 --> 00:27:50,440
That Thursday customer call is with a prospect in Singapore

713
00:27:50,440 --> 00:27:53,160
and your sales team is already pitching the new strategy

714
00:27:53,160 --> 00:27:54,680
that was locked in on Tuesday.

715
00:27:54,680 --> 00:27:56,600
They aren't fumbling or realizing mid call

716
00:27:56,600 --> 00:27:57,880
that their talking points are wrong

717
00:27:57,880 --> 00:28:01,080
because the organization's knowledge is actually synced with reality.

718
00:28:01,080 --> 00:28:02,920
If you scale this across an entire enterprise,

719
00:28:02,920 --> 00:28:04,360
the numbers become staggering.

720
00:28:04,360 --> 00:28:06,120
Think about how many decisions happen every day

721
00:28:06,120 --> 00:28:08,360
that require people to know what just happened in a meeting.

722
00:28:08,360 --> 00:28:10,040
The old model forces your employees

723
00:28:10,040 --> 00:28:12,440
to manually track changes and hunt through emails

724
00:28:12,440 --> 00:28:15,000
which effectively turns your people into the search engine.

725
00:28:15,000 --> 00:28:18,040
The power of graph discovery is that it kills decision latency.

726
00:28:18,040 --> 00:28:19,880
Strategic moves made at 3pm

727
00:28:19,880 --> 00:28:22,520
are ready to influence work by 3.15pm.

728
00:28:22,520 --> 00:28:24,760
This isn't just a small win for efficiency

729
00:28:24,760 --> 00:28:27,960
but a fundamental change in how fast a company can move.

730
00:28:27,960 --> 00:28:30,840
For an executive, this is pure competitive speed.

731
00:28:30,840 --> 00:28:32,360
Your competitor might make a move

732
00:28:32,360 --> 00:28:34,120
and take six hours to operationalize it

733
00:28:34,120 --> 00:28:36,040
because their people are still in the dark.

734
00:28:36,040 --> 00:28:37,400
Your team does it in six minutes,

735
00:28:37,400 --> 00:28:39,960
that is a 60 times difference in decision velocity

736
00:28:39,960 --> 00:28:41,560
that compounds every single day.

737
00:28:41,560 --> 00:28:43,800
The magic here doesn't come from having better processes

738
00:28:43,800 --> 00:28:44,840
or smarter people.

739
00:28:44,840 --> 00:28:46,120
It comes from a discovery system

740
00:28:46,120 --> 00:28:48,200
that understands business moves at the speed of change,

741
00:28:48,200 --> 00:28:49,800
not the speed of a preset schedule.

742
00:28:49,800 --> 00:28:52,360
The system listens to the organization in real time.

743
00:28:52,360 --> 00:28:54,120
This is how search stops being a tool

744
00:28:54,120 --> 00:28:56,120
and starts being an invisible nervous system.

745
00:28:56,120 --> 00:28:59,080
Moving from batch processing to event-driven discovery

746
00:28:59,080 --> 00:29:00,680
isn't just a technical detail,

747
00:29:00,680 --> 00:29:02,840
it is a total business transformation.

748
00:29:02,840 --> 00:29:03,960
Scenario B.

749
00:29:03,960 --> 00:29:05,880
The semantic bridge across silos.

750
00:29:05,880 --> 00:29:08,920
The meeting notes show how discovery works inside one silo

751
00:29:08,920 --> 00:29:11,400
but real organizations work across them.

752
00:29:11,400 --> 00:29:13,960
This is where the true power of the graph becomes visible.

753
00:29:13,960 --> 00:29:15,320
Take a look at this case.

754
00:29:15,320 --> 00:29:18,360
Your VP of sales is getting ready for a major customer renewal.

755
00:29:18,360 --> 00:29:20,360
This client has been with you for three years

756
00:29:20,360 --> 00:29:22,360
but their contract expires in two months

757
00:29:22,360 --> 00:29:24,200
and they are starting to look at competitors.

758
00:29:24,200 --> 00:29:26,440
The VP needs the full story fast.

759
00:29:26,440 --> 00:29:27,960
She needs to know about past complaints

760
00:29:27,960 --> 00:29:29,480
which features they actually use

761
00:29:29,480 --> 00:29:31,080
and if there are any open support tickets

762
00:29:31,080 --> 00:29:32,200
that might blow up the deal.

763
00:29:32,200 --> 00:29:33,320
In a legacy setup,

764
00:29:33,320 --> 00:29:36,360
that information is scattered across three or four different worlds.

765
00:29:36,360 --> 00:29:38,440
The CRM holds the account history,

766
00:29:38,440 --> 00:29:40,120
the support system has the tickets

767
00:29:40,120 --> 00:29:43,160
and the team's channels hold the messy day-to-day conversations.

768
00:29:43,160 --> 00:29:45,960
There is also a one-drive folder somewhere with usage reports

769
00:29:45,960 --> 00:29:47,400
because these systems aren't connected.

770
00:29:47,400 --> 00:29:49,400
The VP has to jump between Salesforce,

771
00:29:49,400 --> 00:29:50,680
Zendesk and Teams,

772
00:29:50,680 --> 00:29:52,120
while downloading files manually.

773
00:29:52,120 --> 00:29:53,800
She is forced to be the integration layer.

774
00:29:53,800 --> 00:29:55,640
Her brain has to act as the data hub

775
00:29:55,640 --> 00:29:56,520
and because she is busy,

776
00:29:56,520 --> 00:29:58,120
she misses a critical conversation

777
00:29:58,120 --> 00:30:01,240
that happened three weeks ago in a channel she didn't even know existed.

778
00:30:01,240 --> 00:30:04,120
She walks into the renewal meeting with an incomplete picture,

779
00:30:04,120 --> 00:30:05,960
the customer feels misunderstood

780
00:30:05,960 --> 00:30:07,240
and the deal falls through.

781
00:30:07,240 --> 00:30:08,840
Now, imagine the same scenario

782
00:30:08,840 --> 00:30:11,640
with a properly designed graph discovery system.

783
00:30:11,640 --> 00:30:13,160
The VP asks one question,

784
00:30:13,160 --> 00:30:15,320
"What is the complete context for this renewal?"

785
00:30:15,320 --> 00:30:16,360
Behind the scenes,

786
00:30:16,360 --> 00:30:18,520
the system doesn't just search a single index.

787
00:30:18,520 --> 00:30:20,040
It runs a multi-hop query

788
00:30:20,040 --> 00:30:22,120
across the entire organizational graph.

789
00:30:22,120 --> 00:30:23,880
First, it links the customer and the CRM

790
00:30:23,880 --> 00:30:25,880
to the specific team in EntraID.

791
00:30:25,880 --> 00:30:27,560
Then it pulls every Teams channel

792
00:30:27,560 --> 00:30:29,320
and one-drive folder connected to that group.

793
00:30:29,320 --> 00:30:31,480
Finally, it grabs the support tickets associated

794
00:30:31,480 --> 00:30:32,600
with those specific users.

795
00:30:32,600 --> 00:30:34,520
In a few seconds, it has mapped out every corner

796
00:30:34,520 --> 00:30:36,440
of the company where this customer exists.

797
00:30:36,440 --> 00:30:38,680
But mapping the data is just the start.

798
00:30:38,680 --> 00:30:41,880
The system then analyzes the relationships between these pieces.

799
00:30:41,880 --> 00:30:44,120
It notices that a specific feature request

800
00:30:44,120 --> 00:30:47,080
mentioned in EntraID is the same one being debated

801
00:30:47,080 --> 00:30:48,920
in a Teams chat and noted in the CRM.

802
00:30:48,920 --> 00:30:51,240
This is where discovery becomes a knowledge engine.

803
00:30:51,240 --> 00:30:52,680
It isn't just finding files,

804
00:30:52,680 --> 00:30:55,000
it is understanding how they relate to each other.

805
00:30:55,000 --> 00:30:56,920
The VP doesn't want a pile of documents.

806
00:30:56,920 --> 00:30:59,720
She wants a coherent narrative of the customer relationship.

807
00:30:59,720 --> 00:31:03,160
The AI builds this story using sub-query decomposition.

808
00:31:03,160 --> 00:31:05,640
It breaks the big question about renewal risk

809
00:31:05,640 --> 00:31:07,080
into smaller pieces like,

810
00:31:07,080 --> 00:31:08,440
"What are the open issues?"

811
00:31:08,440 --> 00:31:10,200
And "What is the recent sentiment?"

812
00:31:10,200 --> 00:31:12,600
It answers each one by hitting the right silo

813
00:31:12,600 --> 00:31:14,280
and then weaves those answers together.

814
00:31:14,280 --> 00:31:18,680
This synthesis only works because the graph understands the connections.

815
00:31:18,680 --> 00:31:21,720
It knows that a support ticket can be linked to a specific user

816
00:31:21,720 --> 00:31:23,720
and that a one-drive folder has metadata

817
00:31:23,720 --> 00:31:25,000
that ties it to a project.

818
00:31:25,000 --> 00:31:28,680
The result is that the VP no longer has to be the bridge between systems.

819
00:31:28,680 --> 00:31:31,720
She gets a full clear picture in five minutes instead of 30.

820
00:31:31,720 --> 00:31:32,840
She makes a better proposal

821
00:31:32,840 --> 00:31:34,920
and has a much higher chance of keeping the client.

822
00:31:34,920 --> 00:31:37,000
This is the hidden strength of graph discovery.

823
00:31:37,000 --> 00:31:38,760
It doesn't just make things faster.

824
00:31:38,760 --> 00:31:40,520
It makes the whole organization smarter

825
00:31:40,520 --> 00:31:42,840
by surfacing the connections that humans usually miss

826
00:31:42,840 --> 00:31:44,360
when they are trapped in silos.

827
00:31:44,360 --> 00:31:45,960
It builds a bridge across departments

828
00:31:45,960 --> 00:31:48,440
and transforms raw data into actual intelligence.

829
00:31:48,440 --> 00:31:50,760
This foundation of connected data

830
00:31:50,760 --> 00:31:52,920
is what allows us to build reasoning systems

831
00:31:52,920 --> 00:31:53,880
on top of it.

832
00:31:53,880 --> 00:31:55,560
That is where we move into rag pipelines

833
00:31:55,560 --> 00:31:57,400
and the world of multi-hop reasoning.

834
00:31:57,400 --> 00:31:58,680
The graph rag shift.

835
00:31:58,680 --> 00:32:00,200
Reesoning over relationships.

836
00:32:00,200 --> 00:32:01,880
The scenarios we've been discussing work

837
00:32:01,880 --> 00:32:04,520
because graph discovery doesn't just pull data from a pile

838
00:32:04,520 --> 00:32:06,120
but instead it retrieves data

839
00:32:06,120 --> 00:32:08,360
that actually understands its own internal structure.

840
00:32:08,360 --> 00:32:10,760
This distinction matters enormously

841
00:32:10,760 --> 00:32:13,480
when we look at how AI systems process what they find

842
00:32:13,480 --> 00:32:15,160
and it's why we need to talk about the shift

843
00:32:15,160 --> 00:32:18,760
from traditional rag to something much more powerful called graph rag.

844
00:32:18,760 --> 00:32:21,880
Most semantic search systems today use what we call flat rag

845
00:32:21,880 --> 00:32:24,440
and the architecture behind it is very straightforward.

846
00:32:24,440 --> 00:32:26,680
You take your documents, break them into chunks,

847
00:32:26,680 --> 00:32:28,840
convert those chunks into vector embeddings

848
00:32:28,840 --> 00:32:31,240
and then you store those vectors in a database.

849
00:32:31,240 --> 00:32:32,600
When a user asks a question,

850
00:32:32,600 --> 00:32:34,920
the system converts that question into a vector,

851
00:32:34,920 --> 00:32:36,520
finds the nearest neighbors in that space

852
00:32:36,520 --> 00:32:38,840
and hands those text chunks to an LLM.

853
00:32:38,840 --> 00:32:41,320
The problem is that the system treats all text

854
00:32:41,320 --> 00:32:42,680
as interchangeable units.

855
00:32:42,680 --> 00:32:44,920
So it doesn't care if a chunk came from a meeting note

856
00:32:44,920 --> 00:32:46,040
or a support ticket.

857
00:32:46,040 --> 00:32:48,680
It has no way of knowing that two different chunks are related

858
00:32:48,680 --> 00:32:50,920
because they mention the same customer or project

859
00:32:50,920 --> 00:32:52,840
because the system is purely semantic,

860
00:32:52,840 --> 00:32:54,920
meaning it matches on general meaning

861
00:32:54,920 --> 00:32:56,600
rather than actual structure.

862
00:32:56,600 --> 00:32:58,120
This approach has real limitations

863
00:32:58,120 --> 00:33:00,200
that show up quickly in a professional setting.

864
00:33:00,200 --> 00:33:01,960
It works beautifully for simple questions

865
00:33:01,960 --> 00:33:04,360
where you just need to find one specific document

866
00:33:04,360 --> 00:33:05,720
but it falls apart on questions

867
00:33:05,720 --> 00:33:08,360
that require reasoning across different relationships.

868
00:33:08,360 --> 00:33:09,800
If you ask about the dependencies

869
00:33:09,800 --> 00:33:11,000
between your product roadmap

870
00:33:11,000 --> 00:33:12,920
and the customer feedback from last month,

871
00:33:12,920 --> 00:33:14,920
the system has to guess at the connection.

872
00:33:14,920 --> 00:33:16,360
It can't inherently understand

873
00:33:16,360 --> 00:33:19,000
that a roadmap item is tied to a piece of feedback

874
00:33:19,000 --> 00:33:21,000
just because they mention the same feature

875
00:33:21,000 --> 00:33:23,560
so it can only match on semantic similarity

876
00:33:23,560 --> 00:33:24,840
and hope for the best.

877
00:33:24,840 --> 00:33:27,480
Graphrag is different because instead of treating all text

878
00:33:27,480 --> 00:33:29,960
as flat chunks, it builds a literal graph.

879
00:33:29,960 --> 00:33:31,480
The notes in this graph are entities

880
00:33:31,480 --> 00:33:34,360
like customers, features, products, teams, and time periods

881
00:33:34,360 --> 00:33:37,000
while the edges represent the relationships between them.

882
00:33:37,000 --> 00:33:38,520
You might have a node for a customer

883
00:33:38,520 --> 00:33:40,760
that is connected to a feature they requested

884
00:33:40,760 --> 00:33:42,360
which is then connected to a roadmap

885
00:33:42,360 --> 00:33:43,800
where that feature appears

886
00:33:43,800 --> 00:33:45,400
and a specific team that owns it.

887
00:33:45,400 --> 00:33:48,200
When you build this graph over your organizational data,

888
00:33:48,200 --> 00:33:50,360
you've created a map of not just what was said

889
00:33:50,360 --> 00:33:52,920
but how every single piece of information is connected.

890
00:33:52,920 --> 00:33:54,600
Here is how that looks in practice.

891
00:33:54,600 --> 00:33:57,160
When documents are ingested into a graph-rag system,

892
00:33:57,160 --> 00:33:58,600
an extraction layer runs first

893
00:33:58,600 --> 00:34:02,200
to identify the entities and relationships within the text.

894
00:34:02,200 --> 00:34:05,160
This layer uses an LLM to find mentions of customers,

895
00:34:05,160 --> 00:34:06,440
products, and decisions,

896
00:34:06,440 --> 00:34:08,600
but it doesn't just pull them out in isolation.

897
00:34:08,600 --> 00:34:10,440
It captures how they relate to each other

898
00:34:10,440 --> 00:34:12,920
so a support ticket becomes a node connected to a customer

899
00:34:12,920 --> 00:34:14,680
and a product while a feature request

900
00:34:14,680 --> 00:34:16,600
links directly to a roadmap entry.

901
00:34:16,600 --> 00:34:17,880
Once this graph is built,

902
00:34:17,880 --> 00:34:20,520
the way we retrieve information changes fundamentally.

903
00:34:20,520 --> 00:34:22,760
Instead of just finding the most similar vectors,

904
00:34:22,760 --> 00:34:24,840
the system can now follow specific paths

905
00:34:24,840 --> 00:34:26,520
through the graph to find answers.

906
00:34:26,520 --> 00:34:28,200
If you ask about a customer's needs,

907
00:34:28,200 --> 00:34:30,280
the system can move from the customer node

908
00:34:30,280 --> 00:34:31,640
to their support tickets,

909
00:34:31,640 --> 00:34:33,080
then to the features they requested,

910
00:34:33,080 --> 00:34:36,520
and finally, to how those features appear in your roadmap.

911
00:34:36,520 --> 00:34:38,440
It is reasoning through relationships

912
00:34:38,440 --> 00:34:41,240
rather than just matching on keywords or general concepts.

913
00:34:41,240 --> 00:34:42,920
The payoff for LLMs is enormous

914
00:34:42,920 --> 00:34:45,560
because when you hand a model a set of flat text chunks,

915
00:34:45,560 --> 00:34:48,120
it has to figure out the relationships on its own.

916
00:34:48,120 --> 00:34:50,680
That process is context heavy and very prone to errors,

917
00:34:50,680 --> 00:34:53,800
but when you hand an LLM a subgraph of connected entities,

918
00:34:53,800 --> 00:34:55,800
the model can reason with much more clarity.

919
00:34:55,800 --> 00:34:58,200
It knows that two pieces of information are connected

920
00:34:58,200 --> 00:34:59,720
because they share a specific entity,

921
00:34:59,720 --> 00:35:02,440
which means it doesn't have to guess or infer the connection.

922
00:35:02,440 --> 00:35:04,360
Graphs also create a secondary benefit

923
00:35:04,360 --> 00:35:05,560
that people often overlook,

924
00:35:05,560 --> 00:35:08,200
which is that they allow for summarization at a massive scale.

925
00:35:08,200 --> 00:35:09,480
And if your graph is huge,

926
00:35:09,480 --> 00:35:11,160
you can't feed the whole thing to an LLM

927
00:35:11,160 --> 00:35:13,000
without hitting a context window limit.

928
00:35:13,000 --> 00:35:14,760
So graph-rag systems use a technique called

929
00:35:14,760 --> 00:35:16,200
community detection.

930
00:35:16,200 --> 00:35:18,760
They identify clusters of tightly connected entities

931
00:35:18,760 --> 00:35:21,320
and generate summaries for each of those communities.

932
00:35:21,320 --> 00:35:22,920
When a user asks a question,

933
00:35:22,920 --> 00:35:24,760
the system retrieves these summaries

934
00:35:24,760 --> 00:35:25,880
instead of the raw graph,

935
00:35:25,880 --> 00:35:27,240
which keeps the semantic meaning

936
00:35:27,240 --> 00:35:28,520
and the relational structure

937
00:35:28,520 --> 00:35:30,040
while using much less space.

938
00:35:30,040 --> 00:35:32,760
This solves the context window problem entirely.

939
00:35:32,760 --> 00:35:35,560
Instead of struggling to fit a massive graph into an LLM,

940
00:35:35,560 --> 00:35:37,080
you strategically shrink the data

941
00:35:37,080 --> 00:35:39,640
by moving from raw details to abstracted summaries.

942
00:35:39,640 --> 00:35:41,000
You keep the parts that actually matter

943
00:35:41,000 --> 00:35:42,040
for answering the question

944
00:35:42,040 --> 00:35:43,400
and you throw away the noise.

945
00:35:43,400 --> 00:35:45,080
The computational overhead of building

946
00:35:45,080 --> 00:35:47,560
and maintaining these graphs is a real factor to consider,

947
00:35:47,560 --> 00:35:49,240
but the benefit is just as real.

948
00:35:49,240 --> 00:35:50,600
You get reasoning capabilities

949
00:35:50,600 --> 00:35:52,680
that pure vector retrieval simply cannot match.

950
00:35:52,680 --> 00:35:55,240
And in the end, you are trading indexing complexity

951
00:35:55,240 --> 00:35:57,640
for much higher retrieval intelligence.

952
00:35:57,640 --> 00:35:59,000
This raises the central question

953
00:35:59,000 --> 00:36:00,440
that architects have to figure out,

954
00:36:00,440 --> 00:36:02,840
which is how you build and maintain these graphs

955
00:36:02,840 --> 00:36:05,000
without drowning in the cost of running them.

956
00:36:05,000 --> 00:36:07,640
Lazy graph construction performance versus cost.

957
00:36:07,640 --> 00:36:09,400
The answer to that cost question determines

958
00:36:09,400 --> 00:36:12,200
whether graph-rag is actually viable for a real business.

959
00:36:12,200 --> 00:36:13,960
Full graph-rag is computationally expensive

960
00:36:13,960 --> 00:36:16,840
because you have to extract every entity from every document,

961
00:36:16,840 --> 00:36:18,760
identify every relationship,

962
00:36:18,760 --> 00:36:21,240
and then cluster the graph to generate summaries.

963
00:36:21,240 --> 00:36:23,560
The indexing cost multiplier is staggering,

964
00:36:23,560 --> 00:36:25,720
often reaching about 1,000 times the cost

965
00:36:25,720 --> 00:36:27,080
of traditional vector indexing.

966
00:36:27,080 --> 00:36:30,200
If your vector only rag costs $100 a month to run,

967
00:36:30,200 --> 00:36:33,160
a full graph-rag system could jump to $100,000.

968
00:36:33,160 --> 00:36:36,440
And for most organizations, that is a complete non-starter.

969
00:36:36,440 --> 00:36:38,520
This is where lazy graph-rag changes the equation

970
00:36:38,520 --> 00:36:41,720
for everyone, instead of building the entire graph up front.

971
00:36:41,720 --> 00:36:44,600
You build it on demand only when a query actually requires

972
00:36:44,600 --> 00:36:45,640
that level of detail.

973
00:36:45,640 --> 00:36:48,200
Most of the time, your data just sits as vectors

974
00:36:48,200 --> 00:36:49,720
in storage where they are indexed

975
00:36:49,720 --> 00:36:51,720
for semantic similarity and nothing more.

976
00:36:51,720 --> 00:36:53,480
But when a user asks a complex question

977
00:36:53,480 --> 00:36:55,160
that needs relational reasoning,

978
00:36:55,160 --> 00:36:57,160
the system detects the need and triggers

979
00:36:57,160 --> 00:36:58,600
the lazy graph construction.

980
00:36:58,600 --> 00:36:59,800
Here is how the process works.

981
00:36:59,800 --> 00:37:02,440
A user asks a question that involves multiple hops

982
00:37:02,440 --> 00:37:03,640
across the organization,

983
00:37:03,640 --> 00:37:06,280
and the system recognizes that this requires reasoning,

984
00:37:06,280 --> 00:37:07,960
rather than just simple retrieval.

985
00:37:07,960 --> 00:37:10,120
It takes the top results from an initial vector search

986
00:37:10,120 --> 00:37:11,800
to find the candidate documents,

987
00:37:11,800 --> 00:37:13,800
and then it extracts entities and relationships

988
00:37:13,800 --> 00:37:15,240
only from those specific files.

989
00:37:15,240 --> 00:37:17,320
It builds a small, focused subgraph

990
00:37:17,320 --> 00:37:19,720
that is just large enough to answer the question at hand.

991
00:37:19,720 --> 00:37:21,000
And once the answer is delivered,

992
00:37:21,000 --> 00:37:23,080
that temporary graph is discarded.

993
00:37:23,080 --> 00:37:24,440
The cost difference is dramatic

994
00:37:24,440 --> 00:37:26,600
because you aren't processing your entire corpus

995
00:37:26,600 --> 00:37:27,960
during the indexing phase.

996
00:37:27,960 --> 00:37:30,200
Full graph-rag requires mapping every relationship

997
00:37:30,200 --> 00:37:31,480
in your entire database,

998
00:37:31,480 --> 00:37:33,480
but lazy graph-rag only performs extraction

999
00:37:33,480 --> 00:37:35,240
on the documents that are actually relevant

1000
00:37:35,240 --> 00:37:36,760
to the current query.

1001
00:37:36,760 --> 00:37:39,320
The computational work shrinks by orders of magnitude

1002
00:37:39,320 --> 00:37:42,200
because you are only processing what matters in the moment.

1003
00:37:42,200 --> 00:37:43,560
To put some real numbers on this,

1004
00:37:43,560 --> 00:37:46,200
full graph-rag indexing costs usually run

1005
00:37:46,200 --> 00:37:49,560
at about 100 times the cost of a standard vector pass.

1006
00:37:49,560 --> 00:37:53,160
Lazy graph-rag adds only about 10% overhead to the retrieval phase.

1007
00:37:53,160 --> 00:37:55,960
So if your base retrieval takes 100 milliseconds,

1008
00:37:55,960 --> 00:37:58,920
the lazy construction adds only 10 milliseconds of work.

1009
00:37:58,920 --> 00:38:01,800
That is the difference between a $100,000 monthly bill

1010
00:38:01,800 --> 00:38:03,240
and a $10,000 bill,

1011
00:38:03,240 --> 00:38:05,640
which finally makes graph-rag economically viable

1012
00:38:05,640 --> 00:38:06,680
for most companies.

1013
00:38:06,680 --> 00:38:08,920
The trade-off is that you do lose some of the benefits

1014
00:38:08,920 --> 00:38:11,160
of having a pre-built graph ready to go.

1015
00:38:11,160 --> 00:38:13,160
A graph that was constructed with global knowledge

1016
00:38:13,160 --> 00:38:15,400
understands the overall structure of your organization

1017
00:38:15,400 --> 00:38:17,880
and knows which communities are isolated or connected.

1018
00:38:17,880 --> 00:38:20,280
It can generate summaries of entire domains,

1019
00:38:20,280 --> 00:38:22,440
whereas a lazy graph built on demand

1020
00:38:22,440 --> 00:38:25,640
only understands the local context around the current question.

1021
00:38:25,640 --> 00:38:27,720
But here is the insight that most architects miss,

1022
00:38:27,720 --> 00:38:29,320
which is that the big picture isn't always

1023
00:38:29,320 --> 00:38:30,920
what the user actually needs.

1024
00:38:30,920 --> 00:38:32,520
When someone asks a specific question,

1025
00:38:32,520 --> 00:38:34,040
they usually need local reasoning

1026
00:38:34,040 --> 00:38:36,360
rather than a global perspective on the entire company.

1027
00:38:36,360 --> 00:38:38,680
They want to understand how the specific entities

1028
00:38:38,680 --> 00:38:40,360
in their question relate to each other.

1029
00:38:40,360 --> 00:38:43,320
And they don't need to know about every other entity in the system.

1030
00:38:43,320 --> 00:38:45,960
A focused graph built on demand is often more than enough,

1031
00:38:45,960 --> 00:38:47,960
and the system trades global optimization

1032
00:38:47,960 --> 00:38:50,280
for local precision and cost efficiency.

1033
00:38:50,280 --> 00:38:51,880
This represents a fundamental shift

1034
00:38:51,880 --> 00:38:53,960
in how we think about building these systems.

1035
00:38:53,960 --> 00:38:56,520
In the old paradigm, you build the best possible graph upfront

1036
00:38:56,520 --> 00:38:57,800
and just paid the high cost.

1037
00:38:57,800 --> 00:38:59,080
But in the lazy paradigm,

1038
00:38:59,080 --> 00:39:01,640
you build just enough graph to answer the question.

1039
00:39:01,640 --> 00:39:05,240
It is a shift from trying to anticipate what users might ask

1040
00:39:05,240 --> 00:39:08,040
to simply responding to what they are actually asking.

1041
00:39:08,040 --> 00:39:10,680
The operational agility here is also a major benefit.

1042
00:39:10,680 --> 00:39:12,120
In a full graph-rag system,

1043
00:39:12,120 --> 00:39:14,680
when your organization changes or new product launches,

1044
00:39:14,680 --> 00:39:16,360
you have to rebuild the entire graph

1045
00:39:16,360 --> 00:39:18,600
in a heavy operation that can take days.

1046
00:39:18,600 --> 00:39:19,640
In a lazy system,

1047
00:39:19,640 --> 00:39:21,720
the extraction logic automatically applies

1048
00:39:21,720 --> 00:39:23,320
to new data as it comes in.

1049
00:39:23,320 --> 00:39:24,680
When a user runs a query,

1050
00:39:24,680 --> 00:39:25,880
they get current information

1051
00:39:25,880 --> 00:39:27,800
because the extraction happens at query time

1052
00:39:27,800 --> 00:39:28,760
rather than index time,

1053
00:39:28,760 --> 00:39:31,480
which means you don't have to deal with battery-fresh cycles.

1054
00:39:31,480 --> 00:39:33,480
This makes lazy graph-rag the practical bridge

1055
00:39:33,480 --> 00:39:36,120
between vector-only-rag and the full graph model.

1056
00:39:36,120 --> 00:39:37,320
You get relational reasoning

1057
00:39:37,320 --> 00:39:39,640
when you need it without paying the high cost when you don't

1058
00:39:39,640 --> 00:39:41,000
and you get operational agility

1059
00:39:41,000 --> 00:39:43,080
without sacrificing the quality of the answers.

1060
00:39:43,080 --> 00:39:45,080
Most importantly, it makes graph-rag affordable enough

1061
00:39:45,080 --> 00:39:47,320
that architects can actually put it into production.

1062
00:39:47,320 --> 00:39:49,080
Now that we've solved the cost problem,

1063
00:39:49,080 --> 00:39:51,560
we have to address a different constraint, which is geography.

1064
00:39:51,560 --> 00:39:54,200
Organizations today don't operate in a single region

1065
00:39:54,200 --> 00:39:57,080
and data often lives in multiple countries with different rules.

1066
00:39:57,080 --> 00:40:00,200
This brings us to the 2026 data residency requirements

1067
00:40:00,200 --> 00:40:01,480
that are currently reshaping

1068
00:40:01,480 --> 00:40:04,040
how we think about discovery infrastructure.

1069
00:40:04,040 --> 00:40:07,320
The EU Data Boundary, residency in the AI era.

1070
00:40:07,320 --> 00:40:09,640
We have spent a lot of time talking about performance

1071
00:40:09,640 --> 00:40:11,640
and discovery speed, but by 2026,

1072
00:40:11,640 --> 00:40:13,000
you cannot have a conversation

1073
00:40:13,000 --> 00:40:14,840
about enterprise search architecture

1074
00:40:14,840 --> 00:40:17,480
without also addressing regulation.

1075
00:40:17,480 --> 00:40:20,120
The EU AI Act does not care about your latency benchmarks

1076
00:40:20,120 --> 00:40:21,960
because it only cares about where your data lives

1077
00:40:21,960 --> 00:40:23,000
and who can see it.

1078
00:40:23,000 --> 00:40:26,360
The European Union created something called the EU Data Boundary

1079
00:40:26,360 --> 00:40:28,520
and on the surface, it sounds straightforward.

1080
00:40:28,520 --> 00:40:31,320
Data from EU customers stays in EU data centers.

1081
00:40:31,320 --> 00:40:33,080
But the reality is far more complicated

1082
00:40:33,080 --> 00:40:35,160
once you layer in AI systems

1083
00:40:35,160 --> 00:40:37,480
that require inference and logging infrastructure.

1084
00:40:37,480 --> 00:40:38,760
Here is the fundamental tension.

1085
00:40:38,760 --> 00:40:41,720
Your Microsoft 365 tenant sits in an EU region,

1086
00:40:41,720 --> 00:40:44,360
which means SharePoint, Teams and Exchange,

1087
00:40:44,360 --> 00:40:47,080
all run from Frankfurt, Dublin or Amsterdam.

1088
00:40:47,080 --> 00:40:48,920
The data itself never leaves Europe,

1089
00:40:48,920 --> 00:40:50,280
but when you enable co-pilot

1090
00:40:50,280 --> 00:40:52,440
or build semantic search on top of that data,

1091
00:40:52,440 --> 00:40:55,160
the AI model needs to process information somewhere.

1092
00:40:55,160 --> 00:40:57,640
The LLM inference might happen in a different region

1093
00:40:57,640 --> 00:40:59,400
and the logs that track what the model did

1094
00:40:59,400 --> 00:41:02,360
and what it accessed might go somewhere else entirely.

1095
00:41:02,360 --> 00:41:04,520
That is where the data boundary creates friction.

1096
00:41:04,520 --> 00:41:07,960
The EU data boundary applies to most of Microsoft 365

1097
00:41:07,960 --> 00:41:10,680
and core services like SharePoint and OneDrive

1098
00:41:10,680 --> 00:41:13,240
commit to keeping customer data within the region.

1099
00:41:13,240 --> 00:41:14,920
But there is a critical carve-out

1100
00:41:14,920 --> 00:41:17,720
because certain services are excluded from this commitment.

1101
00:41:17,720 --> 00:41:19,560
These excluded services are exactly the ones

1102
00:41:19,560 --> 00:41:21,480
that power semantic search in AI.

1103
00:41:21,480 --> 00:41:23,720
When you use Microsoft 365 co-pilot,

1104
00:41:23,720 --> 00:41:26,040
the inference does not happen on your local servers

1105
00:41:26,040 --> 00:41:28,360
that happens on the global infrastructure of Azure,

1106
00:41:28,360 --> 00:41:30,600
which includes data centers outside the EU.

1107
00:41:30,600 --> 00:41:32,840
When your semantic search indexes documents

1108
00:41:32,840 --> 00:41:34,200
and stores embeddings,

1109
00:41:34,200 --> 00:41:35,720
the embedding model might be running

1110
00:41:35,720 --> 00:41:37,960
in a data center in the United States.

1111
00:41:37,960 --> 00:41:39,480
The logs from these operations,

1112
00:41:39,480 --> 00:41:41,320
including which documents were retrieved

1113
00:41:41,320 --> 00:41:43,400
and what context was fed to the model,

1114
00:41:43,400 --> 00:41:44,600
might be stored in a region

1115
00:41:44,600 --> 00:41:47,800
that does not have the same data protection guarantees as the EU.

1116
00:41:47,800 --> 00:41:49,320
This creates a compliance gap.

1117
00:41:49,320 --> 00:41:50,840
Your primary data lives in the EU,

1118
00:41:50,840 --> 00:41:52,600
but the intelligence layer that processes

1119
00:41:52,600 --> 00:41:54,760
and reasons over that data might not.

1120
00:41:54,760 --> 00:41:56,200
From the perspective of a regulator,

1121
00:41:56,200 --> 00:41:58,120
this looks like data is leaving the boundary.

1122
00:41:58,120 --> 00:42:00,520
The customer sees their file in an EU data center,

1123
00:42:00,520 --> 00:42:03,800
but the AI that accesses it is running outside the EU

1124
00:42:03,800 --> 00:42:07,160
and the logs of what the AI did are stored outside the EU as well.

1125
00:42:07,160 --> 00:42:09,080
This violates the principle that customer data

1126
00:42:09,080 --> 00:42:10,760
should be protected under EU law.

1127
00:42:10,760 --> 00:42:13,320
The risk here is not just regulatory but operational.

1128
00:42:13,320 --> 00:42:15,560
If your inference runs in a non-EU region

1129
00:42:15,560 --> 00:42:17,160
and there is a security incident,

1130
00:42:17,160 --> 00:42:20,120
you have to notify regulators under different legal frameworks.

1131
00:42:20,120 --> 00:42:22,360
You might be subject to law enforcement requests

1132
00:42:22,360 --> 00:42:24,120
from countries outside the EU.

1133
00:42:24,120 --> 00:42:27,160
Your customer data, which they thought was protected by GDPR,

1134
00:42:27,160 --> 00:42:29,000
is actually accessible from jurisdictions

1135
00:42:29,000 --> 00:42:30,360
with different privacy standards.

1136
00:42:30,360 --> 00:42:31,960
This is why architects need to understand

1137
00:42:31,960 --> 00:42:33,960
the strategy for geolocking tenants.

1138
00:42:33,960 --> 00:42:35,880
Geolocking means constraining every layer

1139
00:42:35,880 --> 00:42:37,320
of your semantic search system

1140
00:42:37,320 --> 00:42:39,880
to operate within approved geographical boundaries.

1141
00:42:39,880 --> 00:42:41,880
It is not just about where the primary data sits

1142
00:42:41,880 --> 00:42:43,640
but about where every copy of that data,

1143
00:42:43,640 --> 00:42:45,320
every vector embedding, every log file

1144
00:42:45,320 --> 00:42:47,240
and every inference operation happens.

1145
00:42:47,240 --> 00:42:49,560
To implement this, you have to make architectural choices.

1146
00:42:49,560 --> 00:42:51,160
You cannot just enable co-pilot

1147
00:42:51,160 --> 00:42:54,200
and assume the defaults from Microsoft will satisfy regulators.

1148
00:42:54,200 --> 00:42:56,280
You have to explicitly configure your tenant

1149
00:42:56,280 --> 00:42:58,680
to use only EU region services.

1150
00:42:58,680 --> 00:43:00,680
For inference, you might need to use smaller models

1151
00:43:00,680 --> 00:43:02,840
that can run locally within your data center

1152
00:43:02,840 --> 00:43:04,920
rather than calling cloud-hosted APIs

1153
00:43:04,920 --> 00:43:06,680
that span global infrastructure.

1154
00:43:06,680 --> 00:43:08,920
For logging, you have to ensure that all operational logs

1155
00:43:08,920 --> 00:43:11,000
are captured and stored within the region.

1156
00:43:11,000 --> 00:43:12,360
This creates a trade-off.

1157
00:43:12,360 --> 00:43:14,760
Running inference locally in a smaller data center

1158
00:43:14,760 --> 00:43:17,240
or using a smaller model might mean lower latency

1159
00:43:17,240 --> 00:43:18,760
or better inference quality

1160
00:43:18,760 --> 00:43:20,920
than using a massive global service.

1161
00:43:20,920 --> 00:43:22,760
You are accepting some performance constraints

1162
00:43:22,760 --> 00:43:24,520
to maintain regulatory certainty

1163
00:43:24,520 --> 00:43:26,040
but that is the choice you have to make

1164
00:43:26,040 --> 00:43:29,400
if you want compliance with 2026 EU standards.

1165
00:43:29,400 --> 00:43:31,320
The alternative of ignoring the data boundary

1166
00:43:31,320 --> 00:43:33,400
and letting your AI infrastructure span globally

1167
00:43:33,400 --> 00:43:34,920
is increasingly untenable.

1168
00:43:34,920 --> 00:43:36,840
Regulators are scrutinizing this closely

1169
00:43:36,840 --> 00:43:39,240
and the fines for violations can be severe.

1170
00:43:39,240 --> 00:43:41,400
More fundamentally, customers in the EU

1171
00:43:41,400 --> 00:43:44,600
do not want their AI to run outside their legal jurisdiction.

1172
00:43:44,600 --> 00:43:46,760
This does not mean you cannot have a performant

1173
00:43:46,760 --> 00:43:49,000
modern semantic search system in Europe.

1174
00:43:49,000 --> 00:43:51,800
It means your architecture has to be explicit about boundaries

1175
00:43:51,800 --> 00:43:53,560
and it has to make intentional choices

1176
00:43:53,560 --> 00:43:55,720
about where each component runs.

1177
00:43:55,720 --> 00:43:57,720
It has to treat regulatory requirements

1178
00:43:57,720 --> 00:43:59,960
as architectural constraints from day one

1179
00:43:59,960 --> 00:44:01,720
not as something to bolt on later.

1180
00:44:01,720 --> 00:44:03,960
Compliance is not just about where data sits,

1181
00:44:03,960 --> 00:44:07,000
it is about how the entire intelligence system is governed.

1182
00:44:07,000 --> 00:44:10,200
Metadata lineage, proving the AI's why.

1183
00:44:10,200 --> 00:44:13,000
Compliance requirements shift once you understand

1184
00:44:13,000 --> 00:44:15,320
that the AI generating answers needs to prove

1185
00:44:15,320 --> 00:44:17,080
where those answers came from.

1186
00:44:17,080 --> 00:44:20,840
This is lineage and by 2026 having it will not be optional.

1187
00:44:20,840 --> 00:44:23,640
The EU AI Act and related regulatory frameworks

1188
00:44:23,640 --> 00:44:25,880
are making auditability a legal obligation

1189
00:44:25,880 --> 00:44:27,480
rather than a technical nicety.

1190
00:44:27,480 --> 00:44:28,840
But understanding why this matters

1191
00:44:28,840 --> 00:44:30,680
requires stepping back from the infrastructure

1192
00:44:30,680 --> 00:44:33,960
and looking at what happens when an AI system makes a mistake.

1193
00:44:33,960 --> 00:44:35,640
Imagine your semantic search system

1194
00:44:35,640 --> 00:44:38,680
retrieves a document and feeds it to an LLM.

1195
00:44:38,680 --> 00:44:40,840
The model generates an answer that a user acts on,

1196
00:44:40,840 --> 00:44:43,080
but later it turns out the document was outdated

1197
00:44:43,080 --> 00:44:44,920
or the context was misinterpreted.

1198
00:44:44,920 --> 00:44:47,240
A customer makes a decision based on this answer

1199
00:44:47,240 --> 00:44:48,280
and it costs them money,

1200
00:44:48,280 --> 00:44:51,000
so they demand to know why your system told them something

1201
00:44:51,000 --> 00:44:52,440
that turned out to be wrong.

1202
00:44:52,440 --> 00:44:54,360
This is where lineage becomes critical.

1203
00:44:54,360 --> 00:44:57,160
Without lineage, you are in an impossible position.

1204
00:44:57,160 --> 00:44:59,320
You can see the answer the system generated

1205
00:44:59,320 --> 00:45:01,560
and you can see that it was based on some document,

1206
00:45:01,560 --> 00:45:04,520
but you cannot trace back to show exactly which document it was

1207
00:45:04,520 --> 00:45:05,880
or when it was added to the index.

1208
00:45:05,880 --> 00:45:07,720
You cannot prove who had access to it,

1209
00:45:07,720 --> 00:45:09,480
whether the user was authorized to see it

1210
00:45:09,480 --> 00:45:11,320
or how recent the information was.

1211
00:45:11,320 --> 00:45:13,080
You are forced to say you do not know

1212
00:45:13,080 --> 00:45:14,840
and in a liability situation

1213
00:45:14,840 --> 00:45:16,520
that is the worst answer you can give.

1214
00:45:16,520 --> 00:45:18,920
With proper lineage, you can answer every question.

1215
00:45:18,920 --> 00:45:21,320
You can show the exact document that was retrieved

1216
00:45:21,320 --> 00:45:24,760
and prove when that document entered the graph via a delta event.

1217
00:45:24,760 --> 00:45:27,240
You can show the specific embedding that was calculated

1218
00:45:27,240 --> 00:45:28,760
and the vector that was matched.

1219
00:45:28,760 --> 00:45:31,080
You can demonstrate that the user requesting the answer

1220
00:45:31,080 --> 00:45:33,480
had proper permissions to access that source document

1221
00:45:33,480 --> 00:45:36,360
and you can even trace back to show which previous query

1222
00:45:36,360 --> 00:45:38,840
or update created the data in the first place.

1223
00:45:38,840 --> 00:45:40,840
This chain of evidence is what regulators call

1224
00:45:40,840 --> 00:45:42,360
full data lineage tracking.

1225
00:45:42,360 --> 00:45:45,160
It is the ability to reconstruct exactly what happened

1226
00:45:45,160 --> 00:45:47,880
when a system made a decision at any point in the future.

1227
00:45:47,880 --> 00:45:51,320
For high-risk AI systems under the EU AI Act, this is mandatory.

1228
00:45:51,320 --> 00:45:53,800
By 2026, you must maintain records

1229
00:45:53,800 --> 00:45:55,720
that show the data lineage for any output

1230
00:45:55,720 --> 00:45:58,200
that affects the rights or opportunities of a person.

1231
00:45:58,200 --> 00:46:01,800
The mechanics of capturing this lineage start at the graph API layer.

1232
00:46:01,800 --> 00:46:04,120
When a document is ingested via a delta query,

1233
00:46:04,120 --> 00:46:05,560
that delta carries metadata,

1234
00:46:05,560 --> 00:46:07,480
including the timestamp when the change occurred,

1235
00:46:07,480 --> 00:46:09,000
the user who made the change

1236
00:46:09,000 --> 00:46:11,400
and the specific properties that changed.

1237
00:46:11,400 --> 00:46:14,760
This delta event becomes the authoritative source of truth marker

1238
00:46:14,760 --> 00:46:16,600
for when data entered your system.

1239
00:46:16,600 --> 00:46:19,000
When that data is converted to a vector embedding,

1240
00:46:19,000 --> 00:46:20,440
you have to attach provenance.

1241
00:46:20,440 --> 00:46:22,520
So the embedding has metadata pointing back

1242
00:46:22,520 --> 00:46:24,920
to the delta event that created the source document.

1243
00:46:24,920 --> 00:46:27,560
When retrieval happens, you log which embeddings were selected,

1244
00:46:27,560 --> 00:46:29,000
what the similarity scores were,

1245
00:46:29,000 --> 00:46:30,440
and what context was assembled.

1246
00:46:30,440 --> 00:46:31,880
When the LLM generates an answer,

1247
00:46:31,880 --> 00:46:33,800
you capture the prompt that was sent to the model,

1248
00:46:33,800 --> 00:46:35,720
the model version, and configuration,

1249
00:46:35,720 --> 00:46:37,960
and the exact tokens that were generated.

1250
00:46:37,960 --> 00:46:40,360
All of this information is linked back through a chain

1251
00:46:40,360 --> 00:46:41,800
from the answer to the prompt,

1252
00:46:41,800 --> 00:46:44,200
the retrieved context, the embedding vector,

1253
00:46:44,200 --> 00:46:46,760
the source document, and finally the delta event.

1254
00:46:46,760 --> 00:46:48,200
This chain is decision lineage.

1255
00:46:48,200 --> 00:46:50,680
Decision lineage is what makes the system defensible.

1256
00:46:50,680 --> 00:46:52,840
When someone asks why the AI said that,

1257
00:46:52,840 --> 00:46:55,560
you can follow the chain backward and show the decision tree.

1258
00:46:55,560 --> 00:46:58,120
You can show the exact data that influenced the output,

1259
00:46:58,120 --> 00:46:59,720
and if the data was incorrect,

1260
00:46:59,720 --> 00:47:01,480
you can identify when it became available

1261
00:47:01,480 --> 00:47:03,240
and who had access to modify it.

1262
00:47:03,240 --> 00:47:05,400
If the user was not supposed to see that data,

1263
00:47:05,400 --> 00:47:08,200
you can prove it was a security failure rather than a data problem.

1264
00:47:08,200 --> 00:47:09,880
If the source was outdated,

1265
00:47:09,880 --> 00:47:12,440
you can show exactly when it should have been refreshed.

1266
00:47:12,440 --> 00:47:15,320
This level of auditability requires discipline.

1267
00:47:15,320 --> 00:47:17,320
Every component in your discovery pipeline

1268
00:47:17,320 --> 00:47:19,880
has to be instrumented to capture and pass forward

1269
00:47:19,880 --> 00:47:21,000
this lineage metadata.

1270
00:47:21,000 --> 00:47:22,680
Your ingestion system has to log

1271
00:47:22,680 --> 00:47:24,280
which delta events it processed,

1272
00:47:24,280 --> 00:47:27,000
and your embedding model has to tag vectors with their source.

1273
00:47:27,000 --> 00:47:28,680
Your retrieval pipeline has to record

1274
00:47:28,680 --> 00:47:30,280
which items were matched and why,

1275
00:47:30,280 --> 00:47:33,640
and your LLM interface has to log prompts and generations.

1276
00:47:33,640 --> 00:47:35,080
None of this happens automatically

1277
00:47:35,080 --> 00:47:36,920
because it has to be built in deliberately.

1278
00:47:36,920 --> 00:47:38,360
The operational burden is real.

1279
00:47:38,360 --> 00:47:41,320
You are storing metadata alongside your primary data,

1280
00:47:41,320 --> 00:47:43,720
and adding logging overhead to every query.

1281
00:47:43,720 --> 00:47:45,320
You are creating compliance records

1282
00:47:45,320 --> 00:47:47,000
that have to be retained for years,

1283
00:47:47,000 --> 00:47:49,080
but the alternative of operating without lineage

1284
00:47:49,080 --> 00:47:51,560
is no longer acceptable in 2026.

1285
00:47:51,560 --> 00:47:53,560
Regulators expected, customers expected,

1286
00:47:53,560 --> 00:47:55,080
and auditors will demand it.

1287
00:47:55,080 --> 00:47:56,600
Understanding lineage requirements

1288
00:47:56,600 --> 00:47:58,200
forces you to a critical decision.

1289
00:47:58,200 --> 00:48:00,200
You either build this infrastructure yourself

1290
00:48:00,200 --> 00:48:02,840
or you buy a solution that already has it baked in.

1291
00:48:02,840 --> 00:48:04,120
Buildverse buy.

1292
00:48:04,120 --> 00:48:06,280
Native connectors versus custom pipelines.

1293
00:48:06,280 --> 00:48:09,240
The decision to build lineage-aware discovery infrastructure

1294
00:48:09,240 --> 00:48:11,240
eventually brings you to a fork in the road

1295
00:48:11,240 --> 00:48:13,080
and most architects delay this choice

1296
00:48:13,080 --> 00:48:14,600
until they run into a wall.

1297
00:48:14,600 --> 00:48:15,960
You have two real options here.

1298
00:48:15,960 --> 00:48:18,040
You can use Microsoft's native graph connectors

1299
00:48:18,040 --> 00:48:19,880
and lean on their existing ecosystem,

1300
00:48:19,880 --> 00:48:21,320
or you can build a custom pipeline

1301
00:48:21,320 --> 00:48:24,360
that gives you more control, but demands constant maintenance.

1302
00:48:24,360 --> 00:48:26,920
Microsoft's approach is purpose-built for enterprises

1303
00:48:26,920 --> 00:48:29,240
already living in Microsoft 365.

1304
00:48:29,240 --> 00:48:31,000
Their connectors handle the heavy lifting

1305
00:48:31,000 --> 00:48:33,560
like schema mapping, permission resolution, and batching logic,

1306
00:48:33,560 --> 00:48:35,480
and they even manage throttling automatically,

1307
00:48:35,480 --> 00:48:36,840
so you don't have to.

1308
00:48:36,840 --> 00:48:39,560
Because they handle the complexity of the graph API,

1309
00:48:39,560 --> 00:48:41,720
you get the benefit of their experience at scale.

1310
00:48:41,720 --> 00:48:43,960
They have run into almost every edge case imaginable

1311
00:48:43,960 --> 00:48:45,800
and fixed most of them over the years.

1312
00:48:45,800 --> 00:48:48,640
These connectors come with built in monitoring and diagnostics,

1313
00:48:48,640 --> 00:48:49,960
which means when something breaks,

1314
00:48:49,960 --> 00:48:51,400
you can contact Microsoft support

1315
00:48:51,400 --> 00:48:53,160
instead of debugging the code yourself,

1316
00:48:53,160 --> 00:48:55,640
but there is a critical constraint you need to consider.

1317
00:48:55,640 --> 00:48:57,280
Microsoft's native connectors are designed

1318
00:48:57,280 --> 00:49:00,280
for a specific set of data sources like SharePoint, OneDrive,

1319
00:49:00,280 --> 00:49:01,040
and Teams.

1320
00:49:01,040 --> 00:49:04,120
They work beautifully if your data lives in Microsoft 365,

1321
00:49:04,120 --> 00:49:05,800
but what happens if your critical knowledge lives

1322
00:49:05,800 --> 00:49:07,360
in Salesforce or Confluence?

1323
00:49:07,360 --> 00:49:09,800
If your customer feedback is spread across Zendesk,

1324
00:49:09,800 --> 00:49:12,160
Gira, and custom line of business applications,

1325
00:49:12,160 --> 00:49:15,000
the native connectors simply won't reach those systems.

1326
00:49:15,000 --> 00:49:16,760
You could try the Microsoft connector marketplace

1327
00:49:16,760 --> 00:49:19,160
and hope someone built a connector for your specific source,

1328
00:49:19,160 --> 00:49:21,360
but depending on third-party connectors,

1329
00:49:21,360 --> 00:49:22,600
introduces a lot of risk.

1330
00:49:22,600 --> 00:49:24,520
If the author stops maintaining the connector

1331
00:49:24,520 --> 00:49:27,000
or your data source updates and breaks the integration,

1332
00:49:27,000 --> 00:49:27,880
you are stuck.

1333
00:49:27,880 --> 00:49:30,560
This is where custom pipelines start to look attractive.

1334
00:49:30,560 --> 00:49:33,920
You write the code that connects to your data sources directly,

1335
00:49:33,920 --> 00:49:36,120
which gives you total control over the extraction

1336
00:49:36,120 --> 00:49:37,560
and transformation logic.

1337
00:49:37,560 --> 00:49:40,040
You can handle edge cases specific to your organization,

1338
00:49:40,040 --> 00:49:42,720
and you can prioritize which data to index first

1339
00:49:42,720 --> 00:49:44,080
and how aggressively to refresh it.

1340
00:49:44,080 --> 00:49:46,920
Since you aren't constrained by Microsoft's design decisions,

1341
00:49:46,920 --> 00:49:48,800
you can implement whatever integration pattern

1342
00:49:48,800 --> 00:49:50,880
makes sense for your specific architecture.

1343
00:49:50,880 --> 00:49:51,920
But here is the problem.

1344
00:49:51,920 --> 00:49:54,480
There is a hidden cost that most architects underestimate,

1345
00:49:54,480 --> 00:49:56,240
and that is the operational burden.

1346
00:49:56,240 --> 00:49:58,240
A custom pipeline isn't a one-time project

1347
00:49:58,240 --> 00:49:59,800
you can just finish and forget.

1348
00:49:59,800 --> 00:50:02,040
It is a service that requires ongoing maintenance

1349
00:50:02,040 --> 00:50:03,680
because your source systems will change

1350
00:50:03,680 --> 00:50:05,880
and APIs will eventually be deprecated.

1351
00:50:05,880 --> 00:50:07,640
If your source system pushes an update

1352
00:50:07,640 --> 00:50:11,120
that changes the API, your pipeline breaks immediately.

1353
00:50:11,120 --> 00:50:13,200
You have to diagnose the failure, write code

1354
00:50:13,200 --> 00:50:16,600
to handle the new API version, and then test and deploy the fix.

1355
00:50:16,600 --> 00:50:18,440
Meanwhile, your search index is getting stale

1356
00:50:18,440 --> 00:50:19,960
because the pipeline isn't running.

1357
00:50:19,960 --> 00:50:21,960
Think about the personnel cost for a moment.

1358
00:50:21,960 --> 00:50:24,280
You need engineers who understand both your source systems

1359
00:50:24,280 --> 00:50:25,640
and the graph API, and these people

1360
00:50:25,640 --> 00:50:27,440
have to be on call when the pipeline fails.

1361
00:50:27,440 --> 00:50:29,200
They have to maintain deep documentation,

1362
00:50:29,200 --> 00:50:30,880
so the next person who touches the code

1363
00:50:30,880 --> 00:50:31,960
understands how it works.

1364
00:50:31,960 --> 00:50:33,600
If you have multiple custom connectors

1365
00:50:33,600 --> 00:50:35,040
for multiple data sources,

1366
00:50:35,040 --> 00:50:37,080
this scales into a massive engineering burden.

1367
00:50:37,080 --> 00:50:39,440
You aren't just building a one-off integration anymore.

1368
00:50:39,440 --> 00:50:41,280
You are running an entire platform.

1369
00:50:41,280 --> 00:50:42,760
Then there is the risk of divergence.

1370
00:50:42,760 --> 00:50:44,360
If you build a custom connector,

1371
00:50:44,360 --> 00:50:46,520
your implementation of graph-rag or security trimming

1372
00:50:46,520 --> 00:50:48,760
will be slightly different from how Microsoft does it.

1373
00:50:48,760 --> 00:50:51,160
You will discover unique edge cases and bugs

1374
00:50:51,160 --> 00:50:53,120
that require custom workarounds.

1375
00:50:53,120 --> 00:50:56,280
Over time, your custom code becomes a pile of special cases

1376
00:50:56,280 --> 00:50:58,200
and messy conditional logic.

1377
00:50:58,200 --> 00:50:59,720
New engineers coming into the code base

1378
00:50:59,720 --> 00:51:01,960
will struggle to understand why a bizarre check exists

1379
00:51:01,960 --> 00:51:04,800
online 247 and the code becomes fragile.

1380
00:51:04,800 --> 00:51:07,040
Simple changes start to break subtle dependencies

1381
00:51:07,040 --> 00:51:08,680
that nobody remembered were there.

1382
00:51:08,680 --> 00:51:11,440
Microsoft's first party connectors avoid this trap

1383
00:51:11,440 --> 00:51:12,960
because they are standardized.

1384
00:51:12,960 --> 00:51:14,680
Microsoft handles the complexity,

1385
00:51:14,680 --> 00:51:17,120
so your job is just to configure and monitor them.

1386
00:51:17,120 --> 00:51:18,560
The investment curve is shallow.

1387
00:51:18,560 --> 00:51:21,240
You spend some effort upfront learning how the connector works,

1388
00:51:21,240 --> 00:51:24,200
but then the maintenance burden stays low for the long haul.

1389
00:51:24,200 --> 00:51:25,840
The strategic question is simple.

1390
00:51:25,840 --> 00:51:27,440
What is your core competency?

1391
00:51:27,440 --> 00:51:30,040
If you are an organization that values deep integration

1392
00:51:30,040 --> 00:51:31,480
across many different systems,

1393
00:51:31,480 --> 00:51:33,120
and you have strong engineering resources,

1394
00:51:33,120 --> 00:51:35,480
building custom pipelines might be the right call.

1395
00:51:35,480 --> 00:51:37,280
You gain flexibility and control,

1396
00:51:37,280 --> 00:51:40,400
but you are also committing to those ongoing maintenance costs.

1397
00:51:40,400 --> 00:51:43,080
If Microsoft 365 is your primary system

1398
00:51:43,080 --> 00:51:45,720
and you want to minimize operational complexity,

1399
00:51:45,720 --> 00:51:47,760
using native connectors makes more sense.

1400
00:51:47,760 --> 00:51:48,920
You gain simplicity,

1401
00:51:48,920 --> 00:51:51,640
but you lose coverage of non-Microsoft data sources.

1402
00:51:51,640 --> 00:51:53,600
There is also a timing dimension to consider.

1403
00:51:53,600 --> 00:51:55,400
When should you prioritize completeness

1404
00:51:55,400 --> 00:51:56,880
versus immediate freshness?

1405
00:51:56,880 --> 00:51:58,760
A native connector will give you freshness quickly

1406
00:51:58,760 --> 00:52:00,760
because Microsoft has already optimized them,

1407
00:52:00,760 --> 00:52:01,880
but you might have to wait for them

1408
00:52:01,880 --> 00:52:04,760
to expand support to your specific data sources.

1409
00:52:04,760 --> 00:52:07,480
A custom pipeline lets you ingest everything immediately,

1410
00:52:07,480 --> 00:52:08,600
but the freshness might lag

1411
00:52:08,600 --> 00:52:11,840
because you are building the entire infrastructure from scratch.

1412
00:52:11,840 --> 00:52:15,000
Most mature organizations settle on a hybrid model.

1413
00:52:15,000 --> 00:52:18,240
They use native connectors for core Microsoft 365 systems

1414
00:52:18,240 --> 00:52:19,920
because those are heavily used

1415
00:52:19,920 --> 00:52:23,000
and Microsoft's optimization is too valuable to ignore.

1416
00:52:23,000 --> 00:52:24,680
They build focused custom connectors

1417
00:52:24,680 --> 00:52:26,720
for critical non-Microsoft sources

1418
00:52:26,720 --> 00:52:29,400
and avoid trying to integrate every possible data source.

1419
00:52:29,400 --> 00:52:31,480
They accept that some systems will stay separate

1420
00:52:31,480 --> 00:52:32,600
and that is okay.

1421
00:52:32,600 --> 00:52:34,800
This pragmatic approach balances completeness,

1422
00:52:34,800 --> 00:52:37,080
freshness and the operational burden.

1423
00:52:37,080 --> 00:52:40,120
Once you have settled on how data flows into your discovery system,

1424
00:52:40,120 --> 00:52:42,520
the next constraint is the infrastructure itself,

1425
00:52:42,520 --> 00:52:45,440
infrastructure collocation, cutting the network cord.

1426
00:52:45,440 --> 00:52:47,840
Once you have decided which data sources to integrate

1427
00:52:47,840 --> 00:52:49,200
and how to handle compliance,

1428
00:52:49,200 --> 00:52:51,720
you face a constraint that architects often overlook

1429
00:52:51,720 --> 00:52:54,680
until they are debugging production latency issues.

1430
00:52:54,680 --> 00:52:56,480
That constraint is the physical location

1431
00:52:56,480 --> 00:52:57,680
of your infrastructure.

1432
00:52:57,680 --> 00:52:59,280
Every network hop adds latency

1433
00:52:59,280 --> 00:53:01,560
when your vector database sits in one data center,

1434
00:53:01,560 --> 00:53:03,480
your graph API gateway sits in another

1435
00:53:03,480 --> 00:53:05,640
and your LLM inference runs in a third,

1436
00:53:05,640 --> 00:53:08,120
you are paying a latency tax on every single query.

1437
00:53:08,120 --> 00:53:09,440
These aren't theoretical delays.

1438
00:53:09,440 --> 00:53:12,680
A round trip between data centers can add 50 to 200 milliseconds

1439
00:53:12,680 --> 00:53:14,520
of pure network overhead.

1440
00:53:14,520 --> 00:53:18,080
When your target retrieval latency is 250 milliseconds total,

1441
00:53:18,080 --> 00:53:20,240
losing 150 milliseconds to network hops

1442
00:53:20,240 --> 00:53:21,960
means you have only 100 milliseconds left

1443
00:53:21,960 --> 00:53:23,240
for actual computation.

1444
00:53:23,240 --> 00:53:25,360
That is a massive squeeze on your performance budget.

1445
00:53:25,360 --> 00:53:26,840
The solution is architectural.

1446
00:53:26,840 --> 00:53:28,320
You need collocation.

1447
00:53:28,320 --> 00:53:30,720
Your vector database and your graph API end point

1448
00:53:30,720 --> 00:53:32,440
need to live in the same data center

1449
00:53:32,440 --> 00:53:34,480
or at the very least the same geographic region.

1450
00:53:34,480 --> 00:53:35,880
This isn't just a preference.

1451
00:53:35,880 --> 00:53:37,760
It is non-negotiable if you want to hit

1452
00:53:37,760 --> 00:53:39,480
sub-second discovery targets.

1453
00:53:39,480 --> 00:53:40,760
When these systems are collocated,

1454
00:53:40,760 --> 00:53:43,120
you can make multiple calls to the graph API

1455
00:53:43,120 --> 00:53:45,200
to pull fresh data or check permissions

1456
00:53:45,200 --> 00:53:48,120
without incurring network latency between each call.

1457
00:53:48,120 --> 00:53:50,520
The calls happen locally across internal network fabric

1458
00:53:50,520 --> 00:53:52,640
instead of traveling across internet scale distances,

1459
00:53:52,640 --> 00:53:54,160
but collocation alone isn't enough.

1460
00:53:54,160 --> 00:53:55,960
The communication protocol between these systems

1461
00:53:55,960 --> 00:53:57,040
matters enormously.

1462
00:53:57,040 --> 00:54:00,080
Traditional HTTP, which underpins most rest APIs,

1463
00:54:00,080 --> 00:54:01,880
was designed for request response patterns

1464
00:54:01,880 --> 00:54:02,920
over the public internet.

1465
00:54:02,920 --> 00:54:04,320
Each request opens a connection

1466
00:54:04,320 --> 00:54:07,200
sends headers, waits for a response, and then closes.

1467
00:54:07,200 --> 00:54:08,800
For a single query, this is fine.

1468
00:54:08,800 --> 00:54:11,280
But in discovery pipelines, you are making dozens of calls

1469
00:54:11,280 --> 00:54:14,560
to check scopes, fetch delta tokens, and retrieve permissions.

1470
00:54:14,560 --> 00:54:17,240
If each of these calls opens a new HTTP connection,

1471
00:54:17,240 --> 00:54:18,600
you are drowning in overhead.

1472
00:54:18,600 --> 00:54:20,400
This is where GRPC becomes critical.

1473
00:54:20,400 --> 00:54:23,320
GRPC uses HTTP2 under the hood,

1474
00:54:23,320 --> 00:54:27,320
which multiplexes multiple logical streams over a single TCP connection.

1475
00:54:27,320 --> 00:54:29,640
You open one connection to the graph API gateway

1476
00:54:29,640 --> 00:54:32,200
and send dozens of requests through it simultaneously.

1477
00:54:32,200 --> 00:54:34,840
The server streams responses back as they are ready,

1478
00:54:34,840 --> 00:54:37,520
which means no connection overhead and no handshake delays.

1479
00:54:37,520 --> 00:54:40,440
It is just efficient by directional communication.

1480
00:54:40,440 --> 00:54:43,840
The latency savings from switching to GRPC are dramatic.

1481
00:54:43,840 --> 00:54:45,160
Measurements from production systems

1482
00:54:45,160 --> 00:54:47,960
show that the same discovery pipeline running over HTTP

1483
00:54:47,960 --> 00:54:49,400
can take 300 milliseconds.

1484
00:54:49,400 --> 00:54:51,960
If you switch to GRPC, that same pipeline

1485
00:54:51,960 --> 00:54:53,240
drops to 80 milliseconds.

1486
00:54:53,240 --> 00:54:55,240
That is a 3.75x improvement.

1487
00:54:55,240 --> 00:54:57,080
The difference isn't in the computation itself.

1488
00:54:57,080 --> 00:55:00,240
It is purely in how efficiently the infrastructure moves bits.

1489
00:55:00,240 --> 00:55:03,080
This creates a ripple effect through your entire architecture.

1490
00:55:03,080 --> 00:55:05,520
If you are using Python for your discovery gateway,

1491
00:55:05,520 --> 00:55:07,360
you are already at a disadvantage.

1492
00:55:07,360 --> 00:55:10,520
Python is excellent for data science and rapid prototyping,

1493
00:55:10,520 --> 00:55:13,200
but it isn't optimized for the networking patterns

1494
00:55:13,200 --> 00:55:15,560
that modern semantic search demands.

1495
00:55:15,560 --> 00:55:17,240
Python's global interpreter lock means

1496
00:55:17,240 --> 00:55:19,080
that handling multiple concurrent requests

1497
00:55:19,080 --> 00:55:21,440
introduces contention, adding GRPC

1498
00:55:21,440 --> 00:55:23,160
to a Python service adds overhead

1499
00:55:23,160 --> 00:55:25,040
from the language runtime's perspective.

1500
00:55:25,040 --> 00:55:26,800
This is why serious production deployments

1501
00:55:26,800 --> 00:55:29,600
are shifting to compiled languages like Rust or Go

1502
00:55:29,600 --> 00:55:31,280
for the discovery gateway layer.

1503
00:55:31,280 --> 00:55:34,720
These languages are built for concurrent networked workloads.

1504
00:55:34,720 --> 00:55:38,040
A Go service using GRPC can handle thousands of concurrent

1505
00:55:38,040 --> 00:55:39,760
requests with minimal overhead.

1506
00:55:39,760 --> 00:55:42,080
A Rust service can be even more efficient, offering

1507
00:55:42,080 --> 00:55:43,880
performance characteristics that approach

1508
00:55:43,880 --> 00:55:45,800
the theoretical limits of the hardware.

1509
00:55:45,800 --> 00:55:47,120
The gateway becomes invisible.

1510
00:55:47,120 --> 00:55:48,960
It is just a fast pass through that handles

1511
00:55:48,960 --> 00:55:52,240
multiplexing and connection pooling without adding latency.

1512
00:55:52,240 --> 00:55:54,680
This architectural shift represents a maturation

1513
00:55:54,680 --> 00:55:56,720
in how enterprises deploy semantic search.

1514
00:55:56,720 --> 00:55:59,120
You aren't trying to do everything in one language anymore.

1515
00:55:59,120 --> 00:56:01,360
Instead, you are using the right tool for each layer.

1516
00:56:01,360 --> 00:56:03,520
You use Python for data science and model training,

1517
00:56:03,520 --> 00:56:06,640
but you use Go or Rust for the performance critical networking

1518
00:56:06,640 --> 00:56:07,160
layer.

1519
00:56:07,160 --> 00:56:09,640
Specialized services handle each function.

1520
00:56:09,640 --> 00:56:11,760
The operational impact is also significant.

1521
00:56:11,760 --> 00:56:14,160
A Python-based gateway that is struggling with concurrent load

1522
00:56:14,160 --> 00:56:16,680
requires adding more instances and distributing requests

1523
00:56:16,680 --> 00:56:19,080
across them, which adds operational complexity.

1524
00:56:19,080 --> 00:56:20,800
A Rust gateway handling the same load

1525
00:56:20,800 --> 00:56:23,280
runs on a single instance with headroom to spare.

1526
00:56:23,280 --> 00:56:26,280
It is a simpler deployment with fewer things that can go wrong.

1527
00:56:26,280 --> 00:56:27,360
But here is the thing.

1528
00:56:27,360 --> 00:56:29,840
Collocation and efficient protocols only matter

1529
00:56:29,840 --> 00:56:31,600
if the systems themselves are designed

1530
00:56:31,600 --> 00:56:32,800
to take advantage of them.

1531
00:56:32,800 --> 00:56:35,440
This brings us back to the discovery pipeline design.

1532
00:56:35,440 --> 00:56:37,200
The queries you make to the graph API

1533
00:56:37,200 --> 00:56:39,720
have to be structured to minimize redundant calls.

1534
00:56:39,720 --> 00:56:42,320
You batch requests where possible you cache responses

1535
00:56:42,320 --> 00:56:45,200
and you pipeline operations so that subsequent calls only

1536
00:56:45,200 --> 00:56:47,080
happen when you actually need their results.

1537
00:56:47,080 --> 00:56:49,080
The infrastructure enables this efficiency,

1538
00:56:49,080 --> 00:56:51,040
but your application design has to execute it.

1539
00:56:51,040 --> 00:56:54,200
This foundation of collocated infrastructure, modern protocols,

1540
00:56:54,200 --> 00:56:55,880
and efficient gateway implementations

1541
00:56:55,880 --> 00:56:57,920
is what makes the next phase possible.

1542
00:56:57,920 --> 00:56:59,280
You now have the plumbing in place

1543
00:56:59,280 --> 00:57:01,880
to do something remarkable with organizational data.

1544
00:57:01,880 --> 00:57:04,440
You can start to build systems that treat your entire enterprise

1545
00:57:04,440 --> 00:57:06,160
as a connected living organism.

1546
00:57:06,160 --> 00:57:07,880
That is where we are heading next.

1547
00:57:07,880 --> 00:57:10,920
The future-proof road map, phase one to phase three.

1548
00:57:10,920 --> 00:57:13,320
Everything we have discussed so far from the architecture

1549
00:57:13,320 --> 00:57:15,040
and compliance to the infrastructure

1550
00:57:15,040 --> 00:57:17,560
has to move from theory into reality.

1551
00:57:17,560 --> 00:57:19,240
Execution is the only thing that matters now

1552
00:57:19,240 --> 00:57:20,320
and that happens in stages.

1553
00:57:20,320 --> 00:57:22,120
Most organizations simply cannot rip out

1554
00:57:22,120 --> 00:57:23,800
their legacy search systems overnight

1555
00:57:23,800 --> 00:57:25,120
because you have real constraints

1556
00:57:25,120 --> 00:57:28,000
like existing investments and teams that need time to learn.

1557
00:57:28,000 --> 00:57:30,320
This road map is how you migrate from where you are today

1558
00:57:30,320 --> 00:57:32,520
to where you need to be by 2026.

1559
00:57:32,520 --> 00:57:35,360
Phase one is about being ruthlessly focused

1560
00:57:35,360 --> 00:57:37,400
and it runs from now through month three.

1561
00:57:37,400 --> 00:57:39,160
Your goal is a single objective

1562
00:57:39,160 --> 00:57:41,440
where you move from full crawls to delta queries

1563
00:57:41,440 --> 00:57:43,040
on your most valuable data source.

1564
00:57:43,040 --> 00:57:44,680
You are not trying to fix everything at once.

1565
00:57:44,680 --> 00:57:47,240
Instead, you should pick one silo that generates the most change

1566
00:57:47,240 --> 00:57:49,080
and the most value for the business.

1567
00:57:49,080 --> 00:57:51,240
Whether that is teams, SharePoint,

1568
00:57:51,240 --> 00:57:53,040
or a custom line of business system

1569
00:57:53,040 --> 00:57:55,480
does not matter as much as establishing the pattern.

1570
00:57:55,480 --> 00:57:56,560
During these first three months,

1571
00:57:56,560 --> 00:57:58,840
you are not building a complete graph-rag pipeline

1572
00:57:58,840 --> 00:58:01,120
or worrying about complex security trimming.

1573
00:58:01,120 --> 00:58:03,200
You are solving the staleness problem first

1574
00:58:03,200 --> 00:58:05,280
by setting up delta query infrastructure

1575
00:58:05,280 --> 00:58:06,920
against your chosen source.

1576
00:58:06,920 --> 00:58:08,560
You establish your token persistence

1577
00:58:08,560 --> 00:58:10,840
and wire up the back-off logic for throttling

1578
00:58:10,840 --> 00:58:14,240
so documents flow into your vector store in near real time.

1579
00:58:14,240 --> 00:58:15,560
Once the data is moving,

1580
00:58:15,560 --> 00:58:18,120
you measure the impact to see how much pressure the results are

1581
00:58:18,120 --> 00:58:20,280
and what the operational cost looks like.

1582
00:58:20,280 --> 00:58:21,440
This is the proof of concept

1583
00:58:21,440 --> 00:58:23,560
that justifies the rest of the project.

1584
00:58:23,560 --> 00:58:25,000
The reason Phase one is so short

1585
00:58:25,000 --> 00:58:26,440
is that you are not looking for perfection,

1586
00:58:26,440 --> 00:58:27,480
you are looking for proof,

1587
00:58:27,480 --> 00:58:29,640
you build just enough to show that delta queries

1588
00:58:29,640 --> 00:58:32,120
actually work in your specific environment.

1589
00:58:32,120 --> 00:58:33,960
You will definitely discover edge cases

1590
00:58:33,960 --> 00:58:36,400
and hit throttling limits you did not expect,

1591
00:58:36,400 --> 00:58:37,680
but that is actually the point.

1592
00:58:37,680 --> 00:58:39,600
It is better to find those gaps now

1593
00:58:39,600 --> 00:58:40,640
while the scope is small

1594
00:58:40,640 --> 00:58:42,480
so you can fix them and document the lessons.

1595
00:58:42,480 --> 00:58:44,160
This three-month investment in knowledge

1596
00:58:44,160 --> 00:58:46,680
will guide every decision you make in the next year.

1597
00:58:46,680 --> 00:58:49,000
Phase two is where the real transformation happens,

1598
00:58:49,000 --> 00:58:51,280
spanning from month four through month nine.

1599
00:58:51,280 --> 00:58:52,960
Now that the delta model is proven,

1600
00:58:52,960 --> 00:58:54,840
you extend it to all your data sources

1601
00:58:54,840 --> 00:58:56,920
and implement the full ingestion pipeline.

1602
00:58:56,920 --> 00:58:58,800
The bigger focus here is security,

1603
00:58:58,800 --> 00:59:01,080
specifically building the security trimming layer.

1604
00:59:01,080 --> 00:59:03,080
You are implementing a relationship model

1605
00:59:03,080 --> 00:59:06,040
that ensures permissions flow through your discovery system

1606
00:59:06,040 --> 00:59:07,720
so that when an AI retrieves a document,

1607
00:59:07,720 --> 00:59:10,240
you can prove the user actually had access to it.

1608
00:59:10,240 --> 00:59:12,360
This is also the time to prepare your infrastructure

1609
00:59:12,360 --> 00:59:13,440
for the future.

1610
00:59:13,440 --> 00:59:15,920
You need to evaluate whether your current language stack

1611
00:59:15,920 --> 00:59:17,960
can handle the load or if you need to rebuild

1612
00:59:17,960 --> 00:59:20,560
critical paths in languages like Go or Rust.

1613
00:59:20,560 --> 00:59:22,280
You are planning for collocation

1614
00:59:22,280 --> 00:59:23,720
and building compliance mappings

1615
00:59:23,720 --> 00:59:25,600
to show exactly how your controls meet

1616
00:59:25,600 --> 00:59:28,160
2026 regulatory requirements.

1617
00:59:28,160 --> 00:59:30,200
By creating this audit trail now,

1618
00:59:30,200 --> 00:59:32,720
you prove that you have taken compliance seriously

1619
00:59:32,720 --> 00:59:33,720
from the beginning.

1620
00:59:33,720 --> 00:59:35,040
By the end of phase two,

1621
00:59:35,040 --> 00:59:37,760
your discovery system is both fast and defensible.

1622
00:59:37,760 --> 00:59:39,840
You are hitting sub-second retrieval times

1623
00:59:39,840 --> 00:59:41,280
and can prove that your security trimming

1624
00:59:41,280 --> 00:59:42,400
is working perfectly.

1625
00:59:42,400 --> 00:59:44,640
You have an immutable log of where your data came from,

1626
00:59:44,640 --> 00:59:46,160
which means your organization's knowledge

1627
00:59:46,160 --> 00:59:48,360
is finally accessible in near real time.

1628
00:59:48,360 --> 00:59:50,720
Teams no longer have to guess if information is current

1629
00:59:50,720 --> 00:59:52,680
because they can trust that their search results

1630
00:59:52,680 --> 00:59:54,720
reflect what is happening right now.

1631
00:59:54,720 --> 00:59:56,520
Phase three is about embedding intelligence

1632
00:59:56,520 --> 00:59:57,680
into the actual workflow

1633
00:59:57,680 --> 01:00:00,200
and it runs from month 10 through month 18.

1634
01:00:00,200 --> 01:00:02,360
This is when you finally build the graph rag systems

1635
01:00:02,360 --> 01:00:05,280
and implement lazy graph construction to keep costs down.

1636
01:00:05,280 --> 01:00:07,320
You integrate discovery into AI workflows

1637
01:00:07,320 --> 01:00:09,840
so that tools like co-pilot use graph-based retrieval

1638
01:00:09,840 --> 01:00:11,360
instead of just flat vectors.

1639
01:00:11,360 --> 01:00:13,240
You also automate your compliance monitoring

1640
01:00:13,240 --> 01:00:16,320
so the system constantly proves it meets 2026 standards

1641
01:00:16,320 --> 01:00:18,600
while you extend the patent in new data sources.

1642
01:00:18,600 --> 01:00:21,080
This final phase is where you build a competitive mode

1643
01:00:21,080 --> 01:00:22,880
that others cannot easily cross.

1644
01:00:22,880 --> 01:00:24,520
Most of your competitors will still be stuck

1645
01:00:24,520 --> 01:00:26,160
with legacy search and nightly crawls

1646
01:00:26,160 --> 01:00:27,880
that leave them with stale information.

1647
01:00:27,880 --> 01:00:29,800
You have already moved past those struggles

1648
01:00:29,800 --> 01:00:32,440
and your organization now operates on live context.

1649
01:00:32,440 --> 01:00:34,160
Your decisions are made with what is happening

1650
01:00:34,160 --> 01:00:36,280
this minute, not what happened last night.

1651
01:00:36,280 --> 01:00:37,640
That is not just a small win,

1652
01:00:37,640 --> 01:00:40,720
it is a structural business advantage that grows over time.

1653
01:00:40,720 --> 01:00:42,840
This roadmap is not a theoretical exercise

1654
01:00:42,840 --> 01:00:44,360
but rather a proven pattern

1655
01:00:44,360 --> 01:00:46,640
that successful enterprises are following right now.

1656
01:00:46,640 --> 01:00:48,160
It takes 18 months to transform

1657
01:00:48,160 --> 01:00:50,400
from legacy systems to next generation intelligence.

1658
01:00:50,400 --> 01:00:52,000
You get concrete wins at each stage

1659
01:00:52,000 --> 01:00:54,640
and build your operational muscle one piece at a time.

1660
01:00:54,640 --> 01:00:58,480
The strategic mode, why this matters to the board.

1661
01:00:58,480 --> 01:01:01,200
The roadmap we just walked through is an engineering narrative

1662
01:01:01,200 --> 01:01:04,120
but boards of directors do not invest in engineering stories.

1663
01:01:04,120 --> 01:01:05,840
They invest in competitive advantage

1664
01:01:05,840 --> 01:01:08,600
and that is exactly what real-time discovery provides.

1665
01:01:08,600 --> 01:01:10,360
It is a structural mode that becomes harder

1666
01:01:10,360 --> 01:01:12,920
for your competitors to cross the longer you maintain it.

1667
01:01:12,920 --> 01:01:14,360
To put this in business terms,

1668
01:01:14,360 --> 01:01:16,120
your enterprise is generating new knowledge

1669
01:01:16,120 --> 01:01:17,320
every single second.

1670
01:01:17,320 --> 01:01:19,560
Meetings are happening, decisions are being made,

1671
01:01:19,560 --> 01:01:20,920
and customers are sending feedback

1672
01:01:20,920 --> 01:01:22,480
that should change how you operate.

1673
01:01:22,480 --> 01:01:24,600
In a legacy system, that knowledge is invisible

1674
01:01:24,600 --> 01:01:25,720
for hours or even days

1675
01:01:25,720 --> 01:01:27,400
because decision making happens in the dark.

1676
01:01:27,400 --> 01:01:29,080
You are essentially working by the light

1677
01:01:29,080 --> 01:01:31,400
of whatever happened during last night's crawl

1678
01:01:31,400 --> 01:01:33,240
and by the time the info is searchable,

1679
01:01:33,240 --> 01:01:34,640
the opportunity has passed.

1680
01:01:34,640 --> 01:01:36,840
With real-time discovery, your organization operates

1681
01:01:36,840 --> 01:01:39,400
in permanent daylight where a decision made at 3 pm

1682
01:01:39,400 --> 01:01:41,720
is visible to everyone by 3.15 pm.

1683
01:01:41,720 --> 01:01:43,440
A customer problem found in a support ticket

1684
01:01:43,440 --> 01:01:45,520
becomes immediately available to the product teams

1685
01:01:45,520 --> 01:01:46,440
who can fix it.

1686
01:01:46,440 --> 01:01:48,600
A competitive threat mentioned on an analyst call

1687
01:01:48,600 --> 01:01:50,920
is searchable within minutes of the call ending.

1688
01:01:50,920 --> 01:01:52,480
This is organizational velocity

1689
01:01:52,480 --> 01:01:54,680
and it is the difference between a company that moves

1690
01:01:54,680 --> 01:01:55,960
and one that is stuck.

1691
01:01:55,960 --> 01:01:57,440
Think about what this means for winning

1692
01:01:57,440 --> 01:01:59,800
and keeping customers in a crowded market.

1693
01:01:59,800 --> 01:02:01,560
When a deal comes down to execution speed,

1694
01:02:01,560 --> 01:02:03,040
the company that responds to requests

1695
01:02:03,040 --> 01:02:04,960
and delivers features faster is going to win.

1696
01:02:04,960 --> 01:02:06,840
You can course correct the moment,

1697
01:02:06,840 --> 01:02:08,280
the market changes direction

1698
01:02:08,280 --> 01:02:10,240
because you have real-time intelligence.

1699
01:02:10,240 --> 01:02:11,920
You will not win every single time

1700
01:02:11,920 --> 01:02:13,120
but you will win more often.

1701
01:02:13,120 --> 01:02:15,840
And in a competitive world, that is enough to be decisive.

1702
01:02:15,840 --> 01:02:17,720
When you operationalize this advantage,

1703
01:02:17,720 --> 01:02:20,920
it fundamentally changes how the market sees your brand.

1704
01:02:20,920 --> 01:02:22,600
You are not just telling people you are faster,

1705
01:02:22,600 --> 01:02:24,320
you are proving it by responding to feedback

1706
01:02:24,320 --> 01:02:25,680
in days instead of weeks.

1707
01:02:25,680 --> 01:02:27,320
You ship fixes for problems customers

1708
01:02:27,320 --> 01:02:29,720
just mentioned in conversation and address concerns

1709
01:02:29,720 --> 01:02:31,440
before they ever have a chance to escalate.

1710
01:02:31,440 --> 01:02:34,240
This is an operational reality that your customers feel

1711
01:02:34,240 --> 01:02:35,680
and it becomes a differentiation

1712
01:02:35,680 --> 01:02:37,000
that competitors cannot copy

1713
01:02:37,000 --> 01:02:39,600
without making the same massive architectural investments.

1714
01:02:39,600 --> 01:02:42,520
There is also a massive internal benefit to this kind of speed.

1715
01:02:42,520 --> 01:02:44,960
When information flows through a company in real time,

1716
01:02:44,960 --> 01:02:46,720
thousands of people can stay aligned

1717
01:02:46,720 --> 01:02:48,720
because everyone sees the same current state.

1718
01:02:48,720 --> 01:02:50,040
There is no longer a situation

1719
01:02:50,040 --> 01:02:52,240
where one team is living in yesterday's news

1720
01:02:52,240 --> 01:02:54,480
while another operates on today's reality.

1721
01:02:54,480 --> 01:02:55,680
Whether you are building products

1722
01:02:55,680 --> 01:02:57,320
or planning a long term strategy,

1723
01:02:57,320 --> 01:02:59,800
you are always informed by the latest market signals

1724
01:02:59,800 --> 01:03:02,320
rather than stale analyses from last quarter.

1725
01:03:02,320 --> 01:03:04,720
This approach eliminates institutional amnesia,

1726
01:03:04,720 --> 01:03:06,200
which is that frustrating tendency

1727
01:03:06,200 --> 01:03:08,680
for companies to forget things they already learned.

1728
01:03:08,680 --> 01:03:10,360
We have all seen a customer problem get solved

1729
01:03:10,360 --> 01:03:12,360
once only to resurface three months later

1730
01:03:12,360 --> 01:03:15,480
because the new team does not realize a solution exists.

1731
01:03:15,480 --> 01:03:16,720
Internal process is break

1732
01:03:16,720 --> 01:03:18,640
because nobody remembers why they were designed

1733
01:03:18,640 --> 01:03:19,920
that way in the first place.

1734
01:03:19,920 --> 01:03:21,240
These are not failures of people,

1735
01:03:21,240 --> 01:03:23,080
they are failures of accessibility.

1736
01:03:23,080 --> 01:03:24,520
A company with real-time discovery

1737
01:03:24,520 --> 01:03:26,280
remembers everything automatically

1738
01:03:26,280 --> 01:03:28,040
because the knowledge is always there.

1739
01:03:28,040 --> 01:03:30,800
The risk of staying on your current path is existential

1740
01:03:30,800 --> 01:03:32,880
and that is the message the board needs to hear.

1741
01:03:32,880 --> 01:03:35,680
If your competitors move to real-time discovery while you wait,

1742
01:03:35,680 --> 01:03:36,680
you are not just slower,

1743
01:03:36,680 --> 01:03:38,720
you are actually operating on different information.

1744
01:03:38,720 --> 01:03:39,960
They are making better decisions

1745
01:03:39,960 --> 01:03:41,560
because their data is better

1746
01:03:41,560 --> 01:03:44,040
and their products will eventually reflect that gap.

1747
01:03:44,040 --> 01:03:46,360
Over time, what started as a small performance difference

1748
01:03:46,360 --> 01:03:48,240
widens into a structural disadvantage

1749
01:03:48,240 --> 01:03:50,080
that is almost impossible to close.

1750
01:03:50,080 --> 01:03:53,680
This is why the 2026 deadline is so important for your strategy.

1751
01:03:53,680 --> 01:03:55,920
It is the point where new compliance requirements

1752
01:03:55,920 --> 01:03:58,720
will force every organization to prove their data lineage

1753
01:03:58,720 --> 01:03:59,760
and governance.

1754
01:03:59,760 --> 01:04:02,440
The companies that have already built this infrastructure

1755
01:04:02,440 --> 01:04:04,720
will find compliance to be a simple checkbox

1756
01:04:04,720 --> 01:04:06,960
because they already have the logs and controls.

1757
01:04:06,960 --> 01:04:08,960
Organization still running legacy crawls

1758
01:04:08,960 --> 01:04:11,520
will face a total crisis as they try to modernize

1759
01:04:11,520 --> 01:04:13,040
and meet regulations at the same time.

1760
01:04:13,040 --> 01:04:15,520
The real question for the board is not whether this investment

1761
01:04:15,520 --> 01:04:18,240
is worth it but whether the company can afford to wait.

1762
01:04:18,240 --> 01:04:19,320
Waiting is not a neutral move.

1763
01:04:19,320 --> 01:04:21,440
It is a choice to fall behind and seed your advantage

1764
01:04:21,440 --> 01:04:22,760
to those who are moving now.

1765
01:04:22,760 --> 01:04:25,200
You are essentially accumulating technical debt

1766
01:04:25,200 --> 01:04:26,720
that you will have to pay off later

1767
01:04:26,720 --> 01:04:28,600
while everyone else is already moving forward.

1768
01:04:28,600 --> 01:04:29,840
That is the strategic mode.

1769
01:04:29,840 --> 01:04:31,520
It is not just about the technology

1770
01:04:31,520 --> 01:04:34,240
but the organizational intelligence that flows from it.

1771
01:04:34,240 --> 01:04:37,000
That is the future worth building toward.

1772
01:04:37,000 --> 01:04:40,200
The semantic shift from navigation to context.

1773
01:04:40,200 --> 01:04:42,120
Everything we have talked about so far.

1774
01:04:42,120 --> 01:04:44,640
Delta queries, security trimming, latency budgets

1775
01:04:44,640 --> 01:04:46,480
and compliance, these are just tools.

1776
01:04:46,480 --> 01:04:48,680
They are the mechanics that allow for a much bigger shift

1777
01:04:48,680 --> 01:04:50,960
in how your organization actually functions.

1778
01:04:50,960 --> 01:04:53,000
This change isn't really about the technology itself.

1779
01:04:53,000 --> 01:04:54,960
It is about how work actually happens.

1780
01:04:54,960 --> 01:04:57,080
For decades, we built enterprise software

1781
01:04:57,080 --> 01:04:58,840
on one big assumption.

1782
01:04:58,840 --> 01:05:00,920
We assumed that work starts with navigation.

1783
01:05:00,920 --> 01:05:02,800
When you need information, you go find it.

1784
01:05:02,800 --> 01:05:05,440
You open SharePoint and click through folders.

1785
01:05:05,440 --> 01:05:07,600
You log into Salesforce and search for an account.

1786
01:05:07,600 --> 01:05:10,200
You dig through your inbox to find one specific message.

1787
01:05:10,200 --> 01:05:11,880
In this old model, the system is passive.

1788
01:05:11,880 --> 01:05:13,760
It just sits there and waits for you to show up.

1789
01:05:13,760 --> 01:05:15,720
You are the navigator and the burden is on you

1790
01:05:15,720 --> 01:05:18,400
to find the way that model is dead.

1791
01:05:18,400 --> 01:05:20,480
Modern work does not start with navigation.

1792
01:05:20,480 --> 01:05:21,760
It starts with context.

1793
01:05:21,760 --> 01:05:23,080
You might be in the middle of a meeting

1794
01:05:23,080 --> 01:05:25,960
and need to know exactly what happened with a customer last month.

1795
01:05:25,960 --> 01:05:27,360
Maybe you are writing a proposal

1796
01:05:27,360 --> 01:05:29,440
and need the latest market data right now.

1797
01:05:29,440 --> 01:05:30,800
You could be debugging a system

1798
01:05:30,800 --> 01:05:33,160
and need to see every code change from the last hour.

1799
01:05:33,160 --> 01:05:34,760
In those moments, you do not want to stop

1800
01:05:34,760 --> 01:05:36,600
what you are doing to go on a scavenger hunt.

1801
01:05:36,600 --> 01:05:39,280
You want the information to show up where you already are.

1802
01:05:39,280 --> 01:05:40,440
You want the system to be active

1803
01:05:40,440 --> 01:05:42,040
so you can stay focused on the task.

1804
01:05:42,040 --> 01:05:44,600
This is the big inversion that real-time discovery

1805
01:05:44,600 --> 01:05:45,600
makes possible.

1806
01:05:45,600 --> 01:05:47,920
We are moving away from the idea of searching for things

1807
01:05:47,920 --> 01:05:48,840
when we need them.

1808
01:05:48,840 --> 01:05:50,840
Instead, the system sees what you are working on

1809
01:05:50,840 --> 01:05:52,160
and brings you what matters.

1810
01:05:52,160 --> 01:05:53,640
This shift changes everything.

1811
01:05:53,640 --> 01:05:56,120
It changes what your users expect from their tools

1812
01:05:56,120 --> 01:05:59,120
and it changes how you have to design your intelligence layer.

1813
01:05:59,120 --> 01:06:02,960
Legacy Systems force every employee to act like an information architect.

1814
01:06:02,960 --> 01:06:04,480
They have to remember where data lives.

1815
01:06:04,480 --> 01:06:06,320
They have to guess which system to search first.

1816
01:06:06,320 --> 01:06:09,400
They have to build search queries that match how the computer thinks,

1817
01:06:09,400 --> 01:06:10,760
rather than how they think.

1818
01:06:10,760 --> 01:06:13,840
Real-time discovery gets rid of that requirement entirely.

1819
01:06:13,840 --> 01:06:15,960
The user just focuses on the goal

1820
01:06:15,960 --> 01:06:18,320
and the system figures out which data is relevant

1821
01:06:18,320 --> 01:06:20,000
and surfaces it automatically.

1822
01:06:20,000 --> 01:06:23,040
To make this work, you need infrastructure that understands relationships.

1823
01:06:23,040 --> 01:06:24,880
That is exactly what the Graph API is for.

1824
01:06:24,880 --> 01:06:26,320
It is not just another endpoint.

1825
01:06:26,320 --> 01:06:29,000
It is the structure that lets the system understand

1826
01:06:29,000 --> 01:06:32,240
that because you are in this specific meeting with these specific people,

1827
01:06:32,240 --> 01:06:34,640
these three documents are the only ones that matter right now.

1828
01:06:34,640 --> 01:06:37,800
The Graph is the model that makes context-driven work a reality.

1829
01:06:37,800 --> 01:06:40,480
By the year 2026, this will be the standard.

1830
01:06:40,480 --> 01:06:42,320
Users will not have the patience for systems

1831
01:06:42,320 --> 01:06:44,160
that force them to click through menus.

1832
01:06:44,160 --> 01:06:45,680
They will not accept a tool that claims

1833
01:06:45,680 --> 01:06:48,440
it doesn't know a piece of information is relevant to the task at hand.

1834
01:06:48,440 --> 01:06:52,080
They will expect their tools to understand the work and bring intelligence to them.

1835
01:06:52,080 --> 01:06:55,280
Organizations that fail to adapt will feel primitive and slow,

1836
01:06:55,280 --> 01:06:58,040
like they are trying to compete with one hand tied behind their back.

1837
01:06:58,040 --> 01:07:01,720
The problem is that your current system probably was not designed for this.

1838
01:07:01,720 --> 01:07:04,400
Most enterprise setups were built for structure and browsing.

1839
01:07:04,400 --> 01:07:07,520
Moving to real-time discovery is not just a small performance patch.

1840
01:07:07,520 --> 01:07:11,320
It is a total redesign of how information flows into the hands of your people.

1841
01:07:11,320 --> 01:07:14,320
It is the difference between a system that just stores data

1842
01:07:14,320 --> 01:07:16,560
and a system that actually understands work.

1843
01:07:16,560 --> 01:07:18,160
Concrete implementation plan.

1844
01:07:18,160 --> 01:07:21,960
This kind of transformation does not happen because of a slide deck or a grand strategy.

1845
01:07:21,960 --> 01:07:23,840
It happens through small concrete steps.

1846
01:07:23,840 --> 01:07:26,200
First, you need to audit your current state.

1847
01:07:26,200 --> 01:07:29,200
Start by measuring your search latency at the P95%ile

1848
01:07:29,200 --> 01:07:31,280
to see the real user experience.

1849
01:07:31,280 --> 01:07:33,040
You also need to measure staleness,

1850
01:07:33,040 --> 01:07:37,480
which is the time it takes for a new document to actually show up in a search result.

1851
01:07:37,480 --> 01:07:41,880
Finally, look at your coverage to see what percentage of your company knowledge is even indexed.

1852
01:07:41,880 --> 01:07:45,040
These numbers will show you exactly where you are starting from.

1853
01:07:45,040 --> 01:07:46,520
Second, you need to run a pilot.

1854
01:07:46,520 --> 01:07:48,200
Do not try to fix everything at once.

1855
01:07:48,200 --> 01:07:53,200
Pick one high value data silo or one specific department that would benefit most from real-time discovery.

1856
01:07:53,200 --> 01:07:58,600
Build a Delta query pipeline for that one area and get their documents flowing into search in seconds instead of hours.

1857
01:07:58,600 --> 01:08:02,040
Once that is running, measure the impact on their decision-making speed.

1858
01:08:02,040 --> 01:08:05,680
See if they actually use search more often because the results are finally fresh.

1859
01:08:05,680 --> 01:08:08,280
Third, you have to map out your compliance needs.

1860
01:08:08,280 --> 01:08:10,400
We know that 2026 regulations are coming,

1861
01:08:10,400 --> 01:08:12,440
so you need to know which ones apply to you right now.

1862
01:08:12,440 --> 01:08:14,600
Figure out which data needs lineage tracking

1863
01:08:14,600 --> 01:08:17,320
and which regions have specific residency requirements.

1864
01:08:17,320 --> 01:08:22,360
If you build this map today, it will guide every architectural decision you make moving forward.

1865
01:08:22,360 --> 01:08:23,960
The transformation confirmation.

1866
01:08:23,960 --> 01:08:28,520
Moving from old-school crawling to graph API discovery isn't just a small update because in reality,

1867
01:08:28,520 --> 01:08:30,680
it's a total structural shift.

1868
01:08:30,680 --> 01:08:33,720
You are leaving behind periodic snapshots for live context.

1869
01:08:33,720 --> 01:08:36,600
You are moving from navigation first to context first.

1870
01:08:36,600 --> 01:08:38,120
The system stops being a library.

1871
01:08:38,120 --> 01:08:41,080
You have to search and start remembering your knowledge automatically.

1872
01:08:41,080 --> 01:08:42,200
But here is the problem.

1873
01:08:42,200 --> 01:08:44,240
The window to make this change is closing fast.

1874
01:08:44,240 --> 01:08:45,920
Your competitors are already moving,

1875
01:08:45,920 --> 01:08:49,520
and they will gain a structural advantage that only gets stronger over time.

1876
01:08:49,520 --> 01:08:53,440
By 2026, your compliance obligations will force you to modernize anyway.

1877
01:08:53,440 --> 01:08:56,000
It is much better to build this intentionally on your own timeline

1878
01:08:56,000 --> 01:08:58,360
than to scramble when new regulations demand it.

1879
01:08:58,360 --> 01:09:00,480
If this changed how you think about enterprise search,

1880
01:09:00,480 --> 01:09:02,760
follow me, Mercopeters, on LinkedIn.

1881
01:09:02,760 --> 01:09:05,360
Let's talk about what this looks like in your organization.

Mirko Peters

Founder of m365.fm, m365.show and m365con.net

Mirko Peters is a Microsoft 365 expert, content creator, and founder of m365.fm, a platform dedicated to sharing practical insights on modern workplace technologies. His work focuses on Microsoft 365 governance, security, collaboration, and real-world implementation strategies.

Through his podcast and written content, Mirko provides hands-on guidance for IT professionals, architects, and business leaders navigating the complexities of Microsoft 365. He is known for translating complex topics into clear, actionable advice, often highlighting common mistakes and overlooked risks in real-world environments.

With a strong emphasis on community contribution and knowledge sharing, Mirko is actively building a platform that connects experts, shares experiences, and helps organizations get the most out of their Microsoft 365 investments.