Microsoft Cowork IQ Implementation: Architecting Scalable Knowledge Graphs for Modern Hybrid Workforces


In this episode of the Microsoft 365-focused podcast from m365.fm, the discussion explores how organizations can implement intelligent knowledge graph architectures to support modern hybrid work environments.
The episode examines how Microsoft technologies such as Microsoft Graph, Microsoft 365, AI-powered search, and Copilot-related capabilities help connect people, content, conversations, and business processes into a unified knowledge ecosystem. By building scalable knowledge graphs, organizations can improve information discovery, reduce data silos, and deliver more relevant insights to employees when they need them.
Key topics include data integration strategies, metadata management, governance, security, identity resolution, and ensuring that AI systems respect existing permissions and compliance requirements. The conversation also highlights practical business scenarios such as expertise discovery, employee onboarding, project collaboration, knowledge sharing, and organizational intelligence.
The episode emphasizes that successful knowledge graph implementations require more than technology alone. Strong governance, high-quality data, clear ownership models, and phased deployment strategies are critical for long-term success. Listeners gain insight into how enterprise architects and IT leaders can design scalable foundations that enable AI-driven productivity while maintaining trust, security, and operational control.
Overall, the episode provides a practical look at how knowledge graphs and AI can work together to create smarter, more connected digital workplaces for distributed and hybrid teams.
Microsoft Cowork IQ implementation changes how you approach hybrid work. You gain smarter access to knowledge and easier management across your enterprise. AI now enables context-driven retrieval, moving beyond traditional search. With scalable knowledge graphs, you break through information silos and unlock real business value. Microsoft leads this transformation, drawing on trillions of productivity signals and insights from 20,000 AI users worldwide.
| Key Insights | Data Points |
|---|---|
| Productivity Signals | Trillions of anonymized signals |
| AI User Surveys | 20,000 users, 10 countries |
| Organizational Impact | 2× impact over individual skills |
AI elevates your work by providing context-aware insights and supporting innovative business decisions.
Key Takeaways
- Microsoft Cowork IQ transforms hybrid work by providing smarter access to knowledge and easier management across your enterprise.
- AI-driven tools enhance real-time collaboration, automate workflows, and deliver contextual insights tailored to your organization’s needs.
- Scalable knowledge graphs unify data from various sources, breaking down information silos and improving data security.
- Expertise mapping helps you connect with colleagues who have the right skills, fostering collaboration and speeding up problem-solving.
- Predictive knowledge retrieval anticipates your needs, delivering relevant information before you even ask, saving you time.
- Seamless integration with Microsoft 365 tools ensures smooth workflows and secure data access across platforms.
- Regular monitoring and optimization of Cowork IQ implementation help maintain security, improve efficiency, and drive better business outcomes.
- Effective onboarding and support for teams enhance retention and productivity, making your hybrid work environment more effective.
What Is Microsoft Cowork IQ?
Purpose for Hybrid Teams
You face new challenges as hybrid work becomes the standard. Microsoft Cowork IQ helps you overcome these challenges by making knowledge access and collaboration seamless. This solution uses AI-driven tools to connect your team, no matter where you work. You can collaborate in real time, automate complex workflows, and receive insights that fit your organization’s needs.
Microsoft Cowork IQ enhances collaboration and knowledge access for hybrid teams by integrating AI-driven tools that facilitate real-time collaboration, automate complex workflows, and provide contextual insights based on organizational data. This includes features like passive observation in applications, shared cursor experiences for concurrent editing, and meeting facilitation that tracks viewpoints and summarizes discussions.
You benefit from features such as:
- Real-time collaboration through AI-powered tools.
- Automation of complex workflows across your favorite applications.
- Contextual insights that help you make better decisions.
- Shared cursor functionality for simultaneous editing.
- Meeting facilitation that tracks and summarizes discussions.
These capabilities ensure you always have the right information at your fingertips. You can focus on your work instead of searching for documents or waiting for updates. Cowork IQ brings your enterprise knowledge together, making teamwork smarter and easier.
Core Architecture Layers
Microsoft Cowork IQ stands out because of its robust architecture. You gain a solution built on three powerful layers that work together to deliver a smarter enterprise experience.
| Layer | Description |
|---|---|
| Data | Encompasses emails, files, meetings, and chats, representing how work is actually performed. |
| Memory | Captures individual styles, preferences, and workflows, providing a personalized work experience. |
| Inference | Connects data and memory to predict the next best actions, enhancing decision-making capabilities. |
Each layer plays a unique role:
- The Data layer gathers information from across your enterprise, including emails, files, meetings, and chats.
- The Memory layer learns your preferences and workflows, so your experience feels personal and efficient.
- The Inference layer uses AI to connect everything, predicting what you need next and helping you act faster.
Cowork IQ also offers features that set it apart from other solutions:
| Feature | Copilot | Cowork |
|---|---|---|
| Primary Role | Assists with tasks | Coordinates and executes tasks |
| Task complexity | Primarily single step | Multi-step workflows |
| Interaction patterns | Prompts to response | Prompts to plan to execution (with permissions) |
| Permissions & control | Operates within user permissions | Inherits permissions and executes actions within limits, requesting user approval for certain tasks |
| Autonomy | Low (assistive) | Moderate (permission-based executions) |
| Operational scope | Within individual applications | Across applications and workflows |
| Outcome | Content and insights | Completed actions and workflows |
You can plan and execute multi-step tasks, manage long-running workflows, and integrate data from across the Microsoft 365 ecosystem. This architecture ensures your enterprise knowledge is always accessible, actionable, and secure.
Hybrid Workforce Challenges

Knowledge Fragmentation
You face a growing challenge as hybrid work becomes the standard. Information now lives in many places—emails, chats, cloud drives, and project management tools. This fragmentation makes it difficult to find what you need, when you need it. Experts have observed that while remote work brings flexibility, it also creates long-term issues for businesses. You may notice that your team feels less connected to the organization's culture. Collaboration weakens, and work processes become disrupted. You need the right tools to be effective, but scattered information slows you down.
Tip: Centralizing your knowledge sources helps reduce wasted time and ensures everyone stays aligned.
You may also struggle to coordinate resources, both at home and on-site. This lack of connection can lead to missed opportunities and slower decision-making. When knowledge is fragmented, you spend more time searching and less time doing meaningful work.
Collaboration Barriers
Hybrid teams often run into barriers that make collaboration harder. Communication challenges arise when you cannot meet face-to-face. Gaps in shared awareness can cause confusion. You might experience misalignment in timing and expectations, which hinders productivity. Meeting overload has become common, with a 252% increase in meetings during the pandemic. Too many meetings can lead to burnout and lower engagement.
- Communication tools can both help and limit your ability to connect.
- Inconsistent use of these tools leads to missed updates and uneven participation.
- Remote members may feel excluded from informal decisions, which reduces collaboration.
You need clear processes and reliable tools to keep everyone on the same page. Without them, your team risks falling behind and losing momentum in their work.
Information Silos
Information silos present another major challenge for hybrid organizations. Data silos are isolated collections of information that other teams or systems cannot access easily. This isolation blocks collaboration and unified analysis. A recent study by IDC shows that data silos cost the global economy $3.1 trillion each year. According to Experian, 40% of business-critical data remains trapped in silos. A Harvard Business Review survey found that 84% of executives report negative effects from these silos.
Note: Silos often result from mismatched business logic across departments, making data sets misaligned and hard to access.
When you cannot share or access important information, your team cannot work efficiently. Silos slow down innovation and make it harder to respond to new challenges. Breaking down these barriers is essential for smarter, more connected work.
Cowork IQ Implementation and Data Architecture
Scalable Knowledge Graphs
You need a solution that brings together data from every corner of your enterprise. Cowork IQ implementation uses scalable knowledge graphs to unify information across SharePoint, Teams, and OneDrive. These graphs connect emails, files, meetings, and chats, creating a single source of truth. You gain a unified experience that eliminates fragmentation and supports seamless collaboration.
- Cowork IQ integrates data from SharePoint, Teams, and OneDrive.
- Semantic understanding layers enhance retrieval, making information more accessible.
- Copilot leverages these graphs to provide context-driven insights.
- Custom toolsets execute agentic skills, allowing Copilot to observe, retrieve, reason, and execute tasks.
- Semantic layers capture procedural knowledge from business workflows.
You benefit from a system that understands the intent behind your actions. The knowledge graph adapts as your organization grows, ensuring scalability and flexibility. You can trust that your data security remains intact, even as you expand your cloud solutions.
Tip: Scalable knowledge graphs help you break down information silos and improve data security across your enterprise.
Microsoft Graph and Semantic Index
Microsoft Graph acts as the relationship and context engine within Cowork IQ implementation. You see connections between people, documents, and organizational relationships. The Semantic Index maps organizational data, enhancing search relevance and accuracy. Together, these components create a meaning-aware retrieval experience.
- Microsoft Graph connects elements within Microsoft 365, supporting context-driven data integration.
- Semantic Index improves search accuracy by mapping organizational boundaries and permission structures.
- AI interprets intent and organizational language, moving beyond simple keyword matching.
- Work IQ helps Copilot understand skill profiles and collaboration frequency.
You experience smarter search results that respect your enterprise’s security and data governance. The system reasons across relationships, delivering insights that matter. You can rely on AI to surface information that fits your needs, not just your search terms.
Note: Microsoft Graph and Semantic Index ensure that your data security and privacy are maintained while delivering relevant knowledge.
Metadata and Governance
Metadata forms the backbone of data security and governance in Cowork IQ implementation. You need high-quality metadata to ensure AI retrieves accurate and trustworthy information. Metadata tags describe data, track its origin, and enforce compliance rules. You gain control over sensitive information and maintain regulatory standards.
| Metadata Function | Benefit to You |
|---|---|
| Describes data | Improves retrieval accuracy |
| Tracks origin | Supports compliance and audit trails |
| Enforces permissions | Protects sensitive information |
| Supports governance | Maintains data security and privacy |
| Enables AI reasoning | Reduces errors and enhances trust |
You can configure metadata to match your enterprise’s needs. Governance policies ensure that only authorized users access sensitive data. AI systems rely on metadata to reason over organizational knowledge, reducing mistakes and supporting compliance.
- Metadata enhances retrieval accuracy and reduces errors.
- Governance enforces data security and regulatory compliance.
- Continuous monitoring ensures your enterprise stays protected.
Alert: Strong metadata and governance are essential for maintaining data security and supporting AI-driven solutions.
Cowork IQ implementation empowers you to unify knowledge, protect your data, and drive smarter decisions. You gain a robust architecture that adapts to your enterprise’s needs, supports cloud solutions, and ensures data security at every step.
AI Intelligence Layer
Predictive Knowledge Retrieval
You experience a new level of efficiency with predictive knowledge retrieval. The AI intelligence layer in Cowork IQ uses advanced agents to anticipate your needs and deliver relevant information before you even ask. Work IQ interprets your intent and organizational language, so you find the right documents even if your phrasing differs. This approach personalizes support for every role and ensures you always access accurate data. Fabric IQ organizes business data into a unified semantic layer, which guarantees consistent definitions and metrics across all sources. You gain trust in insights and improve data governance. The system adapts to your workflow, making knowledge retrieval smarter and faster.
Tip: Predictive retrieval reduces wasted time and helps you focus on meaningful tasks.
Integration with Microsoft 365 Copilot
Cowork IQ offers seamless integration with Microsoft 365 Copilot, enhancing your productivity and workflow. You delegate tasks to agents on-the-go, allowing work to progress in the background. Built-in and custom skills streamline processes, so you apply consistent methods across projects. Deeper tool connections link Microsoft products and third-party solutions, creating a cohesive work environment. Integration capabilities ensure you move smoothly between applications, and agents execute tasks securely within your permission boundaries. You benefit from a unified experience that supports both cloud and on-premises data, maintaining security at every step.
- Seamless task delegation boosts productivity.
- Reusable skills improve efficiency.
- Deeper tool connections enhance your workflow.
- Integration supports secure data access across platforms.
Note: Integration with Copilot empowers you to automate complex workflows and maintain data security.
Context-Aware Insights
You receive context-aware insights that drive informed decisions. Cowork IQ securely accesses structured and unstructured data from Microsoft 365, Dynamics 365, Power Apps, and other connected systems. The AI intelligence layer reasons over data across your organization, providing insights tailored to your context. During company earnings calls, leaders highlight the importance of the data underneath Microsoft 365, which contains valuable information about people, relationships, projects, and communications. You ingest business data from other systems using Copilot Connectors, enabling agents to reason over data from WorkDay, Adobe, and hundreds of other sources. This capability enhances productivity and decision-making while maintaining strict security standards.
| Source | Data Type | Security Level |
|---|---|---|
| Microsoft 365 | Structured/Unstructured | High |
| Dynamics 365 | Structured | High |
| Power Apps | Unstructured | High |
| WorkDay/Adobe | Structured | High |
You trust that context-aware insights respect your privacy and security. The AI capabilities in Cowork IQ ensure your data remains protected, and agents operate within secure boundaries. You unlock the full potential of your enterprise data, driving innovation and collaboration.
Alert: Context-aware insights help you make faster, smarter decisions while maintaining data security.
Core Features of Cowork IQ

Real-Time Discovery
You gain immediate access to information with real-time discovery. Cowork IQ implementation uses knowledge graphs to connect data from emails, files, meetings, and chats. This feature helps users in hybrid teams retrieve information quickly, even when data is scattered across platforms. Entity resolution consolidates duplicate organizational concepts, so you find relevant data without confusion. The system shifts from reactive search to predictive intelligence, surfacing knowledge before you ask. You benefit from context-aware retrieval, which is essential for users who lack informal knowledge transfer in hybrid work environments.
- Real-time discovery enhances information retrieval for users.
- Knowledge graphs unify fragmented data sources.
- Entity resolution improves the quality of knowledge access.
- Predictive intelligence surfaces relevant data proactively.
- Context-aware retrieval supports users in hybrid teams.
Tip: Real-time discovery reduces wasted time and ensures users always have the right data at their fingertips.
Expertise Mapping
You identify and connect with experts across your organization using expertise mapping. Cowork IQ implementation analyzes data from productivity tools, meetings, and chats to build a dynamic map of skills and knowledge. Users can locate colleagues with specific expertise, which accelerates problem-solving and fosters collaboration. The system uses AI to track skill profiles and collaboration frequency, so you always know who can help with a task. Expertise mapping breaks down silos and supports users in finding the right resources.
| Feature | Benefit to Users |
|---|---|
| Skill profiles | Locate experts easily |
| Collaboration data | Identify frequent collaborators |
| AI-driven mapping | Connect users to resources |
You leverage expertise mapping to build stronger teams and improve work outcomes. Users gain access to knowledge that might otherwise remain hidden, which boosts productivity and innovation.
Seamless Integration
You experience seamless integration with Microsoft 365 tools and other productivity tools. Cowork IQ implementation connects data across SharePoint, Teams, OneDrive, and third-party tools. Users move smoothly between applications, and agents execute tasks securely within permission boundaries. Integration capabilities ensure data security and compliance, so users trust the system with sensitive information. The platform supports both cloud and on-premises data, maintaining high security standards.
- Integration links Microsoft 365 tools and productivity tools.
- Secure execution protects user data.
- Compliance features support regulatory requirements.
- Users access data across platforms without barriers.
- Integration capabilities enhance workflow efficiency.
Note: Seamless integration empowers users to automate workflows, maintain data security, and improve collaboration.
Cowork IQ implementation delivers capabilities that make hybrid work smarter and easier. You benefit from real-time discovery, expertise mapping, and integration with productivity tools. The platform ensures data security, supports users, and drives innovation across your organization.
Implementation Steps
Assess Needs and Data Sources
You begin your Cowork IQ implementation by assessing your organization’s needs and identifying all relevant data sources. Start by evaluating the data layer, which provides secure access to structured and unstructured data from Microsoft 365, Dynamics 365, and Power Apps. Understanding Microsoft Graph data is essential because it includes collaboration and communication patterns across your Microsoft 365 tenant. Integrate Dynamics 365 and Power Apps data through Dataverse to connect productivity data with systems-of-record data. This integration enhances decision-making and ensures your workflows run smoothly.
You can extend your data sources by using Copilot connectors, which allow you to incorporate external business data from non-Microsoft systems. Survey employees about job friction and address risk factors that impact productivity. Track employee feedback and act on it to improve the hybrid work environment. Ensure technological accessibility for remote employees, including accommodations and reliable internet access. Set realistic expectations for employees and offer flexibility in work hours and locations. Build trust by providing connection opportunities, offering mental health services, and celebrating efforts.
Tip: Make work personally and professionally meaningful by supporting individual interests and encouraging professional development.
Configure Knowledge Graphs
Once you have assessed your needs and data sources, configure your knowledge graphs to unify information across platforms. Knowledge graphs connect emails, files, meetings, and chats, creating a single source of truth for your organization. Use semantic layers to enhance retrieval, making data more accessible and actionable. Entity resolution consolidates duplicate organizational concepts, so you find relevant data without confusion. Configure metadata tags to describe data, track its origin, and enforce compliance rules. This process ensures that sensitive information remains protected and that your workflows are governed by strong security standards.
You can customize your knowledge graphs to match your enterprise’s needs. Governance policies ensure only authorized users access sensitive data. AI systems rely on metadata to reason over organizational knowledge, reducing mistakes and supporting compliance. Continuous monitoring keeps your enterprise protected and adapts as your organization grows.
Note: Scalable knowledge graphs help you break down information silos and improve data security across your enterprise.
Onboard Teams
Onboarding your teams is a critical step in Cowork IQ implementation. Facilitate introductions to cross-functional teammates to help new hires feel part of the big picture. Assign a peer or onboarding buddy to provide guidance and support. Use ice-breakers, welcome sessions, and one-on-one check-ins to incorporate new team members. Design the first day with connection in mind, such as a kickoff video call or a welcome message from leadership.
Start the onboarding process before the new employee’s first day. Integrate company culture into the onboarding process and use mentorship programs to help new hires adapt quickly. Pre-boarding allows engagement to start as soon as the job offer is accepted. Create a resource hub so new employees can find data and information quickly. These strategies improve retention and boost productivity, making your workflows more efficient.
Alert: Effective onboarding helps new hires integrate into your company, improves retention, and accelerates productivity.
Monitor and Optimize
You need to monitor and optimize your Cowork IQ implementation to ensure your organization gets the most value. Start by tracking how your teams use the system. Log meaningful events, such as when users access important data or complete key workflows. Use correlation IDs to connect actions across different platforms. This approach helps you understand how information flows and where you can improve efficiency.
Set up alerts for unusual activity or potential issues. These alerts keep you informed and allow you to respond quickly. Review dashboards as part of your regular operations. Dashboards give you a clear view of how your data moves through the system and how well your workflows perform. You can spot trends, identify bottlenecks, and make informed decisions.
Tip: Regular monitoring helps you catch problems early and keeps your data secure.
You should also focus on building organizational maturity and trust in AI. Make sure your data is accurate and well-structured. High-quality data supports better decision-making and improves the performance of your AI tools. Document and standardize your workflows so everyone follows the same process. Clear governance over agent capabilities ensures only authorized users can create or manage agents. Plan your licensing and cost strategy to control expenses and maximize value.
| Strategy | Description |
|---|---|
| Organizational maturity and trust in AI | Build trust and readiness for AI-based systems. |
| Quality, well-structured data | Ensure your ERP data is accurate and organized. |
| Defined and stable processes | Document and standardize your finance workflows. |
| Clear governance over agent capabilities | Set rules for who can create and manage agents. |
| Licensing and cost strategy | Plan your resources to control costs and maximize value. |
Cowork IQ creates a collaborative workspace for your teams. You can review documents, approve transactions, and discuss data together in real time. Shared approvals mean you do not wait for emails or chase down signatures. Everyone sees the same information, so you make decisions faster and with more confidence. The system automates approval workflows by routing documents to the right people. It tracks every action, ensuring transparency and building trust across your organization.
To optimize your Cowork IQ environment, analyze usage patterns and feedback. Adjust your data sources and workflows as your business evolves. Train your teams on new features and best practices. Continuous improvement keeps your system aligned with your goals and ensures you get the most from your data.
Note: Monitoring and optimizing Cowork IQ helps you maintain security, improve efficiency, and drive better business outcomes.
Business Impact
Productivity Gains
You see measurable productivity gains when you implement Cowork IQ in your enterprise. The transition from AI as an assistant to a digital colleague changes how you approach work. Copilot Cowork plans and executes tasks, allowing you to focus on judgment and value creation. You use productivity tools that support cognitive tasks like analysis and problem-solving. Nearly half of Microsoft 365 Copilot conversations help users complete higher-value work. You notice that AI changes work design at the organizational level, not just for individual tasks. Teams coordinate work more efficiently, reducing friction and improving business outcomes. You rely on integration with productivity tools to automate workflows and accelerate insight generation. You emphasize quality control in AI outputs, which highlights the importance of human judgment. You benefit from capabilities that support automation and insight acceleration, especially in regulated environments. You experience a boost in productivity as you spend less time on repetitive tasks and more time on strategic business activities.
Tip: Use Cowork IQ to streamline your daily workflows and unlock higher-value business opportunities.
| Key Finding | Description |
|---|---|
| AI's Role in Work Design | AI changes how you design work, enhancing productivity across the enterprise. |
| Cognitive Work Support | 49% of Copilot conversations support analysis and problem-solving. |
| Importance of Human Judgment | 50% of users focus on quality control in AI outputs. |
| Copilot Cowork's Impact | Teams coordinate work and reduce friction, leading to measurable outcomes. |
Improved Collaboration
You improve collaboration across your enterprise with Cowork IQ. You connect users from different departments, breaking down silos and enabling seamless integration of data and tools. You use solutions that unify communication channels, making it easier for users to share information and coordinate tasks. You rely on context-aware capabilities that surface relevant data and insights for users in real time. You access business data securely, ensuring data security and compliance at every step. You use integration features to link productivity tools, allowing users to move smoothly between applications. You build stronger teams by mapping expertise and connecting users to resources. You foster a culture of collaboration where users share knowledge and drive innovation. You trust that your data security remains intact as you collaborate across platforms.
Note: Enhanced collaboration leads to faster decision-making and improved business outcomes for your enterprise.
Case Studies
You see real-world results when you use Cowork IQ in enterprise settings. You use tools that analyze CRM data to prioritize strategic accounts, reducing the time and effort required for sales leaders. You consolidate inputs from meetings and communications to create annual plans and presentations, streamlining the planning process. You generate personalized daily dashboards that compile essential business data for executives, improving productivity and providing a clear overview of tasks and priorities. You rely on integration capabilities to connect productivity tools and ensure data security. You use solutions that support users in regulated environments, enabling automation and insight acceleration.
| Use Case | Description |
|---|---|
| Strategic Account Prioritization | Sales leaders identify top accounts by analyzing CRM data and generating materials in minutes, saving time and effort. |
| Annual Planning and OKR Development | Cowork IQ consolidates meeting inputs to create coherent annual plans, allowing leaders to refine outputs efficiently. |
| Daily Executive Dashboard | Personalized dashboards compile essential business data, enhancing productivity and providing a comprehensive overview. |
You unlock the full potential of your enterprise data with Cowork IQ. You drive productivity, improve collaboration, and achieve measurable business outcomes. You trust that your data security and integration capabilities support users at every level.
You unlock smarter hybrid work with Cowork IQ implementation. AI connects your knowledge, drives productivity, and supports seamless collaboration. You gain strong governance that protects your business data. Real-time discovery and expertise mapping help you work faster and smarter. You boost productivity by automating tasks and surfacing insights. Explore Cowork IQ implementation to transform your work. Request a demo or consult Microsoft experts to start your journey.
- Request a demo for hands-on experience.
- Connect with Microsoft experts for tailored solutions.
FAQ
What are autonomous agents in Cowork IQ?
You use autonomous agents to complete tasks without constant supervision. These agents analyze your data, make decisions, and execute actions. They help you automate repetitive work and improve efficiency. Autonomous agents support your enterprise ai roadmap by enabling smarter, faster business processes.
How does Cowork IQ support deep integration with Microsoft 365?
You benefit from deep integration because Cowork IQ connects seamlessly with Microsoft 365 tools. This integration lets you access data, automate workflows, and use ai agents across SharePoint, Teams, and OneDrive. You experience a unified workspace that boosts productivity and collaboration.
Can I use Cowork IQ for enterprise use in the cloud?
You can deploy Cowork IQ for enterprise use in the cloud. This approach gives you secure access to your data from anywhere. You manage your workflows, automate tasks, and use ai tools for business while maintaining compliance and data protection.
How do agents improve automation in hybrid work?
Agents handle routine tasks and automate complex workflows. You assign tasks to agents, and they execute them autonomously. This automation reduces manual effort and lets you focus on higher-value work. Agents help you streamline operations and drive better business outcomes.
What is the role of autonomous ai in Cowork IQ?
Autonomous ai powers agents that learn from your actions and adapt to your needs. You rely on these systems to make decisions, retrieve information, and execute tasks. Autonomous ai increases efficiency and supports your organization’s digital transformation.
How does Cowork IQ fit into my enterprise ai roadmap?
You use Cowork IQ as a foundation for your enterprise ai roadmap. The platform provides scalable knowledge graphs, autonomous agents, and automation tools. These features help you build a future-ready organization that leverages ai for smarter decision-making and innovation.
Are ai tools for business secure in Cowork IQ?
You trust Cowork IQ to protect your data. The platform uses advanced security measures and governance policies. You control access, monitor activity, and ensure compliance. Ai tools for business operate within strict boundaries, keeping your information safe.
🚀 Want to be part of m365.fm?
Then stop just listening… and start showing up.
👉 Connect with me on LinkedIn and let’s make something happen:
- 🎙️ Be a podcast guest and share your story
- 🎧 Host your own episode (yes, seriously)
- 💡 Pitch topics the community actually wants to hear
- 🌍 Build your personal brand in the Microsoft 365 space
This isn’t just a podcast — it’s a platform for people who take action.
🔥 Most people wait. The best ones don’t.
👉 Connect with me on LinkedIn and send me a message:
"I want in"
Let’s build something awesome 👊
1
00:00:00,000 --> 00:00:02,440
Most organizations think they have an AI problem.
2
00:00:02,440 --> 00:00:04,040
They don't, they have an architecture problem.
3
00:00:04,040 --> 00:00:05,760
The co-pilot demo looked incredible.
4
00:00:05,760 --> 00:00:08,760
The procurement team signed off, IT rolled it out,
5
00:00:08,760 --> 00:00:11,760
and then it felt like a slightly smarter search bar.
6
00:00:11,760 --> 00:00:15,120
Answers were vague, citations pointed to three-year-old drafts.
7
00:00:15,120 --> 00:00:17,720
Sensitive information surfaced in places it shouldn't,
8
00:00:17,720 --> 00:00:19,240
and the business went back to doing things
9
00:00:19,240 --> 00:00:20,440
the way they always had.
10
00:00:20,440 --> 00:00:21,920
That's not a co-pilot failure.
11
00:00:21,920 --> 00:00:24,600
That's what happens when you deploy a 2026 AI capability
12
00:00:24,600 --> 00:00:26,480
on top of a 2010 knowledge architecture
13
00:00:26,480 --> 00:00:28,200
and expect the model to sort it out.
14
00:00:28,200 --> 00:00:29,440
Today we're going to fix that.
15
00:00:29,440 --> 00:00:32,080
We're going to walk through how to architect a scalable knowledge graph
16
00:00:32,080 --> 00:00:35,640
that turns Microsoft co-pilot IQ from a glorified search experience
17
00:00:35,640 --> 00:00:37,320
into something that actually reasons
18
00:00:37,320 --> 00:00:40,400
over your organization's knowledge, anticipates context,
19
00:00:40,400 --> 00:00:42,400
and gives you answers you can trust.
20
00:00:42,400 --> 00:00:44,600
If you're an IT pro or a decision maker,
21
00:00:44,600 --> 00:00:47,440
trying to figure out where co-pilot actually fits in your organization,
22
00:00:47,440 --> 00:00:49,760
subscribe to M365FM.
23
00:00:49,760 --> 00:00:52,400
Every episode is built around exactly this problem.
24
00:00:52,400 --> 00:00:54,160
Now, before we get to the architecture,
25
00:00:54,160 --> 00:00:57,480
we need to understand why the current model is broken.
26
00:00:57,480 --> 00:00:59,560
The 2026 information problem.
27
00:00:59,560 --> 00:01:01,120
Here's a number that should bother you.
28
00:01:01,120 --> 00:01:04,200
Knowledge workers still spend somewhere between 15 and 30%
29
00:01:04,200 --> 00:01:06,920
of their working week searching for information they already own.
30
00:01:06,920 --> 00:01:08,520
Not information that doesn't exist,
31
00:01:08,520 --> 00:01:10,840
not information stored somewhere inaccessible.
32
00:01:10,840 --> 00:01:13,240
Information the organization paid to create
33
00:01:13,240 --> 00:01:14,840
that lives inside its own systems
34
00:01:14,840 --> 00:01:17,160
that nobody can reliably find when they need it.
35
00:01:17,160 --> 00:01:18,520
And this isn't a new problem.
36
00:01:18,520 --> 00:01:20,360
It predates AI predates cloud collaboration
37
00:01:20,360 --> 00:01:22,560
predates Microsoft 365 entirely,
38
00:01:22,560 --> 00:01:23,880
but hybrid work didn't solve it.
39
00:01:23,880 --> 00:01:25,400
It made it structurally worse.
40
00:01:25,400 --> 00:01:26,480
Think about what used to happen
41
00:01:26,480 --> 00:01:28,160
when someone needed a quick answer.
42
00:01:28,160 --> 00:01:29,480
They walked down the hallway,
43
00:01:29,480 --> 00:01:31,080
caught the right person at their desk,
44
00:01:31,080 --> 00:01:32,840
and had it resolved in 30 seconds.
45
00:01:32,840 --> 00:01:35,000
That interaction was invisible to any system.
46
00:01:35,000 --> 00:01:36,760
It didn't generate a document or a ticket.
47
00:01:36,760 --> 00:01:39,320
It was just how organizations actually transferred knowledge.
48
00:01:39,320 --> 00:01:41,040
Hybrid work eliminated the hallway.
49
00:01:41,040 --> 00:01:43,800
The 30 second conversation became three slack threads,
50
00:01:43,800 --> 00:01:45,920
a team's meeting scheduled for next Thursday,
51
00:01:45,920 --> 00:01:48,200
and a document that may or may not reflect the decision
52
00:01:48,200 --> 00:01:49,520
that was actually made.
53
00:01:49,520 --> 00:01:52,560
The tools got significantly better over the last decade.
54
00:01:52,560 --> 00:01:54,680
Teams replaced email for a lot of communication.
55
00:01:54,680 --> 00:01:56,400
SharePoint got modern interfaces,
56
00:01:56,400 --> 00:01:58,360
one drive made file access frictionless.
57
00:01:58,360 --> 00:02:01,040
But the underlying model for how knowledge is organized
58
00:02:01,040 --> 00:02:02,040
didn't change at all.
59
00:02:02,040 --> 00:02:04,480
We took the filing cabinet metaphor and made it digital.
60
00:02:04,480 --> 00:02:07,040
Folders became libraries, draws became sites,
61
00:02:07,040 --> 00:02:09,320
and we called it a knowledge management strategy.
62
00:02:09,320 --> 00:02:10,840
So now you have an organization
63
00:02:10,840 --> 00:02:14,800
where 60 to 73% of enterprise data is never analyzed
64
00:02:14,800 --> 00:02:16,360
according to industry estimates.
65
00:02:16,360 --> 00:02:18,880
Not because it's locked away behind access controls,
66
00:02:18,880 --> 00:02:20,240
because it's undiscoverable,
67
00:02:20,240 --> 00:02:21,960
because it's buried in a SharePoint site,
68
00:02:21,960 --> 00:02:23,840
nobody remembered to name properly,
69
00:02:23,840 --> 00:02:26,240
tagged with metadata that was never standardized,
70
00:02:26,240 --> 00:02:28,120
connected to nothing else in the tenant.
71
00:02:28,120 --> 00:02:29,600
And then we point co-pilot at it
72
00:02:29,600 --> 00:02:31,760
and wonder why it produces shallow answers.
73
00:02:31,760 --> 00:02:34,440
The gap isn't compute power, it's not model quality.
74
00:02:34,440 --> 00:02:36,560
The large language models powering these tools
75
00:02:36,560 --> 00:02:38,200
are genuinely sophisticated.
76
00:02:38,200 --> 00:02:40,840
The gap is the absence of a semantic layer,
77
00:02:40,840 --> 00:02:43,280
a structure that gives the AI something coherent
78
00:02:43,280 --> 00:02:45,960
to reason over instead of a pile of unrelated files
79
00:02:45,960 --> 00:02:47,600
and disconnected conversations.
80
00:02:47,600 --> 00:02:49,120
What does that mean in practice?
81
00:02:49,120 --> 00:02:50,840
It means that when someone asks co-pilot
82
00:02:50,840 --> 00:02:52,400
a question about a customer account,
83
00:02:52,400 --> 00:02:54,240
the model is searching across content,
84
00:02:54,240 --> 00:02:56,920
it can't properly interpret, pulling context from documents,
85
00:02:56,920 --> 00:02:58,920
it can't rank by authority or recency
86
00:02:58,920 --> 00:03:01,600
and generating a response without any reliable grounding
87
00:03:01,600 --> 00:03:03,640
in what your organization actually knows.
88
00:03:03,640 --> 00:03:06,920
The model fills gaps with plausible language,
89
00:03:06,920 --> 00:03:09,280
it sounds confident, it's often partially wrong.
90
00:03:09,280 --> 00:03:10,480
And because it sounds confident,
91
00:03:10,480 --> 00:03:12,200
people either trusted when they shouldn't
92
00:03:12,200 --> 00:03:13,840
or they stop trusting it entirely
93
00:03:13,840 --> 00:03:15,400
and go back to manual search,
94
00:03:15,400 --> 00:03:18,000
both outcomes destroy the ROI case.
95
00:03:18,000 --> 00:03:19,880
The reason this matters so much in 2026
96
00:03:19,880 --> 00:03:22,280
specifically is that the expectation has shifted.
97
00:03:22,280 --> 00:03:24,840
Generative AI raised the bar for what enterprise knowledge
98
00:03:24,840 --> 00:03:26,360
tools are supposed to do.
99
00:03:26,360 --> 00:03:29,080
Users no longer expect to navigate, they expect answers.
100
00:03:29,080 --> 00:03:30,960
And the only way to deliver reliable answers
101
00:03:30,960 --> 00:03:34,240
is to build the semantic layer the AI needs to reason over.
102
00:03:34,240 --> 00:03:36,080
That layer starts with Microsoft Graph
103
00:03:36,080 --> 00:03:37,840
and understanding what Graph actually is,
104
00:03:37,840 --> 00:03:40,000
separate from what most people assume it is,
105
00:03:40,000 --> 00:03:43,040
changes everything about how you approach the architecture.
106
00:03:43,040 --> 00:03:45,160
What Microsoft Graph actually is.
107
00:03:45,160 --> 00:03:47,560
Most people hear Microsoft Graph and think search,
108
00:03:47,560 --> 00:03:49,240
that's the first thing to unlearn.
109
00:03:49,240 --> 00:03:51,800
Graph is not a search engine, it's not a content repository,
110
00:03:51,800 --> 00:03:54,560
it's the unified API surface that exposes relationships
111
00:03:54,560 --> 00:03:57,000
across every Microsoft 365 workload,
112
00:03:57,000 --> 00:04:00,000
SharePoint, OneDrive, Teams, Exchange, EnterID,
113
00:04:00,000 --> 00:04:02,840
Planner, Viva, and every connected service
114
00:04:02,840 --> 00:04:04,400
running on top of that stack.
115
00:04:04,400 --> 00:04:06,080
What makes it different from a search index
116
00:04:06,080 --> 00:04:08,760
is what it returns, a search engine returns documents,
117
00:04:08,760 --> 00:04:10,080
Graph returns context.
118
00:04:10,080 --> 00:04:12,400
When you query Microsoft Graph, you don't just get a file.
119
00:04:12,400 --> 00:04:14,080
You get the file, the person who created it,
120
00:04:14,080 --> 00:04:16,520
the people who've accessed it, the meetings where it was discussed,
121
00:04:16,520 --> 00:04:18,600
the permissions controlling who can see it,
122
00:04:18,600 --> 00:04:20,080
and the organizational relationships
123
00:04:20,080 --> 00:04:22,240
that connect all of those things.
124
00:04:22,240 --> 00:04:25,320
That context is what the semantic index for Copilot is built on,
125
00:04:25,320 --> 00:04:27,280
and that's why the quality of your graph data
126
00:04:27,280 --> 00:04:31,080
determines the quality of every AI answer your organization gets.
127
00:04:31,080 --> 00:04:32,600
To understand how Copilot actually works,
128
00:04:32,600 --> 00:04:34,680
you need to see the three layer stack it operates on.
129
00:04:34,680 --> 00:04:36,720
These layers are converging in 2026,
130
00:04:36,720 --> 00:04:39,160
and most organizations are only using one of them.
131
00:04:39,160 --> 00:04:41,400
The first layer is Microsoft Graph itself,
132
00:04:41,400 --> 00:04:43,120
the data and relationship plane.
133
00:04:43,120 --> 00:04:44,600
This is the raw API surface.
134
00:04:44,600 --> 00:04:46,920
It knows who works with whom, what content exists,
135
00:04:46,920 --> 00:04:49,520
what permissions apply, and how everything is connected.
136
00:04:49,520 --> 00:04:51,840
It's the foundation without it nothing else functions,
137
00:04:51,840 --> 00:04:53,960
but graph alone doesn't give you intelligence.
138
00:04:53,960 --> 00:04:55,160
It gives you access.
139
00:04:55,160 --> 00:04:57,560
The second layer is the semantic index for Copilot,
140
00:04:57,560 --> 00:04:58,880
and this is where things get interesting.
141
00:04:58,880 --> 00:05:01,640
The semantic index builds vector and graph-based representations
142
00:05:01,640 --> 00:05:02,880
of your tenant content.
143
00:05:02,880 --> 00:05:05,080
It's not indexing files in the traditional sense.
144
00:05:05,080 --> 00:05:08,560
It's encoding meaning, topics, relationships, concepts,
145
00:05:08,560 --> 00:05:10,120
and their proximity to each other,
146
00:05:10,120 --> 00:05:13,440
so that Copilot can retrieve content based on what a question means,
147
00:05:13,440 --> 00:05:15,520
rather than which keywords it contains.
148
00:05:15,520 --> 00:05:18,120
When someone asks about competitive positioning for a deal,
149
00:05:18,120 --> 00:05:20,600
the semantic index doesn't just match those words.
150
00:05:20,600 --> 00:05:22,480
It traverses the graph to find related emails,
151
00:05:22,480 --> 00:05:23,960
documents, meetings, and people,
152
00:05:23,960 --> 00:05:26,480
and surfaces the most contextually relevant content
153
00:05:26,480 --> 00:05:29,040
it can find within that user's permissions.
154
00:05:29,040 --> 00:05:31,160
The third layer is fabric semantic models,
155
00:05:31,160 --> 00:05:32,440
the business logic layer.
156
00:05:32,440 --> 00:05:34,000
This is where your structured data
157
00:05:34,000 --> 00:05:36,040
gets a business-friendly translation.
158
00:05:36,040 --> 00:05:37,600
Instead of raw tables and columns,
159
00:05:37,600 --> 00:05:40,560
you get measures, relationships, hierarchies, and definitions
160
00:05:40,560 --> 00:05:42,400
that reflect how your organization actually
161
00:05:42,400 --> 00:05:43,880
thinks about its data.
162
00:05:43,880 --> 00:05:46,000
Revenue isn't just a number in a database,
163
00:05:46,000 --> 00:05:48,200
it's a governed measure with a clear definition,
164
00:05:48,200 --> 00:05:51,720
tied to specific data sources, accessible to specific roles.
165
00:05:51,720 --> 00:05:54,440
Copilot can reason over this layer using natural language
166
00:05:54,440 --> 00:05:57,080
because the layer itself is designed to be interpretable.
167
00:05:57,080 --> 00:05:58,880
Here's what most organizations are missing.
168
00:05:58,880 --> 00:06:00,920
They've enabled the first layer, partially.
169
00:06:00,920 --> 00:06:02,960
They have Microsoft 365 running,
170
00:06:02,960 --> 00:06:04,360
graph is technically available,
171
00:06:04,360 --> 00:06:06,160
and Copilot has been licensed.
172
00:06:06,160 --> 00:06:07,880
But the semantic index is only as good
173
00:06:07,880 --> 00:06:10,920
as the content quality and metadata structure feeding into it,
174
00:06:10,920 --> 00:06:13,320
and the fabric semantic models either don't exist
175
00:06:13,320 --> 00:06:15,640
or haven't been designed with AI in mind.
176
00:06:15,640 --> 00:06:17,600
So Copilot hits the first layer,
177
00:06:17,600 --> 00:06:21,320
finds a pile of unstructured, untagged, disorganized content,
178
00:06:21,320 --> 00:06:23,120
and does the best it can with what's there.
179
00:06:23,120 --> 00:06:25,160
The result feels shallow because it is shallow,
180
00:06:25,160 --> 00:06:27,000
not because the technology is limited,
181
00:06:27,000 --> 00:06:29,600
because the inputs are, this is the architectural gap,
182
00:06:29,600 --> 00:06:32,000
organization's treat graph as plumbing,
183
00:06:32,000 --> 00:06:33,560
something that works in the background
184
00:06:33,560 --> 00:06:36,120
while users interact with teams and SharePoint.
185
00:06:36,120 --> 00:06:37,360
But in a Copilot deployment,
186
00:06:37,360 --> 00:06:38,880
graph is the knowledge substrate,
187
00:06:38,880 --> 00:06:41,080
the quality of the relationships it can express,
188
00:06:41,080 --> 00:06:43,360
the richness of the metadata attached to content,
189
00:06:43,360 --> 00:06:45,440
and the governance of the permissions it enforces
190
00:06:45,440 --> 00:06:47,360
are not infrastructure details.
191
00:06:47,360 --> 00:06:48,200
They are the product.
192
00:06:48,200 --> 00:06:49,880
They determine what Copilot knows,
193
00:06:49,880 --> 00:06:51,440
how confidently it can answer,
194
00:06:51,440 --> 00:06:53,000
and how much you can trust what it says.
195
00:06:53,000 --> 00:06:54,840
And right now, for most organizations,
196
00:06:54,840 --> 00:06:57,520
that substrate was never built for AI.
197
00:06:57,520 --> 00:06:59,760
Why most Copilot deployments under deliver?
198
00:06:59,760 --> 00:07:01,280
So let's talk about what actually happens
199
00:07:01,280 --> 00:07:02,920
when organizations deploy Copilot
200
00:07:02,920 --> 00:07:04,760
without fixing the foundation first,
201
00:07:04,760 --> 00:07:06,040
because this isn't theoretical.
202
00:07:06,040 --> 00:07:08,640
This is a pattern playing out across enterprises right now,
203
00:07:08,640 --> 00:07:11,280
and it follows a remarkably consistent sequence.
204
00:07:11,280 --> 00:07:14,080
The deployment goes live, the demo works.
205
00:07:14,080 --> 00:07:16,640
Someone asks Copilot to summarize a project,
206
00:07:16,640 --> 00:07:19,320
and it produces something coherent enough to be impressive.
207
00:07:19,320 --> 00:07:20,680
Then it goes into production.
208
00:07:20,680 --> 00:07:23,040
And within weeks, the feedback starts coming in.
209
00:07:23,040 --> 00:07:25,680
Answers reference documents that were superseded two years ago.
210
00:07:25,680 --> 00:07:27,920
Copilot surfaces a sensitive HR file
211
00:07:27,920 --> 00:07:29,440
in response to a general query,
212
00:07:29,440 --> 00:07:31,600
because that file was stored in a broadly shared site
213
00:07:31,600 --> 00:07:33,120
with no sensitivity label.
214
00:07:33,120 --> 00:07:34,760
Someone asks about pricing strategy
215
00:07:34,760 --> 00:07:38,240
and gets a response that blends three different draft proposals,
216
00:07:38,240 --> 00:07:40,280
none of them the current approved version.
217
00:07:40,280 --> 00:07:41,680
None of this is a bug in the model.
218
00:07:41,680 --> 00:07:45,160
It's the expected output of a system reasoning over the content it was given.
219
00:07:45,160 --> 00:07:47,640
The model didn't choose bad sources because it malfunctioned.
220
00:07:47,640 --> 00:07:50,640
It chose bad sources because good sources and bad sources
221
00:07:50,640 --> 00:07:52,920
looked identical to the retrieval layer.
222
00:07:52,920 --> 00:07:54,640
No metadata to distinguish them,
223
00:07:54,640 --> 00:07:57,080
no governance to separate current from deprecated,
224
00:07:57,080 --> 00:07:59,720
no permissions architecture to contain sensitive content
225
00:07:59,720 --> 00:08:01,640
to the people who should actually see it.
226
00:08:01,640 --> 00:08:05,040
Here's the specific mechanism that catches organizations off guard.
227
00:08:05,040 --> 00:08:07,320
Copilot doesn't grant new permissions.
228
00:08:07,320 --> 00:08:08,600
Microsoft is clear on this.
229
00:08:08,600 --> 00:08:11,840
The model can only surface content a user already has access to,
230
00:08:11,840 --> 00:08:15,360
but what organizations consistently underestimate is how much content
231
00:08:15,360 --> 00:08:17,920
their users already have access to that they shouldn't.
232
00:08:17,920 --> 00:08:20,160
Years of anyone with the link sharing,
233
00:08:20,160 --> 00:08:24,240
legacy sharepoint sites where entire departments were added as members and never removed.
234
00:08:24,240 --> 00:08:26,120
Groups that carried broad access permissions
235
00:08:26,120 --> 00:08:28,400
and were never recertified after projects ended.
236
00:08:28,400 --> 00:08:30,520
Copilot doesn't create the exposure problem.
237
00:08:30,520 --> 00:08:33,840
It makes the existing exposure problem trivially exploitable.
238
00:08:33,840 --> 00:08:35,760
One prompt and the semantic retrieval layer
239
00:08:35,760 --> 00:08:40,640
goes looking for anything relevant across every piece of content within that user's permission boundary.
240
00:08:40,640 --> 00:08:42,640
If the permission boundary is too wide,
241
00:08:42,640 --> 00:08:44,920
the surface area of that query is enormous.
242
00:08:44,920 --> 00:08:46,880
The metadata problem compounds this.
243
00:08:46,880 --> 00:08:49,520
When retrieval can't distinguish a current approved policy
244
00:08:49,520 --> 00:08:52,080
from a three-year-old draft sitting in the same library,
245
00:08:52,080 --> 00:08:54,120
the model synthesizes across both.
246
00:08:54,120 --> 00:08:57,880
When it can't determine which version of a proposal was actually sent to the client,
247
00:08:57,880 --> 00:08:59,880
it blends context from multiple versions
248
00:08:59,880 --> 00:09:02,160
and fills the gaps with confident language.
249
00:09:02,160 --> 00:09:04,440
That's not hallucination in the science fiction sense.
250
00:09:04,440 --> 00:09:07,720
It's a precise and predictable consequence of retrieval failure.
251
00:09:07,720 --> 00:09:09,960
The model can only be as accurate as what it retrieves
252
00:09:09,960 --> 00:09:13,080
and retrieval quality is a direct function of metadata quality.
253
00:09:13,080 --> 00:09:15,200
There's a broader failure pattern worth naming here,
254
00:09:15,200 --> 00:09:17,520
one that shows up specifically in organizations
255
00:09:17,520 --> 00:09:20,920
that invested heavily in the technology before addressing the data.
256
00:09:20,920 --> 00:09:23,040
Call it the million-dollar graph nobody uses.
257
00:09:23,040 --> 00:09:25,800
The IT team builds something architecturally impressive.
258
00:09:25,800 --> 00:09:27,840
They connect data sources, configure connectors,
259
00:09:27,840 --> 00:09:29,560
stand up the graph infrastructure,
260
00:09:29,560 --> 00:09:31,680
the demo at the executive briefing lands well.
261
00:09:31,680 --> 00:09:34,800
The usage data comes in six months later and shows that daily active users
262
00:09:34,800 --> 00:09:36,400
are still running manual searches,
263
00:09:36,400 --> 00:09:39,120
still emailing the SharePoint admin to find documents,
264
00:09:39,120 --> 00:09:42,240
still asking each other on teams instead of asking the system,
265
00:09:42,240 --> 00:09:45,280
what happened? They optimized for completeness rather than utility.
266
00:09:45,280 --> 00:09:48,480
They built a graph that covered everything before it was useful for anything.
267
00:09:48,480 --> 00:09:50,880
The ontology modeled every conceivable concept,
268
00:09:50,880 --> 00:09:53,280
the ingestion pipeline pulled from every available source,
269
00:09:53,280 --> 00:09:56,560
and nobody asked the actual users what question they needed answered today.
270
00:09:56,560 --> 00:09:58,000
The graph became shelf-ware,
271
00:09:58,000 --> 00:09:59,600
not because the technology failed,
272
00:09:59,600 --> 00:10:02,320
because the implementation never started from a user problem
273
00:10:02,320 --> 00:10:06,480
and never validated that the output was trustworthy enough to change behavior.
274
00:10:06,480 --> 00:10:09,440
Fixing this requires understanding what a knowledge graph actually is
275
00:10:09,440 --> 00:10:11,440
in an M365 context,
276
00:10:11,440 --> 00:10:12,800
not as a product feature,
277
00:10:12,800 --> 00:10:16,560
but as an architectural pattern with very specific requirements.
278
00:10:16,560 --> 00:10:19,520
Defining the knowledge graph in an M365 context,
279
00:10:19,520 --> 00:10:22,320
so what is a knowledge graph precisely in this context,
280
00:10:22,320 --> 00:10:25,200
strip away the vendor marketing and the academic definitions,
281
00:10:25,200 --> 00:10:26,800
and it comes down to this.
282
00:10:26,800 --> 00:10:28,560
A knowledge graph is a structured network
283
00:10:28,560 --> 00:10:31,840
where nodes represent entities, people, documents, projects,
284
00:10:31,840 --> 00:10:33,840
policies, systems, customers,
285
00:10:33,840 --> 00:10:36,960
and edges represent the relationships between them.
286
00:10:36,960 --> 00:10:39,360
The edge is the thing most data architectures miss,
287
00:10:39,360 --> 00:10:41,280
traditional relational databases,
288
00:10:41,280 --> 00:10:43,280
store records in rows and columns.
289
00:10:43,280 --> 00:10:44,880
They're good at answering what do we have?
290
00:10:44,880 --> 00:10:47,360
They're poor at answering how does it connect?
291
00:10:47,360 --> 00:10:51,280
A knowledge graph is built specifically to answer the second question.
292
00:10:51,280 --> 00:10:53,200
In an M365 environment,
293
00:10:53,200 --> 00:10:54,640
those entities already exist.
294
00:10:54,640 --> 00:10:56,000
They just aren't connected.
295
00:10:56,000 --> 00:10:57,760
A SharePoint document is a node,
296
00:10:57,760 --> 00:10:59,200
the person who authored it is a node.
297
00:10:59,200 --> 00:11:00,880
The project it belongs to is a node.
298
00:11:00,880 --> 00:11:03,200
The policy that governs its retention is a node.
299
00:11:03,200 --> 00:11:06,320
The sensitivity label that controls its distribution is a node.
300
00:11:06,320 --> 00:11:08,560
Right now most of those nodes sit in isolation.
301
00:11:08,560 --> 00:11:10,080
The document exists in a library.
302
00:11:10,080 --> 00:11:11,760
The author exists in EnterID,
303
00:11:11,760 --> 00:11:13,920
the project exists in Planner or a Teams channel,
304
00:11:13,920 --> 00:11:15,360
the policy exists in Perview.
305
00:11:15,360 --> 00:11:18,240
Nothing explicitly models the relationship between them.
306
00:11:18,240 --> 00:11:20,240
A knowledge graph draws those edges.
307
00:11:20,240 --> 00:11:22,800
Authored by connects the document to the person.
308
00:11:22,800 --> 00:11:25,360
Belongs to connects it to the project.
309
00:11:25,360 --> 00:11:27,760
Governed by connects it to the retention policy.
310
00:11:27,760 --> 00:11:30,560
Classified as connects it to the sensitivity label.
311
00:11:30,560 --> 00:11:33,280
Once those relationships exist in a queryable structure,
312
00:11:33,280 --> 00:11:35,120
an AI system can traverse them.
313
00:11:35,120 --> 00:11:36,720
It doesn't just retrieve the document.
314
00:11:36,720 --> 00:11:38,720
It retrieves the document in context.
315
00:11:38,720 --> 00:11:40,560
Who owns it, what it's part of,
316
00:11:40,560 --> 00:11:41,920
what rules apply to it,
317
00:11:41,920 --> 00:11:43,280
and what else is connected to it.
318
00:11:43,280 --> 00:11:47,600
That traversal capability is what makes the difference between a search result and an answer.
319
00:11:47,600 --> 00:11:48,640
When someone asks,
320
00:11:48,640 --> 00:11:50,720
what's the current status of project Phoenix?
321
00:11:50,720 --> 00:11:54,880
A search returns a list of documents that contain the words "project Phoenix".
322
00:11:54,880 --> 00:11:57,520
A knowledge graph traversal returns the project entity,
323
00:11:57,520 --> 00:11:59,280
follows the edges to current tasks,
324
00:11:59,280 --> 00:12:00,880
open decisions, relevant documents,
325
00:12:00,880 --> 00:12:02,800
and the people accountable for each.
326
00:12:02,800 --> 00:12:05,280
Then assembles that context into a coherent response.
327
00:12:05,280 --> 00:12:08,080
The user never needed to know where any of those artifacts lived,
328
00:12:08,080 --> 00:12:08,960
the graph knew.
329
00:12:08,960 --> 00:12:11,120
In an M365 context specifically,
330
00:12:11,120 --> 00:12:13,440
building this means connecting SharePoint content,
331
00:12:13,440 --> 00:12:14,720
Teams conversations,
332
00:12:14,720 --> 00:12:15,920
EnterID identities,
333
00:12:15,920 --> 00:12:17,280
Perview classifications,
334
00:12:17,280 --> 00:12:20,560
and line of business data into a single queryable model.
335
00:12:20,560 --> 00:12:22,880
That's a broader scope than most organizations attempt.
336
00:12:22,880 --> 00:12:24,800
Most start with SharePoint and stop there.
337
00:12:24,800 --> 00:12:27,680
The power compounds when you include the people graph from Intra,
338
00:12:27,680 --> 00:12:29,360
the activity signals from Teams,
339
00:12:29,360 --> 00:12:31,440
and the governance metadata from Perview,
340
00:12:31,440 --> 00:12:33,520
because that's when the context becomes rich enough
341
00:12:33,520 --> 00:12:35,520
to actually disambiguate.
342
00:12:35,520 --> 00:12:37,360
One clarification that matters architecturally,
343
00:12:37,360 --> 00:12:39,360
the knowledge graph doesn't replace Microsoft Graph.
344
00:12:39,360 --> 00:12:40,240
It sits on top of it.
345
00:12:40,240 --> 00:12:42,720
Microsoft Graph is the data and relationship plane,
346
00:12:42,720 --> 00:12:45,680
the API surface that exposes the raw connections.
347
00:12:45,680 --> 00:12:48,880
The knowledge graph adds business semantics and ontological structure
348
00:12:48,880 --> 00:12:50,960
that the raw API layer doesn't provide.
349
00:12:50,960 --> 00:12:54,000
Graph knows that a document exists in a SharePoint site.
350
00:12:54,000 --> 00:12:57,040
The knowledge graph knows that document is the authoritative source of record
351
00:12:57,040 --> 00:12:58,560
for pricing in the MIA region,
352
00:12:58,560 --> 00:13:00,400
supersedes three previous versions,
353
00:13:00,400 --> 00:13:03,280
and was last reviewed by the VP of Sales in Q1.
354
00:13:03,280 --> 00:13:06,480
That layer of meaning is what Copilot needs to reason reliably,
355
00:13:06,480 --> 00:13:08,160
rather than retrieve randomly,
356
00:13:08,160 --> 00:13:10,160
and that layer of meaning has to be deliberately built.
357
00:13:10,160 --> 00:13:11,760
It doesn't emerge from the data on its own.
358
00:13:11,760 --> 00:13:14,720
It starts with a decision about what metadata will anchor it.
359
00:13:14,720 --> 00:13:16,800
Metadata is the foundation of intelligence.
360
00:13:16,800 --> 00:13:20,160
Let's be direct about something the industry consistently undersells.
361
00:13:20,160 --> 00:13:22,000
Metadata is not an administrative task.
362
00:13:22,000 --> 00:13:26,160
It's not the thing your SharePoint admin configures after the real work is done.
363
00:13:26,160 --> 00:13:29,440
In a knowledge graph architecture, metadata is an AI safety control.
364
00:13:29,440 --> 00:13:31,200
The decisions you make about site structures,
365
00:13:31,200 --> 00:13:33,520
content types, and managed metadata columns
366
00:13:33,520 --> 00:13:35,280
don't just affect discoverability.
367
00:13:35,280 --> 00:13:38,800
They directly determine whether Copilot's answers are trustworthy or dangerous.
368
00:13:38,800 --> 00:13:40,240
That's not an exaggeration.
369
00:13:40,240 --> 00:13:42,720
Microsoft's own guidance on hallucination mitigation
370
00:13:42,720 --> 00:13:46,480
in enterprise LLM deployments identifies metadata filtering as a primary control,
371
00:13:46,480 --> 00:13:47,680
not a secondary one.
372
00:13:47,680 --> 00:13:49,600
Not a nice to have, once the model is running.
373
00:13:49,600 --> 00:13:53,520
A foundational requirement that shapes what retrieval can and can't do
374
00:13:53,520 --> 00:13:55,760
before the model ever generates a word.
375
00:13:55,760 --> 00:13:59,040
So what does a metadata schema actually need to do for an AI system?
376
00:13:59,040 --> 00:14:03,360
There are four dimensions that consistently have the highest impact on retrieval quality,
377
00:14:03,360 --> 00:14:06,320
and they map to four different failure modes when they're missing.
378
00:14:06,320 --> 00:14:08,160
The first is time and recency.
379
00:14:08,160 --> 00:14:12,160
The retrieval layer needs to distinguish between a policy that's currently in effect
380
00:14:12,160 --> 00:14:14,240
and a policy that was valid three years ago.
381
00:14:14,240 --> 00:14:16,720
Without a reliable effective date, last-reviewed date,
382
00:14:16,720 --> 00:14:19,760
or version status field consistently populated across your content,
383
00:14:19,760 --> 00:14:23,600
the semantic index treats a deprecated draft as equally authoritative
384
00:14:23,600 --> 00:14:25,520
as the current approved document.
385
00:14:25,520 --> 00:14:27,760
It has no basis to prefer one over the other,
386
00:14:27,760 --> 00:14:29,520
so it retrieves both, blends them,
387
00:14:29,520 --> 00:14:32,880
and produces an answer that's partially outdated and entirely untrustworthy.
388
00:14:32,880 --> 00:14:35,280
The second dimension is authority and reliability.
389
00:14:35,280 --> 00:14:37,120
Not all content is created equal,
390
00:14:37,120 --> 00:14:39,120
and the retrieval layer needs to know that.
391
00:14:39,120 --> 00:14:42,480
A document published by Legal and tagged as the system of record policy
392
00:14:42,480 --> 00:14:45,520
is not the same as a working draft someone saved in a project folder.
393
00:14:45,520 --> 00:14:49,920
A proposal that went through approval workflow is not the same as a brainstorm from a team's chat.
394
00:14:49,920 --> 00:14:53,360
Without source system tags, owner roles, document type classifications,
395
00:14:53,360 --> 00:14:56,720
and approval status fields retrieval treats them identically.
396
00:14:56,720 --> 00:15:00,480
The model synthesizes across authoritative and non-authoritative sources
397
00:15:00,480 --> 00:15:03,120
without any signal that they shouldn't carry equal weight.
398
00:15:03,120 --> 00:15:04,960
The third is topic and taxonomy.
399
00:15:04,960 --> 00:15:07,920
When content isn't tagged with standardized business concepts,
400
00:15:07,920 --> 00:15:10,480
product lines, regions, departments, process areas,
401
00:15:10,480 --> 00:15:12,320
retrieval can't scope correctly.
402
00:15:12,320 --> 00:15:15,680
Someone asking a question about imia pricing shouldn't get results from APAC.
403
00:15:15,680 --> 00:15:20,320
Someone asking about HR onboarding policy shouldn't get results from the IT onboarding checklist.
404
00:15:20,320 --> 00:15:22,640
Without a govern taxonomy applied consistently,
405
00:15:22,640 --> 00:15:24,640
the model can't make those distinctions.
406
00:15:24,640 --> 00:15:28,160
It retrieves on semantic similarity alone and occasionally mixes contexts
407
00:15:28,160 --> 00:15:29,440
that should never be combined.
408
00:15:29,440 --> 00:15:32,400
The fourth dimension is confidentiality and audience.
409
00:15:32,400 --> 00:15:36,080
This one connects directly to the security concerns from the previous section.
410
00:15:36,080 --> 00:15:40,480
Sensitivity classifications and intended audience fields are not just governance metadata.
411
00:15:40,480 --> 00:15:44,160
They're retrieval constraints. When these fields are missing or inconsistently applied,
412
00:15:44,160 --> 00:15:48,720
the system has no policy level signal about who should and shouldn't see a given piece of content
413
00:15:48,720 --> 00:15:50,240
beyond raw permissions.
414
00:15:50,240 --> 00:15:54,240
And as we established, raw permissions in most tenants are far too permissive.
415
00:15:54,240 --> 00:15:56,720
Now here's what a practical schema review actually covers.
416
00:15:56,720 --> 00:16:00,960
You're looking at five field families, provenance fields, source, author, owner,
417
00:16:00,960 --> 00:16:05,760
system of record approval status, freshness fields, created date, last reviewed,
418
00:16:05,760 --> 00:16:07,920
effective date, expiration date.
419
00:16:07,920 --> 00:16:12,080
Authority fields, document type, version, policy tier, review status,
420
00:16:12,080 --> 00:16:16,960
access and sensitivity fields, classification, region, business unit, PII flag,
421
00:16:16,960 --> 00:16:17,920
and retrieval fields.
422
00:16:17,920 --> 00:16:20,320
Topic keywords, language, product, jurisdiction.
423
00:16:20,320 --> 00:16:23,360
Each of those field families is a control point in the rag pipeline,
424
00:16:23,360 --> 00:16:26,560
prefiltered by document type and approval status before you run semantic search
425
00:16:26,560 --> 00:16:29,680
and you've already eliminated a huge class of retrieval errors.
426
00:16:29,680 --> 00:16:33,040
Rewanked by recency and source authority and the model sees better evidence,
427
00:16:33,040 --> 00:16:36,400
root queries to different corpora based on topic and jurisdiction tags
428
00:16:36,400 --> 00:16:39,040
and cross domain contamination drops significantly.
429
00:16:39,040 --> 00:16:42,880
The organizations that get this right don't treat schema designers a metadata project.
430
00:16:42,880 --> 00:16:44,640
They treat it as AI architecture.
431
00:16:44,640 --> 00:16:48,320
Because when those fields are consistently populated, retrieval becomes precise.
432
00:16:48,320 --> 00:16:51,360
When they're missing, the model fills the gaps with plausible language
433
00:16:51,360 --> 00:16:53,280
and plausible is not the same as accurate.
434
00:16:53,280 --> 00:16:54,880
Building the schema is half the work.
435
00:16:54,880 --> 00:16:58,080
The other half is making sure it doesn't drift the moment it's deployed.
436
00:16:58,080 --> 00:17:03,600
And that requires a governance model built for how organizations actually work in 2026, not 2019.
437
00:17:03,600 --> 00:17:07,120
The governance model that actually scales, governance has a reputation problem.
438
00:17:07,120 --> 00:17:10,560
For most of the last decade, it meant retention schedules nobody enforced.
439
00:17:10,560 --> 00:17:13,600
Classification policies applied inconsistently and access reviews
440
00:17:13,600 --> 00:17:16,320
that happened once a year when legal-reminded IT they were overdue.
441
00:17:16,320 --> 00:17:18,800
It was compliance theatre.
442
00:17:18,800 --> 00:17:20,960
Documentation that satisfied an audit requirement
443
00:17:20,960 --> 00:17:22,800
without changing how anyone actually worked,
444
00:17:22,800 --> 00:17:25,520
that model doesn't just fail in an AI-enabled environment,
445
00:17:25,520 --> 00:17:26,800
it actively makes things worse.
446
00:17:26,800 --> 00:17:29,360
Because every governance gap that existed before co-pilot
447
00:17:29,360 --> 00:17:31,200
was a manageable inefficiency.
448
00:17:31,200 --> 00:17:34,240
After co-pilot, it's a liability with a natural language interface.
449
00:17:34,240 --> 00:17:38,080
So what does governance actually look like when it's designed for AI, not compliance?
450
00:17:38,080 --> 00:17:42,080
Microsoft's 2026 guidance frames it as three interdependent disciplines
451
00:17:42,080 --> 00:17:44,880
and the order matters more than most organizations realize.
452
00:17:44,880 --> 00:17:46,160
The first is readiness.
453
00:17:46,160 --> 00:17:50,800
This means fixing permissions, life cycle, and content quality before AI touches any of it.
454
00:17:50,800 --> 00:17:53,360
Not in parallel, not afterward, before.
455
00:17:53,360 --> 00:17:56,320
The instinct in most deployments is to enable co-pilot first
456
00:17:56,320 --> 00:17:58,560
and then tighten governance as problem surface.
457
00:17:58,560 --> 00:17:59,760
That instinct is backwards.
458
00:17:59,760 --> 00:18:03,920
Once the semantic index is built on top of a permissive unstructured content estate,
459
00:18:03,920 --> 00:18:06,320
your retro-fitting controls are under live system
460
00:18:06,320 --> 00:18:10,160
while users are already drawing conclusions from whatever the AI surfaces.
461
00:18:10,160 --> 00:18:11,760
The cost of correcting that,
462
00:18:11,760 --> 00:18:14,320
in trust, in rework, in potential exposure,
463
00:18:14,320 --> 00:18:16,880
is much higher than the cost of doing it right up front.
464
00:18:16,880 --> 00:18:19,600
Readiness isn't a single action, it's a checklist with teeth.
465
00:18:19,600 --> 00:18:20,800
Which sites are overshared?
466
00:18:20,800 --> 00:18:24,240
Which groups still carry permissions from projects that ended 18 months ago?
467
00:18:24,240 --> 00:18:27,120
Which content types have no life cycle policy and no owner?
468
00:18:27,120 --> 00:18:28,720
Until those questions have answers,
469
00:18:28,720 --> 00:18:31,600
the semantic index you build reflects the chaos underneath it.
470
00:18:31,600 --> 00:18:34,080
The second discipline is relevance.
471
00:18:34,080 --> 00:18:38,240
This one is counterintuitive for organizations that default to keep everything.
472
00:18:38,240 --> 00:18:41,200
Relevance means actively deciding what the AI should not see.
473
00:18:41,200 --> 00:18:42,800
Not just archiving dominant content,
474
00:18:42,800 --> 00:18:45,520
removing it from the semantic index entirely.
475
00:18:45,520 --> 00:18:48,560
A document that hasn't been accessed in three years that belongs to a project
476
00:18:48,560 --> 00:18:50,000
that's been formally closed,
477
00:18:50,000 --> 00:18:51,360
that has no current owner,
478
00:18:51,360 --> 00:18:53,200
that document is not a knowledge asset,
479
00:18:53,200 --> 00:18:54,000
it's noise.
480
00:18:54,000 --> 00:18:56,400
And when the retrieval layer treats noise with the same way
481
00:18:56,400 --> 00:18:59,760
to signal, answer quality degrades in ways that are hard to diagnose
482
00:18:59,760 --> 00:19:01,440
because they don't look like errors.
483
00:19:01,440 --> 00:19:02,960
They look like slightly off answers.
484
00:19:02,960 --> 00:19:04,320
Answers that are mostly right.
485
00:19:04,320 --> 00:19:07,920
Answers users stop trusting without being able to articulate exactly why.
486
00:19:07,920 --> 00:19:09,600
Relevance orders aren't glamorous work.
487
00:19:09,600 --> 00:19:11,600
But they're the difference between a semantic index
488
00:19:11,600 --> 00:19:13,920
that reasons over your organization's current knowledge
489
00:19:13,920 --> 00:19:17,120
and one that reasons over everything your organization has ever created
490
00:19:17,120 --> 00:19:19,840
regardless of whether it still reflects reality.
491
00:19:19,840 --> 00:19:21,840
The third discipline is resiliency.
492
00:19:21,840 --> 00:19:24,640
This is the one most governance framework skip entirely,
493
00:19:24,640 --> 00:19:27,280
and it's the one that determines whether your architecture survives
494
00:19:27,280 --> 00:19:29,520
contact with a real production environment.
495
00:19:29,520 --> 00:19:31,840
Resiliency means building monitoring and rollback
496
00:19:31,840 --> 00:19:34,000
into the governance model from the start.
497
00:19:34,000 --> 00:19:37,680
Not as an incident response capability you develop after something goes wrong,
498
00:19:37,680 --> 00:19:40,320
as a designed operational function with clear ownership,
499
00:19:40,320 --> 00:19:42,720
defined triggers, and tested procedures.
500
00:19:42,720 --> 00:19:46,000
What does a governance failure actually look like in a co-pilot deployment?
501
00:19:46,000 --> 00:19:50,560
A sensitivity label policy changes and nobody updates the corresponding retrieval scopes.
502
00:19:50,560 --> 00:19:54,000
A new connector brings external data into the semantic index
503
00:19:54,000 --> 00:19:58,080
without going through the same permissions validation as native M365 content.
504
00:19:58,080 --> 00:20:00,800
An agent built by a business unit gets tenant-wide graph access
505
00:20:00,800 --> 00:20:03,360
because nobody reviewed the permission scope before it went live,
506
00:20:03,360 --> 00:20:06,080
but these aren't dramatic failures, they're quiet drifts.
507
00:20:06,080 --> 00:20:07,600
And without continuous monitoring,
508
00:20:07,600 --> 00:20:10,720
they compound silently until someone asks the wrong question
509
00:20:10,720 --> 00:20:13,120
and gets the wrong answer at exactly the wrong moment.
510
00:20:13,120 --> 00:20:16,320
The organization is doing this well, treat governance as a function.
511
00:20:16,320 --> 00:20:20,160
Not a phase, it has staffing, tooling, metrics, and a regular cadence.
512
00:20:20,160 --> 00:20:22,560
It isn't declared complete after the initial cleanup.
513
00:20:22,560 --> 00:20:24,880
It runs alongside the deployment indefinitely
514
00:20:24,880 --> 00:20:28,080
because the content estate, the workforce, and the AI capabilities
515
00:20:28,080 --> 00:20:29,520
are all changing constantly.
516
00:20:29,520 --> 00:20:32,400
That operational discipline defines the boundaries of the architecture.
517
00:20:32,400 --> 00:20:35,600
What lives inside those boundaries and how securely it's protected
518
00:20:35,600 --> 00:20:37,280
is where we go next.
519
00:20:37,280 --> 00:20:38,800
Hardening the semantic layer.
520
00:20:38,800 --> 00:20:42,160
Here's a framing shift that changes how you approach the next phase of the architecture.
521
00:20:42,160 --> 00:20:44,880
The semantic index for co-pilot is not a search optimization.
522
00:20:44,880 --> 00:20:46,240
It's a security boundary.
523
00:20:46,240 --> 00:20:49,360
And right now most organizations are treating it like the former
524
00:20:49,360 --> 00:20:51,840
while leaving it completely unprotected as the latter.
525
00:20:51,840 --> 00:20:55,120
The distinction matters because of what the semantic index actually does.
526
00:20:55,120 --> 00:20:58,080
It builds meaning-aware representations of your tenant content
527
00:20:58,080 --> 00:20:59,920
and uses them to answer questions.
528
00:20:59,920 --> 00:21:02,640
That means it doesn't just know what documents exist.
529
00:21:02,640 --> 00:21:04,720
It knows what they mean, how they relate,
530
00:21:04,720 --> 00:21:07,360
and how to surface the most contextually relevant content
531
00:21:07,360 --> 00:21:09,520
in response to a natural language prompt.
532
00:21:09,520 --> 00:21:11,440
That capability is enormously valuable.
533
00:21:11,440 --> 00:21:15,200
It's also enormously dangerous if the content feeding it hasn't been hardened first.
534
00:21:15,200 --> 00:21:17,600
One practitioner methodology that's been applied across
535
00:21:17,600 --> 00:21:19,280
multiple enterprise deployments
536
00:21:19,280 --> 00:21:21,840
found that collaborative recertification of access,
537
00:21:21,840 --> 00:21:24,240
systematically reviewing who actually needs what,
538
00:21:24,240 --> 00:21:26,480
and removing access that can't be justified,
539
00:21:26,480 --> 00:21:30,400
reduced attack surface by 30 to 40% within weeks.
540
00:21:30,400 --> 00:21:31,920
Not months, weeks.
541
00:21:31,920 --> 00:21:34,560
That number reflects how much accumulated permission drift exists
542
00:21:34,560 --> 00:21:37,840
in a typical Microsoft 365 tenant that's been running for several years
543
00:21:37,840 --> 00:21:39,840
without regular access hygiene.
544
00:21:39,840 --> 00:21:41,600
So where does hardening actually begin?
545
00:21:41,600 --> 00:21:43,280
At the data layer, not the AI layer.
546
00:21:43,280 --> 00:21:45,360
Before you configure anything in the semantic index
547
00:21:45,360 --> 00:21:47,440
before you tune retrieval or setup connectors,
548
00:21:47,440 --> 00:21:50,640
you work through a specific sequence at the content and permissions level.
549
00:21:50,640 --> 00:21:52,480
Eliminate broad sharing first.
550
00:21:52,480 --> 00:21:55,280
Every anyone with the link configuration on a site
551
00:21:55,280 --> 00:21:57,840
containing anything sensitive needs to be reviewed
552
00:21:57,840 --> 00:21:59,840
and either justified or revoked.
553
00:21:59,840 --> 00:22:03,520
Every everyone in org permission on a library that holds HR,
554
00:22:03,520 --> 00:22:06,800
finance, legal, or customer data needs to come off.
555
00:22:06,800 --> 00:22:10,560
Every Microsoft 365 group that was created for a project three years ago
556
00:22:10,560 --> 00:22:13,440
that still carries added access to a sensitive workspace
557
00:22:13,440 --> 00:22:14,800
that has no active owner.
558
00:22:14,800 --> 00:22:17,040
Those need to be audited and expired.
559
00:22:17,040 --> 00:22:18,480
This isn't a one-time cleanup.
560
00:22:18,480 --> 00:22:20,080
It's the baseline you need to establish
561
00:22:20,080 --> 00:22:21,920
before the semantic layer is built.
562
00:22:21,920 --> 00:22:23,920
Because once the semantic index is running,
563
00:22:23,920 --> 00:22:26,400
it reflects whatever permissions exist in the tenant.
564
00:22:26,400 --> 00:22:28,560
You can't harden the index retroactively
565
00:22:28,560 --> 00:22:30,960
without fixing the underlying data state first.
566
00:22:30,960 --> 00:22:32,720
The second thing to understand is that
567
00:22:32,720 --> 00:22:36,000
container-level labels give you a false sense of security.
568
00:22:36,000 --> 00:22:38,080
A SharePoint site-labelled confidential
569
00:22:38,080 --> 00:22:40,800
does not automatically label the files inside it.
570
00:22:40,800 --> 00:22:43,600
An unlabeled document sitting inside a confidential container
571
00:22:43,600 --> 00:22:45,680
is treated by the retrieval layer as accessible
572
00:22:45,680 --> 00:22:47,520
if the user's permissions allow it.
573
00:22:47,520 --> 00:22:49,120
Item-level sensitivity labels
574
00:22:49,120 --> 00:22:52,080
enforced through Microsoft purview auto-labelling policies
575
00:22:52,080 --> 00:22:54,720
are what actually constrain what co-pilot can see,
576
00:22:54,720 --> 00:22:56,080
summarize, and surface.
577
00:22:56,080 --> 00:22:57,360
Container-label-set intent?
578
00:22:57,360 --> 00:22:58,800
Item-label's enforced.
579
00:22:58,800 --> 00:23:01,440
There's a practical tool for validating your exposure surface
580
00:23:01,440 --> 00:23:02,560
before and after hardening,
581
00:23:02,560 --> 00:23:04,720
and it's one most organizations already have access to
582
00:23:04,720 --> 00:23:05,920
but rarely use this way.
583
00:23:05,920 --> 00:23:07,840
Microsoft Search is your exposure oracle.
584
00:23:07,840 --> 00:23:10,720
Whatever a given user persona can find in Microsoft Search
585
00:23:10,720 --> 00:23:12,720
is in scope for semantic retrieval
586
00:23:12,720 --> 00:23:14,640
and therefore in scope for co-pilot.
587
00:23:14,640 --> 00:23:15,840
Test is deliberately.
588
00:23:15,840 --> 00:23:17,600
Build test accounts for different roles.
589
00:23:17,600 --> 00:23:20,000
A new employee, a contractor, a junior analyst,
590
00:23:20,000 --> 00:23:21,680
and run Microsoft Search queries
591
00:23:21,680 --> 00:23:24,240
against sensitive topic areas from each persona.
592
00:23:24,240 --> 00:23:26,960
What surfaces tells you exactly what the semantic index
593
00:23:26,960 --> 00:23:28,640
will return for those same users
594
00:23:28,640 --> 00:23:30,160
when they ask co-pilot a question.
595
00:23:30,160 --> 00:23:32,400
If you find something in that test that shouldn't be there,
596
00:23:32,400 --> 00:23:34,640
the fix belongs at the permissions and labeling layer,
597
00:23:34,640 --> 00:23:36,000
not at the prompt layer.
598
00:23:36,000 --> 00:23:37,840
Prompt constraints can reduce the probability
599
00:23:37,840 --> 00:23:40,160
that co-pilot volunteer sensitive content.
600
00:23:40,160 --> 00:23:42,160
They can't prevent retrieval from finding it
601
00:23:42,160 --> 00:23:45,520
and they can't prevent a determined user from asking for it directly.
602
00:23:45,520 --> 00:23:47,520
The hardening sequence follows a specific order
603
00:23:47,520 --> 00:23:49,920
because the dependencies run in one direction.
604
00:23:49,920 --> 00:23:51,440
Data and permissions first.
605
00:23:51,440 --> 00:23:52,880
Clean up the content estate
606
00:23:52,880 --> 00:23:56,160
and validate that sensitivity labels are applied at item level.
607
00:23:56,160 --> 00:23:58,320
Semantic index validation second.
608
00:23:58,320 --> 00:24:01,920
Use search testing to confirm the exposure surface matches your intent.
609
00:24:01,920 --> 00:24:03,440
Connector governance third.
610
00:24:03,440 --> 00:24:06,080
Every external data source brought in through graph connectors
611
00:24:06,080 --> 00:24:07,680
needs the same permissions validation
612
00:24:07,680 --> 00:24:09,680
as native M365 content.
613
00:24:09,680 --> 00:24:11,200
Prompt and output controls fourth.
614
00:24:11,200 --> 00:24:13,360
ISE pattern system prompts, abstention instructions
615
00:24:13,360 --> 00:24:14,960
and output DLP policies.
616
00:24:14,960 --> 00:24:17,520
Skipping that order or running those steps in parallel
617
00:24:17,520 --> 00:24:18,960
creates compounding risk.
618
00:24:18,960 --> 00:24:21,920
You end up with output controls governing a retrieval layer
619
00:24:21,920 --> 00:24:22,960
that hasn't been cleaned,
620
00:24:22,960 --> 00:24:24,720
which is the equivalent of putting a lock on a door
621
00:24:24,720 --> 00:24:26,160
with no walls around it.
622
00:24:26,160 --> 00:24:28,160
Permissions to find what co-pilot can see,
623
00:24:28,160 --> 00:24:31,120
what it sees determines whether its answers are worth trusting.
624
00:24:31,120 --> 00:24:33,360
That's the retrieval problem we tackle next.
625
00:24:33,360 --> 00:24:35,280
Hallucination as a retrieval problem.
626
00:24:35,280 --> 00:24:38,400
Enterprise hallucinations fall into two distinct categories
627
00:24:38,400 --> 00:24:40,480
and conflating them leads to the wrong fix.
628
00:24:40,480 --> 00:24:43,360
The first category is retrieval induced hallucination.
629
00:24:43,360 --> 00:24:45,600
This happens when the model receives context
630
00:24:45,600 --> 00:24:47,280
that's technically related to the query
631
00:24:47,280 --> 00:24:49,600
but wrong in the specific ways that matter.
632
00:24:49,600 --> 00:24:51,440
An outdated version of a policy,
633
00:24:51,440 --> 00:24:53,200
a proposal from a lost deal
634
00:24:53,200 --> 00:24:55,360
that shares a client name with an active one,
635
00:24:55,360 --> 00:24:57,840
a procedure document from a deprecated system
636
00:24:57,840 --> 00:24:59,280
that no longer exists.
637
00:24:59,280 --> 00:25:00,800
The model doesn't invent from nothing,
638
00:25:00,800 --> 00:25:02,960
it synthesizes from what retrieval hands it.
639
00:25:02,960 --> 00:25:04,720
When retrieval hands it the wrong context,
640
00:25:04,720 --> 00:25:07,520
synthesis produces confident, fluent wrong answers.
641
00:25:07,520 --> 00:25:08,960
The output sounds authoritative
642
00:25:08,960 --> 00:25:11,680
because the model is doing exactly what it's designed to do.
643
00:25:11,680 --> 00:25:13,360
It's just doing it with bad inputs.
644
00:25:13,360 --> 00:25:16,720
The second category is gap-filling hallucination.
645
00:25:16,720 --> 00:25:20,000
This is what happens when retrieval returns nothing useful at all.
646
00:25:20,000 --> 00:25:22,560
The query hits a topic, the organization has documented somewhere
647
00:25:22,560 --> 00:25:24,800
but the documentation is so poorly tagged,
648
00:25:24,800 --> 00:25:27,440
so deeply buried in an unindexed library
649
00:25:27,440 --> 00:25:30,400
or so thoroughly excluded by overly restrictive permissions
650
00:25:30,400 --> 00:25:32,640
that the semantic search comes back empty.
651
00:25:32,640 --> 00:25:34,880
The model doesn't say "I don't know".
652
00:25:34,880 --> 00:25:36,720
It falls back on training priors,
653
00:25:36,720 --> 00:25:38,400
general knowledge from pre-training
654
00:25:38,400 --> 00:25:42,160
that may bear only a passing resemblance to your organization's actual policies,
655
00:25:42,160 --> 00:25:43,680
processes or decisions.
656
00:25:43,680 --> 00:25:45,120
The answer sounds plausible.
657
00:25:45,120 --> 00:25:47,520
It has nothing to do with your specific situation.
658
00:25:47,520 --> 00:25:49,200
Both of these are retrieval problems,
659
00:25:49,200 --> 00:25:51,360
neither is a fundamental model limitation
660
00:25:51,360 --> 00:25:53,600
and both are addressable at the architecture layer.
661
00:25:53,600 --> 00:25:55,920
The practical fix is metadata first rag design.
662
00:25:55,920 --> 00:25:57,520
The principle here is sequence.
663
00:25:57,520 --> 00:25:59,280
Before you run semantic vector search,
664
00:25:59,280 --> 00:26:00,880
apply metadata filters.
665
00:26:00,880 --> 00:26:02,800
Scope retrieval to approved documents,
666
00:26:02,800 --> 00:26:04,800
current versions, relevant business units
667
00:26:04,800 --> 00:26:07,760
and appropriate jurisdictions before the embedding comparison happens.
668
00:26:07,760 --> 00:26:09,600
This reduces the candidate pool to content
669
00:26:09,600 --> 00:26:11,040
that's actually worth reasoning over,
670
00:26:11,040 --> 00:26:14,240
which means the semantic layer operates on a pre-qualified set
671
00:26:14,240 --> 00:26:16,400
rather than your entire content estate.
672
00:26:16,400 --> 00:26:18,400
After retrieval, use metadata for re-ranking.
673
00:26:18,400 --> 00:26:20,640
A document published last month by the policy owner
674
00:26:20,640 --> 00:26:23,840
with an approved status should rank higher than a working draft
675
00:26:23,840 --> 00:26:25,200
from 18 months ago,
676
00:26:25,200 --> 00:26:28,000
even if the semantic similarity score favors the draft.
677
00:26:28,000 --> 00:26:31,360
Recent C and authority signals built into the metadata schema
678
00:26:31,360 --> 00:26:33,520
feed directly into that re-ranking logic
679
00:26:33,520 --> 00:26:35,280
and use metadata for rooting.
680
00:26:35,280 --> 00:26:37,040
A query about HR compliance in Germany
681
00:26:37,040 --> 00:26:38,320
should be routed to a corpus scope
682
00:26:38,320 --> 00:26:40,800
to EU employment regulations and HR-owned content.
683
00:26:40,800 --> 00:26:42,480
A query about product pricing should root
684
00:26:42,480 --> 00:26:45,360
to sales-owned content within a specific product taxonomy.
685
00:26:45,360 --> 00:26:47,280
When queries are rooted to the right corpour
686
00:26:47,280 --> 00:26:48,480
or before retrieval runs,
687
00:26:48,480 --> 00:26:50,640
cross-domain contamination drops sharply
688
00:26:50,640 --> 00:26:52,480
and the model sees a context window
689
00:26:52,480 --> 00:26:53,600
that's actually coherent.
690
00:26:53,600 --> 00:26:56,320
There's a framing that captures the dependency precisely.
691
00:26:56,320 --> 00:26:58,400
Rag without good metadata retrieves poorly.
692
00:26:58,400 --> 00:27:01,200
Metadata without rag relies on generic model knowledge.
693
00:27:01,200 --> 00:27:02,640
These aren't competing approaches
694
00:27:02,640 --> 00:27:04,400
or alternatives you choose between.
695
00:27:04,400 --> 00:27:07,680
They're dependent layers that only work when both are present.
696
00:27:07,680 --> 00:27:09,920
Organizations that implement one without the other
697
00:27:09,920 --> 00:27:12,000
consistently underperform organizations
698
00:27:12,000 --> 00:27:14,400
that treat them as inseparable parts of the same system.
699
00:27:14,400 --> 00:27:18,080
One emerging control worth tracking is provenance tracing.
700
00:27:18,080 --> 00:27:21,040
Microsoft research's very trail work is developing approaches
701
00:27:21,040 --> 00:27:24,480
to make every step of a multi-stage AI workflow traceable,
702
00:27:24,480 --> 00:27:25,680
which documents were retrieved,
703
00:27:25,680 --> 00:27:26,880
which chunks were used,
704
00:27:26,880 --> 00:27:28,640
where in the context window they appeared
705
00:27:28,640 --> 00:27:30,560
when the answer was generated.
706
00:27:30,560 --> 00:27:32,080
This isn't a production ready toggle
707
00:27:32,080 --> 00:27:33,840
you flip in the admin center today,
708
00:27:33,840 --> 00:27:36,640
but it signals where enterprise AI governance is heading.
709
00:27:36,640 --> 00:27:38,800
Tracability is becoming a first-class control,
710
00:27:38,800 --> 00:27:40,640
not a debugging convenience.
711
00:27:40,640 --> 00:27:43,680
Organizations that invest in the metadata architecture now
712
00:27:43,680 --> 00:27:46,080
that tag provenance fields and authority signals
713
00:27:46,080 --> 00:27:49,120
into their content estate will have a much shorter path
714
00:27:49,120 --> 00:27:52,080
to compliance when traceability requirements become mandatory
715
00:27:52,080 --> 00:27:53,920
rather than aspirational.
716
00:27:53,920 --> 00:27:55,600
The architecture pattern is clear.
717
00:27:55,600 --> 00:27:57,920
Clean metadata enables precise retrieval.
718
00:27:57,920 --> 00:28:01,120
Precise retrieval gives the model authoritative context.
719
00:28:01,120 --> 00:28:03,840
Authoritative context produces answers worth trusting.
720
00:28:03,840 --> 00:28:06,800
Every element of that chain depends on the one before it,
721
00:28:06,800 --> 00:28:09,280
which means the leverage point is always upstream.
722
00:28:09,280 --> 00:28:11,200
In the content, the schema,
723
00:28:11,200 --> 00:28:13,280
and the ingestion pipeline that keeps them current.
724
00:28:13,280 --> 00:28:16,240
Building scalable ingestion pipelines.
725
00:28:16,240 --> 00:28:19,280
Clean retrieval depends on clean content reaching the index
726
00:28:19,280 --> 00:28:21,280
and clean content reaching the index
727
00:28:21,280 --> 00:28:22,880
depends on an ingestion pipeline
728
00:28:22,880 --> 00:28:25,440
that doesn't collapse under the weight of real enterprise volumes,
729
00:28:25,440 --> 00:28:27,200
real world content velocity,
730
00:28:27,200 --> 00:28:30,560
and the geographic distribution of a global hybrid workforce.
731
00:28:30,560 --> 00:28:32,320
This is where architectures that look sound
732
00:28:32,320 --> 00:28:34,560
on a whiteboard start breaking down in production.
733
00:28:34,560 --> 00:28:38,160
The framework that holds up at scale is the bronze silver gold pattern,
734
00:28:38,160 --> 00:28:39,680
borrowed from data engineering,
735
00:28:39,680 --> 00:28:42,080
and applied directly to knowledge graph ingestion.
736
00:28:42,080 --> 00:28:42,880
Bronze is raw.
737
00:28:42,880 --> 00:28:44,640
You're pulling content from SharePoint,
738
00:28:44,640 --> 00:28:46,320
OneDrive, Teams, Exchange,
739
00:28:46,320 --> 00:28:49,360
and Connected line of business systems in its original state.
740
00:28:49,360 --> 00:28:53,120
Unvalidated, potentially duplicated, inconsistently tagged.
741
00:28:53,120 --> 00:28:55,600
Nothing from bronze feeds the knowledge graph directly.
742
00:28:55,600 --> 00:28:58,560
Ever, bronze is the intake layer, not the intelligence layer.
743
00:28:58,560 --> 00:29:00,960
Silver is where cleansing and confirmation happen.
744
00:29:00,960 --> 00:29:04,000
Duplicate detection runs here, structural normalization runs here.
745
00:29:04,000 --> 00:29:05,520
You're not yet adding business semantics,
746
00:29:05,520 --> 00:29:06,800
you're removing noise.
747
00:29:06,800 --> 00:29:08,720
Documents without owners get flagged.
748
00:29:08,720 --> 00:29:11,360
Content with missing provenance fields gets quarantined for review
749
00:29:11,360 --> 00:29:12,720
rather than past forward.
750
00:29:12,720 --> 00:29:14,480
Stale content past its effective date
751
00:29:14,480 --> 00:29:17,120
gets tagged for exclusion from the semantic index.
752
00:29:17,120 --> 00:29:19,040
The silver layer is your quality gate,
753
00:29:19,040 --> 00:29:22,800
and the discipline you apply here determines whether the gold layer is actually useful.
754
00:29:22,800 --> 00:29:25,200
Gold is business ready and semantically enriched.
755
00:29:25,200 --> 00:29:27,280
This is the content that feeds the knowledge graph.
756
00:29:27,280 --> 00:29:30,880
It has consistent metadata across all five field families we covered.
757
00:29:30,880 --> 00:29:32,400
It has past quality validation.
758
00:29:32,400 --> 00:29:35,040
It carries entity tags that map it to the ontology.
759
00:29:35,040 --> 00:29:37,280
It's the content the semantic index reasons over,
760
00:29:37,280 --> 00:29:40,640
and it's the content co-pilot draws from when it generates a response.
761
00:29:40,640 --> 00:29:42,480
Only gold.
762
00:29:42,480 --> 00:29:46,080
Now, the operational challenge for hybrid teams distributed across time zones
763
00:29:46,080 --> 00:29:48,880
is keeping that pipeline current without running full rescanse
764
00:29:48,880 --> 00:29:50,240
every time something changes.
765
00:29:50,240 --> 00:29:53,760
Full rescanse are architecturally incompatible with scale.
766
00:29:53,760 --> 00:29:56,800
They're expensive. They introduce latency between content creation
767
00:29:56,800 --> 00:29:58,240
and index availability,
768
00:29:58,240 --> 00:30:03,120
and they create synchronization problems when content is changing in multiple regions simultaneously.
769
00:30:03,120 --> 00:30:05,680
The answer is Delta queries and Wepphook notifications.
770
00:30:05,680 --> 00:30:08,800
Delta queries let your pipeline ask Microsoft Graph
771
00:30:08,800 --> 00:30:11,040
what changed since the last time I checked.
772
00:30:11,040 --> 00:30:13,280
Rather than give me everything again.
773
00:30:13,280 --> 00:30:16,960
Wepphook notifications push change events to your pipeline the moment they occur
774
00:30:16,960 --> 00:30:18,800
rather than waiting for a scheduled poll.
775
00:30:18,800 --> 00:30:22,560
Together, they let you maintain a continuously current semantic cache
776
00:30:22,560 --> 00:30:25,120
without the overhead of periodic full scans.
777
00:30:25,120 --> 00:30:27,920
If you're building ingestion pipelines for a global deployment
778
00:30:27,920 --> 00:30:29,440
and you're still polling on a schedule,
779
00:30:29,440 --> 00:30:32,960
you're building for a world that doesn't match your actual content velocity.
780
00:30:32,960 --> 00:30:34,880
Chunking strategy is the part of ingestion
781
00:30:34,880 --> 00:30:37,120
that most architectures underinvest in,
782
00:30:37,120 --> 00:30:40,320
and it directly affects retrieval quality in ways that don't surface
783
00:30:40,320 --> 00:30:42,640
until you're debugging poor answers in production.
784
00:30:42,640 --> 00:30:45,040
Fixed-sized chunking,
785
00:30:45,040 --> 00:30:48,560
splitting documents into blocks of a defined character or token count
786
00:30:48,560 --> 00:30:51,920
is fast to implement and genuinely problematic in practice.
787
00:30:51,920 --> 00:30:56,080
A fixed-sized chunk that splits mid-paragraph, mid-procedure, or mid-argument
788
00:30:56,080 --> 00:30:58,560
strips the context that makes a passage interpretable.
789
00:30:58,560 --> 00:31:01,840
The model retrieves the chunk but can't reason over it coherently
790
00:31:01,840 --> 00:31:04,800
because the chunk doesn't represent a coherent unit of meaning.
791
00:31:04,800 --> 00:31:06,720
Chunk by semantic units instead.
792
00:31:06,720 --> 00:31:07,520
Sections.
793
00:31:07,520 --> 00:31:08,480
Procedures.
794
00:31:08,480 --> 00:31:10,480
API endpoint definitions.
795
00:31:10,480 --> 00:31:11,840
Policy clauses.
796
00:31:11,840 --> 00:31:14,000
Each chunk should represent a complete thought,
797
00:31:14,000 --> 00:31:16,640
and each chunk needs to carry local metadata.
798
00:31:16,640 --> 00:31:19,120
Section heading document type version effective date
799
00:31:19,120 --> 00:31:23,040
so that when retrieval surfaces it, the re-ranking layer has the signals it needs
800
00:31:23,040 --> 00:31:27,360
to evaluate its authority and recency without having to go back to the parent document.
801
00:31:27,360 --> 00:31:31,200
The transition from manual tagging to automated intelligent categorization
802
00:31:31,200 --> 00:31:34,400
is where Microsoft's syntax-taxonomy tagging becomes relevant.
803
00:31:34,400 --> 00:31:35,920
The capability is genuine.
804
00:31:35,920 --> 00:31:39,040
Syntax can automatically assign terms from your managed metadata
805
00:31:39,040 --> 00:31:41,200
term-store to documents as they're ingested,
806
00:31:41,200 --> 00:31:43,760
without requiring a custom model or manual intervention.
807
00:31:43,760 --> 00:31:46,000
But the qualification matters.
808
00:31:46,000 --> 00:31:48,080
It works when the term store is governed.
809
00:31:48,080 --> 00:31:51,200
When the tag-sonomy is clean, consistent, and actively maintained,
810
00:31:51,200 --> 00:31:54,400
Syntax amplifies that quality across your entire library.
811
00:31:54,400 --> 00:31:58,240
When the tag-sonomy is inconsistent or overgrown with redundant terms,
812
00:31:58,240 --> 00:32:01,040
Syntax amplifies the inconsistency at the same scale.
813
00:32:01,040 --> 00:32:03,280
The automation doesn't fix a broken tag-sonomy.
814
00:32:03,280 --> 00:32:04,400
It applies it faster.
815
00:32:04,400 --> 00:32:06,960
Get the Graph API Mechanics right too.
816
00:32:06,960 --> 00:32:10,000
Use select to retrieve only the properties your pipeline needs.
817
00:32:10,000 --> 00:32:11,840
Not every field on every object.
818
00:32:11,840 --> 00:32:13,600
Handle pagination with ODATA.
819
00:32:13,600 --> 00:32:17,360
Next link, or you'll silently miss content on large results sets.
820
00:32:17,360 --> 00:32:21,040
Use JSON batching to reduce network round trips when you're processing
821
00:32:21,040 --> 00:32:22,960
multiple sites or libraries in parallel.
822
00:32:22,960 --> 00:32:24,720
These aren't optimizations you add later.
823
00:32:24,720 --> 00:32:27,360
They're the difference between a pipeline that holds at enterprise scale
824
00:32:27,360 --> 00:32:28,800
and one that throttles and fails.
825
00:32:28,800 --> 00:32:30,480
Ingestion handles the flow.
826
00:32:30,480 --> 00:32:33,360
What the data flows into has to be designed with equal care.
827
00:32:33,360 --> 00:32:36,560
And that starts with a deliberate decision about the ontology.
828
00:32:36,560 --> 00:32:38,240
Designing the enterprise ontology.
829
00:32:38,240 --> 00:32:40,480
People use ontology and tag-sonomy interchangeably
830
00:32:40,480 --> 00:32:42,240
and it causes real architectural damage.
831
00:32:42,240 --> 00:32:45,120
A tag-sonomy classifies it puts things in buckets.
832
00:32:45,120 --> 00:32:46,640
This document is a policy.
833
00:32:46,640 --> 00:32:48,800
This project is in the Imiya region.
834
00:32:48,800 --> 00:32:50,480
This person is in finance.
835
00:32:50,480 --> 00:32:52,320
Taxonomies are useful.
836
00:32:52,320 --> 00:32:55,520
We spent the last two sections explaining exactly why they matter
837
00:32:55,520 --> 00:32:56,640
but they only sought.
838
00:32:56,640 --> 00:32:58,320
An ontology models relationships.
839
00:32:58,320 --> 00:33:00,320
It answers not just what something is,
840
00:33:00,320 --> 00:33:02,160
but how it connects to everything else.
841
00:33:02,160 --> 00:33:04,880
That distinction determines whether your knowledge graph can reason
842
00:33:04,880 --> 00:33:06,080
or just organize.
843
00:33:06,080 --> 00:33:09,280
Building an enterprise ontology for an M365 environment
844
00:33:09,280 --> 00:33:11,440
starts with identifying the core entities.
845
00:33:11,440 --> 00:33:13,760
The nodes that everything else connects to.
846
00:33:13,760 --> 00:33:16,560
For most organizations, that set is smaller than they expect.
847
00:33:16,560 --> 00:33:20,560
Customers, products, projects, policies, processes,
848
00:33:20,560 --> 00:33:22,320
assets, people.
849
00:33:22,320 --> 00:33:25,360
Seven entity types cover the vast majority of what knowledge
850
00:33:25,360 --> 00:33:27,120
workers actually query about.
851
00:33:27,120 --> 00:33:29,920
Everything else, documents, tasks, decisions, meetings,
852
00:33:29,920 --> 00:33:32,320
emails is either a subtype or an artifact
853
00:33:32,320 --> 00:33:34,800
that connects to those core entities rather than standing
854
00:33:34,800 --> 00:33:36,560
independently in the graph.
855
00:33:36,560 --> 00:33:39,120
The edges are where the architecture earns its value.
856
00:33:39,120 --> 00:33:40,440
All that buy is obvious.
857
00:33:40,440 --> 00:33:42,240
Governed by is where it gets interesting,
858
00:33:42,240 --> 00:33:44,960
connecting a document to the policy that controls its life cycle
859
00:33:44,960 --> 00:33:46,880
or a project to the compliance obligation
860
00:33:46,880 --> 00:33:49,040
that shapes its deliverables.
861
00:33:49,040 --> 00:33:51,280
Depends on lets you traverse system dependencies
862
00:33:51,280 --> 00:33:52,800
during incident response.
863
00:33:52,800 --> 00:33:55,600
Superseeds lets retrieval understand version lineage
864
00:33:55,600 --> 00:33:58,000
without having to compare timestamps manually.
865
00:33:58,000 --> 00:34:00,480
Applies to, connects a policy to the business unit,
866
00:34:00,480 --> 00:34:02,200
region, or role it covers.
867
00:34:02,200 --> 00:34:03,920
These relationships are what allow copilot
868
00:34:03,920 --> 00:34:06,360
to traverse context rather than retrieve documents.
869
00:34:06,360 --> 00:34:09,360
Without explicit edges, you have a well-organized filing system.
870
00:34:09,360 --> 00:34:11,160
With them, you have something that can reason.
871
00:34:11,160 --> 00:34:13,720
The most common ontology failure isn't poor design.
872
00:34:13,720 --> 00:34:15,560
It's over ambitious design.
873
00:34:15,560 --> 00:34:18,800
Teams spend months trying to model every conceivable concept,
874
00:34:18,800 --> 00:34:21,440
every document subtype, every workflow state,
875
00:34:21,440 --> 00:34:23,200
every organizational relationship
876
00:34:23,200 --> 00:34:25,920
before a single production use case has been validated.
877
00:34:25,920 --> 00:34:27,880
The ontology becomes a monument to thoroughness
878
00:34:27,880 --> 00:34:29,240
and a barrier to delivery.
879
00:34:29,240 --> 00:34:32,320
Nobody ships anything because everything needs to be modeled first
880
00:34:32,320 --> 00:34:34,200
and everything always leads to something else
881
00:34:34,200 --> 00:34:35,720
that also needs to be modeled.
882
00:34:35,720 --> 00:34:38,200
The fix is the minimal viable ontology.
883
00:34:38,200 --> 00:34:40,960
Before you model anything, identify one high value use case,
884
00:34:40,960 --> 00:34:42,360
not a category of use cases.
885
00:34:42,360 --> 00:34:45,400
One specific problem, a specific team needs solved,
886
00:34:45,400 --> 00:34:47,320
then model only the entities and relationships
887
00:34:47,320 --> 00:34:49,200
required to support that problem.
888
00:34:49,200 --> 00:34:52,000
If the use case is, account teams need to find everything
889
00:34:52,000 --> 00:34:54,560
relevant to a customer renewal before a QBR,
890
00:34:54,560 --> 00:34:56,600
your initial ontology needs customer entities,
891
00:34:56,600 --> 00:34:58,720
project entities, document entities,
892
00:34:58,720 --> 00:35:00,240
and the relationships connecting them.
893
00:35:00,240 --> 00:35:02,080
You don't need to model HR policies
894
00:35:02,080 --> 00:35:04,200
or engineering architecture diagrams,
895
00:35:04,200 --> 00:35:05,760
build for the problem in front of you,
896
00:35:05,760 --> 00:35:08,200
ship it, learn from actual query patterns,
897
00:35:08,200 --> 00:35:10,680
and let usage data drive ontology evolution
898
00:35:10,680 --> 00:35:12,440
rather than theoretical completeness.
899
00:35:12,440 --> 00:35:14,520
There's a technical discipline that matters here
900
00:35:14,520 --> 00:35:17,360
and often gets skipped in the rush to get the graph running.
901
00:35:17,360 --> 00:35:18,960
The ontology needs to be humanized.
902
00:35:18,960 --> 00:35:21,120
That term comes from Microsoft's own guidance
903
00:35:21,120 --> 00:35:24,440
on making fabric semantic models interpretable by AI
904
00:35:24,440 --> 00:35:26,880
and it applies equally to knowledge graph design.
905
00:35:26,880 --> 00:35:29,120
Clear, business-friendly names for entity types,
906
00:35:29,120 --> 00:35:31,840
relationship labels, site structures, library names,
907
00:35:31,840 --> 00:35:33,520
and content type definitions.
908
00:35:33,520 --> 00:35:35,840
Not because users read the ontology directly
909
00:35:35,840 --> 00:35:37,880
but because Copilot reads it constantly.
910
00:35:37,880 --> 00:35:39,240
When someone asks a question,
911
00:35:39,240 --> 00:35:42,120
the model uses the semantic structure of your ontology,
912
00:35:42,120 --> 00:35:45,120
the names you gave things, the relationships you defined,
913
00:35:45,120 --> 00:35:48,240
to interpret intent and generate responses.
914
00:35:48,240 --> 00:35:50,160
An ontology full of internal system codes,
915
00:35:50,160 --> 00:35:53,120
legacy naming conventions and IT-centric abbreviations
916
00:35:53,120 --> 00:35:55,120
produces answers that don't map clearly
917
00:35:55,120 --> 00:35:56,480
to how the business talks.
918
00:35:56,480 --> 00:35:58,360
An ontology that uses the same vocabulary
919
00:35:58,360 --> 00:36:00,520
your organization uses to describe its work
920
00:36:00,520 --> 00:36:02,280
produces answers that feel native.
921
00:36:02,280 --> 00:36:03,680
This isn't a cosmetic concern.
922
00:36:03,680 --> 00:36:05,400
It's a retrieval precision concern.
923
00:36:05,400 --> 00:36:07,160
The closer the ontology's language is
924
00:36:07,160 --> 00:36:08,960
to the language of actual queries,
925
00:36:08,960 --> 00:36:10,520
the better the disambiguation,
926
00:36:10,520 --> 00:36:12,080
the better the context assembly,
927
00:36:12,080 --> 00:36:13,800
and the better the final answer.
928
00:36:13,800 --> 00:36:15,600
Clean entities and well-known relationships
929
00:36:15,600 --> 00:36:16,880
give you a trust-worthy graph.
930
00:36:16,880 --> 00:36:18,760
Keeping that graph trustworthy as data flows
931
00:36:18,760 --> 00:36:20,000
through it at enterprise scale
932
00:36:20,000 --> 00:36:22,440
is the entity resolution problem we address next.
933
00:36:22,440 --> 00:36:24,360
Entity resolution and graph quality.
934
00:36:24,360 --> 00:36:26,680
The moment data starts flowing into your knowledge graph
935
00:36:26,680 --> 00:36:27,720
at enterprise scale,
936
00:36:27,720 --> 00:36:30,160
you encounter a problem that pure metadata work
937
00:36:30,160 --> 00:36:31,600
can't solve on its own.
938
00:36:31,600 --> 00:36:33,600
Different systems use different identifiers
939
00:36:33,600 --> 00:36:35,240
for the same real-world thing.
940
00:36:35,240 --> 00:36:37,400
Salesforce knows a client as ACMECorp,
941
00:36:37,400 --> 00:36:40,520
SharePoint has folders labeled ACME and ACMECorp.
942
00:36:40,520 --> 00:36:43,200
The ont team's channels reference the ACME account.
943
00:36:43,200 --> 00:36:45,680
Per view classified three separate documents sets
944
00:36:45,680 --> 00:36:48,040
under three variations of the same name.
945
00:36:48,040 --> 00:36:49,480
Without a mechanism to recognize
946
00:36:49,480 --> 00:36:51,720
that all of these refer to the same entity,
947
00:36:51,720 --> 00:36:53,160
your graph fragments.
948
00:36:53,160 --> 00:36:54,520
Project Phoenix doesn't exist
949
00:36:54,520 --> 00:36:56,800
as a single connected node you can traverse.
950
00:36:56,800 --> 00:36:59,360
It exists a 17 separate, unconnected records
951
00:36:59,360 --> 00:37:01,440
each containing partial context,
952
00:37:01,440 --> 00:37:03,320
non-containing the full picture.
953
00:37:03,320 --> 00:37:05,400
Entity resolution is the process of determining
954
00:37:05,400 --> 00:37:08,040
when different records across different systems
955
00:37:08,040 --> 00:37:10,280
refer to the same real-world entity
956
00:37:10,280 --> 00:37:14,360
and merging them into a single canonical representation.
957
00:37:14,360 --> 00:37:16,560
It's not deduplication in the traditional sense,
958
00:37:16,560 --> 00:37:18,400
it's not just catching exact string matches.
959
00:37:18,400 --> 00:37:21,080
It uses contextual signals, shared relationships,
960
00:37:21,080 --> 00:37:23,600
overlapping attributes, co-occurring references
961
00:37:23,600 --> 00:37:26,160
to make probabilistic determinations about identity
962
00:37:26,160 --> 00:37:28,560
across heterogeneous sources.
963
00:37:28,560 --> 00:37:30,920
Dunwell is what turns a collection of related fragments
964
00:37:30,920 --> 00:37:32,600
into a coherent node.
965
00:37:32,600 --> 00:37:35,000
Recent research on LLM generated knowledge graphs
966
00:37:35,000 --> 00:37:36,600
makes the value concrete.
967
00:37:36,600 --> 00:37:38,240
Applying entity resolution to graphs
968
00:37:38,240 --> 00:37:40,000
built from unstructured content eliminated
969
00:37:40,000 --> 00:37:42,440
roughly 40% of entities in relationships,
970
00:37:42,440 --> 00:37:44,560
not by discarding useful information,
971
00:37:44,560 --> 00:37:46,760
but by collapsing redundant representations
972
00:37:46,760 --> 00:37:50,040
of the same thing into single canonical nodes.
973
00:37:50,040 --> 00:37:52,400
And despite that reduction in raw graph size,
974
00:37:52,400 --> 00:37:54,800
question answering performance improved consistently
975
00:37:54,800 --> 00:37:56,360
across multiple rag methods.
976
00:37:56,360 --> 00:37:58,960
Smaller, cleaner graph, better answers.
977
00:37:58,960 --> 00:38:01,040
The relationship between completeness and quality
978
00:38:01,040 --> 00:38:02,760
isn't linear, it runs the other way.
979
00:38:02,760 --> 00:38:05,400
There's a precision recall trade-off in entity resolution
980
00:38:05,400 --> 00:38:07,640
that matters more than most architects appreciate
981
00:38:07,640 --> 00:38:09,480
until they've hit both failure modes.
982
00:38:09,480 --> 00:38:12,600
Over-optimizing for recall, capturing every possible match
983
00:38:12,600 --> 00:38:15,680
creates what practitioners call monster entities.
984
00:38:15,680 --> 00:38:18,000
Two, project Phoenix initiatives from different business
985
00:38:18,000 --> 00:38:21,040
units, running concurrently, get merged into a single node
986
00:38:21,040 --> 00:38:22,600
because both contain the same name
987
00:38:22,600 --> 00:38:24,320
and some overlapping team members.
988
00:38:24,320 --> 00:38:26,680
Now every query about either project pulls context
989
00:38:26,680 --> 00:38:29,760
from both, centrality signals distort, downstream reasoning
990
00:38:29,760 --> 00:38:30,480
breaks.
991
00:38:30,480 --> 00:38:32,000
The graph confidently conflates things
992
00:38:32,000 --> 00:38:33,240
that should be distinct.
993
00:38:33,240 --> 00:38:36,280
Over-optimizing for precision produces the opposite failure.
994
00:38:36,280 --> 00:38:38,000
Legitimate variance of the same entities
995
00:38:38,000 --> 00:38:40,400
stay fragmented because the resolution threshold
996
00:38:40,400 --> 00:38:41,560
is too conservative.
997
00:38:41,560 --> 00:38:44,400
The problem you started with, 17 unconnected nodes,
998
00:38:44,400 --> 00:38:47,040
persists just with fewer obvious duplicates.
999
00:38:47,040 --> 00:38:48,960
The metrics that actually matter for evaluating
1000
00:38:48,960 --> 00:38:52,200
your resolution quality aren't captured by any single number.
1001
00:38:52,200 --> 00:38:53,680
Precision and recall on matches
1002
00:38:53,680 --> 00:38:54,960
established the baseline.
1003
00:38:54,960 --> 00:38:56,520
Cluster size distributions tell you
1004
00:38:56,520 --> 00:38:58,520
whether you're creating monster entities
1005
00:38:58,520 --> 00:39:01,080
and unusually large cluster warrants investigation
1006
00:39:01,080 --> 00:39:03,640
before it propagates through downstream retrieval.
1007
00:39:03,640 --> 00:39:06,280
And downstream task performance is the honest measure.
1008
00:39:06,280 --> 00:39:08,920
Does answer quality improve when you apply resolution?
1009
00:39:08,920 --> 00:39:11,560
Does it degrade in ways that suggest over merging?
1010
00:39:11,560 --> 00:39:13,080
Those behavioral signals tell you more
1011
00:39:13,080 --> 00:39:15,280
than any matching metric in isolation.
1012
00:39:15,280 --> 00:39:17,880
For M365 environments specifically,
1013
00:39:17,880 --> 00:39:19,200
the entity resolution challenge
1014
00:39:19,200 --> 00:39:21,400
isn't primarily a data science problem.
1015
00:39:21,400 --> 00:39:22,880
It's a naming discipline problem.
1016
00:39:22,880 --> 00:39:25,800
Ambiguous SharePoint site names duplicate document libraries
1017
00:39:25,800 --> 00:39:28,720
created when different teams solve the same problem independently,
1018
00:39:28,720 --> 00:39:31,760
project spaces that reuse names across budget cycles.
1019
00:39:31,760 --> 00:39:33,040
These are organizational habits
1020
00:39:33,040 --> 00:39:35,080
that create resolution complexity.
1021
00:39:35,080 --> 00:39:37,320
You can build sophisticated matching algorithms,
1022
00:39:37,320 --> 00:39:39,840
but the more you can standardize naming conventions
1023
00:39:39,840 --> 00:39:42,080
and enforce controlled vocabularies upstream
1024
00:39:42,080 --> 00:39:44,040
in the ingestion pipeline, the less work
1025
00:39:44,040 --> 00:39:46,560
the resolution layer has to do.
1026
00:39:46,560 --> 00:39:49,320
Quality controls that belong in every implementation.
1027
00:39:49,320 --> 00:39:52,320
Automated duplicate detection that runs continuously
1028
00:39:52,320 --> 00:39:54,280
rather than as a one time cleanup,
1029
00:39:54,280 --> 00:39:56,760
stale content flags that surface documents approaching
1030
00:39:56,760 --> 00:39:59,320
their review dates before they become retrieval noise
1031
00:39:59,320 --> 00:40:02,160
and missing metadata alerts that catch new content entering
1032
00:40:02,160 --> 00:40:04,800
the graph without the provenance and authority fields
1033
00:40:04,800 --> 00:40:06,720
required for reliable retrieval.
1034
00:40:06,720 --> 00:40:09,480
These aren't monitoring dashboards you check occasionally.
1035
00:40:09,480 --> 00:40:11,200
They're operational loops that keep the graph
1036
00:40:11,200 --> 00:40:13,720
trustworthy as the content estate evolves.
1037
00:40:13,720 --> 00:40:15,600
Clean entities and well-defined relationships
1038
00:40:15,600 --> 00:40:17,520
give you a graph worth querying.
1039
00:40:17,520 --> 00:40:19,080
How you expose that graph securely
1040
00:40:19,080 --> 00:40:20,640
to a globally distributed workforce
1041
00:40:20,640 --> 00:40:22,880
is the architecture question we turn to next.
1042
00:40:22,880 --> 00:40:25,280
Security architecture for global hybrid teams.
1043
00:40:25,280 --> 00:40:27,320
The security of your knowledge graph is exactly
1044
00:40:27,320 --> 00:40:29,560
as strong as the EntraID design underneath it.
1045
00:40:29,560 --> 00:40:32,320
That's not a general principle about identity being important.
1046
00:40:32,320 --> 00:40:34,160
It's a specific architectural statement
1047
00:40:34,160 --> 00:40:36,000
about where the control plane lives.
1048
00:40:36,000 --> 00:40:38,040
Every retrieval decision co-pilot makes,
1049
00:40:38,040 --> 00:40:39,880
every permission boundary it respects,
1050
00:40:39,880 --> 00:40:42,520
every access scope and agent operates within,
1051
00:40:42,520 --> 00:40:44,160
all of it traces back to the identity
1052
00:40:44,160 --> 00:40:47,160
and policy structures you've built in EntraID.
1053
00:40:47,160 --> 00:40:48,840
If those structures are well designed,
1054
00:40:48,840 --> 00:40:50,160
the graph inherits that precision.
1055
00:40:50,160 --> 00:40:52,520
If they're not, no amount of sensitivity labeling
1056
00:40:52,520 --> 00:40:54,320
or connector governance compensates.
1057
00:40:54,320 --> 00:40:56,960
Groups, conditional access and privilege identity management
1058
00:40:56,960 --> 00:40:59,280
are the three Entra mechanisms that do most of the work
1059
00:40:59,280 --> 00:41:01,240
in a knowledge graph security architecture.
1060
00:41:01,240 --> 00:41:03,040
Groups are the unit of access control.
1061
00:41:03,040 --> 00:41:05,120
Not individual users, groups, every workspace,
1062
00:41:05,120 --> 00:41:07,240
every library, every sensitive content set
1063
00:41:07,240 --> 00:41:09,760
should have its permissions managed through a name security group
1064
00:41:09,760 --> 00:41:12,000
rather than through direct user assignment.
1065
00:41:12,000 --> 00:41:13,800
This isn't just an operational convenience,
1066
00:41:13,800 --> 00:41:16,000
it's what makes access reviewable at scale.
1067
00:41:16,000 --> 00:41:18,000
When permissions are attached to individuals,
1068
00:41:18,000 --> 00:41:19,920
auditing who has access to what requires
1069
00:41:19,920 --> 00:41:22,440
inspecting every item across every system.
1070
00:41:22,440 --> 00:41:23,840
When permissions are attached to groups,
1071
00:41:23,840 --> 00:41:25,320
you audit the group memberships
1072
00:41:25,320 --> 00:41:27,360
and the access picture becomes readable.
1073
00:41:27,360 --> 00:41:29,280
Dynamic groups extend this further.
1074
00:41:29,280 --> 00:41:31,840
When group membership is driven by EntraID attributes,
1075
00:41:31,840 --> 00:41:34,080
department job title region cost center,
1076
00:41:34,080 --> 00:41:37,000
access follows organizational changes automatically.
1077
00:41:37,000 --> 00:41:39,640
A team member who moves from Emia sales to Apex sales
1078
00:41:39,640 --> 00:41:41,400
stops seeing Emia scope content
1079
00:41:41,400 --> 00:41:43,280
and starts seeing Apex scope content
1080
00:41:43,280 --> 00:41:45,680
without anyone manually updating a permission list.
1081
00:41:45,680 --> 00:41:48,720
For global hybrid teams where org structure shifts frequently,
1082
00:41:48,720 --> 00:41:50,600
dynamic groups reduce the permission drift
1083
00:41:50,600 --> 00:41:53,160
that accumulates when access provisioning relies on humans
1084
00:41:53,160 --> 00:41:54,680
remembering to update it.
1085
00:41:54,680 --> 00:41:56,280
Conditional access is the outer ring,
1086
00:41:56,280 --> 00:41:58,840
it controls whether a user can reach the semantic layer
1087
00:41:58,840 --> 00:42:00,960
at all under a given set of conditions.
1088
00:42:00,960 --> 00:42:02,840
Require MFA before accessing sites
1089
00:42:02,840 --> 00:42:04,440
that contain M&A documents.
1090
00:42:04,440 --> 00:42:07,680
Restrict access to HR content to manage compliant devices.
1091
00:42:07,680 --> 00:42:09,840
Block access from high-risk sign-in context
1092
00:42:09,840 --> 00:42:12,280
entirely for specific high sensitivity workspaces.
1093
00:42:12,280 --> 00:42:14,600
These policies don't operate at the document level,
1094
00:42:14,600 --> 00:42:16,080
they operate at the session level
1095
00:42:16,080 --> 00:42:17,760
before retrieval ever runs.
1096
00:42:17,760 --> 00:42:19,440
They're the boundary condition that determines
1097
00:42:19,440 --> 00:42:20,960
whether co-pilot can even attempt
1098
00:42:20,960 --> 00:42:23,240
to surface content from a protected scope.
1099
00:42:23,240 --> 00:42:24,880
For the highest sensitivity workloads,
1100
00:42:24,880 --> 00:42:27,920
privileged identity management introduces just in time access.
1101
00:42:27,920 --> 00:42:30,040
Instead of permanently granting a user membership
1102
00:42:30,040 --> 00:42:32,760
in a group that controls access to a sensitive project,
1103
00:42:32,760 --> 00:42:35,000
PM requires them to activate that membership
1104
00:42:35,000 --> 00:42:37,160
for a specific approved time window.
1105
00:42:37,160 --> 00:42:39,600
Once the window closes, the access expires.
1106
00:42:39,600 --> 00:42:42,320
Co-pilot's effective access scope contracts with it.
1107
00:42:42,320 --> 00:42:44,280
This sharply limits what a compromised account
1108
00:42:44,280 --> 00:42:46,320
or a misdirected prompt can reach.
1109
00:42:46,320 --> 00:42:49,000
The blast radius of any query is bounded
1110
00:42:49,000 --> 00:42:51,320
by what the identity can access right now,
1111
00:42:51,320 --> 00:42:53,880
not what it was granted access to six months ago
1112
00:42:53,880 --> 00:42:55,200
and never had revoked.
1113
00:42:55,200 --> 00:42:57,240
The principle of least privilege applies to agents
1114
00:42:57,240 --> 00:42:59,240
as forcefully as it applies to users.
1115
00:42:59,240 --> 00:43:01,520
And this is where many deployments create new exposure
1116
00:43:01,520 --> 00:43:03,280
while trying to extend capability.
1117
00:43:03,280 --> 00:43:05,600
Each co-pilot agent could each declarative agent
1118
00:43:05,600 --> 00:43:08,480
each co-pilot studio bot, each custom rag application
1119
00:43:08,480 --> 00:43:10,360
should operate under its own service identity
1120
00:43:10,360 --> 00:43:13,520
with its own permission scope, not tenant-wide graph access.
1121
00:43:13,520 --> 00:43:15,760
Scoped access to the specific sites, libraries,
1122
00:43:15,760 --> 00:43:18,800
or data sources it legitimately needs to serve its purpose.
1123
00:43:18,800 --> 00:43:21,000
An agent built to answer HR policy questions
1124
00:43:21,000 --> 00:43:23,240
doesn't need access to the finance data state.
1125
00:43:23,240 --> 00:43:25,080
An agent built to support account teams
1126
00:43:25,080 --> 00:43:27,480
doesn't need access to internal legal communications.
1127
00:43:27,480 --> 00:43:29,720
Scope the identity, scope the risk.
1128
00:43:29,720 --> 00:43:31,480
Data residency and sovereignty requirements
1129
00:43:31,480 --> 00:43:33,520
add a layer of complexity that's specifically
1130
00:43:33,520 --> 00:43:35,360
acute for global hybrid teams.
1131
00:43:35,360 --> 00:43:37,600
When employees in the EU query co-pilot,
1132
00:43:37,600 --> 00:43:41,160
the content surfaced must respect data residency obligations.
1133
00:43:41,160 --> 00:43:42,920
When flex-rooting routes inference capacity
1134
00:43:42,920 --> 00:43:44,920
across regions under capacity pressure,
1135
00:43:44,920 --> 00:43:48,000
semantically rich governance metadata, geotags,
1136
00:43:48,000 --> 00:43:51,200
residency policy fields, sensitivity classifications
1137
00:43:51,200 --> 00:43:54,440
needs to be attached to every content object co-pilot may access
1138
00:43:54,440 --> 00:43:56,240
so that routing decisions can be evaluated
1139
00:43:56,240 --> 00:43:57,720
against those constraints.
1140
00:43:57,720 --> 00:44:00,320
Residency compliance isn't enforced by infrastructure alone.
1141
00:44:00,320 --> 00:44:01,840
It requires the metadata that makes
1142
00:44:01,840 --> 00:44:03,720
residency aware retrieval possible.
1143
00:44:03,720 --> 00:44:05,720
Sensitivity labels are the last control
1144
00:44:05,720 --> 00:44:07,560
worth calling out specifically
1145
00:44:07,560 --> 00:44:09,920
and the item level distinction matters here
1146
00:44:09,920 --> 00:44:11,760
just as it did during hardening.
1147
00:44:11,760 --> 00:44:13,880
Labels at the container level declare intent.
1148
00:44:13,880 --> 00:44:15,720
Labels at the item level enforce it.
1149
00:44:15,720 --> 00:44:19,120
Co-pilot's ability to see, summarize or export a piece of content
1150
00:44:19,120 --> 00:44:22,080
is constrained by the label applied to that specific item.
1151
00:44:22,080 --> 00:44:23,880
Not the site it lives in.
1152
00:44:23,880 --> 00:44:26,000
Security defines who can access the graph.
1153
00:44:26,000 --> 00:44:28,440
Continuous governance is what keeps that access picture
1154
00:44:28,440 --> 00:44:30,520
accurate as the organization changes.
1155
00:44:30,520 --> 00:44:33,120
Continuous governance as an operational discipline.
1156
00:44:33,120 --> 00:44:35,440
Security defines who can access the graph.
1157
00:44:35,440 --> 00:44:36,960
But access boundaries are road.
1158
00:44:36,960 --> 00:44:39,760
Content accumulates permissions, drift, teams change,
1159
00:44:39,760 --> 00:44:42,040
projects close and their workspaces stay open.
1160
00:44:42,040 --> 00:44:43,520
The hardening work from section seven
1161
00:44:43,520 --> 00:44:45,360
doesn't hold indefinitely on its own.
1162
00:44:45,360 --> 00:44:47,920
It holds because someone is watching it, measuring it
1163
00:44:47,920 --> 00:44:49,600
and correcting it when it slips.
1164
00:44:49,600 --> 00:44:51,000
That's the governance discipline.
1165
00:44:51,000 --> 00:44:54,320
And in 2026 it runs as a continuous operational loop,
1166
00:44:54,320 --> 00:44:55,480
not a calendar event.
1167
00:44:55,480 --> 00:44:57,280
The organization's finding surprises
1168
00:44:57,280 --> 00:45:00,320
in their co-pilot outputs six months after deployment
1169
00:45:00,320 --> 00:45:02,000
share a common characteristic.
1170
00:45:02,000 --> 00:45:03,520
They treated governance as a phase.
1171
00:45:03,520 --> 00:45:07,160
Phase one, clean up, phase two, enablement, phase three done.
1172
00:45:07,160 --> 00:45:08,600
The problem is that the content estate
1173
00:45:08,600 --> 00:45:11,160
doesn't stop changing when the project plan says it should.
1174
00:45:11,160 --> 00:45:12,800
New sites get created.
1175
00:45:12,800 --> 00:45:15,760
External collaborators get invited and never off-boarded.
1176
00:45:15,760 --> 00:45:18,120
Someone grants broad access to accelerator deadline
1177
00:45:18,120 --> 00:45:19,400
and never revisits it.
1178
00:45:19,400 --> 00:45:21,320
Each of those events is individually small.
1179
00:45:21,320 --> 00:45:23,680
Collectively, over months, they reconstitute
1180
00:45:23,680 --> 00:45:25,880
exactly the permissions brawl that the initial cleanup
1181
00:45:25,880 --> 00:45:26,600
removed.
1182
00:45:26,600 --> 00:45:28,680
The SharePoint Data Access Governance Reports
1183
00:45:28,680 --> 00:45:30,960
in the Microsoft 365 Admin Center
1184
00:45:30,960 --> 00:45:33,160
are the starting instrument for every governance cycle.
1185
00:45:33,160 --> 00:45:34,720
They surface overshared sites,
1186
00:45:34,720 --> 00:45:37,880
sites with anonymous access enabled, sites where everyone
1187
00:45:37,880 --> 00:45:41,200
except external users has permissions that go beyond red
1188
00:45:41,200 --> 00:45:43,400
and a permission change history that shows exactly
1189
00:45:43,400 --> 00:45:46,160
when access expanded and by how much.
1190
00:45:46,160 --> 00:45:48,400
This isn't a forensic tool you reach for after a problem
1191
00:45:48,400 --> 00:45:49,200
surfaces.
1192
00:45:49,200 --> 00:45:51,880
It's the regular readout that tells you whether the architecture
1193
00:45:51,880 --> 00:45:53,920
you built is still the architecture you're running.
1194
00:45:53,920 --> 00:45:57,240
The SharePoint Admin agent extends this into automation.
1195
00:45:57,240 --> 00:45:59,360
It can identify ownerless sites at scale,
1196
00:45:59,360 --> 00:46:01,120
sites where the original owner has left
1197
00:46:01,120 --> 00:46:02,800
and no successor has been assigned,
1198
00:46:02,800 --> 00:46:04,600
as well as patterns of permissions brawl
1199
00:46:04,600 --> 00:46:07,480
that would take a human admin days to surface manually.
1200
00:46:07,480 --> 00:46:09,840
The distinction worth making is between governance, automation,
1201
00:46:09,840 --> 00:46:11,200
and governance theater.
1202
00:46:11,200 --> 00:46:13,720
Governance theater produces reports that nobody acts on.
1203
00:46:13,720 --> 00:46:16,520
Governance automation produces alerts that trigger defined
1204
00:46:16,520 --> 00:46:19,920
workflows with clear owners and measurable resolution times.
1205
00:46:19,920 --> 00:46:21,000
The tool is the same.
1206
00:46:21,000 --> 00:46:22,400
The discipline is the difference.
1207
00:46:22,400 --> 00:46:25,120
Site access reviews need to be owned by the right people,
1208
00:46:25,120 --> 00:46:26,760
and that's rarely IT.
1209
00:46:26,760 --> 00:46:29,280
The person who knows whether the EMEA sales team still needs
1210
00:46:29,280 --> 00:46:31,280
added access to a sensitive project workspace
1211
00:46:31,280 --> 00:46:34,280
is the EMEA sales lead, not the SharePoint administrator.
1212
00:46:34,280 --> 00:46:36,680
IT can surface the question, only the business owner can
1213
00:46:36,680 --> 00:46:37,800
answer it accurately.
1214
00:46:37,800 --> 00:46:39,240
Embedding quarterly access reviews
1215
00:46:39,240 --> 00:46:42,440
into existing business rhythms, alongside budget reviews,
1216
00:46:42,440 --> 00:46:45,040
team retrospectives, project closeouts,
1217
00:46:45,040 --> 00:46:47,720
makes them a normal part of how the organization operates
1218
00:46:47,720 --> 00:46:51,600
rather than an IT compliance request that gets deprioritized.
1219
00:46:51,600 --> 00:46:53,880
Legacy SharePoint alerts with a previous mechanism
1220
00:46:53,880 --> 00:46:56,600
for monitoring change events, new external shares,
1221
00:46:56,600 --> 00:46:59,720
permissioned additions, content access by unexpected users.
1222
00:46:59,720 --> 00:47:01,960
Those alerts are retiring in mid-2026,
1223
00:47:01,960 --> 00:47:04,560
replacing them with power-automate-based governance flows
1224
00:47:04,560 --> 00:47:06,160
isn't just a migration task.
1225
00:47:06,160 --> 00:47:08,040
It's an upgrade to real-time visibility,
1226
00:47:08,040 --> 00:47:09,600
a flow that triggers the moment someone
1227
00:47:09,600 --> 00:47:12,080
adds a broad access group to a sensitive site
1228
00:47:12,080 --> 00:47:14,400
that roots in immediate notification to the data owner
1229
00:47:14,400 --> 00:47:17,800
with a one-click review action is categorically more useful
1230
00:47:17,800 --> 00:47:20,720
than a weekly email summary that arrives after the damage
1231
00:47:20,720 --> 00:47:21,800
is already done.
1232
00:47:21,800 --> 00:47:23,880
The operational rhythm that ties this together
1233
00:47:23,880 --> 00:47:25,840
maps directly to the three hours.
1234
00:47:25,840 --> 00:47:28,920
Readiness reviews on a quarterly cycle, permissions audits,
1235
00:47:28,920 --> 00:47:31,880
access recertifications, life cycle checks for sites
1236
00:47:31,880 --> 00:47:34,360
and groups approaching expiration, relevance audits
1237
00:47:34,360 --> 00:47:36,840
that archive or formally exclude dormant content
1238
00:47:36,840 --> 00:47:39,520
from the semantic index, not just from SharePoint views,
1239
00:47:39,520 --> 00:47:41,800
but from the retrieval layer itself.
1240
00:47:41,800 --> 00:47:44,680
And resiliency testing that validates rollback capabilities
1241
00:47:44,680 --> 00:47:47,280
before an incident forces you to use them under pressure.
1242
00:47:47,280 --> 00:47:48,880
These aren't three separate programs.
1243
00:47:48,880 --> 00:47:51,480
Their three lenses applied to the same operational cadence
1244
00:47:51,480 --> 00:47:53,200
owned by the same cross-functional group
1245
00:47:53,200 --> 00:47:55,760
measured against the same governance KPIs.
1246
00:47:55,760 --> 00:47:57,080
With that loop running reliably,
1247
00:47:57,080 --> 00:47:59,000
the architecture has what it needs to do something
1248
00:47:59,000 --> 00:48:00,880
more than retrieve content accurately.
1249
00:48:00,880 --> 00:48:03,040
It has what it needs to anticipate what users need
1250
00:48:03,040 --> 00:48:04,480
before they ask.
1251
00:48:04,480 --> 00:48:06,280
From search to predictive intelligence,
1252
00:48:06,280 --> 00:48:08,520
the shift from reactive search to predictive intelligence
1253
00:48:08,520 --> 00:48:09,960
isn't a feature you enable.
1254
00:48:09,960 --> 00:48:11,520
It's a state the architecture reaches
1255
00:48:11,520 --> 00:48:14,120
when the knowledge graph has enough clean, connected,
1256
00:48:14,120 --> 00:48:16,080
governed data to anticipate context
1257
00:48:16,080 --> 00:48:17,560
rather than wait for a query.
1258
00:48:17,560 --> 00:48:19,880
Think about what that actually means operationally.
1259
00:48:19,880 --> 00:48:21,720
In a reactive search model, the user
1260
00:48:21,720 --> 00:48:23,360
carries the cognitive burden.
1261
00:48:23,360 --> 00:48:24,680
They have to know what to ask for.
1262
00:48:24,680 --> 00:48:27,360
Know where it might live and know enough about the content
1263
00:48:27,360 --> 00:48:30,000
is state to construct a query that will surface the right thing.
1264
00:48:30,000 --> 00:48:32,400
That's a significant tax on every knowledge worker every day.
1265
00:48:32,400 --> 00:48:35,000
In a predictive model, that burden shifts to the graph.
1266
00:48:35,000 --> 00:48:36,600
The graph knows the relationships.
1267
00:48:36,600 --> 00:48:37,880
The graph knows the context.
1268
00:48:37,880 --> 00:48:40,520
The graph knows that when someone on the Emia account team
1269
00:48:40,520 --> 00:48:42,320
opens a customer record the morning
1270
00:48:42,320 --> 00:48:44,720
before a renewal meeting, the relevant artifacts
1271
00:48:44,720 --> 00:48:46,240
aren't just the latest contract.
1272
00:48:46,240 --> 00:48:47,880
They're the open support tickets.
1273
00:48:47,880 --> 00:48:49,800
The last three executive touchpoints,
1274
00:48:49,800 --> 00:48:52,960
the competitive risks flagged in the most recent deal review
1275
00:48:52,960 --> 00:48:55,160
and the decision the product team made in Q3
1276
00:48:55,160 --> 00:48:57,480
that directly affects the renewal terms.
1277
00:48:57,480 --> 00:49:00,480
None of those connections require the user to know they exist.
1278
00:49:00,480 --> 00:49:03,640
The graph already modeled them, co-pilot traverses them.
1279
00:49:03,640 --> 00:49:06,000
That traversal is what distinguishes the architecture
1280
00:49:06,000 --> 00:49:07,920
we've been building throughout this episode
1281
00:49:07,920 --> 00:49:09,800
from a well-configured search engine.
1282
00:49:09,800 --> 00:49:11,200
A search engine returns documents.
1283
00:49:11,200 --> 00:49:12,880
A knowledge graph enables navigation
1284
00:49:12,880 --> 00:49:15,640
across project entities, task entities, document entities
1285
00:49:15,640 --> 00:49:17,800
and owner entities in a single query path.
1286
00:49:17,800 --> 00:49:20,520
What's the current status and risk on the X initiative?
1287
00:49:20,520 --> 00:49:22,360
Becomes answerable without the user knowing
1288
00:49:22,360 --> 00:49:24,760
which system owns the status, which document holds
1289
00:49:24,760 --> 00:49:27,560
the risk register or which person is currently accountable
1290
00:49:27,560 --> 00:49:28,760
for each work stream.
1291
00:49:28,760 --> 00:49:31,720
The graph resolves all of that before the answer surfaces.
1292
00:49:31,720 --> 00:49:34,640
The expert and team discovery works the same way.
1293
00:49:34,640 --> 00:49:36,480
The question who has implemented feature X
1294
00:49:36,480 --> 00:49:38,840
for a customer in the financial services sector
1295
00:49:38,840 --> 00:49:41,160
is at its core a graph traversal problem.
1296
00:49:41,160 --> 00:49:44,240
Connect people to skills, connect skills to project deliverables,
1297
00:49:44,240 --> 00:49:45,720
connect deliverables to customers,
1298
00:49:45,720 --> 00:49:48,040
connect customers to industry classification.
1299
00:49:48,040 --> 00:49:49,800
When those edges exist in the graph,
1300
00:49:49,800 --> 00:49:51,400
the query resolves in seconds.
1301
00:49:51,400 --> 00:49:53,920
Without them, the answer requires someone to remember
1302
00:49:53,920 --> 00:49:55,480
who worked on what, ask around
1303
00:49:55,480 --> 00:49:57,800
and hope the right person responds before the deadline.
1304
00:49:57,800 --> 00:50:00,720
For hybrid workforces, this isn't a productivity convenience.
1305
00:50:00,720 --> 00:50:02,120
It's a structural requirement.
1306
00:50:02,120 --> 00:50:04,440
Distributed teams operating across time zones
1307
00:50:04,440 --> 00:50:06,600
have already lost the ambient knowledge transfer
1308
00:50:06,600 --> 00:50:09,080
that happened naturally in collocated environments,
1309
00:50:09,080 --> 00:50:11,480
the hallway conversation, the overheard discussion,
1310
00:50:11,480 --> 00:50:13,640
the informal briefing before a meeting
1311
00:50:13,640 --> 00:50:16,120
that ambient layer carried enormous informational value
1312
00:50:16,120 --> 00:50:18,480
and hybrid work didn't replace it with anything.
1313
00:50:18,480 --> 00:50:21,720
The knowledge graph is the closest architectural substitute.
1314
00:50:21,720 --> 00:50:23,640
It captures the organizational relationships
1315
00:50:23,640 --> 00:50:26,840
and content connections that used to exist only in people's heads
1316
00:50:26,840 --> 00:50:29,920
and makes them queryable at any hour from any location.
1317
00:50:29,920 --> 00:50:31,880
The ROI case compounds at scale in a way
1318
00:50:31,880 --> 00:50:33,720
that's worth being precise about.
1319
00:50:33,720 --> 00:50:36,320
Industry research consistently estimates knowledge workers
1320
00:50:36,320 --> 00:50:39,240
spend 15 to 30% of their time searching for information
1321
00:50:39,240 --> 00:50:40,520
they already own.
1322
00:50:40,520 --> 00:50:43,160
A reduction of 20 to 40% in that friction,
1323
00:50:43,160 --> 00:50:45,120
achievable with a well-implemented graph,
1324
00:50:45,120 --> 00:50:47,160
translates into tens of millions of dollars
1325
00:50:47,160 --> 00:50:48,960
in recovered productivity annually
1326
00:50:48,960 --> 00:50:51,800
for organizations above 10,000 employees.
1327
00:50:51,800 --> 00:50:54,160
That's before accounting for the quality improvements.
1328
00:50:54,160 --> 00:50:56,480
Fewer decisions made on stale information,
1329
00:50:56,480 --> 00:50:59,800
faster onboarding, lower duplication rates across projects.
1330
00:50:59,800 --> 00:51:02,800
The productivity gain and the quality gain compound together.
1331
00:51:02,800 --> 00:51:05,120
But predictive intelligence only reaches that potential
1332
00:51:05,120 --> 00:51:07,920
when the foundation we've built is solid, clean metadata,
1333
00:51:07,920 --> 00:51:10,920
governed ingestion, hardened permissions, resolved entities,
1334
00:51:10,920 --> 00:51:13,600
every section of this architecture connects to this outcome.
1335
00:51:13,600 --> 00:51:15,040
None of it works in isolation
1336
00:51:15,040 --> 00:51:16,720
and none of it can be skipped in favor
1337
00:51:16,720 --> 00:51:19,000
of moving faster to the impressive output.
1338
00:51:19,000 --> 00:51:21,800
The path to that destination is a phased implementation
1339
00:51:21,800 --> 00:51:24,440
and phase one is where most organizations lose the plot.
1340
00:51:24,440 --> 00:51:28,160
The implementation roadmap, phase one,
1341
00:51:28,160 --> 00:51:29,680
phase one has one job,
1342
00:51:29,680 --> 00:51:31,440
build the foundation without cutting corners
1343
00:51:31,440 --> 00:51:34,280
to get to the demo faster, that instinct.
1344
00:51:34,280 --> 00:51:36,080
To skip ahead to the visible output,
1345
00:51:36,080 --> 00:51:37,280
the impressive proof of concept,
1346
00:51:37,280 --> 00:51:39,800
the thing you can show and executive in a slide deck,
1347
00:51:39,800 --> 00:51:44,120
is the single most reliable predictor of a phase three failure.
1348
00:51:44,120 --> 00:51:46,240
Organizations that rush the foundation
1349
00:51:46,240 --> 00:51:48,440
spend the back half of their implementation,
1350
00:51:48,440 --> 00:51:50,520
fixing problems they created in the front half
1351
00:51:50,520 --> 00:51:52,120
and they fix them under pressure
1352
00:51:52,120 --> 00:51:54,320
with live users already drawing conclusions
1353
00:51:54,320 --> 00:51:57,360
from a system that was never properly grounded.
1354
00:51:57,360 --> 00:51:58,800
The sequence matters.
1355
00:51:58,800 --> 00:52:01,600
Strategy and use case definition come first.
1356
00:52:01,600 --> 00:52:04,600
Before any configuration, before any ontology work,
1357
00:52:04,600 --> 00:52:06,160
before any data preparation,
1358
00:52:06,160 --> 00:52:07,600
not because process requires it
1359
00:52:07,600 --> 00:52:09,440
but because everything downstream depends
1360
00:52:09,440 --> 00:52:12,640
on having a specific measurable problem to solve.
1361
00:52:12,640 --> 00:52:14,920
Build a knowledge graph is not a use case.
1362
00:52:14,920 --> 00:52:16,600
Reduce time to insight for account teams
1363
00:52:16,600 --> 00:52:19,240
preparing quarterly business reviews by 40%
1364
00:52:19,240 --> 00:52:20,240
is a use case.
1365
00:52:20,240 --> 00:52:22,520
The difference is that the second version tells you exactly
1366
00:52:22,520 --> 00:52:24,760
which entities to model, which content to enrich,
1367
00:52:24,760 --> 00:52:26,040
which permissions to audit
1368
00:52:26,040 --> 00:52:28,440
and which metrics will tell you whether the work succeeded.
1369
00:52:28,440 --> 00:52:29,720
The first version tells you nothing
1370
00:52:29,720 --> 00:52:32,120
except that someone approved a budget line.
1371
00:52:32,120 --> 00:52:34,200
Identify three to five high value use cases
1372
00:52:34,200 --> 00:52:36,000
tied to executive KPIs before writing
1373
00:52:36,000 --> 00:52:37,440
a single line of configuration.
1374
00:52:37,440 --> 00:52:39,840
Three to five is a deliberate range, fewer than three
1375
00:52:39,840 --> 00:52:41,560
and you risk building something too narrow
1376
00:52:41,560 --> 00:52:44,320
to demonstrate organizational value more than five
1377
00:52:44,320 --> 00:52:45,880
and you've already started boiling the ocean
1378
00:52:45,880 --> 00:52:47,160
before phase one is complete.
1379
00:52:47,160 --> 00:52:49,320
Establish baselines before anything changes.
1380
00:52:49,320 --> 00:52:52,440
Average time knowledge workers spend searching per week.
1381
00:52:52,440 --> 00:52:54,720
Rate and cost of duplicated work.
1382
00:52:54,720 --> 00:52:56,720
How often are teams recreating analysis
1383
00:52:56,720 --> 00:52:59,320
that already exists somewhere in the content estate?
1384
00:52:59,320 --> 00:53:01,120
On boarding time to full productivity
1385
00:53:01,120 --> 00:53:03,720
for the roles most affected by information friction.
1386
00:53:03,720 --> 00:53:06,080
These numbers exist in your organization right now.
1387
00:53:06,080 --> 00:53:07,640
They're just not being measured.
1388
00:53:07,640 --> 00:53:10,160
Capturing them before the implementation begins
1389
00:53:10,160 --> 00:53:13,000
is the only way to produce a credible ROI case
1390
00:53:13,000 --> 00:53:15,760
when phase three asks you to justify the investment.
1391
00:53:15,760 --> 00:53:17,440
Define the minimal viable ontology
1392
00:53:17,440 --> 00:53:20,120
for your first use case and hold the line on scope.
1393
00:53:20,120 --> 00:53:22,040
That specific constraint, model only
1394
00:53:22,040 --> 00:53:23,880
what the first use case requires
1395
00:53:23,880 --> 00:53:26,000
will be challenged repeatedly during phase one.
1396
00:53:26,000 --> 00:53:27,840
Someone will point out that modeling customers
1397
00:53:27,840 --> 00:53:30,160
without also modeling contracts is incomplete.
1398
00:53:30,160 --> 00:53:32,160
Someone will argue that project entities need
1399
00:53:32,160 --> 00:53:34,160
to connect to budget entities or the graph
1400
00:53:34,160 --> 00:53:36,400
won't reflect how the business actually works.
1401
00:53:36,400 --> 00:53:38,120
These arguments aren't wrong in principle.
1402
00:53:38,120 --> 00:53:39,440
They're wrong in phase one.
1403
00:53:39,440 --> 00:53:41,760
Every scope addition extends the timeline,
1404
00:53:41,760 --> 00:53:44,160
adds complexity to the quality validation work
1405
00:53:44,160 --> 00:53:46,840
and delays the moment when a real user gets a real answer
1406
00:53:46,840 --> 00:53:48,920
from a production system, ship something,
1407
00:53:48,920 --> 00:53:51,920
then expand based on what the usage data actually shows
1408
00:53:51,920 --> 00:53:53,000
you needs expanding.
1409
00:53:53,000 --> 00:53:54,760
The most critical governance action in phase one
1410
00:53:54,760 --> 00:53:57,240
isn't a governance task in the traditional sense.
1411
00:53:57,240 --> 00:53:58,800
It's the tenant-wide permissions ordered
1412
00:53:58,800 --> 00:54:01,360
using SharePoint Data Access Governance Reports.
1413
00:54:01,360 --> 00:54:02,760
Run it before you build anything.
1414
00:54:02,760 --> 00:54:05,040
Map the gap between what co-pilot would currently see
1415
00:54:05,040 --> 00:54:08,040
across your content estate and what your security posture says
1416
00:54:08,040 --> 00:54:08,640
it should see.
1417
00:54:08,640 --> 00:54:10,880
That gap is the single most important input
1418
00:54:10,880 --> 00:54:14,240
to your hardening roadmap and it almost always contains surprises.
1419
00:54:14,240 --> 00:54:16,400
Overshared sites, nobody remembered.
1420
00:54:16,400 --> 00:54:19,240
Broad access groups from projects that closed two years ago,
1421
00:54:19,240 --> 00:54:21,840
content with no owner and no life cycle policy,
1422
00:54:21,840 --> 00:54:23,600
sitting in libraries, the semantic index
1423
00:54:23,600 --> 00:54:25,880
will treat as authoritative.
1424
00:54:25,880 --> 00:54:27,920
Finding those surprises in phase one
1425
00:54:27,920 --> 00:54:30,320
before the semantic layer is built on top of them
1426
00:54:30,320 --> 00:54:32,040
costs almost nothing to fix.
1427
00:54:32,040 --> 00:54:34,480
Finding them in phase three after users are relying
1428
00:54:34,480 --> 00:54:37,280
on co-pilot outputs derived from that content costs
1429
00:54:37,280 --> 00:54:38,320
significantly more.
1430
00:54:38,320 --> 00:54:39,920
Phase one establishes the foundation.
1431
00:54:39,920 --> 00:54:41,800
Phase two is where the architecture starts
1432
00:54:41,800 --> 00:54:43,720
delivering the value that justifies it.
1433
00:54:43,720 --> 00:54:46,120
The implementation roadmap, phase two.
1434
00:54:46,120 --> 00:54:48,240
Phase two is where the architecture transitions
1435
00:54:48,240 --> 00:54:49,720
from foundation to function.
1436
00:54:49,720 --> 00:54:50,840
The permissions are clean.
1437
00:54:50,840 --> 00:54:52,200
The baseline metrics are captured.
1438
00:54:52,200 --> 00:54:54,360
The minimal viable ontology exists on paper.
1439
00:54:54,360 --> 00:54:56,680
Now the work shifts from preparation to construction
1440
00:54:56,680 --> 00:54:58,680
and the discipline required is different.
1441
00:54:58,680 --> 00:55:01,440
Less about what you exclude and more about what you build
1442
00:55:01,440 --> 00:55:05,440
with enough precision to actually deliver a result users trust.
1443
00:55:05,440 --> 00:55:08,200
The central activity in phase two is metadata enrichment
1444
00:55:08,200 --> 00:55:10,760
and it deserves more respect than it typically gets.
1445
00:55:10,760 --> 00:55:11,880
This isn't data entry.
1446
00:55:11,880 --> 00:55:15,120
It's the highest leverage work in the entire implementation.
1447
00:55:15,120 --> 00:55:17,080
Every hour spent ensuring that the content
1448
00:55:17,080 --> 00:55:20,080
feeding your pilot use case carries consistent provenance
1449
00:55:20,080 --> 00:55:23,040
fields, accurate authority signals and correct sensitivity
1450
00:55:23,040 --> 00:55:25,720
classifications compounds forward into every query
1451
00:55:25,720 --> 00:55:28,040
the system processes from that point on.
1452
00:55:28,040 --> 00:55:31,000
Retrieval quality isn't set once at configuration time.
1453
00:55:31,000 --> 00:55:33,280
It's set by the quality of what's in the index
1454
00:55:33,280 --> 00:55:35,560
and what's in the index reflects the metadata decisions
1455
00:55:35,560 --> 00:55:36,760
made during enrichment.
1456
00:55:36,760 --> 00:55:39,320
Get this right in phase two and everything downstream
1457
00:55:39,320 --> 00:55:39,960
gets easier.
1458
00:55:39,960 --> 00:55:42,520
Rush it and you spend phase three debugging answers
1459
00:55:42,520 --> 00:55:44,880
that are subtly wrong in ways that erode user trust
1460
00:55:44,880 --> 00:55:46,720
faster than obvious errors do.
1461
00:55:46,720 --> 00:55:48,680
Microsoft syntax taxonomy tagging
1462
00:55:48,680 --> 00:55:52,240
has a specific role in phase two and the sequencing matters.
1463
00:55:52,240 --> 00:55:54,880
The term store needs to be governed clean, consistent,
1464
00:55:54,880 --> 00:55:58,200
actively maintained before you enable syntax on any library
1465
00:55:58,200 --> 00:56:00,920
set as we established in the ingestion section the automation
1466
00:56:00,920 --> 00:56:02,680
amplifies what's underneath it.
1467
00:56:02,680 --> 00:56:06,440
Enable syntax on a focused set of libraries validate the tag
1468
00:56:06,440 --> 00:56:09,440
quality against your pilot use case requirements and expand
1469
00:56:09,440 --> 00:56:12,600
only after the taxonomy has proven reliable in that
1470
00:56:12,600 --> 00:56:13,760
constraint scope.
1471
00:56:13,760 --> 00:56:15,240
This is not a tenant wide toggle.
1472
00:56:15,240 --> 00:56:17,920
It's a targeted enrichment tool for specific content sets
1473
00:56:17,920 --> 00:56:19,760
that are already well structured.
1474
00:56:19,760 --> 00:56:22,320
Co-pilot studio agents built in phase two
1475
00:56:22,320 --> 00:56:24,400
should be scope narrowly and deliberately.
1476
00:56:24,400 --> 00:56:27,600
Each agent gets access to specific well curated knowledge
1477
00:56:27,600 --> 00:56:30,280
sources, not the full content estate,
1478
00:56:30,280 --> 00:56:32,200
not even the full enriched corpus,
1479
00:56:32,200 --> 00:56:34,800
but the subset directly relevant to the pilot use case
1480
00:56:34,800 --> 00:56:36,000
is designed to serve.
1481
00:56:36,000 --> 00:56:38,640
This is both a security decision and a quality decision.
1482
00:56:38,640 --> 00:56:41,680
Narrow scope means fewer irrelevant retrieval candidates.
1483
00:56:41,680 --> 00:56:44,480
Fewer irrelevant candidates means better answer precision.
1484
00:56:44,480 --> 00:56:46,800
Better answer precision means users keep coming back
1485
00:56:46,800 --> 00:56:49,520
rather than deciding early that the system can't be trusted.
1486
00:56:49,520 --> 00:56:51,920
The agent that covers less ground confidently outperforms
1487
00:56:51,920 --> 00:56:54,840
the agent that attempts everything and hedges constantly.
1488
00:56:54,840 --> 00:56:57,480
Measurement infrastructure belongs in phase two from day one,
1489
00:56:57,480 --> 00:57:00,120
not as an afterthought added once something feels wrong.
1490
00:57:00,120 --> 00:57:02,400
Instrument the pilot with search success rate tracking
1491
00:57:02,400 --> 00:57:05,000
time to answer logging, user satisfaction scoring
1492
00:57:05,000 --> 00:57:06,960
and citation quality review from the moment
1493
00:57:06,960 --> 00:57:10,440
the first real user asks the first real question.
1494
00:57:10,440 --> 00:57:12,360
The baseline numbers you captured in phase one
1495
00:57:12,360 --> 00:57:15,200
have no value unless you're actively measuring against them.
1496
00:57:15,200 --> 00:57:17,480
Without that comparison, you can't demonstrate progress
1497
00:57:17,480 --> 00:57:19,880
to the stakeholders who need to fund phase three
1498
00:57:19,880 --> 00:57:22,160
and you can't identify the specific failure modes
1499
00:57:22,160 --> 00:57:24,440
that deserve attention before you scale.
1500
00:57:24,440 --> 00:57:27,320
Hallucination audits are a phase two responsibility
1501
00:57:27,320 --> 00:57:30,440
that most implementations defer and later regret.
1502
00:57:30,440 --> 00:57:33,040
Build label test sets during the pilot.
1503
00:57:33,040 --> 00:57:35,240
Curated questions with known correct answers
1504
00:57:35,240 --> 00:57:37,640
drawn from your governed content run those tests
1505
00:57:37,640 --> 00:57:40,520
before and after each significant metadata change,
1506
00:57:40,520 --> 00:57:43,120
log which documents were retrieved for each query
1507
00:57:43,120 --> 00:57:45,000
and track the ones that produced poor answers
1508
00:57:45,000 --> 00:57:47,120
back to their metadata characteristics.
1509
00:57:47,120 --> 00:57:48,840
This isn't academic quality assurance.
1510
00:57:48,840 --> 00:57:50,600
It's the feedback loop that tells you exactly
1511
00:57:50,600 --> 00:57:53,120
which fields in your schema have the most impact
1512
00:57:53,120 --> 00:57:55,120
on retrieval reliability, which shapes
1513
00:57:55,120 --> 00:57:57,880
every enrichment priority in the weeks that follow.
1514
00:57:57,880 --> 00:57:59,160
The pilot proves the model.
1515
00:57:59,160 --> 00:58:01,160
Phase three is where you find out whether the governance
1516
00:58:01,160 --> 00:58:05,200
discipline that made the pilot work can survive at scale.
1517
00:58:05,200 --> 00:58:07,440
The implementation roadmap, phase three.
1518
00:58:07,440 --> 00:58:09,680
Phase three is where organizations discover whether they
1519
00:58:09,680 --> 00:58:11,120
build a system or a project.
1520
00:58:11,120 --> 00:58:13,440
Projects have endpoints, systems have owners.
1521
00:58:13,440 --> 00:58:15,800
The difference between the two shows up clearly at scale
1522
00:58:15,800 --> 00:58:18,240
when the governance discipline that kept the pilot clean
1523
00:58:18,240 --> 00:58:20,480
either holds under the pressure of broader deployment
1524
00:58:20,480 --> 00:58:24,160
or quietly dissolves as attention moves to the next initiative.
1525
00:58:24,160 --> 00:58:26,000
Scale in phase three means extending knowledge
1526
00:58:26,000 --> 00:58:27,680
graph coverage to additional domains
1527
00:58:27,680 --> 00:58:29,400
and the order in which you extend matters
1528
00:58:29,400 --> 00:58:31,560
as much as the decision to extend at all.
1529
00:58:31,560 --> 00:58:33,280
Don't choose the next domain based on who
1530
00:58:33,280 --> 00:58:35,360
asks loudest or which executive sponsor
1531
00:58:35,360 --> 00:58:36,880
has the most political capital.
1532
00:58:36,880 --> 00:58:39,640
Choose based on what the usage data from the pilot actually
1533
00:58:39,640 --> 00:58:42,680
shows, which entities are being traversed most heavily,
1534
00:58:42,680 --> 00:58:44,400
which relationship types are generating
1535
00:58:44,400 --> 00:58:47,080
the most downstream value, which metadata fields
1536
00:58:47,080 --> 00:58:49,560
are appearing in every high performing retrieval path.
1537
00:58:49,560 --> 00:58:52,160
The graph itself is telling you where to invest next.
1538
00:58:52,160 --> 00:58:54,320
Follow the signal from real queries rather
1539
00:58:54,320 --> 00:58:56,360
than the assumptions from original planning.
1540
00:58:56,360 --> 00:58:58,960
The ROI assessment belongs in phase three,
1541
00:58:58,960 --> 00:59:01,000
and it needs to cover more than adoption numbers.
1542
00:59:01,000 --> 00:59:02,960
License utilization and active user accounts
1543
00:59:02,960 --> 00:59:04,720
tell you that people opened the product.
1544
00:59:04,720 --> 00:59:06,560
They don't tell you that the product changed
1545
00:59:06,560 --> 00:59:08,600
how the organization makes decisions.
1546
00:59:08,600 --> 00:59:10,680
The assessment that matters covers time savings
1547
00:59:10,680 --> 00:59:13,040
against the baselines established in phase one,
1548
00:59:13,040 --> 00:59:15,760
cost reductions from duplicated work that didn't happen,
1549
00:59:15,760 --> 00:59:18,320
changes in wind rates for teams using graph-backed account
1550
00:59:18,320 --> 00:59:21,120
preparation, compliance metrics showing audit readiness
1551
00:59:21,120 --> 00:59:23,720
that previously required weeks of manual scramble
1552
00:59:23,720 --> 00:59:25,480
and co-pilot answer quality scores
1553
00:59:25,480 --> 00:59:28,360
tracked against the hallucination baseline from phase two.
1554
00:59:28,360 --> 00:59:29,960
Build the case from those dimensions.
1555
00:59:29,960 --> 00:59:33,080
Adoption is a prerequisite for impact, not a substitute for it.
1556
00:59:33,080 --> 00:59:34,960
Agent governance becomes a distinct challenge
1557
00:59:34,960 --> 00:59:37,520
at phase three scale that doesn't exist at pilot scale.
1558
00:59:37,520 --> 00:59:39,720
During the pilot you had one or two agents
1559
00:59:39,720 --> 00:59:42,120
scoped tightly, monitored closely.
1560
00:59:42,120 --> 00:59:44,840
At enterprise scale, every team with a business problem
1561
00:59:44,840 --> 00:59:47,440
and a co-pilot studio license wants to build an agent.
1562
00:59:47,440 --> 00:59:49,760
Without a formal request and approval workflow,
1563
00:59:49,760 --> 00:59:52,560
that proliferation recreates inside the AI layer,
1564
00:59:52,560 --> 00:59:55,040
the exact permission sprawl problem you spend phase one,
1565
00:59:55,040 --> 00:59:57,320
eliminating at the content layer.
1566
00:59:57,320 --> 00:59:59,240
Each new agent is a new service identity
1567
00:59:59,240 --> 01:00:02,160
with its own access scope and its own retrieval behavior,
1568
01:00:02,160 --> 01:00:04,400
without governance over which agents get built,
1569
01:00:04,400 --> 01:00:06,160
which data sources they connect to,
1570
01:00:06,160 --> 01:00:07,920
and who approves new connections,
1571
01:00:07,920 --> 01:00:10,680
you end up with an ungoverned constellation of agents
1572
01:00:10,680 --> 01:00:12,880
that collectively expose more than any single one
1573
01:00:12,880 --> 01:00:14,360
was designed to access.
1574
01:00:14,360 --> 01:00:16,200
The structural answer is a center of excellence
1575
01:00:16,200 --> 01:00:18,720
that operates as a cross-functional governance board,
1576
01:00:18,720 --> 01:00:20,160
rather than an IT committee.
1577
01:00:20,160 --> 01:00:23,160
Security, compliance, legal and business stakeholders
1578
01:00:23,160 --> 01:00:24,560
needs seats at the same table
1579
01:00:24,560 --> 01:00:26,960
where ontology evolution decisions get made,
1580
01:00:26,960 --> 01:00:30,520
use case priorities get set, and incidents get reviewed.
1581
01:00:30,520 --> 01:00:31,800
This board shouldn't meet quarterly
1582
01:00:31,800 --> 01:00:34,080
to ratify decisions it has already made.
1583
01:00:34,080 --> 01:00:35,800
It should meet on a regular cadence.
1584
01:00:35,800 --> 01:00:39,040
Monthly at minimum, to actively shape the knowledge architecture
1585
01:00:39,040 --> 01:00:41,000
as the organization's needs evolve.
1586
01:00:41,000 --> 01:00:42,480
The ontology is a living artifact.
1587
01:00:42,480 --> 01:00:44,200
It should change when usage patterns shift,
1588
01:00:44,200 --> 01:00:45,400
when new domains get added,
1589
01:00:45,400 --> 01:00:48,360
when regulatory requirements create new classification needs.
1590
01:00:48,360 --> 01:00:49,720
The governance board is the mechanism
1591
01:00:49,720 --> 01:00:51,400
that keeps those changes intentional,
1592
01:00:51,400 --> 01:00:52,640
rather than accidental.
1593
01:00:52,640 --> 01:00:54,920
One failure pattern specific to phase three
1594
01:00:54,920 --> 01:00:57,360
is treating the organizational lift as complete,
1595
01:00:57,360 --> 01:00:58,960
because the technology is running,
1596
01:00:58,960 --> 01:01:02,320
the cultural change required to sustain a knowledge graph at scale.
1597
01:01:02,320 --> 01:01:04,680
Employees filing content in govern spaces,
1598
01:01:04,680 --> 01:01:06,720
data owners completing access reviews,
1599
01:01:06,720 --> 01:01:08,520
teams maintaining metadata quality
1600
01:01:08,520 --> 01:01:10,040
as a normal operating habit,
1601
01:01:10,040 --> 01:01:11,960
doesn't consolidate during a pilot.
1602
01:01:11,960 --> 01:01:14,480
It consolidates over months of consistent reinforcement,
1603
01:01:14,480 --> 01:01:16,480
visible leadership and friction reduction.
1604
01:01:16,480 --> 01:01:18,520
The roadmap gets you to enterprise scale.
1605
01:01:18,520 --> 01:01:21,440
The metrics are what keep you honest once you're there.
1606
01:01:21,440 --> 01:01:23,760
ROI metrics that actually matter.
1607
01:01:23,760 --> 01:01:26,160
Most organizations measure co-pilot adoption
1608
01:01:26,160 --> 01:01:27,520
and call it ROI.
1609
01:01:27,520 --> 01:01:29,520
Active users, prompts per day.
1610
01:01:29,520 --> 01:01:31,720
Percentage of license seats that opened the product
1611
01:01:31,720 --> 01:01:33,560
at least once in a 30-day window,
1612
01:01:33,560 --> 01:01:35,080
these numbers go into a slide deck,
1613
01:01:35,080 --> 01:01:36,760
get presented to a leadership team,
1614
01:01:36,760 --> 01:01:38,280
and get interpreted as evidence
1615
01:01:38,280 --> 01:01:39,720
that the investment is working.
1616
01:01:39,720 --> 01:01:40,400
They're not.
1617
01:01:40,400 --> 01:01:42,160
Measuring how many times someone uses search bar
1618
01:01:42,160 --> 01:01:43,560
and calling it productivity improvement
1619
01:01:43,560 --> 01:01:45,360
is the analytical equivalent of measuring
1620
01:01:45,360 --> 01:01:47,240
how many times someone opened a refrigerator
1621
01:01:47,240 --> 01:01:48,760
and calling it nutrition.
1622
01:01:48,760 --> 01:01:50,920
The behavior and the outcome are not the same thing.
1623
01:01:50,920 --> 01:01:53,360
The ROI framework that maps to real business value
1624
01:01:53,360 --> 01:01:55,960
runs across five dimensions, hard savings,
1625
01:01:55,960 --> 01:01:57,400
actual cost reductions from work
1626
01:01:57,400 --> 01:01:59,960
that no longer requires the same labor input.
1627
01:01:59,960 --> 01:02:01,400
Productivity value.
1628
01:02:01,400 --> 01:02:03,880
Time recovered from friction that used to consume it,
1629
01:02:03,880 --> 01:02:06,000
quality gains, decisions made faster
1630
01:02:06,000 --> 01:02:07,960
and with better information than before.
1631
01:02:07,960 --> 01:02:10,520
Avoided costs, incidents that didn't happen,
1632
01:02:10,520 --> 01:02:12,320
duplicated projects that weren't funded,
1633
01:02:12,320 --> 01:02:14,640
compliance findings that didn't materialize,
1634
01:02:14,640 --> 01:02:17,320
and strategic benefits, competitive differentiation,
1635
01:02:17,320 --> 01:02:20,160
faster onboarding, knowledge that stays inside the organization
1636
01:02:20,160 --> 01:02:22,280
when people leave, every dimension matters.
1637
01:02:22,280 --> 01:02:24,760
Organizations that measure only the first one
1638
01:02:24,760 --> 01:02:26,000
miss most of the value
1639
01:02:26,000 --> 01:02:28,680
and most of the argument for continued investment.
1640
01:02:28,680 --> 01:02:30,360
Input metrics tell you about coverage,
1641
01:02:30,360 --> 01:02:32,480
how many content sources are connected to the graph,
1642
01:02:32,480 --> 01:02:34,080
how many entities are modeled,
1643
01:02:34,080 --> 01:02:36,160
how complete is the metadata across your priority
1644
01:02:36,160 --> 01:02:37,120
content sets.
1645
01:02:37,120 --> 01:02:38,680
These numbers matter because the graph
1646
01:02:38,680 --> 01:02:41,040
can only return value from what it contains.
1647
01:02:41,040 --> 01:02:43,720
A graph with 50% metadata completeness
1648
01:02:43,720 --> 01:02:46,920
on its core document corpus is running at half capacity
1649
01:02:46,920 --> 01:02:48,280
before a single query runs.
1650
01:02:48,280 --> 01:02:49,800
Coverage isn't the destination,
1651
01:02:49,800 --> 01:02:52,000
but it's the floor beneath everything else.
1652
01:02:52,000 --> 01:02:53,880
Adoption metrics tell you about usage.
1653
01:02:53,880 --> 01:02:56,200
Active users, query volume, repeat usage rates,
1654
01:02:56,200 --> 01:02:58,160
department coverage, these are necessary,
1655
01:02:58,160 --> 01:02:59,200
but not sufficient.
1656
01:02:59,200 --> 01:03:01,120
Adoption without impact is a vanity metric
1657
01:03:01,120 --> 01:03:02,640
with a license cost attached.
1658
01:03:02,640 --> 01:03:04,560
The question adoption answers is whether people
1659
01:03:04,560 --> 01:03:05,680
are trying the system.
1660
01:03:05,680 --> 01:03:07,840
The question that matters is whether trying it is changing
1661
01:03:07,840 --> 01:03:08,600
anything.
1662
01:03:08,600 --> 01:03:11,200
Efficiency metrics are where the real signal lives.
1663
01:03:11,200 --> 01:03:12,960
Minutes saved per task.
1664
01:03:12,960 --> 01:03:17,080
Not estimated, but measured against the baselines from phase one.
1665
01:03:17,080 --> 01:03:19,240
Faster resolution times on the specific workflows
1666
01:03:19,240 --> 01:03:21,000
the pilot was designed to accelerate.
1667
01:03:21,000 --> 01:03:23,480
Fewer escalations to subject matter experts
1668
01:03:23,480 --> 01:03:26,400
because answers are accessible without human intermediation.
1669
01:03:26,400 --> 01:03:29,680
Reduced time preparing for audits or compliance assessments
1670
01:03:29,680 --> 01:03:31,920
because the graph can surface the relevant evidence set
1671
01:03:31,920 --> 01:03:33,400
in seconds rather than days.
1672
01:03:33,400 --> 01:03:35,840
These are the numbers that connect the technology investment
1673
01:03:35,840 --> 01:03:38,320
to the operational reality of the people using it.
1674
01:03:38,320 --> 01:03:41,280
Quality metrics are the leading indicator of long term trust.
1675
01:03:41,280 --> 01:03:42,880
Search relevant scores.
1676
01:03:42,880 --> 01:03:45,560
Answer accuracy tracked against the labeled test sets
1677
01:03:45,560 --> 01:03:46,960
built during phase two.
1678
01:03:46,960 --> 01:03:48,120
Citation quality.
1679
01:03:48,120 --> 01:03:50,080
Can every answer trace back to a specific,
1680
01:03:50,080 --> 01:03:51,880
verifiable, govern source?
1681
01:03:51,880 --> 01:03:54,640
Hallucination rate measured as the fraction of answers
1682
01:03:54,640 --> 01:03:57,080
containing unsupported claims.
1683
01:03:57,080 --> 01:04:00,080
And the abstention rate, how often does the system correctly say
1684
01:04:00,080 --> 01:04:03,360
I don't know rather than generating a plausible sounding answer
1685
01:04:03,360 --> 01:04:05,200
with no supporting evidence?
1686
01:04:05,200 --> 01:04:08,240
That last metric deserves more attention than it typically gets.
1687
01:04:08,240 --> 01:04:10,520
A system that admits uncertainty appropriately
1688
01:04:10,520 --> 01:04:13,800
is more trustworthy than one that answers everything confidently.
1689
01:04:13,800 --> 01:04:15,960
And trustworthiness is the variable
1690
01:04:15,960 --> 01:04:19,840
that determines whether adoption converts to dependency.
1691
01:04:19,840 --> 01:04:22,200
The Forester Total Economic Impact Study
1692
01:04:22,200 --> 01:04:26,000
for a 25,000 employee enterprise found Microsoft co-pilot
1693
01:04:26,000 --> 01:04:29,440
delivered 116% ROI over three years
1694
01:04:29,440 --> 01:04:32,480
with nearly $20 million in net present value.
1695
01:04:32,480 --> 01:04:34,720
That number is real, but it carries a condition
1696
01:04:34,720 --> 01:04:36,640
that rarely makes it into the headline.
1697
01:04:36,640 --> 01:04:38,880
It assumes the underlying knowledge architecture
1698
01:04:38,880 --> 01:04:40,200
is built to support it.
1699
01:04:40,200 --> 01:04:42,280
That assumption is exactly what this entire episode
1700
01:04:42,280 --> 01:04:43,240
has been addressing.
1701
01:04:43,240 --> 01:04:44,640
The licenses are the easy part.
1702
01:04:44,640 --> 01:04:45,920
The architecture is the work.
1703
01:04:45,920 --> 01:04:47,960
The ROI compounds when the architecture is right
1704
01:04:47,960 --> 01:04:49,400
and evaporates when it isn't.
1705
01:04:49,400 --> 01:04:51,520
Metrics keeps the architecture honest.
1706
01:04:51,520 --> 01:04:54,240
But the numbers only tell you whether the system is working,
1707
01:04:54,240 --> 01:04:57,160
not whether the organization is ready to sustain it.
1708
01:04:57,160 --> 01:04:58,960
The organizational shift required.
1709
01:04:58,960 --> 01:05:00,720
The technology case for a knowledge graph
1710
01:05:00,720 --> 01:05:02,400
is relatively straightforward to make.
1711
01:05:02,400 --> 01:05:05,040
The organizational case is where implementation stall
1712
01:05:05,040 --> 01:05:07,240
and it stalls in a specific place.
1713
01:05:07,240 --> 01:05:09,840
The moment someone realizes that making this work permanently
1714
01:05:09,840 --> 01:05:12,920
requires changing how the organization thinks about information
1715
01:05:12,920 --> 01:05:16,800
as a shared resource rather than a personal or departmental asset.
1716
01:05:16,800 --> 01:05:19,080
Information architecture is now AI architecture.
1717
01:05:19,080 --> 01:05:20,560
That restatement isn't rhetorical.
1718
01:05:20,560 --> 01:05:23,880
It's a structural fact with direct operational consequences.
1719
01:05:23,880 --> 01:05:26,440
The SharePoint admin who decides whether to use a flat folder
1720
01:05:26,440 --> 01:05:29,200
structure or a content-type-driven library hierarchy,
1721
01:05:29,200 --> 01:05:31,320
whether to enforce required metadata fields
1722
01:05:31,320 --> 01:05:34,400
or leave them optional, whether to set document lifecycle
1723
01:05:34,400 --> 01:05:37,480
policies or let content accumulate indefinitely.
1724
01:05:37,480 --> 01:05:39,280
That person is now making decisions
1725
01:05:39,280 --> 01:05:42,200
that directly shape the trustworthiness of every AI output
1726
01:05:42,200 --> 01:05:43,800
the organization produces.
1727
01:05:43,800 --> 01:05:45,600
The stakes attached to those decisions
1728
01:05:45,600 --> 01:05:48,040
changed when co-pilot was deployed on top of them.
1729
01:05:48,040 --> 01:05:51,240
The role hasn't caught up to the stakes yet in most organizations.
1730
01:05:51,240 --> 01:05:53,920
The consequence of that gap is that AI governance decisions
1731
01:05:53,920 --> 01:05:56,120
are being made by default rather than by design.
1732
01:05:56,120 --> 01:05:58,280
Nobody decided that untagged one-drive documents
1733
01:05:58,280 --> 01:06:00,000
should feed the semantic index.
1734
01:06:00,000 --> 01:06:02,480
It happened because nobody decided they shouldn't.
1735
01:06:02,480 --> 01:06:05,000
Nobody decided that project workspaces should stay open
1736
01:06:05,000 --> 01:06:07,960
and fully permissioned for three years after the project closed.
1737
01:06:07,960 --> 01:06:09,960
It happened because the decision to close them
1738
01:06:09,960 --> 01:06:12,160
was never built into the lifecycle policy.
1739
01:06:12,160 --> 01:06:15,040
Default behavior in a system designed for human navigation
1740
01:06:15,040 --> 01:06:18,280
becomes deliberate policy in a system designed for AI retrieval.
1741
01:06:18,280 --> 01:06:20,200
Organizations that don't recognize that shift
1742
01:06:20,200 --> 01:06:22,640
are governing an architecture they don't think they're running.
1743
01:06:22,640 --> 01:06:24,600
Cross-functional collaboration on schema design
1744
01:06:24,600 --> 01:06:26,520
isn't a best practice to aspire to.
1745
01:06:26,520 --> 01:06:28,120
It's a prerequisite for accuracy.
1746
01:06:28,120 --> 01:06:29,920
I'd can build the technical container,
1747
01:06:29,920 --> 01:06:31,520
business units own the vocabulary.
1748
01:06:31,520 --> 01:06:34,120
Legal and compliance define what classification means
1749
01:06:34,120 --> 01:06:36,040
and what retention requires.
1750
01:06:36,040 --> 01:06:38,320
Knowledge managers understand how information actually
1751
01:06:38,320 --> 01:06:39,880
flows through the organization
1752
01:06:39,880 --> 01:06:42,240
as opposed to how the org chart suggested should.
1753
01:06:42,240 --> 01:06:44,040
No single function has a complete picture.
1754
01:06:44,040 --> 01:06:46,000
When schemas are designed by IT alone,
1755
01:06:46,000 --> 01:06:48,760
they reflect technical constraints rather than business reality
1756
01:06:48,760 --> 01:06:51,440
and the ontology ends up full of system terminology
1757
01:06:51,440 --> 01:06:53,960
that doesn't match how anyone actually describes their work.
1758
01:06:53,960 --> 01:06:55,800
The cultural dimension is harder to instrument,
1759
01:06:55,800 --> 01:06:57,360
but no less consequential.
1760
01:06:57,360 --> 01:06:59,920
Employees who understand why govern storage matters
1761
01:06:59,920 --> 01:07:02,200
behave differently from employees who experience it
1762
01:07:02,200 --> 01:07:03,920
as bureaucratic friction.
1763
01:07:03,920 --> 01:07:06,200
The framing that tends to land is this.
1764
01:07:06,200 --> 01:07:08,080
Storing a document in a governed library
1765
01:07:08,080 --> 01:07:10,840
with consistent metadata isn't compliance overhead.
1766
01:07:10,840 --> 01:07:13,960
It's a contribution to the organization's collective intelligence.
1767
01:07:13,960 --> 01:07:15,920
The document you file properly today
1768
01:07:15,920 --> 01:07:18,360
is the context that helps a colleague in a different time zone
1769
01:07:18,360 --> 01:07:20,200
make a better decision six months from now
1770
01:07:20,200 --> 01:07:21,960
in a distributed workforce where that colleague
1771
01:07:21,960 --> 01:07:24,520
can't walk across the office to ask a follow-up question.
1772
01:07:24,520 --> 01:07:27,360
The quality of what's in the graph is the quality of the connection.
1773
01:07:27,360 --> 01:07:30,880
Reducing friction is what actually changes behavior at scale.
1774
01:07:30,880 --> 01:07:34,120
Simple templates that pre-populate required fields.
1775
01:07:34,120 --> 01:07:37,320
Default sensitivity labels applied automatically by library location
1776
01:07:37,320 --> 01:07:39,240
suggested metadata surfaced by syntax
1777
01:07:39,240 --> 01:07:40,840
rather than demanded by a form.
1778
01:07:40,840 --> 01:07:42,520
The goal isn't to eliminate human judgment
1779
01:07:42,520 --> 01:07:44,160
about how to classify information.
1780
01:07:44,160 --> 01:07:46,040
It's to make exercising that judgment,
1781
01:07:46,040 --> 01:07:48,880
the path of least resistance rather than an extra step.
1782
01:07:48,880 --> 01:07:51,920
When the default behavior produces a compliant well-taged document,
1783
01:07:51,920 --> 01:07:53,680
most people follow the default.
1784
01:07:53,680 --> 01:07:56,680
When compliance requires deliberate extra effort, most people don't.
1785
01:07:56,680 --> 01:07:58,480
The AI governance board is the structure
1786
01:07:58,480 --> 01:08:01,600
that keeps all of this coherent as the organization grows,
1787
01:08:01,600 --> 01:08:04,360
as the technology evolves, and as the use case portfolio
1788
01:08:04,360 --> 01:08:07,280
expands beyond what any single team can oversee.
1789
01:08:07,280 --> 01:08:08,960
It owns the acceptable use policy.
1790
01:08:08,960 --> 01:08:11,360
It owns the data access and labeling strategy.
1791
01:08:11,360 --> 01:08:13,520
It owns the risk tolerance decisions that determine
1792
01:08:13,520 --> 01:08:16,640
which workloads get graph-backed AI access and which don't.
1793
01:08:16,640 --> 01:08:19,400
And it meets regularly, not at annual strategy reviews,
1794
01:08:19,400 --> 01:08:22,680
but on a cadence frequent enough to catch drift before it compounds.
1795
01:08:22,680 --> 01:08:24,160
The technology will keep changing.
1796
01:08:24,160 --> 01:08:27,880
The governance discipline is the only sustainable competitive advantage.
1797
01:08:27,880 --> 01:08:29,760
Failure modes to avoid.
1798
01:08:29,760 --> 01:08:32,880
Knowing the architecture is one thing, knowing where it breaks,
1799
01:08:32,880 --> 01:08:36,400
specifically, predictably, and repeatedly across organizations
1800
01:08:36,400 --> 01:08:39,080
that look like they were doing everything right.
1801
01:08:39,080 --> 01:08:43,200
Is what separates implementations that deliver from implementations
1802
01:08:43,200 --> 01:08:47,480
that get quietly decommissioned 18 months after launch?
1803
01:08:47,480 --> 01:08:49,240
The first failure is the one we've referenced
1804
01:08:49,240 --> 01:08:51,520
throughout this episode without dwelling on directly,
1805
01:08:51,520 --> 01:08:52,840
the boil, the ocean pattern.
1806
01:08:52,840 --> 01:08:54,440
A team spends four to six months designing
1807
01:08:54,440 --> 01:08:56,440
a comprehensive enterprise ontology.
1808
01:08:56,440 --> 01:08:59,480
Every entity type is modeled, every relationship is mapped,
1809
01:08:59,480 --> 01:09:01,200
every edge case is accounted for.
1810
01:09:01,200 --> 01:09:02,680
The term store is immaculate.
1811
01:09:02,680 --> 01:09:05,400
The documentation is thorough, and then nobody uses it
1812
01:09:05,400 --> 01:09:07,560
because the first production use case never shipped
1813
01:09:07,560 --> 01:09:10,120
and the window of executive enthusiasm closed
1814
01:09:10,120 --> 01:09:12,400
while the architecture team was still debating
1815
01:09:12,400 --> 01:09:16,680
whether deliverable and artifact should be separate node types.
1816
01:09:16,680 --> 01:09:20,120
The graph becomes technically impressive and operationally irrelevant.
1817
01:09:20,120 --> 01:09:22,240
The fix isn't to model less carefully.
1818
01:09:22,240 --> 01:09:24,440
It's to ship something before perfecting everything.
1819
01:09:24,440 --> 01:09:26,880
A minimal ontology supporting one working use case
1820
01:09:26,880 --> 01:09:29,920
outperforms a comprehensive ontology supporting zero.
1821
01:09:29,920 --> 01:09:31,800
The second failure looks different on the surface
1822
01:09:31,800 --> 01:09:35,480
but comes from the same root cause, building without users.
1823
01:09:35,480 --> 01:09:37,720
The technical team constructs a sophisticated graph
1824
01:09:37,720 --> 01:09:39,680
integrates it with teams and SharePoint,
1825
01:09:39,680 --> 01:09:42,480
demos it to stakeholders and declares it ready.
1826
01:09:42,480 --> 01:09:44,960
Adoption numbers come in at single digits.
1827
01:09:44,960 --> 01:09:46,720
The diagnosis is usually the same,
1828
01:09:46,720 --> 01:09:48,800
limited user research before design,
1829
01:09:48,800 --> 01:09:50,640
no co-design with the frontline staff
1830
01:09:50,640 --> 01:09:53,200
who were supposed to benefit, success criteria
1831
01:09:53,200 --> 01:09:54,720
defined in technical milestones,
1832
01:09:54,720 --> 01:09:56,920
rather than behavioral outcomes.
1833
01:09:56,920 --> 01:09:58,600
A graph that knowledge workers don't trust
1834
01:09:58,600 --> 01:10:00,320
or don't find in their natural workflow
1835
01:10:00,320 --> 01:10:01,520
doesn't change how they work.
1836
01:10:01,520 --> 01:10:03,280
It sits adjacent to how they work
1837
01:10:03,280 --> 01:10:04,680
and people root around it.
1838
01:10:04,680 --> 01:10:07,000
The third failure is a tool selection error
1839
01:10:07,000 --> 01:10:09,080
that's more common than the knowledge graph community
1840
01:10:09,080 --> 01:10:11,240
usually acknowledges full knowledge graphs
1841
01:10:11,240 --> 01:10:13,120
excel in specific conditions.
1842
01:10:13,120 --> 01:10:14,640
When explainability matters,
1843
01:10:14,640 --> 01:10:17,680
when complex multi-hop relationship traversal is required,
1844
01:10:17,680 --> 01:10:19,800
when contextual boundaries need to be enforced
1845
01:10:19,800 --> 01:10:21,560
at the entity level.
1846
01:10:21,560 --> 01:10:23,720
For generic semantic similarity,
1847
01:10:23,720 --> 01:10:26,560
finding documents that are topically related to a query,
1848
01:10:26,560 --> 01:10:28,480
vector search delivers most of the value
1849
01:10:28,480 --> 01:10:31,480
at a fraction of the implementation cost and complexity.
1850
01:10:31,480 --> 01:10:33,760
Organizations that reach for a full knowledge graph
1851
01:10:33,760 --> 01:10:36,280
when vector search or a well-structured search index
1852
01:10:36,280 --> 01:10:38,120
would have solved the problem faster,
1853
01:10:38,120 --> 01:10:39,720
haven't made a technical mistake.
1854
01:10:39,720 --> 01:10:41,200
They've made a scoping mistake.
1855
01:10:41,200 --> 01:10:43,480
The right architecture for the problem at hand
1856
01:10:43,480 --> 01:10:45,320
beats the most sophisticated architecture
1857
01:10:45,320 --> 01:10:46,640
for a problem you don't have.
1858
01:10:46,640 --> 01:10:48,600
The fourth failure is governance drift.
1859
01:10:48,600 --> 01:10:49,920
The taxonomy starts clean.
1860
01:10:49,920 --> 01:10:51,600
The metadata schema is consistent,
1861
01:10:51,600 --> 01:10:53,200
enrichment quality is high.
1862
01:10:53,200 --> 01:10:54,880
Then, six months into operation,
1863
01:10:54,880 --> 01:10:56,040
a team creates a new site
1864
01:10:56,040 --> 01:10:57,760
with a slightly different naming convention.
1865
01:10:57,760 --> 01:11:00,360
Another team starts filing documents directly in libraries
1866
01:11:00,360 --> 01:11:02,760
that were supposed to require content type selection.
1867
01:11:02,760 --> 01:11:04,640
A third team skips the approval workflow
1868
01:11:04,640 --> 01:11:06,640
for a new agent because the deadline was tight
1869
01:11:06,640 --> 01:11:07,760
and it seemed low-risk.
1870
01:11:07,760 --> 01:11:09,600
None of these individually breaks anything.
1871
01:11:09,600 --> 01:11:11,880
Collectively, they reconstitute the inconsistency
1872
01:11:11,880 --> 01:11:13,120
the initial cleanup removed,
1873
01:11:13,120 --> 01:11:15,560
and it accumulates silently until users start noticing
1874
01:11:15,560 --> 01:11:17,400
that answers have become less reliable.
1875
01:11:17,400 --> 01:11:20,080
By the time the degradation is visible in quality metrics,
1876
01:11:20,080 --> 01:11:22,080
the root cause is distributed across dozens
1877
01:11:22,080 --> 01:11:25,080
of small decisions that nobody flagged as consequential.
1878
01:11:25,080 --> 01:11:28,080
The fifth failure is the one that kills adoption fastest,
1879
01:11:28,080 --> 01:11:29,120
building the knowledge graph
1880
01:11:29,120 --> 01:11:32,000
as a standalone experience rather than an embedded one,
1881
01:11:32,000 --> 01:11:35,320
a portal that requires users to navigate away from teams,
1882
01:11:35,320 --> 01:11:36,720
away from their CRM,
1883
01:11:36,720 --> 01:11:38,760
away from wherever their actual work lives,
1884
01:11:38,760 --> 01:11:40,600
to query a separate interface.
1885
01:11:40,600 --> 01:11:44,120
That portal gets used for demos and not much else.
1886
01:11:44,120 --> 01:11:46,200
Graph-powered intelligence needs to surface
1887
01:11:46,200 --> 01:11:47,560
where decisions get made,
1888
01:11:47,560 --> 01:11:50,280
not where the technology team prefers to expose it,
1889
01:11:50,280 --> 01:11:52,520
and the sixth as we've established in both the metadata
1890
01:11:52,520 --> 01:11:53,920
and the rag sections,
1891
01:11:53,920 --> 01:11:56,040
implementing either layer without the other.
1892
01:11:56,040 --> 01:11:58,280
Rag without metadata retrieves noisily.
1893
01:11:58,280 --> 01:12:00,960
Metadata without rag relies on generic model priors.
1894
01:12:00,960 --> 01:12:02,920
The dependency runs both directions,
1895
01:12:02,920 --> 01:12:05,320
organizations that treat them as alternatives,
1896
01:12:05,320 --> 01:12:08,520
instead of compliments consistently under perform organizations
1897
01:12:08,520 --> 01:12:11,640
that build them as a single coherent pipeline from the start.
1898
01:12:11,640 --> 01:12:14,080
What good looks like in 2026?
1899
01:12:14,080 --> 01:12:16,640
It's worth pausing here before the final section
1900
01:12:16,640 --> 01:12:18,880
to describe what the destination actually looks like
1901
01:12:18,880 --> 01:12:20,080
when you've done this right.
1902
01:12:20,080 --> 01:12:21,920
Not the theoretical capability,
1903
01:12:21,920 --> 01:12:23,600
the lived operational reality
1904
01:12:23,600 --> 01:12:26,240
that emerges when a knowledge graph is properly built,
1905
01:12:26,240 --> 01:12:29,360
governed and embedded into how work gets done.
1906
01:12:29,360 --> 01:12:32,520
In a well-implemented M365 knowledge graph environment,
1907
01:12:32,520 --> 01:12:34,440
co-pilot doesn't feel like a search tool
1908
01:12:34,440 --> 01:12:36,040
with a conversational interface.
1909
01:12:36,040 --> 01:12:37,440
It feels like an informed colleague
1910
01:12:37,440 --> 01:12:39,640
who has read everything, remembers everything,
1911
01:12:39,640 --> 01:12:42,200
and can connect anything to anything else in seconds.
1912
01:12:42,200 --> 01:12:43,800
The user asking, "What are the key risks
1913
01:12:43,800 --> 01:12:45,760
for the ACME renewal next quarter?"
1914
01:12:45,760 --> 01:12:48,040
Get's an answer that draws from the most recent contract.
1915
01:12:48,040 --> 01:12:50,480
The open support tickets logged in the last 90 days.
1916
01:12:50,480 --> 01:12:53,440
The risk flags raised in the last executive business review,
1917
01:12:53,440 --> 01:12:55,840
the competitive intelligence filed by the account team
1918
01:12:55,840 --> 01:12:56,680
three weeks ago,
1919
01:12:56,680 --> 01:12:58,600
and the product limitation documented
1920
01:12:58,600 --> 01:13:00,000
in the engineering tracker.
1921
01:13:00,000 --> 01:13:01,840
Not because someone built a custom integration
1922
01:13:01,840 --> 01:13:03,240
for that specific query,
1923
01:13:03,240 --> 01:13:05,200
because the graph connected those entities
1924
01:13:05,200 --> 01:13:06,600
when the content was ingested,
1925
01:13:06,600 --> 01:13:08,880
and the retrieval layer traversed those connections
1926
01:13:08,880 --> 01:13:10,160
when the question was asked.
1927
01:13:10,160 --> 01:13:11,520
That's not a demo scenario.
1928
01:13:11,520 --> 01:13:13,800
That's what happens when the architecture we've been building
1929
01:13:13,800 --> 01:13:16,080
throughout this episode reaches production maturity.
1930
01:13:16,080 --> 01:13:19,040
New hire onboarding is one of the most visible transformations
1931
01:13:19,040 --> 01:13:21,000
because the friction is so tangible before
1932
01:13:21,000 --> 01:13:23,120
and so dramatically reduced after.
1933
01:13:23,120 --> 01:13:25,920
A new account executive joining a global team today
1934
01:13:25,920 --> 01:13:27,360
spends weeks piecing together
1935
01:13:27,360 --> 01:13:29,600
institutional knowledge through a combination
1936
01:13:29,600 --> 01:13:32,200
of calendar invitations, introductory calls,
1937
01:13:32,200 --> 01:13:34,640
and asking colleagues who are already overloaded.
1938
01:13:34,640 --> 01:13:36,160
The knowledge lives in people's heads
1939
01:13:36,160 --> 01:13:39,080
and in documents scattered across dozens of sharepoint sites
1940
01:13:39,080 --> 01:13:40,840
that nobody gave them a map to.
1941
01:13:40,840 --> 01:13:42,880
With a graph backed onboarding path,
1942
01:13:42,880 --> 01:13:45,040
that same person can follow a structured traversal
1943
01:13:45,040 --> 01:13:47,400
from their role definition through the systems they need
1944
01:13:47,400 --> 01:13:50,560
to access, the canonical documents that govern their work,
1945
01:13:50,560 --> 01:13:52,840
and the people who are the authoritative sources
1946
01:13:52,840 --> 01:13:54,720
on each domain.
1947
01:13:54,720 --> 01:13:56,160
Hours, not weeks.
1948
01:13:56,160 --> 01:13:57,800
The graph replaces the ambient knowledge
1949
01:13:57,800 --> 01:14:00,760
that co-located environments transferred through proximity
1950
01:14:00,760 --> 01:14:03,320
and it does it at any hour in any time zone
1951
01:14:03,320 --> 01:14:06,280
without burdening a single senior team member.
1952
01:14:06,280 --> 01:14:08,200
Compliance and audit preparation undergoes a similar
1953
01:14:08,200 --> 01:14:09,160
structural shift.
1954
01:14:09,160 --> 01:14:11,920
The question show me all sites storing customer PII
1955
01:14:11,920 --> 01:14:14,240
that aren't covered by retention policy X
1956
01:14:14,240 --> 01:14:17,400
used to require a multi-week manual review involving IT,
1957
01:14:17,400 --> 01:14:19,920
legal, and compliance working through spreadsheets.
1958
01:14:19,920 --> 01:14:21,240
In a graph where content objects
1959
01:14:21,240 --> 01:14:23,560
carry semantically rich governance metadata,
1960
01:14:23,560 --> 01:14:25,400
classification, residency policy,
1961
01:14:25,400 --> 01:14:27,920
retention assignment, sensitivity label,
1962
01:14:27,920 --> 01:14:29,880
that question becomes a graph query,
1963
01:14:29,880 --> 01:14:31,680
the answer surfaces in seconds.
1964
01:14:31,680 --> 01:14:33,560
The gap between what's covered and what isn't
1965
01:14:33,560 --> 01:14:36,440
is visible immediately, not at the end of a project.
1966
01:14:36,440 --> 01:14:38,520
Audit readiness stops being a periodic scramble
1967
01:14:38,520 --> 01:14:40,120
and becomes a permanent state.
1968
01:14:40,120 --> 01:14:42,280
The hybrid workforce gains are structural rather
1969
01:14:42,280 --> 01:14:44,240
than incremental because the graph addresses
1970
01:14:44,240 --> 01:14:47,160
the specific friction that hybrid work creates
1971
01:14:47,160 --> 01:14:49,600
and that no other tool has adequately solved,
1972
01:14:49,600 --> 01:14:52,720
asynchronous access to organizational context.
1973
01:14:52,720 --> 01:14:54,600
As the Trip.com research showed,
1974
01:14:54,600 --> 01:14:57,760
hybrid models hold productivity and improve retention,
1975
01:14:57,760 --> 01:15:00,280
but information friction is the remaining constraint.
1976
01:15:00,280 --> 01:15:01,600
The graph removes that constraint
1977
01:15:01,600 --> 01:15:04,000
by making organizational knowledge accessible
1978
01:15:04,000 --> 01:15:06,360
regardless of where or when someone is working.
1979
01:15:06,360 --> 01:15:08,320
The competitive differentiation that emerges
1980
01:15:08,320 --> 01:15:11,480
from this architecture is harder to copy than a product feature
1981
01:15:11,480 --> 01:15:12,640
or a process improvement.
1982
01:15:12,640 --> 01:15:15,160
Any organization can buy copilot licenses.
1983
01:15:15,160 --> 01:15:16,960
The intelligence behind those licenses,
1984
01:15:16,960 --> 01:15:18,520
the clean graph, the govern metadata,
1985
01:15:18,520 --> 01:15:20,680
the resolved entities that trusted retrieval
1986
01:15:20,680 --> 01:15:23,000
took months to build and requires continuous discipline
1987
01:15:23,000 --> 01:15:24,040
to maintain.
1988
01:15:24,040 --> 01:15:26,080
And that depth doesn't replicate quickly.
1989
01:15:26,080 --> 01:15:27,560
It's the kind of structural advantage
1990
01:15:27,560 --> 01:15:29,920
that compounds quietly while competitors are still trying
1991
01:15:29,920 --> 01:15:31,920
to understand why their own deployments feel
1992
01:15:31,920 --> 01:15:33,480
like expensive search bars.
1993
01:15:33,480 --> 01:15:35,840
Explainable trusted AI, every answer
1994
01:15:35,840 --> 01:15:37,400
traceable to a govern source.
1995
01:15:37,400 --> 01:15:40,280
Uncertainty acknowledged rather than papered over.
1996
01:15:40,280 --> 01:15:41,280
That's the standard.
1997
01:15:41,280 --> 01:15:43,480
And the path there starts with one honest look at where
1998
01:15:43,480 --> 01:15:45,640
your architecture actually stands today.
1999
01:15:45,640 --> 01:15:46,840
Where to start tomorrow?
2000
01:15:46,840 --> 01:15:48,160
The first action isn't technical.
2001
01:15:48,160 --> 01:15:49,240
It's diagnostic.
2002
01:15:49,240 --> 01:15:51,400
Open the SharePoint Data Access Governance Reports
2003
01:15:51,400 --> 01:15:53,880
in your admin center, run them against your highest value
2004
01:15:53,880 --> 01:15:57,440
content estate and map the gap between what copilot can currently
2005
01:15:57,440 --> 01:16:00,440
see and what your security posture says it should see.
2006
01:16:00,440 --> 01:16:01,640
Don't interpret what you find.
2007
01:16:01,640 --> 01:16:04,120
Just documented the overshared sites, the broad access
2008
01:16:04,120 --> 01:16:06,400
groups, the content with no owner and no life cycle
2009
01:16:06,400 --> 01:16:06,920
policy.
2010
01:16:06,920 --> 01:16:09,400
That inventory is the starting condition for everything else.
2011
01:16:09,400 --> 01:16:10,840
And you can't accurately plan a route
2012
01:16:10,840 --> 01:16:12,360
without knowing where you're standing.
2013
01:16:12,360 --> 01:16:14,080
The second action is strategic.
2014
01:16:14,080 --> 01:16:16,560
And it requires sitting with a specific business pane
2015
01:16:16,560 --> 01:16:18,360
rather than a general capability goal.
2016
01:16:18,360 --> 01:16:21,200
Find one workflow where information friction is causing
2017
01:16:21,200 --> 01:16:22,520
a measurable problem.
2018
01:16:22,520 --> 01:16:24,280
Not our search could be better.
2019
01:16:24,280 --> 01:16:26,920
A specific team, a specific question, a specific answer
2020
01:16:26,920 --> 01:16:28,840
that currently takes three days to assemble
2021
01:16:28,840 --> 01:16:30,480
from 12 different places.
2022
01:16:30,480 --> 01:16:31,880
The account team that spends half a day
2023
01:16:31,880 --> 01:16:34,280
before every renewal gathering context that should
2024
01:16:34,280 --> 01:16:35,720
be instantly accessible.
2025
01:16:35,720 --> 01:16:37,800
The compliance function that scrambles for two weeks
2026
01:16:37,800 --> 01:16:39,320
before every ordered pulling evidence
2027
01:16:39,320 --> 01:16:41,560
that a governed graph could surface in minutes.
2028
01:16:41,560 --> 01:16:44,120
That concrete friction point is your first use case,
2029
01:16:44,120 --> 01:16:47,040
and it's the only scope that belongs in your phase one plan.
2030
01:16:47,040 --> 01:16:49,240
The third action connects those first two.
2031
01:16:49,240 --> 01:16:51,000
Define the minimal metadata schema
2032
01:16:51,000 --> 01:16:53,880
required to make retrieval reliable for that one problem.
2033
01:16:53,880 --> 01:16:56,680
Not the enterprise ontology, not the complete classification
2034
01:16:56,680 --> 01:16:59,240
framework, just the fields that, if consistently
2035
01:16:59,240 --> 01:17:01,960
populated on the content relevant to your first use case,
2036
01:17:01,960 --> 01:17:03,800
would give the retrieval layer enough signal
2037
01:17:03,800 --> 01:17:07,000
to return the right answer instead of a plausible guess.
2038
01:17:07,000 --> 01:17:11,880
Source, effective date, authority tier, topic, sensitivity,
2039
01:17:11,880 --> 01:17:15,200
five fields consistently applied to a focused content set
2040
01:17:15,200 --> 01:17:17,960
will outperform 50 fields applied inconsistently
2041
01:17:17,960 --> 01:17:19,400
across everything.
2042
01:17:19,400 --> 01:17:21,600
What most organizations discover when they work through
2043
01:17:21,600 --> 01:17:24,520
the sequence honestly is that the distance between their current
2044
01:17:24,520 --> 01:17:26,480
state and a production ready knowledge graph
2045
01:17:26,480 --> 01:17:28,520
isn't primarily a technology problem.
2046
01:17:28,520 --> 01:17:29,240
The tools are there.
2047
01:17:29,240 --> 01:17:31,240
Microsoft Graph is production capable.
2048
01:17:31,240 --> 01:17:33,800
The semantic index is running in your tenant right now,
2049
01:17:33,800 --> 01:17:36,640
purview, syntax, and co-pilot studio are all available
2050
01:17:36,640 --> 01:17:38,040
and mature enough to build on.
2051
01:17:38,040 --> 01:17:40,000
The gap is governance, metadata, discipline,
2052
01:17:40,000 --> 01:17:42,560
and clarity about what problem you're actually solving.
2053
01:17:42,560 --> 01:17:45,160
Those are organizational constraints, not technical ones,
2054
01:17:45,160 --> 01:17:47,960
and they don't resolve by adding licenses or deploying
2055
01:17:47,960 --> 01:17:50,480
more tooling on top of the existing foundation.
2056
01:17:50,480 --> 01:17:53,160
The organizations that will lead on this in the next two years
2057
01:17:53,160 --> 01:17:54,560
aren't the ones who moved fastest.
2058
01:17:54,560 --> 01:17:56,360
They're the ones who build deliberately.
2059
01:17:56,360 --> 01:17:58,680
They audited first, governed before they deployed,
2060
01:17:58,680 --> 01:18:00,840
started narrow and expanded based on evidence
2061
01:18:00,840 --> 01:18:02,600
and treated the knowledge infrastructure
2062
01:18:02,600 --> 01:18:06,160
as a long-lived organizational asset rather than an IT project
2063
01:18:06,160 --> 01:18:07,360
with a go-live date.
2064
01:18:07,360 --> 01:18:09,360
That discipline is available to any organization
2065
01:18:09,360 --> 01:18:10,760
willing to apply it.
2066
01:18:10,760 --> 01:18:13,640
The technology has been waiting for the foundation to catch up.
2067
01:18:13,640 --> 01:18:17,240
Start with the audit, find the use case, define the schema.
2068
01:18:17,240 --> 01:18:19,440
Everything else in this episode is the architecture
2069
01:18:19,440 --> 01:18:21,920
that scales from those three steps.
2070
01:18:21,920 --> 01:18:24,640
The move from siloed search to predictive intelligence
2071
01:18:24,640 --> 01:18:27,360
is an architecture decision, not a product decision.
2072
01:18:27,360 --> 01:18:28,960
It starts with the knowledge graph layer
2073
01:18:28,960 --> 01:18:31,720
underneath co-pilot, not with the prompts on top of it.
2074
01:18:31,720 --> 01:18:33,480
Three things to carry out of this episode.
2075
01:18:33,480 --> 01:18:35,360
Metadata is an AI safety control.
2076
01:18:35,360 --> 01:18:37,920
Governance is an operational discipline, not a project,
2077
01:18:37,920 --> 01:18:40,760
and the minimal viable ontology beats the perfect enterprise
2078
01:18:40,760 --> 01:18:42,800
ontology every time.
2079
01:18:42,800 --> 01:18:44,880
Your challenge this week, run one permissions
2080
01:18:44,880 --> 01:18:47,160
audit on your highest value share point site
2081
01:18:47,160 --> 01:18:49,640
and ask whether what co-pilot can currently see there
2082
01:18:49,640 --> 01:18:51,360
is what you actually wanted to see.
2083
01:18:51,360 --> 01:18:53,400
Then come tell me what you found, connect with me,
2084
01:18:53,400 --> 01:18:56,640
Mirko Peters on LinkedIn, share what your audit surfaces
2085
01:18:56,640 --> 01:18:59,320
or tell me the knowledge graph use case you're trying to solve.
2086
01:18:59,320 --> 01:19:02,480
I read every message and it directly shapes what gets covered next.
2087
01:19:02,480 --> 01:19:04,880
If this episode shifted how you think about your co-pilot
2088
01:19:04,880 --> 01:19:06,800
architecture, leave a review.
2089
01:19:06,800 --> 01:19:09,920
It helps other IT pros and decision makers find this podcast
2090
01:19:09,920 --> 01:19:11,800
and avoid the mistakes we covered today.

Founder of m365.fm, m365.show and m365con.net
Mirko Peters is a Microsoft 365 expert, content creator, and founder of m365.fm, a platform dedicated to sharing practical insights on modern workplace technologies. His work focuses on Microsoft 365 governance, security, collaboration, and real-world implementation strategies.
Through his podcast and written content, Mirko provides hands-on guidance for IT professionals, architects, and business leaders navigating the complexities of Microsoft 365. He is known for translating complex topics into clear, actionable advice, often highlighting common mistakes and overlooked risks in real-world environments.
With a strong emphasis on community contribution and knowledge sharing, Mirko is actively building a platform that connects experts, shares experiences, and helps organizations get the most out of their Microsoft 365 investments.









