June 13, 2026

The Death of the Dropdown: Why Manual Tagging is Killing Your Governance

Manual tagging is dead—and it’s quietly undermining your Microsoft 365 governance strategy.

In this episode, we explore why traditional metadata management based on dropdown menus, user-selected labels, and manual classification no longer works in modern organizations. The volume of content generated across SharePoint, Teams, OneDrive, Copilot, and Microsoft 365 has grown beyond what humans can reliably classify.

The problem isn’t that users are unwilling to tag content—it’s that manual tagging is inconsistent, incomplete, and impossible to scale. When metadata quality declines, governance suffers. Search results become unreliable, retention policies lose effectiveness, compliance controls weaken, and AI tools like Microsoft Copilot struggle to understand and protect organizational data.

The episode examines how Microsoft Purview and AI-powered classification are changing the game. Instead of relying on users to choose the correct label, modern governance systems can analyze content, context, sensitivity, ownership, and usage patterns automatically. This enables real-time classification, more accurate compliance enforcement, and better data discovery across the enterprise.

You’ll learn why governance must move from human-driven processes to system-driven intelligence, how automated classification improves security and compliance, and why organizations that continue relying on manual tagging are creating governance debt that becomes harder to fix over time.

If your governance strategy still depends on users selecting the right option from a dropdown menu, this episode explains why that model is failing—and what needs to replace it.

Manual dropdown tagging creates real challenges for governance. You face errors, inconsistency, and limited scalability when you depend on manual processes. The death of the dropdown signals a shift. Your data grows fast, and manual processes cannot keep up. Poor tagging leads to dark data, hiding important files from your organization. Compliance suffers, and your ai tools lose effectiveness. You need strong governance to manage data and support ai in your workplace.

Key Takeaways

Manual dropdown tagging leads to errors and inconsistencies, making data governance challenging.
Automation and AI can reduce human effort in data classification by up to 80%, improving accuracy significantly.
Implementing automated tagging helps organizations manage data efficiently and ensures compliance.
Decentralizing metadata ownership allows domain experts to manage tags, increasing accuracy and flexibility.
Real-time metadata injection keeps your tag management systems up to date and supports compliance.
Regular monitoring of governance outcomes helps identify gaps and improve data accuracy.
Training users on automated tools enhances their skills and builds a strong data governance culture.
A living metadata layer adapts to changes, ensuring your organization remains ready for future challenges.

Death of the Dropdown in Data Governance

Why Dropdowns Persist

You have seen dropdown menus and manual tagging dominate website tag governance for years. In Microsoft 365 environments, dropdowns became the standard for metadata governance. You relied on them to tag files, emails, and documents. This approach seemed simple at first. You could select a tag from a list and move on.

Dropdowns offered a quick solution for tagging data, but they created hidden risks for governance.

The article highlights the decline of dropdowns in governance practices.
Manual tagging in metadata governance within Microsoft 365 led to challenges that you cannot ignore.

You often skipped fields or guessed which tag to use. Sometimes you chose the default tag just to finish faster. This behavior caused inconsistent data and weakened governance. As your data grew, dropdowns failed to keep up. You faced more errors and lost control over website tag governance. The death of the dropdown signals that you need a new way to manage data.

The Shift Away from Manual Tagging

You now see a shift in data governance. Automation and ai-driven solutions are changing how you tag content. You no longer depend on manual tagging for website tag governance. Automation reduces human errors and improves data integrity. Ai-driven classification identifies and tags sensitive data automatically. You gain real-time updates across all governed content without manual reclassification.

Automation and ai can reduce human effort in data classification by up to 80%.
Accuracy improves by 25-40% compared to manual tagging.
The combination of automation and ai moves governance from a reactive process to continuous assurance.

You benefit from this shift. Your data gets tagged correctly and quickly. You do not need to remember to tag every file. Ai tools help you manage website tag governance and keep your data ready for compliance. The death of the dropdown means you can focus on your work while governance happens in the background. You unlock the power of your data and prepare for future ai advancements.

Website Tag Governance Challenges

Human Error and Inconsistency

Manual tag selection creates many problems for your data governance initiatives. You often face tedious and error-prone tasks when you tag files by hand. Studies show that manual metadata management leads to frequent mistakes. You may forget to tag a file, or you may choose the wrong tag. These errors lower data accuracy and make it hard for your organization to trust its data.

Skipped Fields

You sometimes skip fields when you tag content. This happens because you want to finish your work quickly. You may not see the value in every tag, or you may not know which tag fits best. Skipped fields leave gaps in your metadata. These gaps hurt data accuracy and weaken your governance. When you skip tags, your data becomes harder to find and use.

Guesswork and Defaults

You may guess which tag to use if you are unsure. You might pick the default tag just to move on. This guesswork lowers data accuracy and creates confusion. When you use the wrong tag, your data ends up in the wrong place. Your ai tools cannot find or use this data correctly. Over time, these mistakes add up and damage your data governance initiatives.

Fragmented Metadata

Fragmented metadata creates silos in your organization. You may use different tags for the same type of data. This leads to inconsistent definitions and low data discoverability. For example:

A fintech company, Naranja X, struggled with fragmented metadata. They used manual processes and Excel sheets to manage tags.
Their data governance initiatives became hard to scale and failed to meet regulations.
Fragmented metadata made it difficult for teams to make decisions and slowed down their ai projects.

When you have silos, your data accuracy drops. Your ai tools cannot connect related data. You lose the full value of your data governance initiatives.

Dark Data Risks

Dark data hides in your systems when you do not tag files correctly. This untagged data creates risks for your organization. You may store data longer than allowed by law, which can lead to fines. Regulators may penalize you for failing to secure or classify sensitive data. During legal cases, dark data can surface and cause unexpected problems.

Dark data often contains sensitive information like personal details, credentials, or financial records. Hackers target this data because it is unprotected and unmonitored.

Cybercriminals can exploit old archives or exposed file shares.
You may not know when someone accesses or steals dark data.
Attacks on dark data can lead to identity theft, data breaches, or reputational damage.
Legacy backups or archived emails may expose customer information, resulting in compliance failures.

You need strong governance and accurate tag practices to reduce these risks. Proper tagging supports your ai tools and keeps your data safe and compliant.

Data Governance Strategy for Modern Workplaces

Scaling Beyond Manual Processes

You need a data governance strategy that grows with your organization. Manual tagging cannot keep pace with the volume of data in cloud platforms like Microsoft 365. You must scale your governance by using automated tools and clear policies. This approach helps you manage data efficiently and reduces errors in tagging. You can use automation to tag files, emails, and documents as soon as they are created. This ensures that your data stays organized and ready for ai tools.

To build a scalable data governance strategy, you should:

Establish clear data usage policies to define how you use and share data.
Implement retention and deletion policies to manage the lifecycle of your data.
Create security and privacy policies to support compliance.
Set up governance escalation paths for handling exceptions or policy violations.
Standardize workspace creation with naming conventions and templates to ensure accountability.
Establish lifecycle management rules for archiving or deleting inactive workspaces.
Monitor data usage and enforce policies through regular reporting.
Implement scalable access review cycles and automated permissions tracking.

You can use these steps to create a strong data governance strategy that supports ai and compliance. Automation helps you tag data accurately and keeps your governance system running smoothly.

Compliance and Audit Gaps

Manual tagging creates gaps in compliance and audits. You may miss important files or fail to tag sensitive data. This can lead to problems during compliance audits. You need a data governance strategy that closes these gaps and keeps your data safe.

Here is a table showing common compliance and audit gaps caused by manual tagging:

Compliance and Audit Gap	Description
Reliance on static documentation	You may use outdated documentation that does not reflect current data, leading to compliance issues.
No real-time monitoring	Batch monitoring can leave breaches undetected for long periods, increasing vulnerability.
Limited lineage coverage	Incomplete data lineage can hinder audit readiness, especially when data moves across environments.
Manual audit preparation	Extensive manual work during audits creates operational burdens and delays compliance responses.

You must use automation to tag data in real time. This helps you meet compliance requirements and makes audits easier. Your data governance strategy should include tools that monitor data and tag files automatically. This reduces risks and supports ai readiness.

Ownership and Accountability

Clear ownership and accountability improve metadata management in your workplace. You need a data governance strategy that defines who is responsible for tagging and managing data. This helps you avoid confusion and ensures that your data stays organized.

You can improve metadata management by:

Fostering a shared understanding of data across departments. This supports effective communication and decision-making.
Establishing clear standards for metadata creation and classification. This prevents discrepancies and makes metadata easy to find and understand.

You must assign roles for tagging and data governance. This makes your strategy stronger and supports compliance. When you know who owns each tag, you can track changes and fix problems quickly. Your ai tools will work better because your data stays accurate and well-tagged.

Tip: Assign data owners for each workspace or project. This helps you maintain accountability and ensures that your data governance strategy works at every level.

You can build a modern data governance strategy by scaling beyond manual tagging, closing compliance gaps, and defining ownership. This prepares your organization for ai and keeps your data safe and organized.

AI-Driven Data Governance Solutions

Automated Tagging with AI

You face many challenges when you rely on manual tag management. Errors, skipped fields, and inconsistent tag choices make your data unreliable. Automated tagging with ai changes this. You can use ai-driven data governance to tag files, emails, and documents without manual effort. Ai scans your content and applies the right tag based on context. This improves tag management and reduces mistakes.

SharePoint with AI-powered add-ons can automatically tag documents with metadata, which Copilot can then use to perform more robust queries and assist with organizing content.

We’re using the SharePoint AI capabilities to help with things like automatic processing and auto-tagging. These are mundane tasks that people don’t like to do.

It’s about integrating AI capabilities into daily practices to automate mundane tasks like tagging content, making it more discoverable, and keeping it up to date.

You gain consistency in tag management systems. Ai-driven data governance ensures every tag is accurate and relevant. You do not need to guess or select defaults. Your tag management becomes faster and more reliable. You avoid the failure to adapt to ai by using automated data governance. Your data stays organized, and your compliance improves.

Here is a table showing the differences between traditional and ai-driven data governance:

Feature	Traditional Data Governance	AI-Driven Data Governance
Focus	Structured data, reporting, compliance	Unstructured & real-time data, model training, explainability
Goal	Accuracy, compliance, data reuse	Model fairness, trust, regulatory readiness
Scope	Data quality, cataloging, access control	Data lineage, annotation standards, AI ethics, risk monitoring
Stakeholders	IT, compliance, data stewards	Data scientists, ML engineers, legal, ethics teams

Microsoft 365 Governance Tools

You need strong tag management systems to support compliance and ai. Microsoft 365 Governance offers tools that automate tag management and classification. Microsoft Purview is one of these tools. It uses ai to tag data, track data lineage, and manage compliance. You can rely on Purview to handle tag management across your organization.

Microsoft Purview has evolved into a platform for data governance, security, and compliance. Its Unified Catalog helps you discover data and automate tag management. You can meet standards like GDPR, CMMC, HIPAA, and SOC 2. Purview uses ai to manage policies and provide lineage insights.

Supported Compliance Standards	Core AI Features
GDPR, CMMC, HIPAA, SOC 2	Auto-tagging, lineage insights, AI-based policy management

You benefit from tag management systems that work across Teams, SharePoint, and OneDrive. Ai-driven data governance helps you avoid the failure to adapt to ai. You keep your data ready for audits and ai projects. Your tag management becomes proactive, not reactive.

Real-Time Metadata Injection

Real-time metadata injection gives you a new way to manage tag management. You do not wait for manual reviews or batch updates. Ai injects metadata as soon as you create or modify data. This supports real-time monitoring and keeps your tag management systems up to date.

Active metadata management enables real-time governance by dynamically detecting data or model drift, reducing the need for manual reviews.
Governance tools provide transparency by documenting data sources, transformations, and downstream usage, which is essential for auditing and validating AI outputs.
These capabilities help organizations build an AI-ready data architecture, ensuring compliance, risk management, and alignment with regulatory requirements.

You gain transparency and control over your tag management. Real-time metadata injection helps you track data lineage and meet compliance standards. Your tag management systems become smarter and more responsive. You build a foundation for ai and future analytics.

Tip: Use real-time monitoring to catch errors and update tags instantly. This keeps your data accurate and supports compliance.

You can transform your tag management with ai-driven data governance. Automated tagging, Microsoft 365 Governance tools, and real-time metadata injection help you manage data, improve compliance, and prepare for ai. Your tag management systems become a living layer that adapts to your needs.

Action Steps for Better Governance

Adopt Automated Tools

You need to move beyond manual tag processes to achieve real governance. Automated tools help you reach higher accuracy and reduce errors in your tag system. These tools use ai to scan your files and apply the correct tag without human guesswork. You save time and improve compliance because automated systems enforce rules every time you create or update content.

Here is a table that shows the measurable benefits you gain when you use automated tools for governance:

Benefit	Description
Improved Data Quality	Continuous validation and rule enforcement boost accuracy and trust in your tag system.
Faster Compliance Reporting	Automation shortens audit cycles and speeds up compliance processes.
Increased Operational Efficiency	Streamlined audits and fewer manual tasks save time for your team.
Enhanced Trust in Data Systems	Automated governance builds trust, helping you make decisions with confidence.

Automated tag monitoring also helps you spot issues as soon as they happen. You can set up alerts for missing or incorrect tags. This real-time monitoring keeps your tag accuracy high and supports compliance. Ai-driven tools make your tag system smarter and more reliable. You can focus on your work while the system handles the details.

Tip: Choose automated tools that offer real-time tag monitoring and ai-powered classification. This will help you maintain accuracy and compliance as your data grows.

Decentralize Metadata Ownership

You improve governance when you give ownership of tags to the people who know the data best. Decentralizing metadata ownership means that domain experts manage their own tag systems. These experts understand the context and can ensure the right tag is used every time. This approach increases accuracy and makes your tag system more flexible.

Domain owners take responsibility for tag accuracy and decision-making.
Governance checks built into domain pipelines catch issues early and prevent problems from spreading.
Policies adapt to real usage, so your tag system stays aligned with your operations and compliance needs.
Automated, self-enforcing systems help your governance scale and keep tag accuracy high without extra manual work.

When you decentralize, you also improve compliance. Each team can monitor their own tag system and fix issues quickly. This reduces the risk of errors and keeps your tag system ready for audits. Ai tools can support each domain by suggesting the best tag based on content and context. You get better accuracy and faster responses to changes in your data.

Note: Assign clear tag ownership for every workspace or project. This helps you track who is responsible for tag accuracy and compliance.

Track Governance Outcomes

You need to track governance outcomes to see if your tag strategy works. Monitoring your tag system helps you measure accuracy, compliance, and efficiency. You can use metrics to spot gaps and improve your process. Ai tools can help you collect and analyze these metrics in real time.

Here is a table with key categories and example metrics you should monitor:

Category	Description	Example Metrics
Data Quality & Trust	Measures accuracy, completeness, and consistency of your tag system.	% of tables with tag descriptions, owners, or lineage. # of tag accuracy incidents per month.
Compliance & Risk	Checks if your tag system meets compliance rules and controls access to sensitive data.	% of sensitive tags with access controls. Audit trail completeness. Time to revoke access.
Data Usage & Adoption	Tracks how often users find and use tagged data.	Active users of the tag catalog. # of certified tags vs. total tags.
Operational Efficiency	Looks at how quickly you resolve tag issues and onboard new users.	Time from tag issue identification to resolution. Time to approve tag access requests.

You should set up regular tag monitoring to keep your tag system healthy. Use dashboards to visualize tag accuracy and compliance trends. Ai can help you spot patterns and suggest improvements. When you track governance outcomes, you make sure your tag system supports your business goals and keeps your data ready for ai projects.

Callout: Regular tag monitoring and tracking governance outcomes help you maintain high accuracy, meet compliance needs, and prepare for future ai advancements.

Educate and Enable Users

You play a key role in successful data governance. Automated tools help you manage data, but you must understand how to use them. Training and enablement programs give you the skills to work with these tools. You need ongoing formal training to keep up with new features and best practices. This training helps you avoid mistakes and builds confidence.

You benefit most from training that matches your role. Data stewards, engineers, and business users each need different skills. Role-specific curricula make learning easier and more relevant. For example, data engineers focus on pipelines and automation. Business users learn how to use data catalogs and search for information. Tailored training keeps you engaged and helps you master your tasks.

You can choose from several training methods. In-person sessions let you ask questions and practice skills. Virtual live sessions connect you with experts and other learners. Self-paced modules give you flexibility to learn when you have time. Each method has strengths. You can combine them to fit your schedule and learning style.

Tip: Track your training progress with a learning management system (LMS). This helps you stay compliant and shows your growth.

You must keep training up to date. Governance tools change often. New policies and features appear. Updated training helps you stay current and avoid confusion. You can use external resources to learn faster and reduce the need for custom programs.

You need to understand governance policies. These cover data access, quality, and compliance rules. You learn about data quality standards, lineage documentation, and trust-building. Security training teaches you about role-based access controls and handling sensitive data. When you know the rules, you protect your organization and support strong governance.

Here is a list of best practices for user enablement:

Adjust training for each role to address specific needs.
Track training centrally for compliance and progress monitoring.
Keep training up to date as tools and procedures evolve.
Leverage external training resources to reduce custom program burden.
Choose intuitive governance tools to simplify training and accelerate adoption.

You help your organization succeed when you learn and use automated governance tools. Training gives you the knowledge to manage data, follow policies, and support compliance. You become a trusted user and help build a strong data governance culture.

Future-Proofing with AI and Automation

Building a Living Metadata Layer

You need a living metadata layer to keep your organization ready for change. This layer adapts as your business grows and as new tools appear. You can avoid data classification issues by following best practices. Start with a clear strategy for what metadata you need and who will maintain it. Use a controlled vocabulary so everyone uses the same terms for each tag. Organize your tags with a taxonomy to make information easy to find and use.

Best Practice	Description
Establish a Clear Metadata Strategy	Define what metadata is needed, which systems will use it, and who is responsible for it.
Create and Maintain a Controlled Vocabulary	Use a standard list of approved terms for each tag and update it often.
Implement a Taxonomy or Ontology	Structure your tags in hierarchies to improve navigation and precision.

A unified metadata layer helps you unify governance across platforms. Automation enriches each tag with context, turning static records into a dynamic pipeline. This approach supports compliance and makes auditing easier. You can enforce policies across all ai workloads and keep your data ready for new challenges.

Preparing for AI Readiness

You must prepare your data for ai. Start by building a strong foundation. Standardize your tags and organize your data to avoid data classification issues. Assign clear ownership for each tag and make sure everyone knows their role. Upgrade your infrastructure to handle the storage and processing needs of ai.

Build a strong data foundation for ai integration.
Establish a scalable governance framework with clear tag ownership.
Standardize and organize data for consistency.
Upgrade infrastructure for ai workloads.
Ensure continuous improvement and compliance with regulations.

You can unify governance across platforms by using these steps. Start with small pilot projects to test ai applications. Invest in data preparation to keep quality high and avoid bias. Engage teams from across your business to align ai with your goals. Protect your data with strong access controls and review your ai models often. This keeps your compliance strong and supports auditing.

Continuous Improvement

Continuous improvement keeps your governance effective as technology changes. Set up a framework that aligns people, processes, and technology. Populate your data catalog with key assets and keep it updated. Empower business stewards to manage tags in their areas. Curate your assets to build trust and knowledge. Apply policies and controls to protect sensitive data and support compliance.

Establish a governance framework with clear responsibilities.
Populate and maintain your data catalog for auditing.
Empower business stewards to drive tag management.
Curate and refine asset attributes for trust.
Apply policies and controls for compliance.
Foster collaboration in your data community.
Monitor and measure curation for ongoing improvement.

You should also focus on ethics and privacy. Create guidelines to reduce bias and discrimination in ai. Control data quality and manage the lifecycle from collection to storage. Protect sensitive data with strong access measures. Define stewardship and ownership for each tag. These steps help you meet compliance needs and make auditing more efficient.

Organizations that use ai and automation see big gains. Automation leads to better standardization, higher compliance, and fewer data-related incidents. You can achieve faster time-to-compliance and reduce risk. Real-time monitoring and predictive governance will shape the future. GenAI will help you classify data, monitor for anomalies, and update tags dynamically. This will make auditing and compliance easier and more reliable.

Tip: Review your governance strategy often. Use ai tools to monitor tag accuracy, access, and compliance. This keeps your data ready for any challenge.

You need to move beyond manual dropdowns to achieve scalable governance. Manual tagging slows you down, introduces errors, and limits your ability to grow. AI-driven automation improves accuracy, speeds up workflows, and reduces compliance risks. Microsoft 365 Governance empowers you with unified controls, automated audits, and intuitive dashboards.

Automation embeds governance into your data workflows, keeping controls effective as your business grows.
AI frees your team from repetitive tasks and enhances search, compliance, and cost efficiency.

Data leaders should prioritize modern governance strategies to future-proof your organization and unlock the full value of your data.

FAQ

What is manual tagging in data governance?

Manual tagging means you select tags for files or documents yourself. You use dropdown menus or type in values. This process often leads to mistakes and missing information.

Why should you move away from dropdown tagging?

Dropdown tagging slows you down. You may skip fields or choose the wrong tag. Automation helps you tag data faster and more accurately.

How does AI improve metadata tagging?

AI scans your content and assigns the right tags. You do not need to guess or remember rules. This process increases accuracy and saves time.

What are the risks of poor metadata tagging?

Poor tagging hides important files. You may face compliance issues or lose track of sensitive data. Hackers can target untagged files.

How does Microsoft 365 Governance help with compliance?

Microsoft 365 Governance uses AI to tag and organize your data. You meet regulations and prepare for audits. The system updates tags in real time.

Can automation solve data access bottlenecks?

Yes. Automation removes delays caused by manual tagging. You find and use data faster. This helps your team work without waiting for information.

Who should own metadata in your organization?

Domain experts should own metadata. They know the data best. Assigning ownership improves accuracy and keeps your system organized.

How do you keep users engaged with new governance tools?

You should offer training and support. Use simple tools and clear instructions. Track progress and celebrate success to keep users motivated.

🚀 Want to be part of m365.fm?

Then stop just listening… and start showing up.

👉 Connect with me on LinkedIn and let’s make something happen:

🎙️ Be a podcast guest and share your story
🎧 Host your own episode (yes, seriously)
💡 Pitch topics the community actually wants to hear
🌍 Build your personal brand in the Microsoft 365 space

This isn’t just a podcast — it’s a platform for people who take action.

🔥 Most people wait. The best ones don’t.

👉 Connect with me on LinkedIn and send me a message:
"I want in"

Let’s build something awesome 👊

1
00:00:00,000 --> 00:00:02,400
Everyone told you metadata governance was about training.

2
00:00:02,400 --> 00:00:05,040
Better taxonomies, clearer drop downs, more communication.

3
00:00:05,040 --> 00:00:06,960
But what if the real problem isn't the users?

4
00:00:06,960 --> 00:00:09,360
What if the real problem is that you put governance logic

5
00:00:09,360 --> 00:00:11,120
in the wrong layer entirely?

6
00:00:11,120 --> 00:00:12,880
Because the organization's fixing this,

7
00:00:12,880 --> 00:00:15,240
aren't sending more emails, they're removing the user

8
00:00:15,240 --> 00:00:16,080
from the loop.

9
00:00:16,080 --> 00:00:18,240
And what they're building instead changes everything.

10
00:00:18,240 --> 00:00:20,560
Your governance strategy depends on a behavior

11
00:00:20,560 --> 00:00:22,600
your users stopped doing years ago.

12
00:00:22,600 --> 00:00:24,840
They don't tag files, they don't fill drop downs.

13
00:00:24,840 --> 00:00:26,760
They don't classify content on upload.

14
00:00:26,760 --> 00:00:29,200
And yet your entire search compliance and AI readiness

15
00:00:29,200 --> 00:00:30,840
strategy assumes they do.

16
00:00:30,840 --> 00:00:32,320
The research on this is clear.

17
00:00:32,320 --> 00:00:34,880
The gap between what your policy asks and what your people

18
00:00:34,880 --> 00:00:37,480
actually do is widening every single day.

19
00:00:37,480 --> 00:00:39,840
And the cost of that gap is bigger than most organizations

20
00:00:39,840 --> 00:00:40,840
realize.

21
00:00:40,840 --> 00:00:43,600
When a file lands in SharePoint or Teams without metadata,

22
00:00:43,600 --> 00:00:44,960
it doesn't just lack labels.

23
00:00:44,960 --> 00:00:46,280
It becomes invisible to search.

24
00:00:46,280 --> 00:00:48,160
It becomes unreachable to compliance tools.

25
00:00:48,160 --> 00:00:50,720
It becomes useless to the AI agents your organization

26
00:00:50,720 --> 00:00:51,680
is about to deploy.

27
00:00:51,680 --> 00:00:52,680
You still pay to store it.

28
00:00:52,680 --> 00:00:53,920
You still pay to secure it.

29
00:00:53,920 --> 00:00:56,600
But you can't find it, govern it, or learn from it.

30
00:00:56,600 --> 00:00:58,280
That is what dark data actually means.

31
00:00:58,280 --> 00:01:01,400
Not deleted, not lost, just present and completely unreachable

32
00:01:01,400 --> 00:01:03,560
because nobody clicked the drop down.

33
00:01:03,560 --> 00:01:06,200
The manual metadata crisis, work changed.

34
00:01:06,200 --> 00:01:08,120
The way content gets created changed.

35
00:01:08,120 --> 00:01:11,560
And governance models built for 2015 can't handle 2026.

36
00:01:11,560 --> 00:01:13,480
Hybrid work transformed content creation

37
00:01:13,480 --> 00:01:15,560
into a continuous decentralized stream.

38
00:01:15,560 --> 00:01:18,880
Research from tier one shows that roughly 42% of the workforce

39
00:01:18,880 --> 00:01:22,120
follows a hybrid pattern, 12% is fully remote,

40
00:01:22,120 --> 00:01:24,320
and the rest is back in the office full time.

41
00:01:24,320 --> 00:01:26,520
That distribution, combined with mobile devices

42
00:01:26,520 --> 00:01:29,400
and cloud collaboration means documents, chats, and files

43
00:01:29,400 --> 00:01:32,120
are produced across many tools and contexts all day long.

44
00:01:32,120 --> 00:01:34,120
Content doesn't arrive in batches anymore.

45
00:01:34,120 --> 00:01:37,080
It arrives in a constant flow from Teams, OneDrive,

46
00:01:37,080 --> 00:01:40,320
Email, mobile uploads, and third party integrations.

47
00:01:40,320 --> 00:01:42,200
In that environment, expecting users

48
00:01:42,200 --> 00:01:45,320
to consciously manage metadata through manual tagging gestures

49
00:01:45,320 --> 00:01:48,280
runs directly against how content is naturally created.

50
00:01:48,280 --> 00:01:49,280
People are in meetings.

51
00:01:49,280 --> 00:01:50,600
They're on mobile devices.

52
00:01:50,600 --> 00:01:52,120
They're switching between tools.

53
00:01:52,120 --> 00:01:54,200
And at the moment of creation, they're not thinking

54
00:01:54,200 --> 00:01:55,320
about taxonomy.

55
00:01:55,320 --> 00:01:56,560
They are thinking about the work.

56
00:01:56,560 --> 00:01:58,920
The content gets saved, the drop down appears.

57
00:01:58,920 --> 00:02:00,640
And in most cases, it gets skipped.

58
00:02:00,640 --> 00:02:03,680
The 2026 reality is that content volume has outpaced

59
00:02:03,680 --> 00:02:05,560
any human-driven tagging effort.

60
00:02:05,560 --> 00:02:08,480
Organizations with 10,000 users generate millions of files

61
00:02:08,480 --> 00:02:09,120
per year.

62
00:02:09,120 --> 00:02:11,120
Each file in theory needs classification

63
00:02:11,120 --> 00:02:13,760
for department, project, sensitivity, retention,

64
00:02:13,760 --> 00:02:14,920
and regulatory scope.

65
00:02:14,920 --> 00:02:16,720
That isn't a task humans can scale.

66
00:02:16,720 --> 00:02:18,880
It is a task humans were never meant to scale.

67
00:02:18,880 --> 00:02:21,040
And the assumption that they would was always optimistic,

68
00:02:21,040 --> 00:02:22,360
consider the arithmetic.

69
00:02:22,360 --> 00:02:25,200
A mid-sized enterprise with 5,000 active users

70
00:02:25,200 --> 00:02:27,920
might see each user create or upload five files per day

71
00:02:27,920 --> 00:02:28,600
on average.

72
00:02:28,600 --> 00:02:29,720
Some days are heavier.

73
00:02:29,720 --> 00:02:30,720
Some are lighter.

74
00:02:30,720 --> 00:02:33,160
But over a year, that's more than 6 million files.

75
00:02:33,160 --> 00:02:35,640
If each file requires three metadata selections

76
00:02:35,640 --> 00:02:38,680
from drop-down menus, that's 18 million tagging decisions

77
00:02:38,680 --> 00:02:39,160
per year.

78
00:02:39,160 --> 00:02:41,400
Each decision requires the user to read the field,

79
00:02:41,400 --> 00:02:44,440
recall the correct taxonomy term, navigate the drop-down,

80
00:02:44,440 --> 00:02:46,280
and select the appropriate value.

81
00:02:46,280 --> 00:02:48,880
At 30 seconds per file, that's more than 50,000 hours

82
00:02:48,880 --> 00:02:50,200
of labor annually.

83
00:02:50,200 --> 00:02:52,000
And that's just for the files that get tagged.

84
00:02:52,000 --> 00:02:53,400
It doesn't account for the files that

85
00:02:53,400 --> 00:02:56,560
users skip entirely because they're in a hurry on a mobile device

86
00:02:56,560 --> 00:02:58,200
or simply don't see the value.

87
00:02:58,200 --> 00:03:01,000
The hidden cost extends beyond the time spent on tagging.

88
00:03:01,000 --> 00:03:02,880
When metadata is missing or incorrect,

89
00:03:02,880 --> 00:03:04,480
downstream processes fail.

90
00:03:04,480 --> 00:03:05,840
Search can't find the content.

91
00:03:05,840 --> 00:03:07,280
Compliance tools can't classify it.

92
00:03:07,280 --> 00:03:09,320
Retention policies can't apply to it,

93
00:03:09,320 --> 00:03:10,880
analytics can't measure it.

94
00:03:10,880 --> 00:03:12,960
The organization still pays for storage, backup,

95
00:03:12,960 --> 00:03:15,880
security scanning, and e-discovery indexing of these files.

96
00:03:15,880 --> 00:03:17,960
But it receives none of the governance value

97
00:03:17,960 --> 00:03:19,440
that would justify those costs.

98
00:03:19,440 --> 00:03:23,120
Every untagged file is a small leak in the governance budget.

99
00:03:23,120 --> 00:03:25,680
And at enterprise scale, the leak becomes a flood.

100
00:03:25,680 --> 00:03:27,680
The hidden cost of human error and classification

101
00:03:27,680 --> 00:03:31,000
goes far beyond missing tags when users do engage with drop downs.

102
00:03:31,000 --> 00:03:32,720
They make inconsistent choices.

103
00:03:32,720 --> 00:03:35,040
Two employees can classify the same document differently

104
00:03:35,040 --> 00:03:37,000
based on their understanding of the taxonomy,

105
00:03:37,000 --> 00:03:39,480
their risk perception, or their immediate incentives.

106
00:03:39,480 --> 00:03:42,440
One person marks a customer contract as general business.

107
00:03:42,440 --> 00:03:45,000
Another marks a similar contract as confidential.

108
00:03:45,000 --> 00:03:47,480
Both are guessing, and both guesses become permanent

109
00:03:47,480 --> 00:03:49,440
because once a file is saved, it's rarely

110
00:03:49,440 --> 00:03:52,040
revisited to update tags as its usage or sensitivity

111
00:03:52,040 --> 00:03:52,920
changes.

112
00:03:52,920 --> 00:03:54,520
This inconsistency leads directly to what

113
00:03:54,520 --> 00:03:56,240
governance teams call dark data.

114
00:03:56,240 --> 00:03:57,720
Files that exist but can't be found

115
00:03:57,720 --> 00:03:59,560
because they carry no useful metadata.

116
00:03:59,560 --> 00:04:01,360
Enterprise search initiatives fail

117
00:04:01,360 --> 00:04:03,040
because the majority of indexed items

118
00:04:03,040 --> 00:04:04,960
have little or no manual metadata.

119
00:04:04,960 --> 00:04:07,960
So queries and refiners return incomplete results.

120
00:04:07,960 --> 00:04:10,640
Users learn that the internet search doesn't find anything,

121
00:04:10,640 --> 00:04:13,640
and they default to email or team search instead.

122
00:04:13,640 --> 00:04:15,440
The governance investment becomes invisible

123
00:04:15,440 --> 00:04:17,840
because the underlying data layer is empty.

124
00:04:17,840 --> 00:04:19,200
The enterprise search failure pattern

125
00:04:19,200 --> 00:04:20,720
follows a predictable arc.

126
00:04:20,720 --> 00:04:23,520
An organization invests heavily in a new SharePoint internet

127
00:04:23,520 --> 00:04:25,120
or document management system.

128
00:04:25,120 --> 00:04:27,680
Information architects define detailed managed metadata

129
00:04:27,680 --> 00:04:29,800
columns for department, project, product,

130
00:04:29,800 --> 00:04:31,840
confidentiality level, and topic.

131
00:04:31,840 --> 00:04:33,280
Search configuration and navigation

132
00:04:33,280 --> 00:04:35,840
are built around these columns with refiners on tags.

133
00:04:35,840 --> 00:04:38,080
The launch is celebrated, training is delivered,

134
00:04:38,080 --> 00:04:39,360
and then reality sets in.

135
00:04:39,360 --> 00:04:41,760
Users store documents in teams and one drive rather

136
00:04:41,760 --> 00:04:43,760
than in the official SharePoint libraries.

137
00:04:43,760 --> 00:04:47,400
Even within SharePoint, users skip or superficially fill out

138
00:04:47,400 --> 00:04:50,320
required metadata because it slows them down.

139
00:04:50,320 --> 00:04:52,840
Many items like chats, emails, meeting recordings,

140
00:04:52,840 --> 00:04:55,240
and loop components never pass through the metadata

141
00:04:55,240 --> 00:04:56,280
and forced libraries at all.

142
00:04:56,280 --> 00:04:58,200
What you get is a search experience that looks broken

143
00:04:58,200 --> 00:05:00,040
but is technically functioning correctly.

144
00:05:00,040 --> 00:05:02,160
The search engine is doing exactly what it was configured

145
00:05:02,160 --> 00:05:02,680
to do.

146
00:05:02,680 --> 00:05:05,280
It returns results based on the metadata that exists.

147
00:05:05,280 --> 00:05:08,160
The issue is that the metadata doesn't exist for most items.

148
00:05:08,160 --> 00:05:11,720
Search refiners based on tags show very small or skewed counts.

149
00:05:11,720 --> 00:05:14,120
Important content is invisible to taxonomy-based queries

150
00:05:14,120 --> 00:05:15,560
because it isn't tagged.

151
00:05:15,560 --> 00:05:18,520
Users conclude that the search doesn't work in a band in it.

152
00:05:18,520 --> 00:05:21,200
The organization has spent hundreds of thousands of dollars

153
00:05:21,200 --> 00:05:23,640
on a search infrastructure that can't find its own documents

154
00:05:23,640 --> 00:05:26,680
because the governance layer beneath it has nothing in it.

155
00:05:26,680 --> 00:05:29,400
The cost model for manual governance tells the same story.

156
00:05:29,400 --> 00:05:31,560
Manual metadata management is widely described

157
00:05:31,560 --> 00:05:33,880
as a hidden cost center because it depends

158
00:05:33,880 --> 00:05:36,760
on repeated human effort that becomes expensive and error-prone

159
00:05:36,760 --> 00:05:38,360
as data volumes increase.

160
00:05:38,360 --> 00:05:40,440
In the first year, manual governance looks cheaper

161
00:05:40,440 --> 00:05:42,880
because it avoids licensing and integration work.

162
00:05:42,880 --> 00:05:46,280
But its costs accumulate through analyst time, remediation,

163
00:05:46,280 --> 00:05:48,640
exception handling, periodic reviews,

164
00:05:48,640 --> 00:05:51,200
and the inefficiency of keeping metadata current

165
00:05:51,200 --> 00:05:53,080
in a constantly changing environment.

166
00:05:53,080 --> 00:05:55,440
By year three, the total cost of ownership

167
00:05:55,440 --> 00:05:58,280
has usually overtaken any automation alternative.

168
00:05:58,280 --> 00:06:01,320
Remediation is the cost that manual governance teams rarely

169
00:06:01,320 --> 00:06:02,800
budget for explicitly.

170
00:06:02,800 --> 00:06:05,640
When an audit reveals that sensitive files were mislabeled,

171
00:06:05,640 --> 00:06:08,680
someone must find them, review them, and correct their metadata.

172
00:06:08,680 --> 00:06:10,920
When a search project fails because the indexed items

173
00:06:10,920 --> 00:06:14,120
have no tags, someone must retroactively tag thousands of documents

174
00:06:14,120 --> 00:06:16,840
or rebuild the search index with lower expectations.

175
00:06:16,840 --> 00:06:19,960
When a DLP rule misfires because of inconsistent labeling,

176
00:06:19,960 --> 00:06:23,360
someone must investigate the false positives, adjust the rule,

177
00:06:23,360 --> 00:06:25,480
and apologize to blocked users.

178
00:06:25,480 --> 00:06:28,120
None of these tasks appear in the original governance plan.

179
00:06:28,120 --> 00:06:31,040
They emerge as consequences of the metadata gap.

180
00:06:31,040 --> 00:06:33,440
And they consume resources that could have been spent

181
00:06:33,440 --> 00:06:35,000
on proactive improvements.

182
00:06:35,000 --> 00:06:37,400
A 2026 governance tool buyer guide explicitly

183
00:06:37,400 --> 00:06:39,080
warns that enterprise governance platforms

184
00:06:39,080 --> 00:06:41,480
can look affordable until implementation, training,

185
00:06:41,480 --> 00:06:44,400
and ongoing stewardship labor are included.

186
00:06:44,400 --> 00:06:47,960
The rule of thumb budget threshold is at least $150,000

187
00:06:47,960 --> 00:06:51,840
over three years for enterprise grade tooling to be realistic.

188
00:06:51,840 --> 00:06:54,760
Below that, the guide suggests a lighter native or open source

189
00:06:54,760 --> 00:06:56,640
approach may be more appropriate.

190
00:06:56,640 --> 00:06:58,640
That number is important because it frames the decision

191
00:06:58,640 --> 00:06:59,120
point.

192
00:06:59,120 --> 00:07:00,840
Manual governance isn't free.

193
00:07:00,840 --> 00:07:03,040
It is just paid in labor instead of licenses.

194
00:07:03,040 --> 00:07:04,720
And at scale, labor is more expensive.

195
00:07:04,720 --> 00:07:06,880
The shift from content management to content governance

196
00:07:06,880 --> 00:07:08,200
makes this even clearer.

197
00:07:08,200 --> 00:07:10,520
Microsoft describes content management at scale

198
00:07:10,520 --> 00:07:13,160
as a process that encompasses strategic planning, tool

199
00:07:13,160 --> 00:07:16,040
selection, governance and retention policy design, deployment

200
00:07:16,040 --> 00:07:19,280
and integration, and continuous training and optimization.

201
00:07:19,280 --> 00:07:21,080
Under this view, content governance

202
00:07:21,080 --> 00:07:23,080
isn't just about where files live.

203
00:07:23,080 --> 00:07:26,000
It is about how they're described, secured, retained,

204
00:07:26,000 --> 00:07:28,440
and surfaced to the right audiences over time in line

205
00:07:28,440 --> 00:07:30,880
with business, legal, and regulatory requirements.

206
00:07:30,880 --> 00:07:33,120
Without robust metadata, governance policies

207
00:07:33,120 --> 00:07:35,680
are hard to target, discoveries inefficient,

208
00:07:35,680 --> 00:07:38,440
and analytics or AI capabilities become unreliable.

209
00:07:38,440 --> 00:07:41,360
Manual tagging underdelivers on metadata quality,

210
00:07:41,360 --> 00:07:43,880
precisely when governance is becoming more data-driven

211
00:07:43,880 --> 00:07:45,200
and fine-grained.

212
00:07:45,200 --> 00:07:47,120
Why drop downs are a design failure?

213
00:07:47,120 --> 00:07:48,640
It is tempting to blame the users.

214
00:07:48,640 --> 00:07:49,800
They didn't fill out the form.

215
00:07:49,800 --> 00:07:51,360
They skipped the required field.

216
00:07:51,360 --> 00:07:52,680
They picked the default option.

217
00:07:52,680 --> 00:07:55,000
But the real problem is the interface itself.

218
00:07:55,000 --> 00:07:57,400
Dropdown selectors have been the default UI control

219
00:07:57,400 --> 00:07:59,800
for metadata fields in SharePoint file upload forms

220
00:07:59,800 --> 00:08:02,000
and line of business applications for decades.

221
00:08:02,000 --> 00:08:04,320
They promise consistency by constraining choices

222
00:08:04,320 --> 00:08:05,880
to a predefined taxonomy.

223
00:08:05,880 --> 00:08:07,800
But user experience research has repeatedly

224
00:08:07,800 --> 00:08:09,920
identified serious limitations when drop downs

225
00:08:09,920 --> 00:08:12,480
are used for complex or frequent decision making.

226
00:08:12,480 --> 00:08:14,000
The Nielsen Norman Group's guidelines

227
00:08:14,000 --> 00:08:16,800
on drop-down design highlight several critical issues.

228
00:08:16,800 --> 00:08:19,600
Long drop downs that require scrolling make it impossible

229
00:08:19,600 --> 00:08:21,880
for users to see all choices at once.

230
00:08:21,880 --> 00:08:23,920
Interacting menus where options change based

231
00:08:23,920 --> 00:08:26,520
on another field, confuse users, and hiding

232
00:08:26,520 --> 00:08:28,120
or obscuring labels when menus are open,

233
00:08:28,120 --> 00:08:29,840
deprives users of important contexts

234
00:08:29,840 --> 00:08:31,040
about what they are choosing.

235
00:08:31,040 --> 00:08:33,320
These patterns align closely with common experiences

236
00:08:33,320 --> 00:08:35,800
in metadata-driven forms, where users confront

237
00:08:35,800 --> 00:08:38,360
long lists of content types, departments, regions,

238
00:08:38,360 --> 00:08:40,440
or sensitivity categories, and struggle

239
00:08:40,440 --> 00:08:43,040
to quickly identify the correct choice.

240
00:08:43,040 --> 00:08:45,200
The same guidelines recommend avoiding drop-down boxes

241
00:08:45,200 --> 00:08:47,880
when typing would be faster, such as for familiar data,

242
00:08:47,880 --> 00:08:50,360
like states, dates, or highly known options.

243
00:08:50,360 --> 00:08:52,160
They also caution against using drop downs

244
00:08:52,160 --> 00:08:54,120
for data that's highly familiar to users,

245
00:08:54,120 --> 00:08:56,920
because the motor memory they have for such information

246
00:08:56,920 --> 00:09:00,120
is disrupted by the need to hunt through a list.

247
00:09:00,120 --> 00:09:02,440
In the context of content tagging, these findings

248
00:09:02,440 --> 00:09:04,920
suggest that presenting users with multiple drop downs

249
00:09:04,920 --> 00:09:08,720
for categories, tags, business units, and confidentiality levels

250
00:09:08,720 --> 00:09:10,960
slows them down and disrupts their flow,

251
00:09:10,960 --> 00:09:14,640
even when the underlying taxonomy is logically designed.

252
00:09:14,640 --> 00:09:16,160
Instead of being an unobtrusive way

253
00:09:16,160 --> 00:09:18,760
to capture useful metadata, drop-down heavy forms

254
00:09:18,760 --> 00:09:21,160
become obstacles that users learn to circumvent

255
00:09:21,160 --> 00:09:22,920
or complete perfunctually.

256
00:09:22,920 --> 00:09:25,120
The NN group research emphasizes that drop downs

257
00:09:25,120 --> 00:09:27,800
are particularly ill-suited when there are many items,

258
00:09:27,800 --> 00:09:29,680
or when users must repeatedly interact

259
00:09:29,680 --> 00:09:31,800
with them during routine workflows.

260
00:09:31,800 --> 00:09:33,720
Content authors in Microsoft 365 often

261
00:09:33,720 --> 00:09:35,560
belong to this high-frequency category.

262
00:09:35,560 --> 00:09:37,880
They create, share, and update documents every day,

263
00:09:37,880 --> 00:09:39,840
often under time pressure, requiring them

264
00:09:39,840 --> 00:09:42,840
to scroll through multiple long drop-down lists for each file

265
00:09:42,840 --> 00:09:45,440
is a fundamentally poor fit for their work patterns.

266
00:09:45,440 --> 00:09:47,000
Over time, this misfit manifests

267
00:09:47,000 --> 00:09:50,040
as partial completion of metadata, inaccurate guesses,

268
00:09:50,040 --> 00:09:52,320
or outright resistance to governance initiatives

269
00:09:52,320 --> 00:09:54,600
that are perceived as bureaucratic and disconnected

270
00:09:54,600 --> 00:09:56,080
from actual tasks.

271
00:09:56,080 --> 00:09:58,040
UX studies on enterprise software adoption

272
00:09:58,040 --> 00:09:59,760
underscore that tools and processes which

273
00:09:59,760 --> 00:10:01,720
add friction without obvious value are quickly

274
00:10:01,720 --> 00:10:04,600
sideline, especially when alternatives exist.

275
00:10:04,600 --> 00:10:06,320
If tagging a file takes 30 seconds,

276
00:10:06,320 --> 00:10:08,840
and the user doesn't personally benefit from the metadata,

277
00:10:08,840 --> 00:10:09,680
they will skip it.

278
00:10:09,680 --> 00:10:11,320
And if skipping is faster than doing it,

279
00:10:11,320 --> 00:10:13,120
skipping becomes the default behavior.

280
00:10:13,120 --> 00:10:14,800
Human factors play a role too.

281
00:10:14,800 --> 00:10:16,960
People are more likely to tag content accurately

282
00:10:16,960 --> 00:10:18,920
when they perceive a clear personal benefit,

283
00:10:18,920 --> 00:10:21,360
such as improved findability of their own files.

284
00:10:21,360 --> 00:10:23,840
But they're less motivated when benefits are abstract,

285
00:10:23,840 --> 00:10:25,160
like enterprise-wide reporting,

286
00:10:25,160 --> 00:10:27,480
or when they fear that tagging content as sensitive will

287
00:10:27,480 --> 00:10:29,200
expose them to scrutiny.

288
00:10:29,200 --> 00:10:30,800
In high-volume environments, fatigue

289
00:10:30,800 --> 00:10:34,000
leads to minimal tagging or reliance on default values.

290
00:10:34,000 --> 00:10:36,920
The first few files of the day might get careful attention.

291
00:10:36,920 --> 00:10:38,960
By the 20th, the user is clicking anything

292
00:10:38,960 --> 00:10:40,640
that makes the dialogue disappear.

293
00:10:40,640 --> 00:10:42,040
This creates a two-tier system

294
00:10:42,040 --> 00:10:44,120
that most organizations never acknowledge.

295
00:10:44,120 --> 00:10:46,360
Dilligent teams with strong process discipline

296
00:10:46,360 --> 00:10:48,160
produce well-tagged content.

297
00:10:48,160 --> 00:10:49,920
Everyone else produces chaos.

298
00:10:49,920 --> 00:10:51,400
And because governance policies depend

299
00:10:51,400 --> 00:10:53,680
on accurate metadata to function correctly,

300
00:10:53,680 --> 00:10:56,400
the policies only work on the diligent fraction.

301
00:10:56,400 --> 00:10:58,680
The rest of the estate operates in a metadata vacuum

302
00:10:58,680 --> 00:11:01,080
where search fails, compliance tools misfire,

303
00:11:01,080 --> 00:11:03,480
and DLP rules generate false positives

304
00:11:03,480 --> 00:11:05,320
or miss real risks entirely.

305
00:11:05,320 --> 00:11:06,840
The fatigue cycle is worth examining

306
00:11:06,840 --> 00:11:09,440
because it explains why training has diminishing returns.

307
00:11:09,440 --> 00:11:11,360
On Monday morning, a motivated employee

308
00:11:11,360 --> 00:11:13,920
might carefully tag her first three files of the week.

309
00:11:13,920 --> 00:11:16,120
By Wednesday afternoon, after back-to-back meetings

310
00:11:16,120 --> 00:11:18,720
and urgent requests, she is clicking the first option

311
00:11:18,720 --> 00:11:21,640
in every drop-down just to make the dialogue disappear.

312
00:11:21,640 --> 00:11:24,040
By Friday, she has stopped looking at the fields entirely.

313
00:11:24,040 --> 00:11:25,240
This isn't laziness.

314
00:11:25,240 --> 00:11:27,080
It's the predictable result of asking people

315
00:11:27,080 --> 00:11:29,480
to make low-value decisions at high frequency.

316
00:11:29,480 --> 00:11:31,800
No amount of training changes the cognitive cost

317
00:11:31,800 --> 00:11:32,880
of those decisions.

318
00:11:32,880 --> 00:11:34,640
The only thing that changes is how quickly people

319
00:11:34,640 --> 00:11:35,840
abandon the behavior.

320
00:11:35,840 --> 00:11:38,320
The psychological cost is real and measurable.

321
00:11:38,320 --> 00:11:39,920
Decision fatigue research shows

322
00:11:39,920 --> 00:11:42,440
that the quality of human decisions deteriorates

323
00:11:42,440 --> 00:11:45,120
as the number of consecutive decisions increases.

324
00:11:45,120 --> 00:11:47,520
A knowledge worker who begins their day with careful,

325
00:11:47,520 --> 00:11:50,280
deliberate choices will make progressively worse choices

326
00:11:50,280 --> 00:11:51,480
as the day wears on.

327
00:11:51,480 --> 00:11:53,960
By afternoon, the same person who carefully tagged

328
00:11:53,960 --> 00:11:56,200
their morning documents is randomly selecting

329
00:11:56,200 --> 00:11:58,760
drop-down values just to reduce cognitive load.

330
00:11:58,760 --> 00:12:00,720
The metadata quality curve throughout a workday

331
00:12:00,720 --> 00:12:03,680
resembles a downward slope, not a flat line.

332
00:12:03,680 --> 00:12:05,880
And governance systems that depend on consistent human

333
00:12:05,880 --> 00:12:07,800
judgment across eight hours of work

334
00:12:07,800 --> 00:12:09,840
are fighting a battle against human cognition

335
00:12:09,840 --> 00:12:10,800
that they can't win.

336
00:12:10,800 --> 00:12:13,680
Enterprise software adoption research confirms this pattern.

337
00:12:13,680 --> 00:12:16,120
Tools and processes that add friction without obvious value

338
00:12:16,120 --> 00:12:19,080
are quickly sidelined, especially when alternatives exist.

339
00:12:19,080 --> 00:12:21,320
If a user can save a file to their local desktop

340
00:12:21,320 --> 00:12:23,040
without any metadata forms, they will.

341
00:12:23,040 --> 00:12:25,560
If they can share through email instead of uploading

342
00:12:25,560 --> 00:12:27,360
to a governed library, they will.

343
00:12:27,360 --> 00:12:29,960
The governance system is competing with easier alternatives

344
00:12:29,960 --> 00:12:31,000
and it's losing.

345
00:12:31,000 --> 00:12:32,360
Not because the users are wrong,

346
00:12:32,360 --> 00:12:34,480
but because the system was designed for compliance

347
00:12:34,480 --> 00:12:35,800
rather than for workflow.

348
00:12:35,800 --> 00:12:37,960
Governance teams often miss this dynamic

349
00:12:37,960 --> 00:12:40,480
because they measure adoption rather than quality.

350
00:12:40,480 --> 00:12:42,480
They track how many users have been trained,

351
00:12:42,480 --> 00:12:44,960
how many libraries have metadata columns configured,

352
00:12:44,960 --> 00:12:46,760
and how many documents have been tagged.

353
00:12:46,760 --> 00:12:48,760
These metrics look good in status reports,

354
00:12:48,760 --> 00:12:50,680
but they don't measure whether the tags are accurate,

355
00:12:50,680 --> 00:12:52,240
whether the coverage is complete,

356
00:12:52,240 --> 00:12:53,600
or whether the metadata is actually

357
00:12:53,600 --> 00:12:55,720
being used by search and compliance tools.

358
00:12:55,720 --> 00:12:58,720
A library with 1,000 documents where 800 have default

359
00:12:58,720 --> 00:13:02,120
or incorrect tags looks like 80% adoption on paper.

360
00:13:02,120 --> 00:13:04,800
In reality, it's 80% noise.

361
00:13:04,800 --> 00:13:07,920
The frustration this creates is palpable on both sides.

362
00:13:07,920 --> 00:13:10,200
Users feel burdened by forms that slow them down

363
00:13:10,200 --> 00:13:12,000
and deliver no personal benefit.

364
00:13:12,000 --> 00:13:14,720
Governance teams feel ignored by users

365
00:13:14,720 --> 00:13:17,240
who don't appreciate the importance of metadata.

366
00:13:17,240 --> 00:13:20,200
Executives see governance dashboards that show green status

367
00:13:20,200 --> 00:13:23,000
while search remains broken and compliance remains exposed.

368
00:13:23,000 --> 00:13:24,400
Everyone is working hard.

369
00:13:24,400 --> 00:13:25,640
Nobody's getting what they need.

370
00:13:25,640 --> 00:13:27,760
And the root cause is a structural mismatch

371
00:13:27,760 --> 00:13:29,600
between how governance was designed

372
00:13:29,600 --> 00:13:31,440
and how work actually happens.

373
00:13:31,440 --> 00:13:34,160
The compliance implications are direct and serious.

374
00:13:34,160 --> 00:13:35,800
Data loss prevention solutions rely

375
00:13:35,800 --> 00:13:37,840
on accurate classification of data

376
00:13:37,840 --> 00:13:40,440
to detect and block unauthorized access movement

377
00:13:40,440 --> 00:13:42,480
or sharing of sensitive information.

378
00:13:42,480 --> 00:13:44,680
If sensitive documents are mislabeled as public,

379
00:13:44,680 --> 00:13:47,680
DLP rules may not trigger, allowing data to be exfiltrated

380
00:13:47,680 --> 00:13:48,760
or exposed.

381
00:13:48,760 --> 00:13:50,840
Conversely, if many documents are tagged

382
00:13:50,840 --> 00:13:53,120
as highly sensitive without justification,

383
00:13:53,120 --> 00:13:55,240
DLP systems may generate false alerts

384
00:13:55,240 --> 00:13:58,000
that overwhelm security teams and frustrate users

385
00:13:58,000 --> 00:14:00,040
whose work is blocked unnecessarily.

386
00:14:00,040 --> 00:14:02,080
In consistent manual tagging directly translates

387
00:14:02,080 --> 00:14:03,920
into inconsistent control application.

388
00:14:03,920 --> 00:14:05,640
Microsoft's own security messaging

389
00:14:05,640 --> 00:14:07,520
underscores this point.

390
00:14:07,520 --> 00:14:09,440
Insightful and intelligent classification

391
00:14:09,440 --> 00:14:12,960
is key to data security and classification accuracy

392
00:14:12,960 --> 00:14:16,120
directly determines the effectiveness of protection measures.

393
00:14:16,120 --> 00:14:18,320
Sensitivity labels in Microsoft PerView servers

394
00:14:18,320 --> 00:14:21,360
are a foundation for a wide range of protective actions

395
00:14:21,360 --> 00:14:22,800
from encryption and access controls

396
00:14:22,800 --> 00:14:25,480
to content marking and conditional access integration.

397
00:14:25,480 --> 00:14:27,600
If labels aren't properly applied or maintained,

398
00:14:27,600 --> 00:14:29,800
these controls can't function as intended.

399
00:14:29,800 --> 00:14:31,880
When labels are applied manually by end users,

400
00:14:31,880 --> 00:14:34,000
coverage is typically incomplete and biased

401
00:14:34,000 --> 00:14:35,760
towards certain workloads or teams

402
00:14:35,760 --> 00:14:38,880
that have received more training or are more compliance conscious.

403
00:14:38,880 --> 00:14:40,920
Other parts of the estate are left underprotected,

404
00:14:40,920 --> 00:14:42,320
regulators and auditors want proof

405
00:14:42,320 --> 00:14:43,920
that sensitive data is identified

406
00:14:43,920 --> 00:14:46,040
and protected through repeatable processes.

407
00:14:46,040 --> 00:14:47,880
Manual tagging makes this proof difficult

408
00:14:47,880 --> 00:14:50,240
because individual decisions are rarely documented

409
00:14:50,240 --> 00:14:53,320
and even harder to justify automated classification

410
00:14:53,320 --> 00:14:55,840
backed by centralized tools and published policies

411
00:14:55,840 --> 00:14:57,720
gives you a defensible audit trail.

412
00:14:57,720 --> 00:15:00,480
Retention policies in Microsoft 365

413
00:15:00,480 --> 00:15:02,040
depend on correct metadata scoping

414
00:15:02,040 --> 00:15:03,880
to retain or delete content on schedule.

415
00:15:03,880 --> 00:15:06,040
When manual tagging weakens that chain of control

416
00:15:06,040 --> 00:15:09,040
both compliance exposure and operational risk grow.

417
00:15:09,040 --> 00:15:10,640
So if the user can't scale tagging

418
00:15:10,640 --> 00:15:12,480
and the interface is structurally flawed,

419
00:15:12,480 --> 00:15:14,440
where does governance logic actually belong?

420
00:15:14,440 --> 00:15:16,840
Graph API as the organizational nervous system.

421
00:15:16,840 --> 00:15:18,600
Microsoft Graph isn't just an API,

422
00:15:18,600 --> 00:15:20,880
it is a live map of your entire organization.

423
00:15:20,880 --> 00:15:24,360
Most people think of Graph as a way to query users, groups and files,

424
00:15:24,360 --> 00:15:26,960
but that description misses what actually makes it powerful.

425
00:15:26,960 --> 00:15:29,960
Graph captures relationships, permissions, behavior patterns

426
00:15:29,960 --> 00:15:32,080
and contextual signals in real time.

427
00:15:32,080 --> 00:15:33,440
It knows who created a file,

428
00:15:33,440 --> 00:15:36,400
what team they belong to, what project that team is assigned to,

429
00:15:36,400 --> 00:15:37,840
who has access to the document

430
00:15:37,840 --> 00:15:39,720
and what other files are related to it.

431
00:15:39,720 --> 00:15:42,880
All of this happens automatically without any user intervention

432
00:15:42,880 --> 00:15:44,960
because Graph is the underlying fabric

433
00:15:44,960 --> 00:15:48,200
that connects every Microsoft 365 service.

434
00:15:48,200 --> 00:15:49,520
This is the critical distinction.

435
00:15:49,520 --> 00:15:52,560
Manual tagging asks the user to declare what a file is.

436
00:15:52,560 --> 00:15:55,480
Graph-based governance observes what the file actually is

437
00:15:55,480 --> 00:15:57,520
in the context of the organization.

438
00:15:57,520 --> 00:16:00,480
One depends on human memory, judgment and willingness.

439
00:16:00,480 --> 00:16:02,880
The other depends on signals that are already being generated

440
00:16:02,880 --> 00:16:04,920
by the system every second of every day.

441
00:16:04,920 --> 00:16:07,080
The modern role of Graph API is best understood

442
00:16:07,080 --> 00:16:10,040
as the central nervous system for organizational intelligence.

443
00:16:10,040 --> 00:16:13,760
It connects SharePoint, OneDrive, Teams, Outlook, Azure AD

444
00:16:13,760 --> 00:16:16,920
and the Power Platform into a single programmable layer.

445
00:16:16,920 --> 00:16:18,840
When a document is created in Teams,

446
00:16:18,840 --> 00:16:20,640
Graph knows the channel, the team,

447
00:16:20,640 --> 00:16:22,760
the members, the associated SharePoint site

448
00:16:22,760 --> 00:16:24,200
and the broader project context.

449
00:16:24,200 --> 00:16:26,360
When an email is sent, Graph knows the sender,

450
00:16:26,360 --> 00:16:28,960
the recipients, their departments and their access patterns.

451
00:16:28,960 --> 00:16:30,560
These aren't abstract connections.

452
00:16:30,560 --> 00:16:31,520
They are concrete,

453
00:16:31,520 --> 00:16:34,600
queriable relationships that can drive automated decisions.

454
00:16:34,600 --> 00:16:37,360
The shift from static storage to dynamic intelligence

455
00:16:37,360 --> 00:16:39,720
is what makes automated governance possible.

456
00:16:39,720 --> 00:16:41,720
In the old model, a file was a blob in a library

457
00:16:41,720 --> 00:16:43,800
with whatever metadata a user attached.

458
00:16:43,800 --> 00:16:46,440
In the new model, a file is an entity in a living graph

459
00:16:46,440 --> 00:16:48,600
surrounded by signals that describe its purpose,

460
00:16:48,600 --> 00:16:50,800
audience, sensitivity and life cycle.

461
00:16:50,800 --> 00:16:52,760
The metadata isn't attached by a person,

462
00:16:52,760 --> 00:16:54,160
it is inferred from the graph.

463
00:16:54,160 --> 00:16:56,400
Context-aware metadata is fundamentally different

464
00:16:56,400 --> 00:16:58,240
from user-declared metadata.

465
00:16:58,240 --> 00:17:00,440
User-declared metadata is a snapshot

466
00:17:00,440 --> 00:17:02,720
of what someone thought at the moment of upload.

467
00:17:02,720 --> 00:17:05,160
Context-aware metadata is a continuous reflection

468
00:17:05,160 --> 00:17:08,080
of how the file exists within the organization structure.

469
00:17:08,080 --> 00:17:10,520
A file might start as a draft in a project channel,

470
00:17:10,520 --> 00:17:12,280
move to a formal review library

471
00:17:12,280 --> 00:17:14,840
and eventually become a record in a compliance archive.

472
00:17:14,840 --> 00:17:16,960
At each stage, its context changes.

473
00:17:16,960 --> 00:17:20,040
Its team ownership might shift, its sensitivity might increase,

474
00:17:20,040 --> 00:17:21,880
its retention requirements might extend.

475
00:17:21,880 --> 00:17:24,840
User-declared tags would remain frozen at the initial upload.

476
00:17:24,840 --> 00:17:27,040
Graph-derived context would update automatically

477
00:17:27,040 --> 00:17:29,440
because the relationships around the file have changed.

478
00:17:29,440 --> 00:17:31,520
This matters enormously for the next generation

479
00:17:31,520 --> 00:17:33,720
of enterprise search and AI agents.

480
00:17:33,720 --> 00:17:35,720
Traditional search depends on keyword matching

481
00:17:35,720 --> 00:17:37,040
and static metadata.

482
00:17:37,040 --> 00:17:38,160
If a file isn't tagged,

483
00:17:38,160 --> 00:17:40,280
it might as well not exist for most queries.

484
00:17:40,280 --> 00:17:43,080
Graph-powered search by contrast can traverse relationships.

485
00:17:43,080 --> 00:17:45,000
It can find every document related to a project

486
00:17:45,000 --> 00:17:46,440
regardless of where it's stored

487
00:17:46,440 --> 00:17:49,440
because the project relationship is maintained in the graph.

488
00:17:49,440 --> 00:17:51,720
It can surface content based on who you work with,

489
00:17:51,720 --> 00:17:53,080
what teams you belong to,

490
00:17:53,080 --> 00:17:54,760
and what you're currently focused on.

491
00:17:54,760 --> 00:17:57,280
The search doesn't ask you to remember the right keyword.

492
00:17:57,280 --> 00:18:00,440
It asks the graph, "What is relevant to your context?"

493
00:18:00,440 --> 00:18:03,040
The relationship model in graph is what makes this possible.

494
00:18:03,040 --> 00:18:06,800
Every user, group, team, channel, site, file and event is a node.

495
00:18:06,800 --> 00:18:08,720
The connections between them are edges.

496
00:18:08,720 --> 00:18:10,880
When a file is created in a team's channel,

497
00:18:10,880 --> 00:18:12,840
Graph immediately knows the channel,

498
00:18:12,840 --> 00:18:15,480
the team members, their departments, their managers

499
00:18:15,480 --> 00:18:17,200
and the projects they're assigned to.

500
00:18:17,200 --> 00:18:18,720
These relationships aren't static.

501
00:18:18,720 --> 00:18:21,720
They update continuously as people join and leave teams

502
00:18:21,720 --> 00:18:25,000
as projects change status and as access permissions shift.

503
00:18:25,000 --> 00:18:26,440
The graph is a living representation

504
00:18:26,440 --> 00:18:28,120
of how work actually happens,

505
00:18:28,120 --> 00:18:30,200
not a snapshot of how it was structured

506
00:18:30,200 --> 00:18:31,800
at the last reorganization.

507
00:18:31,800 --> 00:18:32,840
For governance purposes,

508
00:18:32,840 --> 00:18:34,560
this means that metadata can be derived

509
00:18:34,560 --> 00:18:37,040
from signals rather than declared by users.

510
00:18:37,040 --> 00:18:39,400
A file created by someone in the finance department

511
00:18:39,400 --> 00:18:42,080
in a channel associated with the quarterly close project

512
00:18:42,080 --> 00:18:43,800
during the week before quarter end

513
00:18:43,800 --> 00:18:45,640
carries a wealth of contextual signal

514
00:18:45,640 --> 00:18:47,960
that no user would ever tag manually.

515
00:18:47,960 --> 00:18:50,680
The graph API makes those signals available programmatically.

516
00:18:50,680 --> 00:18:51,920
The middle back can query them,

517
00:18:51,920 --> 00:18:53,520
map them to governance properties

518
00:18:53,520 --> 00:18:56,120
and write them back to the file as structured metadata.

519
00:18:56,120 --> 00:18:57,160
The user doesn't think.

520
00:18:57,160 --> 00:18:58,520
The file is fully described.

521
00:18:58,520 --> 00:19:00,160
The depth of graph's relationship data

522
00:19:00,160 --> 00:19:02,040
is what distinguishes it from simpler APIs

523
00:19:02,040 --> 00:19:03,840
that only expose direct properties.

524
00:19:03,840 --> 00:19:05,440
Graph supports transitive queries

525
00:19:05,440 --> 00:19:07,960
that traverse multiple relationship hops.

526
00:19:07,960 --> 00:19:10,240
A middleware query can start from a file,

527
00:19:10,240 --> 00:19:11,480
move to its creator,

528
00:19:11,480 --> 00:19:12,880
then to the creator's department,

529
00:19:12,880 --> 00:19:15,440
then to the department's assigned sensitivity level

530
00:19:15,440 --> 00:19:17,480
and then to the retention schedule associated

531
00:19:17,480 --> 00:19:19,120
with that sensitivity level.

532
00:19:19,120 --> 00:19:21,840
This four-hop traversal happens in a single API call

533
00:19:21,840 --> 00:19:23,440
because graphs, ODATA interface,

534
00:19:23,440 --> 00:19:25,240
supports expand and select parameters

535
00:19:25,240 --> 00:19:27,560
that bring related entities into the response.

536
00:19:27,560 --> 00:19:29,800
The middleware doesn't need to make four separate calls

537
00:19:29,800 --> 00:19:31,440
and join the results manually.

538
00:19:31,440 --> 00:19:32,680
It makes one expressive call

539
00:19:32,680 --> 00:19:34,320
and receives a structured object graph

540
00:19:34,320 --> 00:19:35,840
that contains everything it needs.

541
00:19:35,840 --> 00:19:38,880
This expressiveness reduces both latency and complexity.

542
00:19:38,880 --> 00:19:40,640
A middleware implementation that would require

543
00:19:40,640 --> 00:19:43,200
a dozen API calls against a traditional rest interface

544
00:19:43,200 --> 00:19:45,520
can often be accomplished in two or three calls

545
00:19:45,520 --> 00:19:46,480
against graph.

546
00:19:46,480 --> 00:19:48,040
The difference matters at scale.

547
00:19:48,040 --> 00:19:50,160
A middleware processing 10,000 events per day

548
00:19:50,160 --> 00:19:53,120
makes roughly 300,000 API calls per month

549
00:19:53,120 --> 00:19:55,000
if each event requires three calls.

550
00:19:55,000 --> 00:19:57,040
If each event required 12 calls,

551
00:19:57,040 --> 00:19:59,200
the monthly total would exceed 1 million.

552
00:19:59,200 --> 00:20:01,720
The throttling limits, latency and cost implications

553
00:20:01,720 --> 00:20:03,520
of that difference are substantial.

554
00:20:03,520 --> 00:20:06,560
Graphs relationship aware design isn't a convenience feature.

555
00:20:06,560 --> 00:20:07,920
It is an architectural decision

556
00:20:07,920 --> 00:20:10,640
that makes enterprise scale governance middleware feasible.

557
00:20:10,640 --> 00:20:12,400
Microsoft describes this capability

558
00:20:12,400 --> 00:20:14,360
in its document processing services,

559
00:20:14,360 --> 00:20:16,280
which use AI-powered classification

560
00:20:16,280 --> 00:20:19,520
to understand document types, extract key data points

561
00:20:19,520 --> 00:20:21,760
and integrate results into workflows.

562
00:20:21,760 --> 00:20:24,160
These services support intelligent document discovery,

563
00:20:24,160 --> 00:20:26,040
classification analysis and processing

564
00:20:26,040 --> 00:20:28,480
using a pay as you go model that lowers barriers

565
00:20:28,480 --> 00:20:29,640
to experimentation.

566
00:20:29,640 --> 00:20:32,280
But the real power comes from combining document processing

567
00:20:32,280 --> 00:20:34,320
with the relationship data in graph.

568
00:20:34,320 --> 00:20:36,320
A document isn't just a collection of words

569
00:20:36,320 --> 00:20:37,760
that an AI can classify.

570
00:20:37,760 --> 00:20:41,640
It is an entity with an owner, a team, a project, a timeline

571
00:20:41,640 --> 00:20:43,360
and a sensitivity profile.

572
00:20:43,360 --> 00:20:45,800
The AI can make far better classification decisions

573
00:20:45,800 --> 00:20:47,600
when it has access to that context.

574
00:20:47,600 --> 00:20:50,800
Power Platform's AI builder further democratizes this pattern

575
00:20:50,800 --> 00:20:53,360
by allowing makers to train custom classification models

576
00:20:53,360 --> 00:20:55,640
and use them directly in power automate flows.

577
00:20:55,640 --> 00:20:58,520
A flow can take text input, call an AI builder category

578
00:20:58,520 --> 00:21:01,440
classification model and use the output in subsequent actions

579
00:21:01,440 --> 00:21:03,600
such as routing or updating records.

580
00:21:03,600 --> 00:21:06,320
While the standard examples use manually triggered flows,

581
00:21:06,320 --> 00:21:08,560
the same pattern can be applied to content arriving

582
00:21:08,560 --> 00:21:10,960
from Microsoft 365 connectors.

583
00:21:10,960 --> 00:21:12,360
This enables low-code automation

584
00:21:12,360 --> 00:21:14,320
that enriches content with classification metadata

585
00:21:14,320 --> 00:21:17,240
based on custom models without requiring professional data

586
00:21:17,240 --> 00:21:18,080
scientists.

587
00:21:18,080 --> 00:21:20,280
It is a bridge between governance requirements

588
00:21:20,280 --> 00:21:23,640
and line of business teams who understand domain semantics.

589
00:21:23,640 --> 00:21:26,160
Text classification is one of the foundational tasks

590
00:21:26,160 --> 00:21:27,280
that makes this work.

591
00:21:27,280 --> 00:21:30,120
It involves categorizing texts such as documents, emails,

592
00:21:30,120 --> 00:21:33,400
social media posts or web pages into predefined classes

593
00:21:33,400 --> 00:21:35,520
based on patterns detected in the text.

594
00:21:35,520 --> 00:21:38,200
The typical workflow involves defining the objective,

595
00:21:38,200 --> 00:21:41,000
collecting text, cleaning and normalizing it,

596
00:21:41,000 --> 00:21:43,360
transforming it into numerical features,

597
00:21:43,360 --> 00:21:44,960
training machine learning models

598
00:21:44,960 --> 00:21:48,600
and then using the trained model to classify new text.

599
00:21:48,600 --> 00:21:51,440
In the context of Microsoft 365 content governance,

600
00:21:51,440 --> 00:21:54,120
text classification can infer topics, business processes

601
00:21:54,120 --> 00:21:57,160
or sensitivity levels from document content, email bodies

602
00:21:57,160 --> 00:22:00,840
or chat messages and map them to metadata fields or labels.

603
00:22:00,840 --> 00:22:02,680
Microsoft syntax provides these capabilities

604
00:22:02,680 --> 00:22:04,560
as a managed service, but the same patterns

605
00:22:04,560 --> 00:22:07,800
can be built using graph data as additional input features.

606
00:22:07,800 --> 00:22:10,000
What makes graph powerful here is that it doesn't replace

607
00:22:10,000 --> 00:22:11,520
classification algorithms.

608
00:22:11,520 --> 00:22:12,880
It feeds them better inputs.

609
00:22:12,880 --> 00:22:15,200
A classifier trying to determine whether a document

610
00:22:15,200 --> 00:22:18,160
contains sensitive financial data will perform better

611
00:22:18,160 --> 00:22:19,880
if it also knows the document was created

612
00:22:19,880 --> 00:22:21,680
by someone in the finance department,

613
00:22:21,680 --> 00:22:23,400
stored in a finance team channel

614
00:22:23,400 --> 00:22:25,920
and shared only with people who have finance roles.

615
00:22:25,920 --> 00:22:27,240
Those signals come from graph.

616
00:22:27,240 --> 00:22:28,560
They cost nothing to generate.

617
00:22:28,560 --> 00:22:29,640
They are already there.

618
00:22:29,640 --> 00:22:31,800
The only question is whether your governance architecture

619
00:22:31,800 --> 00:22:33,760
is designed to use them.

620
00:22:33,760 --> 00:22:36,080
Automated content tagging systems use algorithms,

621
00:22:36,080 --> 00:22:37,960
often based on machine learning and natural language

622
00:22:37,960 --> 00:22:40,520
processing to assign descriptive tags or metadata

623
00:22:40,520 --> 00:22:43,000
to digital content without requiring manual input

624
00:22:43,000 --> 00:22:44,080
for every item.

625
00:22:44,080 --> 00:22:47,160
The advantages include speed, consistency and coverage.

626
00:22:47,160 --> 00:22:49,640
Machines can process large volumes of content,

627
00:22:49,640 --> 00:22:52,920
far faster than humans, apply the same rules consistently

628
00:22:52,920 --> 00:22:54,520
and be tuned to detect patterns

629
00:22:54,520 --> 00:22:57,120
that may not be obvious to casual readers.

630
00:22:57,120 --> 00:22:59,920
Automated tagging also supports dynamic updates.

631
00:22:59,920 --> 00:23:02,640
When classification rules change or new tags are introduced,

632
00:23:02,640 --> 00:23:04,680
the system can reprocess existing content

633
00:23:04,680 --> 00:23:06,840
to update metadata, something that would be

634
00:23:06,840 --> 00:23:08,880
prohibitively expensive manually.

635
00:23:08,880 --> 00:23:10,880
However, automation isn't a magic bullet.

636
00:23:10,880 --> 00:23:12,680
The accuracy of automated tagging depends

637
00:23:12,680 --> 00:23:15,280
on training data, model quality, and the clarity

638
00:23:15,280 --> 00:23:16,440
of tag definitions.

639
00:23:16,440 --> 00:23:19,200
Systems trained on generic data may struggle

640
00:23:19,200 --> 00:23:22,720
with domain-specific terminology or compliance categories.

641
00:23:22,720 --> 00:23:25,080
This is why practitioners emphasize combining automated

642
00:23:25,080 --> 00:23:28,200
tagging with well-defined taxonomies and governance frameworks

643
00:23:28,200 --> 00:23:30,960
rather than treating AI as a substitute for human domain

644
00:23:30,960 --> 00:23:32,120
expertise.

645
00:23:32,120 --> 00:23:34,440
Effective implementations often use human in the loop

646
00:23:34,440 --> 00:23:36,840
approaches, where automation provides suggestions

647
00:23:36,840 --> 00:23:39,040
that are reviewed or refined by experts,

648
00:23:39,040 --> 00:23:41,280
especially for high-risk decisions.

649
00:23:41,280 --> 00:23:43,520
Even with these caveats, automated tagging

650
00:23:43,520 --> 00:23:45,040
offers a path out of the bottleneck

651
00:23:45,040 --> 00:23:48,040
created by drop-down-based manual metadata entry.

652
00:23:48,040 --> 00:23:49,480
But knowing the context isn't enough,

653
00:23:49,480 --> 00:23:51,360
you need a layer that acts on it.

654
00:23:51,360 --> 00:23:53,680
Architecting the middleware layer, think of middleware

655
00:23:53,680 --> 00:23:55,880
as a customs checkpoint for your content.

656
00:23:55,880 --> 00:23:58,520
It is a programmatic layer that sits between ingestion

657
00:23:58,520 --> 00:24:01,080
and storage, intercepting files before they reach

658
00:24:01,080 --> 00:24:03,840
their final destination and injecting governance logic

659
00:24:03,840 --> 00:24:04,840
in real time.

660
00:24:04,840 --> 00:24:06,640
Think of it as a customs checkpoint.

661
00:24:06,640 --> 00:24:09,240
Content arrives from teams, one drive, email,

662
00:24:09,240 --> 00:24:10,920
or third-party integrations.

663
00:24:10,920 --> 00:24:12,840
Before it's stored, the middleware examines it.

664
00:24:12,840 --> 00:24:14,400
It queries graph for context.

665
00:24:14,400 --> 00:24:15,920
It applies classification rules.

666
00:24:15,920 --> 00:24:17,640
It injects metadata properties.

667
00:24:17,640 --> 00:24:18,960
It assigns labels.

668
00:24:18,960 --> 00:24:21,960
And only then does it release the content to its destination.

669
00:24:21,960 --> 00:24:23,000
The user saved a file.

670
00:24:23,000 --> 00:24:24,440
The system handled governance.

671
00:24:24,440 --> 00:24:26,040
Those two actions are decoupled.

672
00:24:26,040 --> 00:24:28,360
As your functions are the natural host for this logic,

673
00:24:28,360 --> 00:24:31,040
they are serverless, event-driven, and integrate directly

674
00:24:31,040 --> 00:24:33,160
with Microsoft Graph through the SDK.

675
00:24:33,160 --> 00:24:34,920
They scale automatically with demand,

676
00:24:34,920 --> 00:24:36,640
which means a spike in content uploads

677
00:24:36,640 --> 00:24:38,720
doesn't overwhelm your governance pipeline.

678
00:24:38,720 --> 00:24:40,520
They also follow a pay-as-you-go model,

679
00:24:40,520 --> 00:24:42,840
so you can experiment with automated governance

680
00:24:42,840 --> 00:24:44,680
without committing to enterprise licensing

681
00:24:44,680 --> 00:24:46,400
before you know the pattern works.

682
00:24:46,400 --> 00:24:48,920
The Microsoft Graph SDK provides a middleware pipeline

683
00:24:48,920 --> 00:24:50,520
that's the key extensibility point

684
00:24:50,520 --> 00:24:53,400
for implementing robust ingestion architectures.

685
00:24:53,400 --> 00:24:55,840
The pipeline wraps HTTP requests and allows middleware

686
00:24:55,840 --> 00:24:58,040
components to be chained to add cross-cutting behavior,

687
00:24:58,040 --> 00:25:01,520
such as authentication, retry, logging, and telemetry.

688
00:25:01,520 --> 00:25:04,360
The pipeline is ordered, meaning each handler wraps the next,

689
00:25:04,360 --> 00:25:07,520
and middleware can inspect or modify requests and responses,

690
00:25:07,520 --> 00:25:10,040
short circuit execution, or handle errors.

691
00:25:10,040 --> 00:25:11,480
The built-in middleware components

692
00:25:11,480 --> 00:25:13,760
are essential for reliable content ingestion.

693
00:25:13,760 --> 00:25:16,360
The authentication handler injects access tokens obtained

694
00:25:16,360 --> 00:25:18,200
via Azure AD into requests,

695
00:25:18,200 --> 00:25:21,160
using client credentials flow for background ingestion jobs,

696
00:25:21,160 --> 00:25:23,960
or delegated permissions for user-initiated actions.

697
00:25:23,960 --> 00:25:26,360
The retry handler automatically retrieves failed calls

698
00:25:26,360 --> 00:25:29,240
based on status codes like 429 and 503,

699
00:25:29,240 --> 00:25:31,080
respecting the retry after headers

700
00:25:31,080 --> 00:25:33,400
that graph returns when throttling occurs.

701
00:25:33,400 --> 00:25:35,000
This is critical for ingestion workloads

702
00:25:35,000 --> 00:25:38,160
that hit throttling, which is almost inevitable at scale.

703
00:25:38,160 --> 00:25:41,600
The redirect handler follows HTTP 3XX responses,

704
00:25:41,600 --> 00:25:44,480
where graph or underlying services redirect requests.

705
00:25:44,480 --> 00:25:46,680
The compression handler adds support for compressed payloads,

706
00:25:46,680 --> 00:25:48,600
which is particularly helpful for bandwidth-sensitive

707
00:25:48,600 --> 00:25:49,720
ingestion scenarios.

708
00:25:49,720 --> 00:25:52,680
Custom middleware is where governance-specific logic lives.

709
00:25:52,680 --> 00:25:55,040
The SDK explicitly supports custom middleware

710
00:25:55,040 --> 00:25:56,680
by implementing a handler interface

711
00:25:56,680 --> 00:25:58,160
and adding it to the pipeline.

712
00:25:58,160 --> 00:26:00,120
This is where you build logging and telemetry

713
00:26:00,120 --> 00:26:02,800
handlers for ingestion traces, circuit-breaker style

714
00:26:02,800 --> 00:26:05,600
handlers to pause ingestion on repeated failures,

715
00:26:05,600 --> 00:26:07,920
policy enforcement handlers for PII filtering

716
00:26:07,920 --> 00:26:10,080
or tenant-specific compliance rules,

717
00:26:10,080 --> 00:26:12,080
and multi-tenant routing for SAS products

718
00:26:12,080 --> 00:26:14,320
ingesting content from many tenants.

719
00:26:14,320 --> 00:26:15,840
For governance implementations,

720
00:26:15,840 --> 00:26:17,920
the typical middleware stack looks like this.

721
00:26:17,920 --> 00:26:19,480
Authentication ensures the pipeline

722
00:26:19,480 --> 00:26:22,440
has the right permissions to read and write metadata.

723
00:26:22,440 --> 00:26:25,160
Logging captures every decision for audit purposes,

724
00:26:25,160 --> 00:26:27,520
retry handles throttling and transient failures.

725
00:26:27,520 --> 00:26:30,160
A custom governance handler queries graph for context,

726
00:26:30,160 --> 00:26:33,040
applies classification rules, and injects property bags,

727
00:26:33,040 --> 00:26:35,000
compression reduces payload size,

728
00:26:35,000 --> 00:26:36,840
and the request finally reaches graph

729
00:26:36,840 --> 00:26:39,520
to write the enriched metadata back to the file.

730
00:26:39,520 --> 00:26:40,800
The event-driven pattern is what

731
00:26:40,800 --> 00:26:42,760
makes this architecture practical.

732
00:26:42,760 --> 00:26:44,280
Instead of polling for new content,

733
00:26:44,280 --> 00:26:46,960
which is inefficient and always slightly out of date,

734
00:26:46,960 --> 00:26:49,480
you use webhooks or Microsoft 365 connectors

735
00:26:49,480 --> 00:26:52,320
to trigger the middleware when content is created or modified.

736
00:26:52,320 --> 00:26:54,720
The moment a file is uploaded to a SharePoint library,

737
00:26:54,720 --> 00:26:58,120
a Teams channel, or a OneDrive folder, and event fires.

738
00:26:58,120 --> 00:27:00,440
The Azure Function wakes up, processes the file,

739
00:27:00,440 --> 00:27:02,120
and completes its work in seconds.

740
00:27:02,120 --> 00:27:03,720
The user experience is no delay.

741
00:27:03,720 --> 00:27:05,360
The governance happens invisibly.

742
00:27:05,360 --> 00:27:07,560
Most organizations miss this distinction.

743
00:27:07,560 --> 00:27:09,440
They think automated governance means running

744
00:27:09,440 --> 00:27:12,200
a nightly batch job that scans the entire state

745
00:27:12,200 --> 00:27:13,880
and fixes whatever it finds.

746
00:27:13,880 --> 00:27:14,960
That isn't governance.

747
00:27:14,960 --> 00:27:16,640
That is clean up with a schedule.

748
00:27:16,640 --> 00:27:19,360
Real governance happens before the file settles into storage,

749
00:27:19,360 --> 00:27:20,200
not after.

750
00:27:20,200 --> 00:27:21,200
The difference isn't technical.

751
00:27:21,200 --> 00:27:23,840
It is structural, and it determines whether your metadata

752
00:27:23,840 --> 00:27:26,000
is a foundation or a repair job.

753
00:27:26,000 --> 00:27:28,320
Webhook registration requires a subscription endpoint

754
00:27:28,320 --> 00:27:30,240
that graph can call when changes occur.

755
00:27:30,240 --> 00:27:32,520
You register the webhook by specifying the resource

756
00:27:32,520 --> 00:27:34,320
you want to monitor, such as a SharePoint site

757
00:27:34,320 --> 00:27:36,680
or a Teams channel, and the callback URL,

758
00:27:36,680 --> 00:27:38,600
where graph should send notifications.

759
00:27:38,600 --> 00:27:41,000
The subscription is validated through a handshake process,

760
00:27:41,000 --> 00:27:43,640
where graph sends a validation token to your endpoint,

761
00:27:43,640 --> 00:27:45,840
and expects it back within a short timeout.

762
00:27:45,840 --> 00:27:48,000
Once established, the subscription remains active

763
00:27:48,000 --> 00:27:50,640
for a default period, after which it must be renewed.

764
00:27:50,640 --> 00:27:53,000
Most implementations handle this renewal automatically

765
00:27:53,000 --> 00:27:54,960
as part of the function's startup logic.

766
00:27:54,960 --> 00:27:57,320
The notification payload itself is lightweight.

767
00:27:57,320 --> 00:27:59,440
It doesn't contain the full file content,

768
00:27:59,440 --> 00:28:02,400
which would be impractical for large files and high volumes.

769
00:28:02,400 --> 00:28:04,520
Instead, it contains the file identifier,

770
00:28:04,520 --> 00:28:06,720
the change type, and a change token.

771
00:28:06,720 --> 00:28:08,320
The middleware uses this information

772
00:28:08,320 --> 00:28:10,480
to query graph for the details it needs.

773
00:28:10,480 --> 00:28:12,360
This separation of concerns is important

774
00:28:12,360 --> 00:28:15,400
because it keeps the webhook pipeline fast and resilient.

775
00:28:15,400 --> 00:28:18,640
If a single notification fails, the subscription doesn't break.

776
00:28:18,640 --> 00:28:21,200
The middleware simply processes the next notification

777
00:28:21,200 --> 00:28:22,640
and catches up on missed changes

778
00:28:22,640 --> 00:28:26,360
through Delta queries during its next scheduled synchronization.

779
00:28:26,360 --> 00:28:30,000
Microsoft 365 Connectors provide an alternative trigger mechanism

780
00:28:30,000 --> 00:28:31,480
for Teams and Outlook.

781
00:28:31,480 --> 00:28:35,680
Connectors can post events to a webhook URL when messages are sent,

782
00:28:35,680 --> 00:28:38,080
meetings are scheduled, or files are shared,

783
00:28:38,080 --> 00:28:40,760
while connectors have some overlap with graph webhooks,

784
00:28:40,760 --> 00:28:43,760
they're often easier to configure for specific Teams channels

785
00:28:43,760 --> 00:28:45,200
and can be set up by team owners

786
00:28:45,200 --> 00:28:47,800
without tenant administrator privileges.

787
00:28:47,800 --> 00:28:49,840
This makes them useful for pilot scenarios

788
00:28:49,840 --> 00:28:52,280
where you want to test the middleware on a single team's content

789
00:28:52,280 --> 00:28:55,320
before rolling out tenant-wide webhook subscriptions.

790
00:28:55,320 --> 00:28:56,880
Delta queries complement this pattern

791
00:28:56,880 --> 00:28:58,720
for keeping metadata fresh over time.

792
00:28:58,720 --> 00:29:00,800
A Delta query tracks changes in a resource

793
00:29:00,800 --> 00:29:03,040
since the last query, allowing the middleware

794
00:29:03,040 --> 00:29:05,720
to detect when a file has been modified, moved,

795
00:29:05,720 --> 00:29:07,360
or shared with new people.

796
00:29:07,360 --> 00:29:09,120
Instead of rescanning the entire estate,

797
00:29:09,120 --> 00:29:11,080
the middleware only processes what changed.

798
00:29:11,080 --> 00:29:14,520
This is essential for maintaining accurate context-aware metadata

799
00:29:14,520 --> 00:29:16,400
because a file sensitivity and relevance

800
00:29:16,400 --> 00:29:18,200
often shift over its life cycle.

801
00:29:18,200 --> 00:29:20,160
A draft shared within a small team is different

802
00:29:20,160 --> 00:29:22,880
from a final version distributed to external partners.

803
00:29:22,880 --> 00:29:26,080
Delta queries ensure the metadata keeps pace with reality.

804
00:29:26,080 --> 00:29:28,080
The middleware layer also integrates naturally

805
00:29:28,080 --> 00:29:29,600
with Microsoft PerView.

806
00:29:29,600 --> 00:29:31,600
PerView offers built-in pattern detectors,

807
00:29:31,600 --> 00:29:34,000
trainable classifiers, and custom information

808
00:29:34,000 --> 00:29:37,320
types that recognize sensitive data like credit card numbers,

809
00:29:37,320 --> 00:29:39,800
social security numbers, and health records.

810
00:29:39,800 --> 00:29:42,440
The middleware can invoke these detectors during ingestion,

811
00:29:42,440 --> 00:29:44,680
receive classification results in real time,

812
00:29:44,680 --> 00:29:47,680
and apply the corresponding sensitivity labels automatically.

813
00:29:47,680 --> 00:29:50,520
This closes the gap between discovering sensitive content

814
00:29:50,520 --> 00:29:53,360
and enforcing protection rules so labels are applied

815
00:29:53,360 --> 00:29:55,200
at the point of creation rather than

816
00:29:55,200 --> 00:29:57,040
through retrospective cleanup.

817
00:29:57,040 --> 00:29:59,280
Real-time interception at the point of creation

818
00:29:59,280 --> 00:30:02,320
is the architectural principle that makes all of this work,

819
00:30:02,320 --> 00:30:05,920
the old model stores content first and governs it later, if ever.

820
00:30:05,920 --> 00:30:07,880
The new model governs content before it's stored,

821
00:30:07,880 --> 00:30:11,200
this isn't a minor optimization, it is a structural inversion.

822
00:30:11,200 --> 00:30:13,560
Governance moves from a retrospective cleanup task

823
00:30:13,560 --> 00:30:15,280
to a preventive control and prevention

824
00:30:15,280 --> 00:30:16,960
is always cheaper than remediation.

825
00:30:16,960 --> 00:30:19,360
The Azure function implementation typically follows

826
00:30:19,360 --> 00:30:22,120
an event-driven architecture that scales automatically

827
00:30:22,120 --> 00:30:23,000
with demand.

828
00:30:23,000 --> 00:30:24,520
When a file is uploaded to SharePoint,

829
00:30:24,520 --> 00:30:26,560
the platform generates a web-hook notification

830
00:30:26,560 --> 00:30:27,760
that triggers the function.

831
00:30:27,760 --> 00:30:30,400
The function receives a payload containing the file identifier,

832
00:30:30,400 --> 00:30:33,680
the site identifier, the user identifier, and a change token.

833
00:30:33,680 --> 00:30:36,640
It then authenticates to graph using either managed identity

834
00:30:36,640 --> 00:30:39,200
or a service principle with application permissions.

835
00:30:39,200 --> 00:30:41,520
Managed identity is preferred for production

836
00:30:41,520 --> 00:30:44,960
because it eliminates the need to store and rotate client secrets.

837
00:30:44,960 --> 00:30:46,840
Once authenticated, the function queries

838
00:30:46,840 --> 00:30:48,600
graph for the file's current properties,

839
00:30:48,600 --> 00:30:50,680
the uploader's profile and group memberships,

840
00:30:50,680 --> 00:30:52,680
and the containing site's metadata.

841
00:30:52,680 --> 00:30:54,680
These queries are batched where possible to minimize

842
00:30:54,680 --> 00:30:57,040
API call overhead, the function then applies

843
00:30:57,040 --> 00:30:59,400
the classification rules, which can be stored

844
00:30:59,400 --> 00:31:01,440
in an external configuration store,

845
00:31:01,440 --> 00:31:05,880
like Azure app configuration or a simple JSON file in blob storage.

846
00:31:05,880 --> 00:31:07,360
This externalization is important

847
00:31:07,360 --> 00:31:09,720
because it allows governance teams to update rules

848
00:31:09,720 --> 00:31:11,560
without redeploying the function code.

849
00:31:11,560 --> 00:31:13,960
A new retention schedule or a new sensitivity mapping

850
00:31:13,960 --> 00:31:15,920
can be applied by updating the configuration,

851
00:31:15,920 --> 00:31:17,800
not by pushing a code release.

852
00:31:17,800 --> 00:31:20,520
After classification, the function writes the metadata back

853
00:31:20,520 --> 00:31:23,440
to the file using graph's schema extension endpoints.

854
00:31:23,440 --> 00:31:25,280
It then calls purview or the security

855
00:31:25,280 --> 00:31:28,120
and compliance center APIs to apply sensitivity labels

856
00:31:28,120 --> 00:31:29,880
if the classification warrants them.

857
00:31:29,880 --> 00:31:32,240
Finally, it writes a log entry to Azure Monitor

858
00:31:32,240 --> 00:31:34,720
or a dedicated logging table for audit purposes.

859
00:31:34,720 --> 00:31:37,080
The entire call start execution typically completes

860
00:31:37,080 --> 00:31:39,480
in under 10 seconds and warm executions

861
00:31:39,480 --> 00:31:41,080
complete in two to four seconds.

862
00:31:41,080 --> 00:31:43,320
For high volume estates, the function can be configured

863
00:31:43,320 --> 00:31:45,640
with premium plans that keep instances warm

864
00:31:45,640 --> 00:31:47,760
and reduce latency to under one second.

865
00:31:47,760 --> 00:31:49,240
Once the middleware layer exists,

866
00:31:49,240 --> 00:31:50,880
you have to decide what to inject.

867
00:31:50,880 --> 00:31:52,400
Dynamic property injection.

868
00:31:52,400 --> 00:31:54,880
Property bags are one of the most underutilized mechanisms

869
00:31:54,880 --> 00:31:57,680
for lightweight metadata in Microsoft 365.

870
00:31:57,680 --> 00:32:00,160
They are essentially key value pairs that can be attached

871
00:32:00,160 --> 00:32:02,480
to SharePoint sites, lists or items

872
00:32:02,480 --> 00:32:05,360
and they provide a flexible way to store additional context

873
00:32:05,360 --> 00:32:08,400
without modifying the formal content type schema.

874
00:32:08,400 --> 00:32:10,080
In the context of automated governance,

875
00:32:10,080 --> 00:32:13,680
property bags are ideal for storing context-aware metadata

876
00:32:13,680 --> 00:32:16,000
that the middleware derives from graph signals.

877
00:32:16,000 --> 00:32:17,880
They are lightweight, queryable,

878
00:32:17,880 --> 00:32:19,600
and can be updated dynamically

879
00:32:19,600 --> 00:32:21,720
without requiring a formal schema change.

880
00:32:21,720 --> 00:32:23,400
This makes them perfect for the kind

881
00:32:23,400 --> 00:32:25,400
of rapidly evolving metadata

882
00:32:25,400 --> 00:32:27,880
that automated governance produces.

883
00:32:27,880 --> 00:32:29,680
Open extensions and schema extensions

884
00:32:29,680 --> 00:32:31,960
are not the same thing and the difference matters

885
00:32:31,960 --> 00:32:33,040
for governance.

886
00:32:33,040 --> 00:32:35,720
Open extensions allow any application to add custom data

887
00:32:35,720 --> 00:32:38,960
to a resource, but they lack strong typing and discoverability.

888
00:32:38,960 --> 00:32:41,440
Schema extensions by contrast define a schema

889
00:32:41,440 --> 00:32:43,160
with typed properties that can be registered

890
00:32:43,160 --> 00:32:45,560
in the tenant and discovered by other applications.

891
00:32:45,560 --> 00:32:48,400
For governance scenarios, schema extensions are preferable

892
00:32:48,400 --> 00:32:50,280
because they provide stronger typing,

893
00:32:50,280 --> 00:32:53,600
better discoverability and more reliable querying.

894
00:32:53,600 --> 00:32:55,480
When a middleware layer injects metadata

895
00:32:55,480 --> 00:32:58,560
using schema extensions, other applications and services

896
00:32:58,560 --> 00:33:00,480
can understand and trust that metadata

897
00:33:00,480 --> 00:33:02,160
because it follows a published schema.

898
00:33:02,160 --> 00:33:05,000
The injection process itself is straightforward in concept

899
00:33:05,000 --> 00:33:06,960
but requires careful implementation.

900
00:33:06,960 --> 00:33:08,720
When the middleware intercepts a file,

901
00:33:08,720 --> 00:33:11,360
it queries graph for the relevant context signals

902
00:33:11,360 --> 00:33:13,600
who created the file, what team do they belong to,

903
00:33:13,600 --> 00:33:15,560
what project is associated with that team,

904
00:33:15,560 --> 00:33:18,120
what sensitivity level does the project require,

905
00:33:18,120 --> 00:33:20,520
the middleware then maps these signals to property values

906
00:33:20,520 --> 00:33:23,400
and writes them to the file as schema extension properties.

907
00:33:23,400 --> 00:33:26,960
Registring a schema extension is a one time administrative task

908
00:33:26,960 --> 00:33:29,520
that defines the metadata structure for your tenant.

909
00:33:29,520 --> 00:33:32,080
You create a schema definition that names each property,

910
00:33:32,080 --> 00:33:34,520
specifies its data type and describes its purpose.

911
00:33:34,520 --> 00:33:37,480
For governance purposes, a typical schema might define

912
00:33:37,480 --> 00:33:40,320
project code, department, content type

913
00:33:40,320 --> 00:33:42,400
and sensitivity level as string values,

914
00:33:42,400 --> 00:33:44,680
while retention uses defined as an integer.

915
00:33:44,680 --> 00:33:46,960
Once registered, this schema becomes available

916
00:33:46,960 --> 00:33:49,800
across the tenant and can be applied to any drive item

917
00:33:49,800 --> 00:33:51,280
or list item resource.

918
00:33:51,280 --> 00:33:54,040
Other applications can discover the schema by querying graph's

919
00:33:54,040 --> 00:33:56,680
schema extension catalog, which means the metadata

920
00:33:56,680 --> 00:33:57,960
isn't hidden in a custom field

921
00:33:57,960 --> 00:33:59,680
that only your middleware understands.

922
00:33:59,680 --> 00:34:01,960
It is a published, typed, discoverable structure

923
00:34:01,960 --> 00:34:04,520
that any authorized application can read and write.

924
00:34:04,520 --> 00:34:06,920
The registration process requires administrator consent

925
00:34:06,920 --> 00:34:09,760
because schema extensions modify the tenant's data model.

926
00:34:09,760 --> 00:34:11,920
This is a security feature not an inconvenience.

927
00:34:11,920 --> 00:34:14,800
It ensures that only governed, approved metadata structures

928
00:34:14,800 --> 00:34:16,800
are added to the organization's graph schema.

929
00:34:16,800 --> 00:34:18,640
Once approved, the schema is versioned

930
00:34:18,640 --> 00:34:20,560
and can't be arbitrarily modified

931
00:34:20,560 --> 00:34:22,080
without a new registration.

932
00:34:22,080 --> 00:34:24,640
This stability is essential for downstream applications

933
00:34:24,640 --> 00:34:27,240
that depend on the schema, such as search indexes,

934
00:34:27,240 --> 00:34:30,280
compliance reports and AI training pipelines.

935
00:34:30,280 --> 00:34:32,680
What this means in practice is that your metadata

936
00:34:32,680 --> 00:34:35,560
becomes a first-class citizen of your data architecture.

937
00:34:35,560 --> 00:34:37,040
It isn't a hidden custom field

938
00:34:37,040 --> 00:34:38,920
that only one application understands.

939
00:34:38,920 --> 00:34:41,120
It is a published, typed, discoverable structure

940
00:34:41,120 --> 00:34:43,360
that any authorized service can query, filter,

941
00:34:43,360 --> 00:34:44,200
and reason about.

942
00:34:44,200 --> 00:34:45,960
That is the difference between a governance hack

943
00:34:45,960 --> 00:34:47,120
and a governance platform.

944
00:34:47,120 --> 00:34:50,200
For example, a project document created in a Teams channel

945
00:34:50,200 --> 00:34:51,800
might automatically receive properties

946
00:34:51,800 --> 00:34:54,560
indicating the project code, the department,

947
00:34:54,560 --> 00:34:57,160
the current phase, the assigned sensitivity level

948
00:34:57,160 --> 00:34:59,080
and the retention schedule.

949
00:34:59,080 --> 00:35:01,000
None of these properties were selected by a user

950
00:35:01,000 --> 00:35:01,960
from a drop-down.

951
00:35:01,960 --> 00:35:04,080
They were derived from the organizational context

952
00:35:04,080 --> 00:35:05,800
that already existed in graph.

953
00:35:05,800 --> 00:35:08,240
The project code came from the Teams associated project.

954
00:35:08,240 --> 00:35:10,400
The department came from the team owner's department.

955
00:35:10,400 --> 00:35:13,160
The sensitivity came from the project's classification.

956
00:35:13,160 --> 00:35:15,120
The retention schedule came from a mapping table

957
00:35:15,120 --> 00:35:16,880
that the governance team maintains.

958
00:35:16,880 --> 00:35:19,160
Dynamic property updates are equally important.

959
00:35:19,160 --> 00:35:21,000
A file's context changes over time.

960
00:35:21,000 --> 00:35:23,920
A document might move from draft to review to final.

961
00:35:23,920 --> 00:35:26,160
A project might shift from active to archived.

962
00:35:26,160 --> 00:35:28,320
A security level might escalate after a merger

963
00:35:28,320 --> 00:35:29,600
or regulatory change.

964
00:35:29,600 --> 00:35:31,360
The middleware can detect these changes

965
00:35:31,360 --> 00:35:33,160
through delta queries or webhooks

966
00:35:33,160 --> 00:35:35,320
and update the file's properties accordingly.

967
00:35:35,320 --> 00:35:37,160
This ensures that metadata remains accurate

968
00:35:37,160 --> 00:35:38,360
throughout the content life cycle

969
00:35:38,360 --> 00:35:41,120
from the initial upload through every subsequent change.

970
00:35:41,120 --> 00:35:42,600
The delta query and webhook pattern

971
00:35:42,600 --> 00:35:44,520
is what makes continuous freshness possible

972
00:35:44,520 --> 00:35:45,880
without constant polling.

973
00:35:45,880 --> 00:35:48,360
Delta queries track changes in a resource collection,

974
00:35:48,360 --> 00:35:50,040
returning only items that have changed

975
00:35:50,040 --> 00:35:51,680
since the last synchronization,

976
00:35:51,680 --> 00:35:54,040
webhooks push notifications to the middleware

977
00:35:54,040 --> 00:35:55,800
when specific events occur,

978
00:35:55,800 --> 00:35:57,960
such as a file being modified or shared.

979
00:35:57,960 --> 00:35:59,720
Together, these patterns allow the middleware

980
00:35:59,720 --> 00:36:01,600
to maintain an up-to-date governance layer

981
00:36:01,600 --> 00:36:03,120
without the performance overhead

982
00:36:03,120 --> 00:36:05,920
of repeatedly scanning the entire content estate.

983
00:36:05,920 --> 00:36:08,160
This pattern scales because the work is proportional

984
00:36:08,160 --> 00:36:10,320
to change, not to total volume.

985
00:36:10,320 --> 00:36:12,200
In an organization with 10 million files,

986
00:36:12,200 --> 00:36:14,120
a daily full scan is impractical,

987
00:36:14,120 --> 00:36:16,400
but if only 1,000 files change per day,

988
00:36:16,400 --> 00:36:18,800
delta queries and webhooks ensure the middleware

989
00:36:18,800 --> 00:36:20,880
only processes those 1,000.

990
00:36:20,880 --> 00:36:22,760
The computational cost remains manageable

991
00:36:22,760 --> 00:36:24,320
even as the estate grows

992
00:36:24,320 --> 00:36:25,640
because growth in total storage

993
00:36:25,640 --> 00:36:28,480
doesn't translate to growth in daily change volume.

994
00:36:28,480 --> 00:36:31,240
Accuracy and evaluation are critical considerations.

995
00:36:31,240 --> 00:36:33,080
Automated classification hinges on metrics

996
00:36:33,080 --> 00:36:35,720
such as precision, recall, and F1 score

997
00:36:35,720 --> 00:36:37,040
and different governance scenarios

998
00:36:37,040 --> 00:36:39,160
may prioritize different aspects of performance.

999
00:36:39,160 --> 00:36:41,160
In a compliant scenario, false negatives,

1000
00:36:41,160 --> 00:36:43,720
which are failures to detect sensitive content,

1001
00:36:43,720 --> 00:36:45,880
may be more costly than false positives,

1002
00:36:45,880 --> 00:36:47,920
models should be tuned accordingly.

1003
00:36:47,920 --> 00:36:49,880
Governance teams must establish thresholds

1004
00:36:49,880 --> 00:36:52,560
for acceptable error rates, design review processes

1005
00:36:52,560 --> 00:36:55,840
for borderline cases, and involve subject matter experts

1006
00:36:55,840 --> 00:36:57,640
invalidating model outputs.

1007
00:36:57,640 --> 00:36:59,440
Human in the loop patterns remain important

1008
00:36:59,440 --> 00:37:01,760
in high-risk domains, where automation handles

1009
00:37:01,760 --> 00:37:04,000
baseline classification and human reviewers

1010
00:37:04,000 --> 00:37:06,600
focus on exceptions, contentious categories,

1011
00:37:06,600 --> 00:37:08,000
or policy changes.

1012
00:37:08,000 --> 00:37:10,440
The hybrid model where automation handles discovery,

1013
00:37:10,440 --> 00:37:12,400
classification, and policy triggers

1014
00:37:12,400 --> 00:37:15,160
while humans manage exceptions, taxonomy design

1015
00:37:15,160 --> 00:37:17,560
and high-risk review is the most practical approach

1016
00:37:17,560 --> 00:37:19,000
for most organizations.

1017
00:37:19,000 --> 00:37:21,840
It reduces cost without fully removing human judgment

1018
00:37:21,840 --> 00:37:24,440
from governance decisions that require business context.

1019
00:37:24,440 --> 00:37:26,040
This aligns with governance frameworks

1020
00:37:26,040 --> 00:37:28,800
that assign roles for data stewardship and accountability,

1021
00:37:28,800 --> 00:37:31,240
ensuring that metadata and classification systems

1022
00:37:31,240 --> 00:37:34,040
are actively managed rather than set and forgotten.

1023
00:37:34,040 --> 00:37:35,680
But let's step back from the architecture

1024
00:37:35,680 --> 00:37:38,760
and look at what this actually looks like in a real organization.

1025
00:37:38,760 --> 00:37:40,960
What this actually looks like in practice?

1026
00:37:40,960 --> 00:37:42,840
Consider a typical Tuesday morning,

1027
00:37:42,840 --> 00:37:46,040
a marketing manager creates a campaign brief in Microsoft Teams.

1028
00:37:46,040 --> 00:37:47,600
She attaches a budget spreadsheet,

1029
00:37:47,600 --> 00:37:49,960
a creative brief document, and a vendor contract.

1030
00:37:49,960 --> 00:37:51,640
In the old model, she hits save.

1031
00:37:51,640 --> 00:37:54,000
Three files land in the Team SharePoint library.

1032
00:37:54,000 --> 00:37:55,440
A drop-down appears for each file,

1033
00:37:55,440 --> 00:37:58,400
asking for department, project, content type, sensitivity,

1034
00:37:58,400 --> 00:37:59,600
and retention category.

1035
00:37:59,600 --> 00:38:00,960
She has a meeting in two minutes.

1036
00:38:00,960 --> 00:38:02,040
She clicks the defaults.

1037
00:38:02,040 --> 00:38:04,440
The files are now stored with meaningless metadata

1038
00:38:04,440 --> 00:38:06,680
that doesn't reflect what they actually are.

1039
00:38:06,680 --> 00:38:08,320
Six months later, the compliance team

1040
00:38:08,320 --> 00:38:09,800
needs to find all vendor contracts

1041
00:38:09,800 --> 00:38:11,280
created in the last quarter.

1042
00:38:11,280 --> 00:38:13,920
They run a search filtered by content type and date.

1043
00:38:13,920 --> 00:38:15,080
The new contract doesn't appear

1044
00:38:15,080 --> 00:38:17,520
because it was tagged as general business document.

1045
00:38:17,520 --> 00:38:18,680
It is dark data.

1046
00:38:18,680 --> 00:38:19,520
It exists.

1047
00:38:19,520 --> 00:38:20,360
It is stored.

1048
00:38:20,360 --> 00:38:20,960
It is backed up.

1049
00:38:20,960 --> 00:38:23,120
But it's invisible to the systems that need it.

1050
00:38:23,120 --> 00:38:24,880
In the new model, the same upload triggers

1051
00:38:24,880 --> 00:38:26,520
are completely different sequence.

1052
00:38:26,520 --> 00:38:28,760
The file hits SharePoint, a webhook fires,

1053
00:38:28,760 --> 00:38:31,400
the Azure Function Middleware wakes up and queries graph.

1054
00:38:31,400 --> 00:38:34,120
It learns that the uploader is in the marketing department.

1055
00:38:34,120 --> 00:38:35,840
It learns that the team is associated

1056
00:38:35,840 --> 00:38:37,920
with the Q3 campaign project.

1057
00:38:37,920 --> 00:38:39,840
It examines the file content and detects

1058
00:38:39,840 --> 00:38:42,600
financial terms, vendor names, and contractual language.

1059
00:38:42,600 --> 00:38:44,200
It checks the project's classification

1060
00:38:44,200 --> 00:38:45,880
in the governance system and finds

1061
00:38:45,880 --> 00:38:48,320
that vendor contracts require a three-year retention

1062
00:38:48,320 --> 00:38:50,080
and a confidentiality label.

1063
00:38:50,080 --> 00:38:52,040
All of this happens in under five seconds.

1064
00:38:52,040 --> 00:38:54,240
The middleware then writes schema extension properties

1065
00:38:54,240 --> 00:38:56,600
to the file, project code Q3 campaign,

1066
00:38:56,600 --> 00:38:58,000
department marketing content type,

1067
00:38:58,000 --> 00:39:00,160
vendor contract, sensitivity confidential,

1068
00:39:00,160 --> 00:39:01,200
retention three years.

1069
00:39:01,200 --> 00:39:03,320
It applies the appropriate purview sensitivity label

1070
00:39:03,320 --> 00:39:04,320
automatically.

1071
00:39:04,320 --> 00:39:05,920
It sets a DLP policy trigger based

1072
00:39:05,920 --> 00:39:07,560
on the financial data detected

1073
00:39:07,560 --> 00:39:09,640
and only then does the file settle into storage.

1074
00:39:09,640 --> 00:39:11,280
The user saved three files.

1075
00:39:11,280 --> 00:39:12,760
Nothing else was asked of her.

1076
00:39:12,760 --> 00:39:14,720
The governance happened in the architecture,

1077
00:39:14,720 --> 00:39:16,080
not in the interface.

1078
00:39:16,080 --> 00:39:18,240
Let us look at the technical sequence in more detail

1079
00:39:18,240 --> 00:39:20,280
because this is where the architecture proves itself.

1080
00:39:20,280 --> 00:39:22,800
The file upload triggers a SharePoint webhook

1081
00:39:22,800 --> 00:39:25,240
that posts to the Azure Function endpoint.

1082
00:39:25,240 --> 00:39:27,520
The function authenticates using the graph SDK's

1083
00:39:27,520 --> 00:39:28,800
authentication handler,

1084
00:39:28,800 --> 00:39:30,640
which manages token refresh and permission

1085
00:39:30,640 --> 00:39:31,920
scoping automatically.

1086
00:39:31,920 --> 00:39:33,360
It then queries the graph API

1087
00:39:33,360 --> 00:39:35,400
for three categories of signal in parallel.

1088
00:39:35,400 --> 00:39:37,040
The files direct properties,

1089
00:39:37,040 --> 00:39:39,120
the uploader's organizational context

1090
00:39:39,120 --> 00:39:41,480
and the teams associated metadata.

1091
00:39:41,480 --> 00:39:43,920
The file properties query returns the file name,

1092
00:39:43,920 --> 00:39:45,720
size, type and a content preview

1093
00:39:45,720 --> 00:39:48,040
that the middleware can scan for pattern matching.

1094
00:39:48,040 --> 00:39:50,480
The uploader context query returns their department,

1095
00:39:50,480 --> 00:39:53,760
role, manager and project assignments from Azure AD.

1096
00:39:53,760 --> 00:39:55,680
The team metadata query returns the team's

1097
00:39:55,680 --> 00:39:58,640
associated project code, sensitivity classification,

1098
00:39:58,640 --> 00:40:00,680
retention schedule and membership list

1099
00:40:00,680 --> 00:40:03,040
from the associated SharePoint side properties.

1100
00:40:03,040 --> 00:40:04,840
All three queries complete in under two seconds

1101
00:40:04,840 --> 00:40:07,160
because they're batched and the graph API is optimized

1102
00:40:07,160 --> 00:40:09,520
for exactly this kind of relationship traversal.

1103
00:40:09,520 --> 00:40:12,000
The middleware then runs a classification pipeline.

1104
00:40:12,000 --> 00:40:13,960
First, it applies deterministic rules.

1105
00:40:13,960 --> 00:40:16,240
Project Code Q3 campaign is assigned

1106
00:40:16,240 --> 00:40:18,680
because the team is associated with that project.

1107
00:40:18,680 --> 00:40:20,200
Department marketing is assigned

1108
00:40:20,200 --> 00:40:23,160
because the uploader's Azure AD profile lists marketing

1109
00:40:23,160 --> 00:40:24,280
as their department.

1110
00:40:24,280 --> 00:40:27,280
Second, it runs pattern detection on the file content.

1111
00:40:27,280 --> 00:40:29,120
The vendor contract contains financial terms,

1112
00:40:29,120 --> 00:40:30,720
a vendor name and contractual language

1113
00:40:30,720 --> 00:40:33,600
that matches the trained classifier for vendor agreement.

1114
00:40:33,600 --> 00:40:35,800
The budget spreadsheet contains numerical data

1115
00:40:35,800 --> 00:40:38,960
in columns labeled, budget, forecast and actual

1116
00:40:38,960 --> 00:40:41,840
that match the pattern for financial plan.

1117
00:40:41,840 --> 00:40:44,200
The creative brief contains campaign terminology

1118
00:40:44,200 --> 00:40:46,120
and brand language that matches the pattern

1119
00:40:46,120 --> 00:40:47,680
for marketing creative.

1120
00:40:47,680 --> 00:40:50,080
Third, the middleware resolves any conflicts.

1121
00:40:50,080 --> 00:40:52,360
If the file content suggests financial document

1122
00:40:52,360 --> 00:40:54,640
but the team context suggests marketing project,

1123
00:40:54,640 --> 00:40:56,440
the middleware applies a priority rule

1124
00:40:56,440 --> 00:40:58,800
that marketing project context takes precedence

1125
00:40:58,800 --> 00:41:00,440
for files in this team's channel.

1126
00:41:00,440 --> 00:41:03,040
This is configurable, pertinent and reflects the reality

1127
00:41:03,040 --> 00:41:05,440
that organizational context often matters more

1128
00:41:05,440 --> 00:41:08,240
than content analysis for classification accuracy.

1129
00:41:08,240 --> 00:41:10,600
The middleware logs this decision for audit purposes,

1130
00:41:10,600 --> 00:41:12,360
so a compliance officer can later review

1131
00:41:12,360 --> 00:41:14,520
why a financial looking document was classified

1132
00:41:14,520 --> 00:41:16,520
as marketing rather than finance.

1133
00:41:16,520 --> 00:41:18,360
Finally, the middleware writes the metadata.

1134
00:41:18,360 --> 00:41:20,680
It creates a schema extension instance on each file

1135
00:41:20,680 --> 00:41:23,640
with properties for project, department, content type,

1136
00:41:23,640 --> 00:41:25,120
sensitivity and retention.

1137
00:41:25,120 --> 00:41:26,960
It calls the PerView API to apply

1138
00:41:26,960 --> 00:41:28,800
the appropriate sensitivity label.

1139
00:41:28,800 --> 00:41:31,040
It sets a DLP trigger on the vendor contract

1140
00:41:31,040 --> 00:41:33,040
because it detected financial data.

1141
00:41:33,040 --> 00:41:35,920
And it records the entire transaction in an audit log

1142
00:41:35,920 --> 00:41:37,240
that includes the source signals,

1143
00:41:37,240 --> 00:41:40,000
the classification rules applied, the confidence scores

1144
00:41:40,000 --> 00:41:41,680
and the final metadata values.

1145
00:41:41,680 --> 00:41:43,280
The whole process takes four seconds.

1146
00:41:43,280 --> 00:41:44,840
The user is already in her next meeting.

1147
00:41:44,840 --> 00:41:47,080
When the compliance team runs their quarterly search,

1148
00:41:47,080 --> 00:41:48,440
the contract appears instantly

1149
00:41:48,440 --> 00:41:50,920
because its metadata is accurate and queryable.

1150
00:41:50,920 --> 00:41:53,560
When DLP scans for sensitive financial data,

1151
00:41:53,560 --> 00:41:55,160
the contract is correctly classified

1152
00:41:55,160 --> 00:41:57,880
and routed through the appropriate approval workflow.

1153
00:41:57,880 --> 00:41:59,520
When Copilot needs to answer a question

1154
00:41:59,520 --> 00:42:00,760
about campaign spending,

1155
00:42:00,760 --> 00:42:02,320
it can find the budget spreadsheet

1156
00:42:02,320 --> 00:42:05,560
because the project metadata links it to the right context.

1157
00:42:05,560 --> 00:42:07,240
When the retention period expires,

1158
00:42:07,240 --> 00:42:09,960
the system knows exactly when to delete or archive the file

1159
00:42:09,960 --> 00:42:12,320
because the metadata was set correctly from day one.

1160
00:42:12,320 --> 00:42:13,920
This isn't a theoretical architecture.

1161
00:42:13,920 --> 00:42:16,520
It is a pattern that organizations are already implementing.

1162
00:42:16,520 --> 00:42:19,240
The pieces are all native to Microsoft 365.

1163
00:42:19,240 --> 00:42:21,440
The Graph API is a first-party service.

1164
00:42:21,440 --> 00:42:23,520
Azure Functions are a first-party platform.

1165
00:42:23,520 --> 00:42:25,560
PerView is a first-party governance tool.

1166
00:42:25,560 --> 00:42:26,880
The middleware is custom code,

1167
00:42:26,880 --> 00:42:29,920
but it's built from standard SDKs and standard patterns.

1168
00:42:29,920 --> 00:42:31,600
No exotic technology is required.

1169
00:42:31,600 --> 00:42:33,360
The challenge isn't technical feasibility.

1170
00:42:33,360 --> 00:42:35,880
The challenge is recognizing that the old model can't scale

1171
00:42:35,880 --> 00:42:38,080
and that the new model is already available.

1172
00:42:38,080 --> 00:42:40,840
The user experience is what makes this transformative.

1173
00:42:40,840 --> 00:42:42,600
In the old model governance is friction.

1174
00:42:42,600 --> 00:42:45,360
It is a form to fill out, a drop-down to scroll through,

1175
00:42:45,360 --> 00:42:48,080
a decision to make when the user is trying to do something else.

1176
00:42:48,080 --> 00:42:50,040
In the new model governance is invisible.

1177
00:42:50,040 --> 00:42:51,160
It happens in the background.

1178
00:42:51,160 --> 00:42:52,600
The user focuses on their work,

1179
00:42:52,600 --> 00:42:54,240
the system focuses on compliance.

1180
00:42:54,240 --> 00:42:56,000
Those two goals are no longer in conflict.

1181
00:42:56,000 --> 00:42:58,040
Adoption resistance often surprises teams

1182
00:42:58,040 --> 00:43:01,440
that expect users to celebrate the removal of metadata forms.

1183
00:43:01,440 --> 00:43:02,640
Some users do celebrate.

1184
00:43:02,640 --> 00:43:05,960
Others worry that the system is making decisions they can't see.

1185
00:43:05,960 --> 00:43:07,880
They worry that their files will be misclassified

1186
00:43:07,880 --> 00:43:09,320
and shared with the wrong people.

1187
00:43:09,320 --> 00:43:11,360
They worry that automation removes their control

1188
00:43:11,360 --> 00:43:13,120
over how their content is governed.

1189
00:43:13,120 --> 00:43:14,440
These concerns are legitimate

1190
00:43:14,440 --> 00:43:17,360
and must be addressed through transparency rather than dismissal.

1191
00:43:17,360 --> 00:43:19,640
Transparency in automated governance means users

1192
00:43:19,640 --> 00:43:22,280
can see what metadata was applied to their files and why.

1193
00:43:22,280 --> 00:43:24,240
It means they can request a review

1194
00:43:24,240 --> 00:43:26,200
when they believe a classification is wrong.

1195
00:43:26,200 --> 00:43:27,960
It means they can view the audit trail

1196
00:43:27,960 --> 00:43:31,280
that shows which signals were used and which rules were applied.

1197
00:43:31,280 --> 00:43:33,040
And it means they receive notifications

1198
00:43:33,040 --> 00:43:35,880
when their files are classified as sensitive or restricted,

1199
00:43:35,880 --> 00:43:38,000
not because the system is punishing them,

1200
00:43:38,000 --> 00:43:39,320
but because they deserve to know

1201
00:43:39,320 --> 00:43:41,720
when their content carries elevated protection.

1202
00:43:41,720 --> 00:43:44,840
Trust is earned through visibility, not assumed through automation.

1203
00:43:44,840 --> 00:43:46,040
One file is easy.

1204
00:43:46,040 --> 00:43:49,600
The real question is what happens when you do this for everything?

1205
00:43:49,600 --> 00:43:52,280
Governance at the point of action.

1206
00:43:52,280 --> 00:43:54,720
The old model of governance is retrospective.

1207
00:43:54,720 --> 00:43:56,560
Content gets created. It sits in storage.

1208
00:43:56,560 --> 00:43:59,000
Periodically, someone reviews it maybe quarterly,

1209
00:43:59,000 --> 00:44:00,120
maybe annually.

1210
00:44:00,120 --> 00:44:01,800
They discover mislabeled files.

1211
00:44:01,800 --> 00:44:04,000
They find sensitive data in public libraries.

1212
00:44:04,000 --> 00:44:05,520
They uncover retention violations

1213
00:44:05,520 --> 00:44:07,440
that have been accumulating for months.

1214
00:44:07,440 --> 00:44:08,280
Then they remediate.

1215
00:44:08,280 --> 00:44:09,640
This is governance as cleanup

1216
00:44:09,640 --> 00:44:12,000
and cleanup is always more expensive than prevention.

1217
00:44:12,000 --> 00:44:13,400
The new model is preventive.

1218
00:44:13,400 --> 00:44:15,240
Governance happens at the point of action,

1219
00:44:15,240 --> 00:44:17,200
which means it happens at the exact moment

1220
00:44:17,200 --> 00:44:19,400
content is created, modified or shared.

1221
00:44:19,400 --> 00:44:20,720
The middleware intercepts the file

1222
00:44:20,720 --> 00:44:22,640
before it ever reaches static storage.

1223
00:44:22,640 --> 00:44:25,520
It applies classification, labels and retention rules

1224
00:44:25,520 --> 00:44:26,280
in real time.

1225
00:44:26,280 --> 00:44:28,240
There is no gap between creation and governance

1226
00:44:28,240 --> 00:44:30,520
because the two are part of the same workflow.

1227
00:44:30,520 --> 00:44:31,840
Shifting from retrospective review

1228
00:44:31,840 --> 00:44:33,920
to real time enforcement isn't a subtle improvement.

1229
00:44:33,920 --> 00:44:35,200
It is a structural inversion

1230
00:44:35,200 --> 00:44:37,520
that changes the economics of governance entirely.

1231
00:44:37,520 --> 00:44:39,600
Retrospective governance scales linearly

1232
00:44:39,600 --> 00:44:40,760
with content volume.

1233
00:44:40,760 --> 00:44:43,080
Because more content means more files to review,

1234
00:44:43,080 --> 00:44:45,960
more exceptions to handle and more remediation to perform.

1235
00:44:45,960 --> 00:44:48,520
Real time governance scales with the rate of change,

1236
00:44:48,520 --> 00:44:51,120
which grows far more slowly than total volume.

1237
00:44:51,120 --> 00:44:52,920
An organization with 10 million files

1238
00:44:52,920 --> 00:44:56,320
and 1,000 daily changes processes 1,000 events per day.

1239
00:44:56,320 --> 00:44:58,520
The same organization using retrospective governance

1240
00:44:58,520 --> 00:45:00,920
must eventually review all 10 million files.

1241
00:45:00,920 --> 00:45:03,720
Automated tagging ensures consistent policy application

1242
00:45:03,720 --> 00:45:05,200
across the entire estate.

1243
00:45:05,200 --> 00:45:07,280
In the manual model, coverage is patchy.

1244
00:45:07,280 --> 00:45:09,320
Diligent teams produce well-governed content.

1245
00:45:09,320 --> 00:45:10,640
Other teams produce chaos.

1246
00:45:10,640 --> 00:45:13,160
The policies only work where the metadata is accurate,

1247
00:45:13,160 --> 00:45:15,800
which means they only work on a fraction of the total estate.

1248
00:45:15,800 --> 00:45:18,240
Automated governance eliminates this coverage gap

1249
00:45:18,240 --> 00:45:20,320
by applying the same rules to every file,

1250
00:45:20,320 --> 00:45:22,120
regardless of which team created it

1251
00:45:22,120 --> 00:45:23,920
or how careful the user was.

1252
00:45:23,920 --> 00:45:27,120
This consistency is what makes enterprise policies trust worthy.

1253
00:45:27,120 --> 00:45:30,080
A DLP rule that says block sharing of confidential documents

1254
00:45:30,080 --> 00:45:32,320
outside the organization only works

1255
00:45:32,320 --> 00:45:34,920
if confidential documents are correctly identified.

1256
00:45:34,920 --> 00:45:36,400
A retention policy that says

1257
00:45:36,400 --> 00:45:38,720
delete general business documents after seven years

1258
00:45:38,720 --> 00:45:41,920
only works if general business documents are correctly labeled.

1259
00:45:41,920 --> 00:45:43,440
When labeling is inconsistent,

1260
00:45:43,440 --> 00:45:45,440
policies behave inconsistently.

1261
00:45:45,440 --> 00:45:47,040
They miss risks they should catch

1262
00:45:47,040 --> 00:45:48,920
and they block work they should allow.

1263
00:45:48,920 --> 00:45:50,840
Users learn not to trust the system

1264
00:45:50,840 --> 00:45:52,200
and they find workarounds.

1265
00:45:52,200 --> 00:45:54,320
Governance becomes theatre.

1266
00:45:54,320 --> 00:45:55,840
The total cost of ownership comparison

1267
00:45:55,840 --> 00:45:57,640
between manual and automated governance

1268
00:45:57,640 --> 00:46:00,240
tells a clear story over a three-year horizon.

1269
00:46:00,240 --> 00:46:02,360
Manual governance looks cheaper in the first year

1270
00:46:02,360 --> 00:46:04,840
because it avoids licensing and integration costs.

1271
00:46:04,840 --> 00:46:07,600
Over time though, the labor burden grows through analyst hours,

1272
00:46:07,600 --> 00:46:10,800
remediation work, exception reviews, periodic audits

1273
00:46:10,800 --> 00:46:12,960
and the constant drag of keeping metadata

1274
00:46:12,960 --> 00:46:14,760
current in a shifting environment.

1275
00:46:14,760 --> 00:46:16,800
By year three, the labor cost of manual governance

1276
00:46:16,800 --> 00:46:20,000
at enterprise scale usually exceeds the cost of an automated platform.

1277
00:46:20,000 --> 00:46:22,440
A 2026 governance tool buyer guide

1278
00:46:22,440 --> 00:46:24,600
suggests that the rule of thumb budget threshold

1279
00:46:24,600 --> 00:46:27,800
for enterprise grade tooling is at least $150,000

1280
00:46:27,800 --> 00:46:29,240
over three years.

1281
00:46:29,240 --> 00:46:32,080
Below that, a lighter native approach may be more appropriate.

1282
00:46:32,080 --> 00:46:35,160
That threshold is important because it frames the decision point.

1283
00:46:35,160 --> 00:46:38,120
If your organization is spending more than $50,000 per year

1284
00:46:38,120 --> 00:46:41,240
on manual metadata stewardship, remediation

1285
00:46:41,240 --> 00:46:43,160
and search failure recovery,

1286
00:46:43,160 --> 00:46:44,480
you have already crossed the line

1287
00:46:44,480 --> 00:46:46,640
where automation would have been cheaper.

1288
00:46:46,640 --> 00:46:48,840
The cost comparison becomes even clearer

1289
00:46:48,840 --> 00:46:50,360
when you include the secondary costs

1290
00:46:50,360 --> 00:46:52,880
that manual governance generates but rarely tracks.

1291
00:46:52,880 --> 00:46:54,480
Time spent by compliance officers

1292
00:46:54,480 --> 00:46:56,320
manually reviewing mislabeled files.

1293
00:46:56,320 --> 00:46:58,880
Time spent by IT support fielding search complaints.

1294
00:46:58,880 --> 00:47:01,040
Time spent by legal teams preparing for audits

1295
00:47:01,040 --> 00:47:03,000
without reliable metadata reports.

1296
00:47:03,000 --> 00:47:05,080
Time spent by security teams investigating

1297
00:47:05,080 --> 00:47:08,360
DLP false positives that stem from inconsistent tagging.

1298
00:47:08,360 --> 00:47:11,760
These activities don't appear on a metadata governance budget line

1299
00:47:11,760 --> 00:47:14,280
and here is the part that most cost analyses miss.

1300
00:47:14,280 --> 00:47:17,800
Every hour spent on remediation is an hour not spent on improvement.

1301
00:47:17,800 --> 00:47:20,600
Your governance team is so busy fixing yesterday's mistakes

1302
00:47:20,600 --> 00:47:23,400
that they have no capacity to design tomorrow's architecture

1303
00:47:23,400 --> 00:47:25,280
that is the hidden tax of manual governance.

1304
00:47:25,280 --> 00:47:26,280
It doesn't just cost money.

1305
00:47:26,280 --> 00:47:29,360
It consumes the time and attention of the very people

1306
00:47:29,360 --> 00:47:31,200
who could be building something better.

1307
00:47:31,200 --> 00:47:32,520
They appear on compliance budgets,

1308
00:47:32,520 --> 00:47:36,080
IT support budgets, legal budgets and security operations budgets

1309
00:47:36,080 --> 00:47:38,400
but they're all caused by the same underlying problem.

1310
00:47:38,400 --> 00:47:41,200
Metadata that was never reliable in the first place.

1311
00:47:41,200 --> 00:47:43,440
Automated governance consolidates these costs.

1312
00:47:43,440 --> 00:47:45,160
The middleware runs on Azure Functions

1313
00:47:45,160 --> 00:47:46,960
with predictable compute costs.

1314
00:47:46,960 --> 00:47:51,200
The graph API calls are included in standard Microsoft 365 licensing.

1315
00:47:51,200 --> 00:47:54,760
The Pervue integration uses existing sensitivity label infrastructure.

1316
00:47:54,760 --> 00:47:56,560
The primary investment is upfront,

1317
00:47:56,560 --> 00:47:59,120
building the middleware, designing the taxonomy

1318
00:47:59,120 --> 00:48:00,560
and training the models.

1319
00:48:00,560 --> 00:48:03,000
After that, the ongoing cost scales with content volume

1320
00:48:03,000 --> 00:48:05,280
at a far slower rate than manual labor would.

1321
00:48:05,280 --> 00:48:08,320
An organization that processes one million files per year

1322
00:48:08,320 --> 00:48:11,960
might spend $20,000 in Azure compute and licensing.

1323
00:48:11,960 --> 00:48:14,400
The manual equivalent at 30 seconds per file

1324
00:48:14,400 --> 00:48:16,880
would cost more than 8,000 hours of labor.

1325
00:48:16,880 --> 00:48:19,560
At an average loaded cost of $60 per hour,

1326
00:48:19,560 --> 00:48:21,400
that's nearly $500,000.

1327
00:48:21,400 --> 00:48:22,480
The math isn't close.

1328
00:48:22,480 --> 00:48:25,400
The hybrid model of automation plus human oversight

1329
00:48:25,400 --> 00:48:28,440
also has a cost structure that manual governance can't match.

1330
00:48:28,440 --> 00:48:29,440
In the automated model,

1331
00:48:29,440 --> 00:48:33,400
human labor is reserved for exceptions, edge cases and high-risk decisions.

1332
00:48:33,400 --> 00:48:35,440
A governance team of three people can oversee

1333
00:48:35,440 --> 00:48:37,080
an estate of 10 million files

1334
00:48:37,080 --> 00:48:39,000
because the middleware handles the routine

1335
00:48:39,000 --> 00:48:41,160
and only escalates the unusual.

1336
00:48:41,160 --> 00:48:42,160
In the manual model,

1337
00:48:42,160 --> 00:48:44,080
the same estate would require dozens of people

1338
00:48:44,080 --> 00:48:47,600
across departments to tag, review and remediate content continuously.

1339
00:48:47,600 --> 00:48:49,880
The labor requirement grows linearly with volume.

1340
00:48:49,880 --> 00:48:51,800
The automated model grows logarithmically

1341
00:48:51,800 --> 00:48:53,520
because exceptions are a small fraction

1342
00:48:53,520 --> 00:48:55,400
of total volume regardless of scale.

1343
00:48:55,400 --> 00:48:57,680
The hybrid model of automation plus human oversight

1344
00:48:57,680 --> 00:48:59,680
is where most organizations will land

1345
00:48:59,680 --> 00:49:02,280
and it's the most practical path for most estates.

1346
00:49:02,280 --> 00:49:05,320
Automation handles the baseline, discovery, classification,

1347
00:49:05,320 --> 00:49:07,960
lineage capture and policy triggers.

1348
00:49:07,960 --> 00:49:11,120
Humans manage exceptions, taxonomy design and high-risk review.

1349
00:49:11,120 --> 00:49:14,200
This keeps costs down while preserving human judgment

1350
00:49:14,200 --> 00:49:16,080
for the decisions that actually need it.

1351
00:49:16,080 --> 00:49:18,320
It also provides a natural escalation path.

1352
00:49:18,320 --> 00:49:19,920
When the middleware encounters a file

1353
00:49:19,920 --> 00:49:21,920
it can't classify with sufficient confidence.

1354
00:49:21,920 --> 00:49:25,120
It roots the file to a human reviewer rather than guessing.

1355
00:49:25,120 --> 00:49:27,320
The system is automated but not autonomous.

1356
00:49:27,320 --> 00:49:29,920
The elimination of the coverage gap has a secondary benefit

1357
00:49:29,920 --> 00:49:31,120
that's easy to overlook.

1358
00:49:31,120 --> 00:49:33,520
When governance is consistent across the entire estate,

1359
00:49:33,520 --> 00:49:35,720
analytics and reporting become trustworthy.

1360
00:49:35,720 --> 00:49:38,320
You can query your content store and believe the results.

1361
00:49:38,320 --> 00:49:39,920
You can answer executive questions

1362
00:49:39,920 --> 00:49:42,480
about sensitive data exposure, retention compliance

1363
00:49:42,480 --> 00:49:44,320
and content growth with confidence.

1364
00:49:44,320 --> 00:49:46,240
In the manual model reports are always qualified

1365
00:49:46,240 --> 00:49:48,920
with caveats about incomplete metadata.

1366
00:49:48,920 --> 00:49:51,920
In the automated model reports reflect reality

1367
00:49:51,920 --> 00:49:54,320
because the metadata layer is complete.

1368
00:49:54,320 --> 00:49:56,720
The transformation of governance teams themselves

1369
00:49:56,720 --> 00:49:58,520
is another overlooked consequence.

1370
00:49:58,520 --> 00:50:01,320
In the manual model, governance teams spend most of their time

1371
00:50:01,320 --> 00:50:03,520
on low value repetitive tasks.

1372
00:50:03,520 --> 00:50:06,120
Their review untouched files, they send reminder emails,

1373
00:50:06,120 --> 00:50:07,920
they remediate mislabeled content,

1374
00:50:07,920 --> 00:50:10,320
they argue with departments about compliance deadlines.

1375
00:50:10,320 --> 00:50:12,520
It is tedious work that burns out talented people

1376
00:50:12,520 --> 00:50:14,520
and produces modest results.

1377
00:50:14,520 --> 00:50:17,320
In the automated model, governance teams become architects

1378
00:50:17,320 --> 00:50:18,720
rather than custodians.

1379
00:50:18,720 --> 00:50:21,320
They design classification rules, they tune taxonomy models,

1380
00:50:21,320 --> 00:50:22,720
they analyze exception patterns,

1381
00:50:22,720 --> 00:50:24,720
they advise the business on data strategy

1382
00:50:24,720 --> 00:50:27,120
rather than chasing users to fill out forms.

1383
00:50:27,120 --> 00:50:29,520
The work becomes creative and strategic

1384
00:50:29,520 --> 00:50:31,720
rather than repetitive and adversarial.

1385
00:50:31,720 --> 00:50:34,120
Moral improves because the team can see its impact

1386
00:50:34,120 --> 00:50:36,720
in measurable outcomes rather than in spreadsheet rows

1387
00:50:36,720 --> 00:50:38,520
of manually corrected tags.

1388
00:50:38,520 --> 00:50:40,720
And the organization gets more value from its governance

1389
00:50:40,720 --> 00:50:43,320
investment because the people are focused on design

1390
00:50:43,320 --> 00:50:44,320
rather than cleanup.

1391
00:50:44,320 --> 00:50:46,920
Microsoft purview integration strengthens this further

1392
00:50:46,920 --> 00:50:48,920
by providing a unified governance layer

1393
00:50:48,920 --> 00:50:50,920
that spans structured and unstructured data

1394
00:50:50,920 --> 00:50:53,520
across multi-cloud and on-premises sources.

1395
00:50:53,520 --> 00:50:56,520
Purview's data map scans data assets and captures metadata

1396
00:50:56,520 --> 00:50:58,720
while the unified catalog allows organizations

1397
00:50:58,720 --> 00:51:01,520
to build governance domains, curate data products

1398
00:51:01,520 --> 00:51:04,320
and connect data assets to business concepts.

1399
00:51:04,320 --> 00:51:07,920
When the middleware injects graph-derived metadata into files

1400
00:51:07,920 --> 00:51:10,320
that metadata becomes visible to purview scanning

1401
00:51:10,320 --> 00:51:11,720
and cataloging capabilities,

1402
00:51:11,720 --> 00:51:13,720
the organization gets a single view of governance

1403
00:51:13,720 --> 00:51:15,720
across all repositories including the ones

1404
00:51:15,720 --> 00:51:17,320
that were never tagged manually.

1405
00:51:17,320 --> 00:51:20,120
The reporting transformation is one of the most immediate benefits

1406
00:51:20,120 --> 00:51:21,320
of automated governance.

1407
00:51:21,320 --> 00:51:24,320
In the manual model compliance reports are always qualified.

1408
00:51:24,320 --> 00:51:26,720
The governance team can report that 90% of files

1409
00:51:26,720 --> 00:51:28,720
in the finance library have retention labels

1410
00:51:28,720 --> 00:51:30,520
but they can't report on the marketing library

1411
00:51:30,520 --> 00:51:32,320
because nobody tagged those files.

1412
00:51:32,320 --> 00:51:35,920
They can report the DLP rules 12 incidents last quarter

1413
00:51:35,920 --> 00:51:38,320
but they can't say how many incidents were missed

1414
00:51:38,320 --> 00:51:40,120
because files were mislabeled.

1415
00:51:40,120 --> 00:51:42,920
Every report comes with caveats, estimates and gaps.

1416
00:51:42,920 --> 00:51:46,320
In the automated model, reporting becomes precise and comprehensive

1417
00:51:46,320 --> 00:51:49,520
because every file is classified at the point of creation.

1418
00:51:49,520 --> 00:51:53,120
The governance team can report exact counts for any content type,

1419
00:51:53,120 --> 00:51:54,920
any department, any time period.

1420
00:51:54,920 --> 00:51:56,320
They can answer questions like

1421
00:51:56,320 --> 00:51:58,920
how many confidential documents were created in teams channels

1422
00:51:58,920 --> 00:52:00,720
with external guests last month?

1423
00:52:00,720 --> 00:52:03,920
Or what percentage of project files have complete metadata

1424
00:52:03,920 --> 00:52:05,720
within 24 hours of creation?

1425
00:52:05,720 --> 00:52:07,120
These are operational questions

1426
00:52:07,120 --> 00:52:09,520
that manual governance can't answer with confidence.

1427
00:52:09,520 --> 00:52:10,720
They are strategic questions

1428
00:52:10,720 --> 00:52:12,520
that automated governance answers

1429
00:52:12,520 --> 00:52:14,720
as a byproduct of normal operation.

1430
00:52:14,720 --> 00:52:16,120
The confidence improvement ripples

1431
00:52:16,120 --> 00:52:18,120
through every governance conversation.

1432
00:52:18,120 --> 00:52:20,320
When the chief compliance officer meets with the board

1433
00:52:20,320 --> 00:52:22,320
she can present numbers that are defensible

1434
00:52:22,320 --> 00:52:23,720
rather than estimated.

1435
00:52:23,720 --> 00:52:25,520
When the chief information security officer

1436
00:52:25,520 --> 00:52:27,120
reviews DLP effectiveness

1437
00:52:27,120 --> 00:52:30,520
he can distinguish between policy failures and classification failures.

1438
00:52:30,520 --> 00:52:33,320
When the chief technology officer evaluates AI readiness

1439
00:52:33,320 --> 00:52:36,120
she can measure metadata coverage as a concrete percentage

1440
00:52:36,120 --> 00:52:38,320
rather than a subjective assessment.

1441
00:52:38,320 --> 00:52:41,120
Governance shifts from a softer assurance to a hard metric

1442
00:52:41,120 --> 00:52:43,120
and hard metrics are what justify budgets,

1443
00:52:43,120 --> 00:52:45,520
drive priorities and demonstrate value.

1444
00:52:45,520 --> 00:52:47,520
But governance isn't just about finding files,

1445
00:52:47,520 --> 00:52:48,920
it is about not getting fined.

1446
00:52:48,920 --> 00:52:50,720
Compliance without human friction.

1447
00:52:50,720 --> 00:52:52,920
Manual tagging creates compliance exposure

1448
00:52:52,920 --> 00:52:55,320
in ways that most organizations don't measure

1449
00:52:55,320 --> 00:52:57,120
until an audit forces them to look.

1450
00:52:57,120 --> 00:53:00,120
Mislabeled sensitive data is the most obvious risk.

1451
00:53:01,120 --> 00:53:03,520
A customer health record tagged as general business

1452
00:53:03,520 --> 00:53:06,520
is invisible to DLP, invisible to compliance scanning

1453
00:53:06,520 --> 00:53:08,320
and invisible to retention policies.

1454
00:53:08,320 --> 00:53:11,120
It sits in public libraries gets shared with unauthorized users

1455
00:53:11,120 --> 00:53:12,720
and accumulates regulatory violations

1456
00:53:12,720 --> 00:53:15,720
that nobody detects until the auditor asks the right question.

1457
00:53:15,720 --> 00:53:18,120
In consistent retention is another hidden cost.

1458
00:53:18,120 --> 00:53:20,920
One team applies seven year retention to financial records.

1459
00:53:20,920 --> 00:53:22,720
Another team applies three years.

1460
00:53:22,720 --> 00:53:25,520
A third team forgets to apply any retention at all.

1461
00:53:25,520 --> 00:53:27,920
When the organization is sued or audited

1462
00:53:27,920 --> 00:53:30,520
it can't produce a coherent record of what was kept,

1463
00:53:30,520 --> 00:53:31,920
what was deleted and why.

1464
00:53:31,920 --> 00:53:33,320
The policy exists on paper,

1465
00:53:33,320 --> 00:53:36,120
but the execution is fragmented across hundreds of teams

1466
00:53:36,120 --> 00:53:37,520
and thousands of libraries.

1467
00:53:37,520 --> 00:53:40,120
Auditors don't accept policy documents as evidence.

1468
00:53:40,120 --> 00:53:43,120
They want proof that the policy was applied systematically.

1469
00:53:43,120 --> 00:53:46,720
Automated classification and regulatory frameworks align naturally

1470
00:53:46,720 --> 00:53:49,320
when the classification is derived from system signals

1471
00:53:49,320 --> 00:53:50,720
rather than user guesses.

1472
00:53:50,720 --> 00:53:53,120
GDPR requires organizations to know

1473
00:53:53,120 --> 00:53:54,520
where personal data lives,

1474
00:53:54,520 --> 00:53:57,120
who has access to it and how long it's retained.

1475
00:53:57,120 --> 00:54:00,320
HIPAA requires similar visibility for health information.

1476
00:54:00,320 --> 00:54:02,520
Sector-specific regulations in finance,

1477
00:54:02,520 --> 00:54:04,520
legal and government impose additional metadata

1478
00:54:04,520 --> 00:54:05,920
and retention requirements.

1479
00:54:05,920 --> 00:54:08,520
All of these frameworks depend on accurate classification.

1480
00:54:08,520 --> 00:54:10,520
An accurate classification at scale

1481
00:54:10,520 --> 00:54:12,720
isn't something manual tagging can deliver.

1482
00:54:12,720 --> 00:54:14,720
Sensitivity labels in Microsoft purview

1483
00:54:14,720 --> 00:54:17,520
are the foundation for a wide range of protective actions.

1484
00:54:17,520 --> 00:54:20,320
They control encryption, access restrictions,

1485
00:54:20,320 --> 00:54:23,120
content marking and conditional access integration.

1486
00:54:23,120 --> 00:54:24,920
When labels are applied automatically

1487
00:54:24,920 --> 00:54:28,120
by the middleware based on content analysis and graph context,

1488
00:54:28,120 --> 00:54:30,320
they're applied consistently and immediately.

1489
00:54:30,320 --> 00:54:32,920
There is no delay between creation and protection.

1490
00:54:32,920 --> 00:54:35,920
There is no dependency on user training or user attention.

1491
00:54:35,920 --> 00:54:39,320
The label follows the content because the architecture was designed that way.

1492
00:54:39,320 --> 00:54:41,720
The false positive problem is equally important.

1493
00:54:41,720 --> 00:54:43,720
When manual tagging is inconsistent,

1494
00:54:43,720 --> 00:54:46,920
DLP systems generate either too many alerts or too few.

1495
00:54:46,920 --> 00:54:48,920
If users over-tag content are sensitive,

1496
00:54:48,920 --> 00:54:51,320
security teams are overwhelmed with false alarms

1497
00:54:51,320 --> 00:54:53,320
and users are blocked from legitimate work.

1498
00:54:53,320 --> 00:54:56,720
If users under-tag content real-risk slip through undetected,

1499
00:54:56,720 --> 00:54:59,120
both outcomes erode trust in the governance system.

1500
00:54:59,120 --> 00:55:00,720
Users find workarounds,

1501
00:55:00,720 --> 00:55:02,720
security teams start ignoring alerts,

1502
00:55:02,720 --> 00:55:04,520
the entire control layer degrades.

1503
00:55:04,520 --> 00:55:06,520
Automated labeling specifically addresses

1504
00:55:06,520 --> 00:55:09,120
the false positive problem through tunable thresholds,

1505
00:55:09,120 --> 00:55:11,120
in a high-confident scenario

1506
00:55:11,120 --> 00:55:13,120
where the middleware detects a clear pattern

1507
00:55:13,120 --> 00:55:15,720
like a credit card number in a known finance document,

1508
00:55:15,720 --> 00:55:18,920
it can apply the label immediately with no human review.

1509
00:55:18,920 --> 00:55:20,120
In a borderline scenario,

1510
00:55:20,120 --> 00:55:22,120
where the content contains ambiguous terms

1511
00:55:22,120 --> 00:55:24,520
that might indicate sensitivity or might be innocent,

1512
00:55:24,520 --> 00:55:26,520
the middleware can apply a tentative label

1513
00:55:26,520 --> 00:55:29,520
and root the file to a human reviewer for confirmation.

1514
00:55:29,520 --> 00:55:31,720
This 2TIR approach reduces the volume of alerts

1515
00:55:31,720 --> 00:55:33,120
that reach security teams,

1516
00:55:33,120 --> 00:55:35,920
while ensuring that nothing truly sensitive is ignored.

1517
00:55:35,920 --> 00:55:37,320
The tunability is key.

1518
00:55:37,320 --> 00:55:40,120
Different organizations will set different thresholds

1519
00:55:40,120 --> 00:55:41,720
based on their risk tolerance,

1520
00:55:41,720 --> 00:55:45,320
regulatory environment, and available review capacity.

1521
00:55:45,320 --> 00:55:47,320
Automated labeling avoids this degradation

1522
00:55:47,320 --> 00:55:49,520
by applying the same standards every time.

1523
00:55:49,520 --> 00:55:51,320
The middleware uses defined classifiers,

1524
00:55:51,320 --> 00:55:53,720
trainable classifiers, and sensitive information types

1525
00:55:53,720 --> 00:55:55,720
that detect patterns like credit card numbers,

1526
00:55:55,720 --> 00:55:58,720
social security numbers, or health identifiers.

1527
00:55:58,720 --> 00:56:01,320
It can be tuned to favor recall over precision

1528
00:56:01,320 --> 00:56:02,920
in high-risk scenarios,

1529
00:56:02,920 --> 00:56:04,520
ensuring that borderline cases are flagged

1530
00:56:04,520 --> 00:56:06,320
for review rather than ignored.

1531
00:56:06,320 --> 00:56:08,520
And because the decisions are logged and traceable,

1532
00:56:08,520 --> 00:56:11,320
auditors can see exactly why each label was applied

1533
00:56:11,320 --> 00:56:13,720
and how the system arrived at that conclusion.

1534
00:56:13,720 --> 00:56:16,320
Regulatory frameworks often require demonstrable,

1535
00:56:16,320 --> 00:56:18,520
repeatable processes for identifying

1536
00:56:18,520 --> 00:56:20,120
and protecting sensitive data.

1537
00:56:20,120 --> 00:56:21,720
Manual tagging is hard to audit

1538
00:56:21,720 --> 00:56:23,520
because it depends on individual decisions

1539
00:56:23,520 --> 00:56:25,720
that are rarely documented.

1540
00:56:25,720 --> 00:56:28,520
The user picked general business from a drop-down.

1541
00:56:28,520 --> 00:56:29,720
Why was it intentional?

1542
00:56:29,720 --> 00:56:30,520
Was it a mistake?

1543
00:56:30,520 --> 00:56:31,320
Was it fatigue?

1544
00:56:31,320 --> 00:56:32,920
No log entry answers those questions.

1545
00:56:32,920 --> 00:56:34,520
Automated classification,

1546
00:56:34,520 --> 00:56:36,520
configured through centralized governance tools

1547
00:56:36,520 --> 00:56:37,920
and documented policies,

1548
00:56:37,920 --> 00:56:39,120
provides a defensible basis

1549
00:56:39,120 --> 00:56:40,920
for demonstrating due diligence.

1550
00:56:40,920 --> 00:56:42,520
The middleware logs every decision,

1551
00:56:42,520 --> 00:56:43,320
the rules are published,

1552
00:56:43,320 --> 00:56:44,320
the models are versioned,

1553
00:56:44,320 --> 00:56:45,720
the audit trail is complete.

1554
00:56:45,720 --> 00:56:48,520
Retention policies and labels in Microsoft 365

1555
00:56:48,520 --> 00:56:50,520
can be used to retain or delete content

1556
00:56:50,520 --> 00:56:51,720
according to prescribed rules

1557
00:56:51,720 --> 00:56:54,920
with logs and reports supporting accountability.

1558
00:56:54,920 --> 00:56:56,680
But the effectiveness of these policies

1559
00:56:56,680 --> 00:56:58,120
depends on correct scoping,

1560
00:56:58,120 --> 00:57:00,120
which is tied to metadata and labels.

1561
00:57:00,120 --> 00:57:02,520
When the middleware applies retention metadata,

1562
00:57:02,520 --> 00:57:04,520
automatically at the point of creation,

1563
00:57:04,520 --> 00:57:07,120
the retention clock starts correctly from day one.

1564
00:57:07,120 --> 00:57:08,720
Nobody has to guess retroactively

1565
00:57:08,720 --> 00:57:09,920
about when a file was created

1566
00:57:09,920 --> 00:57:11,520
or what category it belongs to.

1567
00:57:11,520 --> 00:57:12,720
The metadata is accurate

1568
00:57:12,720 --> 00:57:14,520
because it was derived from context,

1569
00:57:14,520 --> 00:57:16,320
not declared by a user in a hurry.

1570
00:57:16,320 --> 00:57:17,720
The compliance payoff extends

1571
00:57:17,720 --> 00:57:20,520
beyond individual files to the overall data architecture.

1572
00:57:20,520 --> 00:57:22,520
When metadata is consistent and complete,

1573
00:57:22,520 --> 00:57:25,120
compliance tools can reason about the estate as a whole.

1574
00:57:25,120 --> 00:57:27,520
They can identify concentrations of sensitive data,

1575
00:57:27,520 --> 00:57:29,120
detect access anomalies,

1576
00:57:29,120 --> 00:57:30,320
and predict retention load.

1577
00:57:30,320 --> 00:57:31,520
They can answer questions like,

1578
00:57:31,520 --> 00:57:33,720
"How much health data do we have in teams channels

1579
00:57:33,720 --> 00:57:34,920
with external guests?"

1580
00:57:34,920 --> 00:57:37,520
Or, "Which project libraries contain financial records

1581
00:57:37,520 --> 00:57:39,520
that are approaching their retention deadline?"

1582
00:57:39,520 --> 00:57:41,520
These are the questions that regulators ask

1583
00:57:41,520 --> 00:57:43,520
and they're the questions that manual tagging

1584
00:57:43,520 --> 00:57:45,520
makes impossible to answer accurately.

1585
00:57:45,520 --> 00:57:48,320
Consider a GDPR data subject access request.

1586
00:57:48,320 --> 00:57:50,320
A European customer asks for every file

1587
00:57:50,320 --> 00:57:51,720
containing their personal data.

1588
00:57:51,720 --> 00:57:53,120
In the manual tagging model,

1589
00:57:53,120 --> 00:57:54,520
the compliance team must search

1590
00:57:54,520 --> 00:57:57,320
across the entire estate using keyword queries

1591
00:57:57,320 --> 00:57:58,520
and hope that files were tagged

1592
00:57:58,520 --> 00:58:00,920
with the correct data subject identifiers.

1593
00:58:00,920 --> 00:58:02,320
The search misses documents

1594
00:58:02,320 --> 00:58:04,320
where the user skipped the metadata field,

1595
00:58:04,320 --> 00:58:05,320
mislabeled the content

1596
00:58:05,320 --> 00:58:07,920
or stored the file in a personal one-drive folder.

1597
00:58:07,920 --> 00:58:09,120
The response is incomplete,

1598
00:58:09,120 --> 00:58:10,920
the regulator is dissatisfied

1599
00:58:10,920 --> 00:58:13,720
and the organization faces fines for non-compliance.

1600
00:58:13,720 --> 00:58:14,920
In the automated model,

1601
00:58:14,920 --> 00:58:16,720
the middleware has already classified files

1602
00:58:16,720 --> 00:58:19,520
by content type and detected personal data patterns

1603
00:58:19,520 --> 00:58:20,520
during ingestion.

1604
00:58:20,520 --> 00:58:22,720
The compliance team queries the metadata layer

1605
00:58:22,720 --> 00:58:25,720
for all files tagged with contains personal data

1606
00:58:25,720 --> 00:58:27,520
and the specific customer identifier.

1607
00:58:27,520 --> 00:58:28,720
The search is exhaustive

1608
00:58:28,720 --> 00:58:30,320
because the metadata is complete.

1609
00:58:30,320 --> 00:58:31,520
The response is defensible

1610
00:58:31,520 --> 00:58:33,520
because the classification was systematic

1611
00:58:33,520 --> 00:58:35,120
and the audit trail shows exactly

1612
00:58:35,120 --> 00:58:36,520
how each file was identified

1613
00:58:36,520 --> 00:58:37,720
when it was classified

1614
00:58:37,720 --> 00:58:39,120
and what signals were used.

1615
00:58:39,120 --> 00:58:40,920
The regulator sees a repeatable process

1616
00:58:40,920 --> 00:58:42,320
not a best-effort search.

1617
00:58:42,320 --> 00:58:45,120
HIPAA creates similar requirements for health information.

1618
00:58:45,120 --> 00:58:46,520
Covered entities must know

1619
00:58:46,520 --> 00:58:48,120
where protected health information lives,

1620
00:58:48,120 --> 00:58:50,320
who has access to it and how long it's retained.

1621
00:58:50,320 --> 00:58:52,920
Manual tagging makes all three requirements unreliable.

1622
00:58:52,920 --> 00:58:55,920
Automated classification makes all three requirements traceable.

1623
00:58:55,920 --> 00:58:58,520
The middleware detects health-related terminology,

1624
00:58:58,520 --> 00:59:01,520
provider names, and patient identifiers during ingestion.

1625
00:59:01,520 --> 00:59:04,120
It applies the HIPAA retention schedule automatically

1626
00:59:04,120 --> 00:59:05,720
and it logs every access event

1627
00:59:05,720 --> 00:59:07,520
through graphs-audit capabilities.

1628
00:59:07,520 --> 00:59:08,520
When an auditor arrives,

1629
00:59:08,520 --> 00:59:10,520
the organization can produce a complete inventory

1630
00:59:10,520 --> 00:59:12,720
of PHI locations, access patterns,

1631
00:59:12,720 --> 00:59:15,720
and retention status without manual searching or guessing.

1632
00:59:15,720 --> 00:59:17,720
Financial services regulations impose

1633
00:59:17,720 --> 00:59:19,320
additional metadata requirements.

1634
00:59:19,320 --> 00:59:22,520
SEC rules on record retention require broker dealers

1635
00:59:22,520 --> 00:59:24,120
to preserve business communications

1636
00:59:24,120 --> 00:59:25,520
for specified periods.

1637
00:59:25,520 --> 00:59:27,120
Finnerer examinations test

1638
00:59:27,120 --> 00:59:28,720
whether firms can produce complete records

1639
00:59:28,720 --> 00:59:31,520
of customer communications and trading decisions.

1640
00:59:31,520 --> 00:59:33,520
Manual tagging can't guarantee completeness

1641
00:59:33,520 --> 00:59:35,320
because it depends on user compliance.

1642
00:59:35,320 --> 00:59:37,920
Automated tagging can guarantee completeness

1643
00:59:37,920 --> 00:59:39,920
because it happens at the point of creation

1644
00:59:39,920 --> 00:59:42,320
for every file, every email, and every chat message.

1645
00:59:42,320 --> 00:59:44,120
The retention clock starts immediately.

1646
00:59:44,120 --> 00:59:45,720
The classification is consistent

1647
00:59:45,720 --> 00:59:47,520
and the audit trail is continuous,

1648
00:59:47,520 --> 00:59:48,920
rather than retrospective.

1649
00:59:48,920 --> 00:59:50,120
And here is where it gets interesting

1650
00:59:50,120 --> 00:59:52,720
for the future of your knowledge base.

1651
00:59:52,720 --> 00:59:54,120
The search and AI payoff,

1652
00:59:54,120 --> 00:59:57,320
enterprise search in Microsoft 365 regularly underperforms

1653
00:59:57,320 --> 00:59:59,320
when it depends on manual metadata.

1654
00:59:59,320 --> 01:00:02,320
Most indexed items carry almost no useful metadata.

1655
01:00:02,320 --> 01:00:05,320
So tag-based queries and refiners come back incomplete.

1656
01:00:05,320 --> 01:00:06,820
Users figure out that the internet search

1657
01:00:06,820 --> 01:00:08,220
isn't finding what they need

1658
01:00:08,220 --> 01:00:10,820
and they switch to email or team search instead.

1659
01:00:10,820 --> 01:00:13,120
The money spent on search infrastructure is wasted

1660
01:00:13,120 --> 01:00:15,720
because the data layer beneath it has nothing to work with.

1661
01:00:15,720 --> 01:00:18,420
This failure isn't a technical problem with the search engine.

1662
01:00:18,420 --> 01:00:20,920
Microsoft search is capable of traversing SharePoint,

1663
01:00:20,920 --> 01:00:23,420
OneDrive, Exchange, Teams, and External Connectors

1664
01:00:23,420 --> 01:00:25,020
to return unified results.

1665
01:00:25,020 --> 01:00:26,720
The problem is the metadata gap.

1666
01:00:26,720 --> 01:00:29,420
When a file has no tags, no labels, and no context,

1667
01:00:29,420 --> 01:00:31,220
the search engine hasn't anything to rank it by

1668
01:00:31,220 --> 01:00:33,020
except the text inside the document.

1669
01:00:33,020 --> 01:00:34,620
And text alone is a poor signal

1670
01:00:34,620 --> 01:00:37,120
for relevance, sensitivity, and audience.

1671
01:00:37,120 --> 01:00:40,520
Clean, machine-generated metadata changes the equation entirely.

1672
01:00:40,520 --> 01:00:42,720
When every file carries accurate project codes,

1673
01:00:42,720 --> 01:00:45,920
department assignments, sensitivity levels, and content types,

1674
01:00:45,920 --> 01:00:50,520
the search engine can filter, rank, and surface results with precision.

1675
01:00:50,520 --> 01:00:52,820
A query for Q3 campaign vendor contracts

1676
01:00:52,820 --> 01:00:54,520
returns exactly the right documents

1677
01:00:54,520 --> 01:00:56,420
because the metadata confirms the project,

1678
01:00:56,420 --> 01:00:58,820
the content type, and the time frame.

1679
01:00:58,820 --> 01:01:01,020
A query for confidential financial documents

1680
01:01:01,020 --> 01:01:02,620
shared with external partners

1681
01:01:02,620 --> 01:01:05,120
returns only the files that match all three criteria

1682
01:01:05,120 --> 01:01:07,820
because the labels and sharing metadata are accurate.

1683
01:01:07,820 --> 01:01:10,520
But the real transformation isn't traditional search.

1684
01:01:10,520 --> 01:01:14,220
It is what happens when AI agents navigate a fully tagged knowledge graph.

1685
01:01:14,220 --> 01:01:16,920
Copilot and similar tools don't just search for keywords.

1686
01:01:16,920 --> 01:01:19,320
They reason about context, they infer intent,

1687
01:01:19,320 --> 01:01:21,520
they connect related content across repositories,

1688
01:01:21,520 --> 01:01:23,620
and they depend on metadata to do this accurately.

1689
01:01:23,620 --> 01:01:26,320
When Copilot answers a question about project spending,

1690
01:01:26,320 --> 01:01:28,720
it needs to know which files are budget spreadsheets,

1691
01:01:28,720 --> 01:01:31,820
which are vendor contracts, and which are informal estimates.

1692
01:01:31,820 --> 01:01:34,620
Without metadata, it guesses, with metadata, it knows.

1693
01:01:34,620 --> 01:01:38,120
Microsoft's security messaging makes this connection explicitly.

1694
01:01:38,120 --> 01:01:39,820
Copilot doesn't create new data risks.

1695
01:01:39,820 --> 01:01:41,620
It exposes the ones you already have.

1696
01:01:41,620 --> 01:01:44,320
And in many organizations, that means something uncomfortable.

1697
01:01:44,320 --> 01:01:47,320
Your data architecture was never designed for visibility at this level.

1698
01:01:47,320 --> 01:01:50,520
When an AI agent can query every document in your tenant,

1699
01:01:50,520 --> 01:01:53,920
the quality of your metadata determines the quality of its answers.

1700
01:01:53,920 --> 01:01:57,320
Bad metadata means bad answers, missing metadata means missing answers.

1701
01:01:57,320 --> 01:01:59,820
And in either case, the user blames the AI

1702
01:01:59,820 --> 01:02:01,720
rather than the data layer underneath it.

1703
01:02:01,720 --> 01:02:04,720
The shift from user's searching for data to data finding users

1704
01:02:04,720 --> 01:02:06,620
is what a graph-powered ecosystem enables.

1705
01:02:06,620 --> 01:02:09,020
In the old model, you navigate, you search.

1706
01:02:09,020 --> 01:02:11,120
You remember file names and folder paths.

1707
01:02:11,120 --> 01:02:13,920
In the new model, the system understands your context.

1708
01:02:13,920 --> 01:02:17,020
It knows your team, your projects, your recent activity, and your role.

1709
01:02:17,020 --> 01:02:19,120
It surfaces content that's relevant to you

1710
01:02:19,120 --> 01:02:21,520
without requiring you to formulate the right query.

1711
01:02:21,520 --> 01:02:23,220
This isn't search optimization.

1712
01:02:23,220 --> 01:02:26,920
It is a fundamental change in how knowledge flows through an organization.

1713
01:02:26,920 --> 01:02:29,920
Traditional keyword search fails because it puts the burden on the user

1714
01:02:29,920 --> 01:02:32,820
to know what they're looking for and how to describe it.

1715
01:02:32,820 --> 01:02:35,020
Graph-powered context awareness removes that burden

1716
01:02:35,020 --> 01:02:37,620
by letting the system infer relevance from relationships.

1717
01:02:37,620 --> 01:02:40,620
A document is relevant to you, not because it contains the right words,

1718
01:02:40,620 --> 01:02:43,720
but because it's connected to your projects, your colleagues,

1719
01:02:43,720 --> 01:02:44,820
and your current work.

1720
01:02:44,820 --> 01:02:47,820
The graph makes those connections visible and queryable.

1721
01:02:47,820 --> 01:02:49,620
The comparison between old and new search

1722
01:02:49,620 --> 01:02:51,220
isn't just about speed or accuracy.

1723
01:02:51,220 --> 01:02:53,920
It is about findability versus discoverability.

1724
01:02:53,920 --> 01:02:56,120
Search finds what you already know exists.

1725
01:02:56,120 --> 01:02:59,020
Discovery surfaces what you didn't know to look for.

1726
01:02:59,020 --> 01:03:01,220
A well-governed knowledge graph enables both.

1727
01:03:01,220 --> 01:03:04,620
When metadata is complete, you can find specific documents instantly.

1728
01:03:04,620 --> 01:03:06,820
But you can also discover related content,

1729
01:03:06,820 --> 01:03:09,620
past decisions, and relevant expertise

1730
01:03:09,620 --> 01:03:12,120
that would have remained hidden in the manual tagging model.

1731
01:03:12,120 --> 01:03:13,720
Consider a practical example.

1732
01:03:13,720 --> 01:03:16,220
A project manager is preparing for a quarterly review

1733
01:03:16,220 --> 01:03:17,720
and needs to understand the evolution

1734
01:03:17,720 --> 01:03:20,320
of a product decision made six months ago.

1735
01:03:20,320 --> 01:03:21,720
In the manual tagging model,

1736
01:03:21,720 --> 01:03:24,120
she searches for keywords like product decision

1737
01:03:24,120 --> 01:03:26,320
and quarterly review and hopes that someone tagged

1738
01:03:26,320 --> 01:03:28,620
the relevant documents with the right project code.

1739
01:03:28,620 --> 01:03:31,720
She finds three documents, misses the critical email thread

1740
01:03:31,720 --> 01:03:33,520
where the decision was actually finalized

1741
01:03:33,520 --> 01:03:35,720
and spends an hour reconstructing a partial story

1742
01:03:35,720 --> 01:03:37,120
from fragmented results.

1743
01:03:37,120 --> 01:03:38,420
In the graph-powered model,

1744
01:03:38,420 --> 01:03:39,920
she navigates through relationships

1745
01:03:39,920 --> 01:03:41,220
rather than keywords.

1746
01:03:41,220 --> 01:03:43,820
She finds the project node for the product initiative.

1747
01:03:43,820 --> 01:03:45,620
From there, she traverses to all documents

1748
01:03:45,620 --> 01:03:46,920
associated with that project,

1749
01:03:46,920 --> 01:03:49,320
filtered by the decision-making time period.

1750
01:03:49,320 --> 01:03:50,720
She sees the initial proposal,

1751
01:03:50,720 --> 01:03:52,720
the budget approval, the stakeholder feedback,

1752
01:03:52,720 --> 01:03:54,220
and the final sign of email,

1753
01:03:54,220 --> 01:03:56,720
all connected through shared project metadata.

1754
01:03:56,720 --> 01:03:59,020
She also discovers a related risk assessment document

1755
01:03:59,020 --> 01:04:00,620
that she didn't know existed surfaced

1756
01:04:00,620 --> 01:04:02,820
because it shares the same project context.

1757
01:04:02,820 --> 01:04:04,820
What took an hour of frustrated searching

1758
01:04:04,820 --> 01:04:07,420
now takes two minutes of intuitive exploration.

1759
01:04:07,420 --> 01:04:09,220
This is the difference between a search index

1760
01:04:09,220 --> 01:04:10,320
and a knowledge graph.

1761
01:04:10,320 --> 01:04:12,220
A search index is a flat list of documents

1762
01:04:12,220 --> 01:04:13,920
ranked by keyword relevance.

1763
01:04:13,920 --> 01:04:16,120
A knowledge graph is a network of connected entities

1764
01:04:16,120 --> 01:04:18,920
that can be traversed, filtered, and reasoned about.

1765
01:04:18,920 --> 01:04:21,720
The search index depends on users knowing what to ask.

1766
01:04:21,720 --> 01:04:23,420
The knowledge graph helps users understand

1767
01:04:23,420 --> 01:04:25,420
what they should be asking, both have value,

1768
01:04:25,420 --> 01:04:27,820
but in an enterprise where most documents are untagged,

1769
01:04:27,820 --> 01:04:29,220
the search index is crippled

1770
01:04:29,220 --> 01:04:32,120
and the knowledge graph is the only viable path forward.

1771
01:04:32,120 --> 01:04:35,120
The future of knowledge work depends on this transition.

1772
01:04:35,120 --> 01:04:38,020
As organizations generate more content across more platforms,

1773
01:04:38,020 --> 01:04:40,220
the human capacity to organize and find information

1774
01:04:40,220 --> 01:04:41,320
reaches its limits.

1775
01:04:41,320 --> 01:04:43,220
We can't train our way out of this constraint.

1776
01:04:43,220 --> 01:04:44,720
We can't hire our way out of it.

1777
01:04:44,720 --> 01:04:47,720
The only sustainable path is to shift the organizational burden

1778
01:04:47,720 --> 01:04:50,320
from human memory to machine readable structure.

1779
01:04:50,320 --> 01:04:53,120
Automated metadata isn't just a governance improvement.

1780
01:04:53,120 --> 01:04:56,020
It is a prerequisite for the next generation of knowledge work.

1781
01:04:56,020 --> 01:04:59,220
The evolution of Microsoft search reflects this shift.

1782
01:04:59,220 --> 01:05:02,520
Early versions relied heavily on manual metadata and keyword matching.

1783
01:05:02,520 --> 01:05:05,220
Modern versions incorporate signals from graph relationships,

1784
01:05:05,220 --> 01:05:07,820
user behavior, and content context to rank results

1785
01:05:07,820 --> 01:05:09,820
by relevance to the individual searcher.

1786
01:05:09,820 --> 01:05:12,020
But these advanced ranking algorithms only work

1787
01:05:12,020 --> 01:05:14,820
when the underlying metadata is rich enough to support them.

1788
01:05:14,820 --> 01:05:16,120
If a document has no metadata,

1789
01:05:16,120 --> 01:05:18,120
the algorithm has no signals to rank it by.

1790
01:05:18,120 --> 01:05:20,820
It falls to the bottom of results or disappears entirely.

1791
01:05:20,820 --> 01:05:23,120
The most sophisticated search engine in the world

1792
01:05:23,120 --> 01:05:25,920
can't find a document that has no describable properties.

1793
01:05:25,920 --> 01:05:28,620
The final piece of this payoff is AI readiness.

1794
01:05:28,620 --> 01:05:31,420
Organizations are deploying AI agents across their estates

1795
01:05:31,420 --> 01:05:32,820
at an accelerating pace.

1796
01:05:32,820 --> 01:05:35,720
These agents need clean, structured, accurate metadata to function.

1797
01:05:35,720 --> 01:05:37,720
They need to know what content is authoritative,

1798
01:05:37,720 --> 01:05:40,720
what is draft, what is sensitive, and what is obsolete.

1799
01:05:40,720 --> 01:05:42,820
They need to understand organizational structure,

1800
01:05:42,820 --> 01:05:44,820
project timelines, and access patterns.

1801
01:05:44,820 --> 01:05:46,520
All of this information lives in graph.

1802
01:05:46,520 --> 01:05:49,620
The only question is whether your governance architecture makes it available.

1803
01:05:49,620 --> 01:05:52,820
Co-pilot for Microsoft 365 illustrates why this matters.

1804
01:05:52,820 --> 01:05:55,620
When a user asks Co-pilot to summarize project status,

1805
01:05:55,620 --> 01:05:58,320
Co-pilot queries the graph for project related documents,

1806
01:05:58,320 --> 01:06:00,420
meetings, emails, and chat threads.

1807
01:06:00,420 --> 01:06:02,020
If the documents are untagged,

1808
01:06:02,020 --> 01:06:05,820
Co-pilot must infer relevance from file names and text content alone.

1809
01:06:05,820 --> 01:06:07,520
It might miss the critical budget revision

1810
01:06:07,520 --> 01:06:09,620
because it was saved as numbers, final V3,

1811
01:06:09,620 --> 01:06:11,820
X-Lasex with no project metadata.

1812
01:06:11,820 --> 01:06:13,520
It might include in a relevant file

1813
01:06:13,520 --> 01:06:17,020
because it happens to contain the word "budget" in a different context.

1814
01:06:17,020 --> 01:06:19,920
The user gets a partially wrong summary and loses trust in the AI.

1815
01:06:19,920 --> 01:06:23,120
With automated metadata, Co-pilot queries become precise.

1816
01:06:23,120 --> 01:06:25,920
It can request documents where project equals Q3 campaign,

1817
01:06:25,920 --> 01:06:28,920
content type equals budget, and status equals approved.

1818
01:06:28,920 --> 01:06:30,420
It can exclude draft documents,

1819
01:06:30,420 --> 01:06:32,620
include only files from the current project phase,

1820
01:06:32,620 --> 01:06:34,520
and surface content from team members,

1821
01:06:34,520 --> 01:06:38,220
rather than from people who happen to mention the project in unrelated emails.

1822
01:06:38,220 --> 01:06:41,320
The summary is accurate because the underlying metadata is accurate.

1823
01:06:41,320 --> 01:06:45,120
The user trusts the AI because the data layer beneath its trustworthy,

1824
01:06:45,120 --> 01:06:47,220
this isn't a Co-pilot specific problem.

1825
01:06:47,220 --> 01:06:51,520
Every AI agent that interacts with enterprise content faces the same challenge.

1826
01:06:51,520 --> 01:06:53,520
Retrieval augmented generation systems,

1827
01:06:53,520 --> 01:06:55,820
which combine language models with enterprise search,

1828
01:06:55,820 --> 01:06:57,420
depend on retrieval quality.

1829
01:06:57,420 --> 01:07:00,220
And retrieval quality depends on metadata quality.

1830
01:07:00,220 --> 01:07:03,320
A retrieval system that searches untag documents is guessing.

1831
01:07:03,320 --> 01:07:07,420
A retrieval system that searches a fully governed knowledge graph is knowing.

1832
01:07:07,420 --> 01:07:10,220
The difference between guessing and knowing is the difference between an AI

1833
01:07:10,220 --> 01:07:13,620
that occasionally hallucinates and an AI that consistently delivers.

1834
01:07:13,620 --> 01:07:17,220
The knowledge graph concept is what separates old search from new discovery.

1835
01:07:17,220 --> 01:07:20,520
In the old model, search indexes documents as isolated objects

1836
01:07:20,520 --> 01:07:22,520
with whatever metadata was attached.

1837
01:07:22,520 --> 01:07:27,620
In the new model, the graph indexes documents as nodes in a network of relationships.

1838
01:07:27,620 --> 01:07:31,020
A budget spreadsheet isn't just a file with the word budget in it.

1839
01:07:31,020 --> 01:07:34,520
It is a node connected to a project, a department, a time period,

1840
01:07:34,520 --> 01:07:38,020
an approval workflow, and a set of people with permission to view it.

1841
01:07:38,020 --> 01:07:41,520
These relationships are the context that makes the document meaningful,

1842
01:07:41,520 --> 01:07:45,620
and they're the context that AI agents need to reason accurately about enterprise content.

1843
01:07:45,620 --> 01:07:49,220
Organizations that build this infrastructure now will have a compounding advantage.

1844
01:07:49,220 --> 01:07:53,520
Their AI agents will become more accurate over time as the knowledge graph grows richer.

1845
01:07:53,520 --> 01:07:56,520
Their search will become more precise as metadata coverage expands.

1846
01:07:56,520 --> 01:08:00,720
Their compliance posture will become more defensible as audit trails accumulate.

1847
01:08:00,720 --> 01:08:04,620
And their governance teams will shift from reactive cleanup to proactive design.

1848
01:08:04,620 --> 01:08:07,220
The investment in automated metadata isn't a cost center.

1849
01:08:07,220 --> 01:08:11,120
It is the foundation that makes every other data driven initiative more effective.

1850
01:08:11,120 --> 01:08:13,720
The competitive dimension is worth stating explicitly.

1851
01:08:13,720 --> 01:08:17,520
In the next three to five years, enterprise AI readiness will separate organizations

1852
01:08:17,520 --> 01:08:21,520
that can leverage intelligent agents from organizations that are stuck with basic search.

1853
01:08:21,520 --> 01:08:25,520
The difference between those two categories isn't budget or technology or talent.

1854
01:08:25,520 --> 01:08:26,920
It is metadata quality.

1855
01:08:26,920 --> 01:08:31,220
An organization with a clean, complete automated metadata layer can deploy AI agents

1856
01:08:31,220 --> 01:08:34,820
that reason accurately about its content, recommend relevant documents,

1857
01:08:34,820 --> 01:08:38,220
summarize project status, and detect compliance risks.

1858
01:08:38,220 --> 01:08:44,320
An organization with sparse, inconsistent, manual metadata can't deploy those agents reliably

1859
01:08:44,320 --> 01:08:47,020
regardless of how much it spends on AI licenses.

1860
01:08:47,020 --> 01:08:48,320
This is why the timing matters.

1861
01:08:48,320 --> 01:08:52,820
The organizations that automate governance now aren't just solving today's metadata problem.

1862
01:08:52,820 --> 01:08:56,720
They are building the data foundation that every future AI initiative will depend on.

1863
01:08:56,720 --> 01:09:00,120
The organizations that delay aren't just maintaining manual tagging.

1864
01:09:00,120 --> 01:09:03,520
They are accumulating technical debt that will make AI deployment harder,

1865
01:09:03,520 --> 01:09:07,120
more expensive and less effective when they eventually decide to pursue it.

1866
01:09:07,120 --> 01:09:11,220
The gap between prepared and unprepared organizations isn't a temporary advantage.

1867
01:09:11,220 --> 01:09:14,520
It is a structural advantage that compounds with every new file,

1868
01:09:14,520 --> 01:09:17,720
every new project, and every new AI capability that arrives.

1869
01:09:17,720 --> 01:09:21,320
So if this is where things are heading, how do you actually get there from where you are now?

1870
01:09:21,320 --> 01:09:22,720
Building your roadmap.

1871
01:09:22,720 --> 01:09:26,320
Transitioning from manual tagging to automated governance isn't a single project.

1872
01:09:26,320 --> 01:09:29,520
It is a phased evolution that requires planning, piloting, and scaling.

1873
01:09:29,520 --> 01:09:32,920
Organizations that try to automate everything at once usually fail.

1874
01:09:32,920 --> 01:09:34,520
The taxonomy is too complex.

1875
01:09:34,520 --> 01:09:37,420
The change management is too heavy. The exceptions are too numerous.

1876
01:09:37,420 --> 01:09:40,120
A phased approach reduces risk and builds confidence.

1877
01:09:40,120 --> 01:09:41,320
Phase one is ordered.

1878
01:09:41,320 --> 01:09:45,020
Before you automate anything, you need to understand what you have and where the gaps are.

1879
01:09:45,020 --> 01:09:48,420
Run a scan of your content estate to identify where metadata is missing,

1880
01:09:48,420 --> 01:09:51,420
where search is failing, and where compliance exposure is highest.

1881
01:09:51,420 --> 01:09:52,220
Look for patterns.

1882
01:09:52,220 --> 01:09:54,920
Are certain teams or content types consistently untagged?

1883
01:09:54,920 --> 01:09:58,120
Are their libraries where metadata is complete and others where it's empty?

1884
01:09:58,120 --> 01:10:01,320
Are sensitivity labels applied unevenly across departments?

1885
01:10:01,320 --> 01:10:03,620
This baseline becomes your measurement framework.

1886
01:10:03,620 --> 01:10:06,320
You will use it to prove that automation improved things.

1887
01:10:06,320 --> 01:10:08,120
Phase two is taxonomy design.

1888
01:10:08,120 --> 01:10:10,720
Define what matters, not everything possible.

1889
01:10:10,720 --> 01:10:14,320
A common mistake is overengineering the taxonomy with dozens of fields,

1890
01:10:14,320 --> 01:10:17,420
nested categories, and complex rules that nobody can maintain.

1891
01:10:17,420 --> 01:10:21,220
Start with five to seven core properties that drive real governance decisions.

1892
01:10:21,220 --> 01:10:27,220
Project association, department, content type, sensitivity, retention.

1893
01:10:27,220 --> 01:10:31,220
These five properties cover most compliance, search, and policy use cases.

1894
01:10:31,220 --> 01:10:35,420
You can add more later, but a simple taxonomy that's fully automated is more valuable

1895
01:10:35,420 --> 01:10:38,020
than a complex taxonomy that's partially implemented.

1896
01:10:38,020 --> 01:10:39,220
Phase three is pilot.

1897
01:10:39,220 --> 01:10:40,920
Pick one content type in one team.

1898
01:10:40,920 --> 01:10:43,320
Maybe it's project documents in the engineering department.

1899
01:10:43,320 --> 01:10:44,820
Maybe it's contracts in the legal team.

1900
01:10:44,820 --> 01:10:47,520
The key is to choose a group that feels the pain of manual tagging

1901
01:10:47,520 --> 01:10:49,320
and will appreciate the automation.

1902
01:10:49,320 --> 01:10:52,820
Build a single Azure function that intercepts files in their SharePoint library,

1903
01:10:52,820 --> 01:10:56,120
queries graph for context, and injects the five core properties.

1904
01:10:56,120 --> 01:10:57,520
Test it for 30 days.

1905
01:10:57,520 --> 01:10:59,820
Measure metadata completeness before and after.

1906
01:10:59,820 --> 01:11:01,120
Measure search accuracy.

1907
01:11:01,120 --> 01:11:02,520
Measure user feedback.

1908
01:11:02,520 --> 01:11:05,720
This pilot becomes your proof of concept and your training ground.

1909
01:11:05,720 --> 01:11:08,420
Pilot success metrics should be specific and measurable.

1910
01:11:08,420 --> 01:11:13,020
Metadata completeness before automation is typically between 10 and 30 percent

1911
01:11:13,020 --> 01:11:14,620
for most enterprise libraries.

1912
01:11:14,620 --> 01:11:18,220
After automation, it should exceed 95 percent within the first week.

1913
01:11:18,220 --> 01:11:22,320
Search accuracy can be measured by asking pilot users to find specific documents

1914
01:11:22,320 --> 01:11:24,520
using search and tracking whether they succeed.

1915
01:11:24,520 --> 01:11:27,020
User feedback should be collected through a short survey

1916
01:11:27,020 --> 01:11:30,120
that asks whether users noticed any change in their workflow,

1917
01:11:30,120 --> 01:11:32,020
whether they trust the automated tags,

1918
01:11:32,020 --> 01:11:35,320
and whether they would support expanding the system to other teams.

1919
01:11:35,320 --> 01:11:38,220
These three metrics, completeness accuracy and trust,

1920
01:11:38,220 --> 01:11:41,520
form the measurement framework that will guide your scaling decisions.

1921
01:11:41,520 --> 01:11:42,520
Phase 4 is scale.

1922
01:11:42,520 --> 01:11:44,320
This is where the pilot becomes production.

1923
01:11:44,320 --> 01:11:48,120
Expand the middleware to cover core business content across the organization.

1924
01:11:48,120 --> 01:11:52,120
Integrate with Microsoft PerView for sensitivity labeling and retention management.

1925
01:11:52,120 --> 01:11:55,820
Add Delta query support to keep metadata fresh as files move and change,

1926
01:11:55,820 --> 01:11:59,120
implement web-hook coverage for teams, one drive and email attachments.

1927
01:11:59,120 --> 01:12:02,820
At this stage, you're building a production governance platform, not a prototype.

1928
01:12:02,820 --> 01:12:06,720
Scaling requires attention to infrastructure rather than just functionality.

1929
01:12:06,720 --> 01:12:10,520
You need logging, error handling, circuit breakers and monitoring dashboards.

1930
01:12:10,520 --> 01:12:13,520
You need a process for handling exceptions and edge cases

1931
01:12:13,520 --> 01:12:15,420
that the middleware can't resolve.

1932
01:12:15,420 --> 01:12:19,320
You need a support model that defines who responds when the middleware misclassifies a file

1933
01:12:19,320 --> 01:12:22,720
who reviews the audit logs and who updates the classification rules

1934
01:12:22,720 --> 01:12:24,520
when business requirements change.

1935
01:12:24,520 --> 01:12:28,720
These operational considerations are what separate a successful production deployment

1936
01:12:28,720 --> 01:12:32,420
from a promising pilot that never expands beyond its initial scope.

1937
01:12:32,420 --> 01:12:35,520
Organizational change management becomes critical during scaling.

1938
01:12:35,520 --> 01:12:39,220
Some teams will embrace automation because it removes a burden they never wanted.

1939
01:12:39,220 --> 01:12:42,420
Others will resist because they distrust automated decisions

1940
01:12:42,420 --> 01:12:45,720
or because they have built personal workflows around manual metadata.

1941
01:12:45,720 --> 01:12:50,120
The governance team must communicate clearly that automation handles routine classification

1942
01:12:50,120 --> 01:12:52,620
while humans retain control over exceptions.

1943
01:12:52,620 --> 01:12:56,820
They must demonstrate that the system is transparent, auditable and adjustable.

1944
01:12:56,820 --> 01:13:00,720
And they must provide a simple escalation path for files that were misclassified

1945
01:13:00,720 --> 01:13:04,320
so users feel hurt rather than overridden by a black box.

1946
01:13:04,320 --> 01:13:05,820
Phase 4 is scale.

1947
01:13:05,820 --> 01:13:09,320
Expand the middleware to cover core business content across the organization.

1948
01:13:09,320 --> 01:13:13,520
Integrate with Microsoft Per View for sensitivity labeling and retention management.

1949
01:13:13,520 --> 01:13:17,520
Add Delta query support to keep metadata fresh as files move and change.

1950
01:13:17,520 --> 01:13:21,220
Implement web hook coverage for teams, one drive and email attachments.

1951
01:13:21,220 --> 01:13:25,020
At this stage, you're building a production governance platform, not a prototype.

1952
01:13:25,020 --> 01:13:28,620
You need logging, error handling, circuit breakers and monitoring dashboards.

1953
01:13:28,620 --> 01:13:33,120
You need a process for handling exceptions and edge cases that the middleware can't resolve.

1954
01:13:33,120 --> 01:13:34,520
Phase 5 is optimized.

1955
01:13:34,520 --> 01:13:38,520
Use analytics from the middleware to identify where classification accuracy is low

1956
01:13:38,520 --> 01:13:39,920
and where models need retraining.

1957
01:13:39,920 --> 01:13:42,520
Refine the taxonomy based on real usage patterns.

1958
01:13:42,520 --> 01:13:45,620
Tune the graph queries to capture additional context signals.

1959
01:13:45,620 --> 01:13:50,720
Implement throttling best practices to ensure the middleware doesn't overload graph API limits.

1960
01:13:50,720 --> 01:13:54,120
Add circuit breakers and retry logic to handle service outages gracefully.

1961
01:13:54,120 --> 01:13:55,420
This phase never really ends.

1962
01:13:55,420 --> 01:13:58,520
Governance is a living system that requires continuous refinement.

1963
01:13:58,520 --> 01:14:02,320
Optimization metrics should focus on accuracy, coverage and latency.

1964
01:14:02,320 --> 01:14:07,920
Classification accuracy measures how often the middleware assigns the correct metadata on the first attempt.

1965
01:14:07,920 --> 01:14:11,620
Human reviewers should periodically sample automated classifications

1966
01:14:11,620 --> 01:14:14,020
and score them against a ground truth data set.

1967
01:14:14,020 --> 01:14:19,920
Coverage measures what percentage of eligible files receive automated metadata within a defined time window after creation.

1968
01:14:19,920 --> 01:14:22,920
It should exceed 95% for most content types.

1969
01:14:22,920 --> 01:14:27,320
Latency measures how long the middleware takes to process an event from trigger to completion.

1970
01:14:27,320 --> 01:14:33,020
It should remain under 5 seconds for user-facing workflows and under 30 seconds for background batch processing.

1971
01:14:33,020 --> 01:14:36,120
Model drift is a real concern that governance teams must monitor.

1972
01:14:36,120 --> 01:14:39,220
The classification models and rules that work today may not work next year

1973
01:14:39,220 --> 01:14:44,020
because the organization structure, terminology and content patterns evolve.

1974
01:14:44,020 --> 01:14:47,720
A model trained on last year's project codes will fail when new codes are introduced.

1975
01:14:47,720 --> 01:14:52,420
A taxonomy designed for last quarter's product lines will misclassify documents from new products.

1976
01:14:52,420 --> 01:14:57,720
Regular retraining, quarterly rule reviews and annual taxonomy audits are essential maintenance tasks.

1977
01:14:57,720 --> 01:15:01,820
They aren't signs that the system is failing. They are signs that the system is adapting.

1978
01:15:01,820 --> 01:15:06,120
The long-term evolution of this architecture points toward increasingly intelligent governance.

1979
01:15:06,120 --> 01:15:08,920
Today's middleware applies rules that humans designed.

1980
01:15:08,920 --> 01:15:13,720
Tomorrow's middleware will learn from human corrections and automatically suggest rule improvements.

1981
01:15:13,720 --> 01:15:17,320
Today's models classify based on patterns in the content and context.

1982
01:15:17,320 --> 01:15:20,820
Tomorrow's models will reason about intent, predicting what a document is for,

1983
01:15:20,820 --> 01:15:22,620
rather than just what it contains.

1984
01:15:22,620 --> 01:15:26,220
The foundation that organizations build now, the metadata layer, the schema extensions,

1985
01:15:26,220 --> 01:15:30,820
the audit logs and the feedback loops, is what will make those future capabilities possible.

1986
01:15:30,820 --> 01:15:34,720
Common pitfalls are worth mentioning because they derail many automation initiatives.

1987
01:15:34,720 --> 01:15:37,820
Overengineering the taxonomy is the most frequent mistake.

1988
01:15:37,820 --> 01:15:41,020
Teams try to replicate every manual field in an automated system

1989
01:15:41,020 --> 01:15:44,420
and end up with complexity that's harder to maintain than the original problem.

1990
01:15:44,420 --> 01:15:46,820
Trying to tag everything at once is another trap.

1991
01:15:46,820 --> 01:15:51,020
Automation should start with high-value content types and expand gradually.

1992
01:15:51,020 --> 01:15:54,420
Neglecting human in the loop for edge cases is a third pitfall.

1993
01:15:54,420 --> 01:15:56,620
Some files genuinely require human judgment.

1994
01:15:56,620 --> 01:16:01,420
The middleware should recognize when confidence is low and escalate to a reviewer rather than guessing wrong.

1995
01:16:01,420 --> 01:16:05,820
The hybrid reality is that automation handles the baseline while humans handle exceptions.

1996
01:16:05,820 --> 01:16:09,020
This isn't a compromise. It is the optimal division of labor.

1997
01:16:09,020 --> 01:16:12,820
Machines excel at applying consistent rules to large volumes of content.

1998
01:16:12,820 --> 01:16:17,620
Humans excel at interpreting nuance, handling exceptions and making judgment calls.

1999
01:16:17,620 --> 01:16:22,420
A governance system that tries to eliminate human judgment entirely will make expensive mistakes.

2000
01:16:22,420 --> 01:16:26,420
A governance system that relies on humans for routine tagging will never scale.

2001
01:16:26,420 --> 01:16:28,220
The middle ground is where the value lives.

2002
01:16:28,220 --> 01:16:31,020
Microsoft PerView integration becomes critical at scale.

2003
01:16:31,020 --> 01:16:35,020
PerViews, patent detectors, trainable classifiers and custom information types

2004
01:16:35,020 --> 01:16:37,620
create a solid foundation for automated labeling.

2005
01:16:37,620 --> 01:16:40,820
The middleware can call PerView APIs during ingestion to scan content,

2006
01:16:40,820 --> 01:16:43,620
receive classification results and apply labels automatically.

2007
01:16:43,620 --> 01:16:46,620
This connects content detection directly to policy enforcement,

2008
01:16:46,620 --> 01:16:49,620
extending PerView's governance reach to every file in the estate,

2009
01:16:49,620 --> 01:16:52,820
regardless of whether a user ever touched a drop down.

2010
01:16:52,820 --> 01:16:55,820
Threatening and resilience are operational concerns that become important

2011
01:16:55,820 --> 01:16:58,620
once the middleware is handling thousands of events per day.

2012
01:16:58,620 --> 01:17:02,420
Graph API imposes rate limits that vary by workload and tenant size.

2013
01:17:02,420 --> 01:17:05,620
The middleware must implement retry logic with exponential back-off,

2014
01:17:05,620 --> 01:17:10,620
respect retry after headers and use delta queries to minimize unnecessary API calls.

2015
01:17:10,620 --> 01:17:14,820
Circuit breakers should pause ingestion if repeated failure suggests a service issue,

2016
01:17:14,820 --> 01:17:17,220
preventing a cascade of errors and back pressure.

2017
01:17:17,220 --> 01:17:21,020
A circuit breaker works by monitoring the failure rate of graph API calls

2018
01:17:21,020 --> 01:17:22,220
over a sliding window.

2019
01:17:22,220 --> 01:17:24,620
If the failure rate exceeds a defined threshold,

2020
01:17:24,620 --> 01:17:27,620
such as 50% of calls failing within a 60 second window,

2021
01:17:27,620 --> 01:17:30,220
the breaker opens and stops sending new requests.

2022
01:17:30,220 --> 01:17:34,820
Instead, it returns a cash response or queues the event for retry once the breaker closes.

2023
01:17:34,820 --> 01:17:37,420
This prevents the middleware from hammering and already struggling service

2024
01:17:37,420 --> 01:17:38,820
and gives graph time to recover.

2025
01:17:38,820 --> 01:17:42,020
These patterns are well documented in the graph SDK and should be treated

2026
01:17:42,020 --> 01:17:44,620
as standard infrastructure, not optional extras.

2027
01:17:44,620 --> 01:17:48,020
Security and permissions deserve attention during the design phase.

2028
01:17:48,020 --> 01:17:51,620
The middleware needs application permissions to read and write metadata

2029
01:17:51,620 --> 01:17:54,620
across the estate, which means it operates with significant authority.

2030
01:17:54,620 --> 01:17:58,420
Principle of least privilege applies here just as it does for human administrators.

2031
01:17:58,420 --> 01:18:01,420
The middleware should only request the specific permissions it needs.

2032
01:18:01,420 --> 01:18:05,220
Read user profiles and group memberships, read and write file metadata,

2033
01:18:05,220 --> 01:18:07,020
and apply sensitivity labels.

2034
01:18:07,020 --> 01:18:10,820
It shouldn't request male access, calendar access, or administrative privileges

2035
01:18:10,820 --> 01:18:15,020
unless those are specifically required for the governance rules it enforces.

2036
01:18:15,020 --> 01:18:17,420
Permission scopes should be documented, reviewed regularly,

2037
01:18:17,420 --> 01:18:19,820
and audited just like any other privileged access.

2038
01:18:19,820 --> 01:18:23,620
Logging and observability are non-negotiable for production governance middleware.

2039
01:18:23,620 --> 01:18:26,420
Every classification decision should be logged with the source file,

2040
01:18:26,420 --> 01:18:29,220
the signals used, the rules applied, the confidence score,

2041
01:18:29,220 --> 01:18:31,220
and the final metadata values.

2042
01:18:31,220 --> 01:18:34,220
These logs become the audit trail that proves compliance to regulators

2043
01:18:34,220 --> 01:18:35,820
and demonstrates value to executives.

2044
01:18:35,820 --> 01:18:39,620
Dashboards should show daily throughput, classification accuracy trends,

2045
01:18:39,620 --> 01:18:41,420
exception counts, and error rates.

2046
01:18:41,420 --> 01:18:44,620
Alerts should fire when classification accuracy drops below a threshold

2047
01:18:44,620 --> 01:18:47,620
when error rates spike, or when the middleware encounters a content type

2048
01:18:47,620 --> 01:18:49,220
it has never seen before.

2049
01:18:49,220 --> 01:18:52,420
Governance without observability is governance without accountability.

2050
01:18:52,420 --> 01:18:56,020
Change management is often underestimated in governance automation projects.

2051
01:18:56,020 --> 01:18:58,820
Users who have spent years ignoring drop downs may not notice

2052
01:18:58,820 --> 01:19:01,220
when the middleware starts tagging files automatically,

2053
01:19:01,220 --> 01:19:04,020
but power users who build workflows around manual metadata

2054
01:19:04,020 --> 01:19:05,420
won't ice immediately.

2055
01:19:05,420 --> 01:19:08,820
They may have sharepoint views filtered by specific metadata values,

2056
01:19:08,820 --> 01:19:11,620
power automate flows triggered by manual tag changes,

2057
01:19:11,620 --> 01:19:14,620
or reporting dashboards that depend on user entered categories.

2058
01:19:14,620 --> 01:19:18,220
Any of these can break when the middleware introduces new metadata sources

2059
01:19:18,220 --> 01:19:21,420
or changes existing values, a communication plan that explains

2060
01:19:21,420 --> 01:19:23,020
what is changing, why it's changing,

2061
01:19:23,020 --> 01:19:27,020
and how existing workflows will be preserved is essential for maintaining trust.

2062
01:19:27,020 --> 01:19:30,220
The roadmap isn't about buying a product, it is about shifting a mindset.

2063
01:19:30,220 --> 01:19:33,020
The old mindset says governance is a user responsibility.

2064
01:19:33,020 --> 01:19:35,620
Train them, remind them, enforce compliance.

2065
01:19:35,620 --> 01:19:38,820
The new mindset says governance is an architectural responsibility.

2066
01:19:38,820 --> 01:19:41,820
Build it into the system, let the middleware handle the routine,

2067
01:19:41,820 --> 01:19:43,420
let humans handle the exceptions,

2068
01:19:43,420 --> 01:19:46,820
and measure success by how little users have to think about metadata

2069
01:19:46,820 --> 01:19:48,820
while their content remains fully governed.

2070
01:19:48,820 --> 01:19:50,820
Measuring success requires new metrics

2071
01:19:50,820 --> 01:19:53,020
because the old metrics no longer apply.

2072
01:19:53,020 --> 01:19:56,220
Adoption rates become irrelevant when users aren't asked to adopt anything.

2073
01:19:56,220 --> 01:20:00,020
Instead, measure metadata completeness as a percentage of eligible files

2074
01:20:00,020 --> 01:20:03,420
that received automated classification within 24 hours of creation,

2075
01:20:03,420 --> 01:20:07,820
measure classification accuracy through periodic human review of random samples,

2076
01:20:07,820 --> 01:20:12,420
measure search effectiveness by tracking user search success rates and time to result.

2077
01:20:12,420 --> 01:20:15,020
Measure compliance posture by the percentage of sensitive files

2078
01:20:15,020 --> 01:20:18,620
that carry correct labels and the time required to respond to audit requests.

2079
01:20:18,620 --> 01:20:21,620
These metrics describe outcomes rather than activities

2080
01:20:21,620 --> 01:20:25,420
and they align governance measurement with business value rather than with process compliance.

2081
01:20:25,420 --> 01:20:29,020
The ultimate measure of success is whether governance becomes invisible.

2082
01:20:29,020 --> 01:20:31,220
When users no longer think about metadata

2083
01:20:31,220 --> 01:20:33,220
because the system handles it automatically

2084
01:20:33,220 --> 01:20:35,820
when compliance offices no longer chase missing labels

2085
01:20:35,820 --> 01:20:37,420
because the coverage is complete,

2086
01:20:37,420 --> 01:20:40,420
when search just works because the underlying data is clean,

2087
01:20:40,420 --> 01:20:42,620
then governance has achieved its purpose.

2088
01:20:42,620 --> 01:20:45,020
It has moved from being an obstacle to being an enabler

2089
01:20:45,020 --> 01:20:47,220
and that shift from friction to foundation

2090
01:20:47,220 --> 01:20:49,420
is what makes the entire investment worthwhile.

2091
01:20:49,420 --> 01:20:52,220
The drop-down isn't coming back and that's a good thing.

2092
01:20:52,220 --> 01:20:54,620
Manual tagging was never a governance strategy.

2093
01:20:54,620 --> 01:20:58,220
It was a hope that users would do unpaid work with no visible payoff

2094
01:20:58,220 --> 01:21:01,020
and that hope is collapsing under the weight of hybrid work,

2095
01:21:01,020 --> 01:21:03,020
regulatory pressure and AI readiness.

2096
01:21:03,020 --> 01:21:05,820
The alternative isn't more training or better drop-downs.

2097
01:21:05,820 --> 01:21:10,020
It is moving governance logic out of the interface and into the architecture.

2098
01:21:10,020 --> 01:21:13,220
When classification happens automatically through graph API middleware

2099
01:21:13,220 --> 01:21:16,020
compliance becomes consistent, search becomes accurate

2100
01:21:16,020 --> 01:21:19,620
and your data becomes ready for the AI agents that are already on their way.

2101
01:21:19,620 --> 01:21:21,820
If this changed how you think about metadata governance,

2102
01:21:21,820 --> 01:21:24,020
follow me, Mirko Peters, on LinkedIn.

2103
01:21:24,020 --> 01:21:27,620
I post regularly about M365 architecture power platform strategy

2104
01:21:27,620 --> 01:21:29,820
and the governance patterns that actually scale.

2105
01:21:29,820 --> 01:21:33,020
The next video dives deeper into the middleware implementation

2106
01:21:33,020 --> 01:21:36,220
with specific code patterns and Azure function configurations

2107
01:21:36,220 --> 01:21:40,020
and share this with your team, especially if you're dealing with metadata chaos right now.

2108
01:21:40,020 --> 01:21:42,620
The organizations that figure this out first will have an advantage

2109
01:21:42,620 --> 01:21:45,220
that compounds over time because the real question isn't

2110
01:21:45,220 --> 01:21:47,220
whether you can afford to automate governance.

2111
01:21:47,220 --> 01:21:49,620
The real question is whether you can afford not to.

2112
01:21:49,620 --> 01:21:52,020
The organizations that answer this question correctly

2113
01:21:52,020 --> 01:21:55,620
will build knowledge graphs that power the next decade of enterprise intelligence.

2114
01:21:55,620 --> 01:21:59,420
The organizations that delay will spend that decade cleaning up the metadata mess

2115
01:21:59,420 --> 01:22:01,820
that manual tagging guaranteed they would have.

Mirko Peters

Founder of m365.fm, m365.show and m365con.net

Mirko Peters is a Microsoft 365 expert, content creator, and founder of m365.fm, a platform dedicated to sharing practical insights on modern workplace technologies. His work focuses on Microsoft 365 governance, security, collaboration, and real-world implementation strategies.

Through his podcast and written content, Mirko provides hands-on guidance for IT professionals, architects, and business leaders navigating the complexities of Microsoft 365. He is known for translating complex topics into clear, actionable advice, often highlighting common mistakes and overlooked risks in real-world environments.

With a strong emphasis on community contribution and knowledge sharing, Mirko is actively building a platform that connects experts, shares experiences, and helps organizations get the most out of their Microsoft 365 investments.