May 30, 2026

The Copilot Tax: Why Your AI Strategy is Bleeding Cash

The Copilot Tax: Why Your AI Strategy is Bleeding Cash
The Copilot Tax: Why Your AI Strategy is Bleeding Cash
M365 FM Podcast
The Copilot Tax: Why Your AI Strategy is Bleeding Cash

In this episode of M365.fm, host Mirko Peters explores the hidden cost behind many enterprise AI initiatives: what he calls the “Copilot Tax.” While organizations often focus on licenses, adoption metrics, and productivity gains, the real challenge lies in the growing operational complexity, governance overhead, and architectural debt created when AI is deployed into environments that were never designed for probabilistic systems.

The episode argues that Microsoft Copilot is not simply another software feature. It changes how decisions are made, how information is interpreted, and how accountability works inside the enterprise. Traditional Microsoft 365 environments are built around deterministic workflows where actions are traceable and predictable. AI systems operate differently, generating outputs based on context, inference, and probability, making governance, auditing, and risk management significantly more difficult.

Listeners learn why many AI strategies unintentionally scale confusion instead of performance. Key topics include AI observability, decision ownership, governance failures, data quality challenges, and the importance of defining clear boundaries around what AI should and should not decide. The discussion introduces concepts such as Mean Time To Explain (MTTE) and explains why explainability is becoming a critical metric for enterprise AI success.

Rather than treating Copilot as a productivity tool, the episode encourages IT leaders, architects, CIOs, and governance teams to view AI as a new execution layer that requires redesigned architecture, stronger controls, and continuous oversight. The core message is clear: organizations that deploy AI without rethinking governance, accountability, and system design risk paying an ever-growing Copilot Tax in the form of increased complexity, operational friction, and unmanaged risk.

Apple Podcasts podcast player iconSpotify podcast player iconYoutube Music podcast player iconSpreaker podcast player iconPodchaser podcast player iconAmazon Music podcast player icon

You may think the $30 per user fee for Copilot covers everything, but the copilot tax runs much deeper. Every time you use Copilot for AI-powered tasks, hidden costs like token consumption and compute usage quietly add up. If you ignore these operational inefficiencies, your money can slip away fast. Leaders who track these costs gain real control over AI spending and see better returns.

Key Takeaways

  • The Copilot Tax includes hidden costs beyond the $30 per user fee, such as token consumption and compute usage.
  • Regularly review your Copilot usage to identify idle licenses and avoid unnecessary expenses.
  • Invest in training and onboarding to maximize your team's efficiency with Copilot and reduce operational waste.
  • Establish clear cost centers to track AI spending by department, helping to identify areas for optimization.
  • Use metrics like cost per hour saved and productivity lift to measure the value of your AI investment.
  • Negotiate contracts strategically to leverage your organization's size and secure better terms.
  • Implement strong governance practices to monitor compliance and manage risks associated with AI usage.
  • Conduct regular audits to ensure your AI tools align with business goals and deliver expected returns.

What Is the Copilot Tax?

Defining Copilot Tax

You may hear the term copilot tax when people talk about the real price of using Copilot in your organization. This tax is not just the $30 per user each month. It describes the full economic impact of Copilot AI on your business. When you pay for Copilot, you do more than buy a tool. You set a new standard for how your team works. This tax pushes you to improve your data governance and rethink your workflows. Microsoft uses this fee to create a new baseline for productivity across companies worldwide. As a result, you need to pay close attention to how Copilot AI changes your daily operations and your budget.

Why Copilot Costs Add Up

You might think the main cost comes from the license fee, but many hidden factors drive up your spending. These extra costs can surprise you if you do not track them closely. Here are the main reasons why Copilot costs add up:

  • Billing rates change based on the features and capabilities you use.
  • Copilot credits get used up as you design and run different agents.
  • The licensing structure pools capacity across your organization, which can make it hard to see where your money goes.
  • Token consumption increases as more users interact with Copilot AI.
  • Compute usage grows when you process large documents or run complex tasks.
  • Zombie licenses, or unused seats, still cost you money.
  • The verification tax appears when you need to check or validate AI outputs.
  • Operational inefficiencies, like poor onboarding or lack of training, waste both time and resources.

Tip: Tax professionals recommend regular reviews of your Copilot usage to avoid paying for idle licenses or wasted tokens.

Impact on AI Budgets

The copilot tax can have a big effect on your overall AI budget. You may start with a simple license fee, but as your team uses Copilot AI more, your costs can rise quickly. Many organizations find it hard to predict these expenses because AI usage changes all the time. When your team relies on Copilot for daily tasks, you see more interactions and higher operating costs, especially with large projects.

The table below shows the main cost factors that can impact your AI budget:

Cost FactorDescription
Licensing CostsThe starting price for Copilot licenses.
Usage-Based FeesExtra charges based on how much you use Copilot AI.
Training CostsMoney spent to teach your team how to use Copilot.
Auto-Enabled SeatsCosts from user seats that are turned on but not always used.
Storage GrowthMore storage needed as Copilot AI handles more data.
Shadow AIUnmanaged AI tools that can create surprise expenses.
Unmanaged RenewalsRenewed licenses without careful review.
Optimization StrategiesSteps you take to control spending, like better governance and analytics.

Only about 20% of organizations use Microsoft's AI tools at a large scale. Many buy Copilot but do not fully deploy it. Sometimes, the promised plug-and-play experience does not match what you need in real life. This gap can lead to wasted money and missed opportunities. Tax professionals often see companies struggle with these hidden costs. You need to watch your spending and make sure every dollar helps your business grow.

Main Cost Drivers of Copilot AI

Main Cost Drivers of Copilot AI

Subscription and Usage Fees

Per-User and Usage-Based Pricing

You pay a base licensing fee for Copilot, but the real cost often goes beyond that. Copilot charges $30 per user each month, and you may see extra fees for compute, AI APIs, or premium models. These usage-based charges can make your total cost rise by 40% to 100% depending on your plan. Many organizations find the pricing structure complex, with mandatory add-ons and hidden costs that can surprise you.

Feature/Cost AspectMicrosoft CopilotOther Enterprise AI Tools
Base Licensing Fee$30 per user/monthVaries by tool
Additional Usage ChargesYes (Compute, AI APIs, premium models)Varies by tool
Total Cost Increase40% to 100% depending on base planVaries by tool
Complexity of PricingHigh (mandatory add-ons, hidden costs)Varies by tool

You may also face unpredictable pay-as-you-go billing, which makes it hard to plan your AI budget. Licenses often come bundled with add-ons, increasing your overall tax burden.

Idle Licenses and Token Waste

Idle licenses, sometimes called zombie licenses, create a hidden copilot tax. When users do not use their assigned seats, you still pay for them. Token waste happens when Copilot AI processes unnecessary or repeated queries, using up valuable compute resources. This waste can push your costs higher without adding value. Tax professionals recommend regular audits to find and remove these idle licenses.

Engineering Overhead

Training, Onboarding, and Support

You need to invest in training and onboarding to help your team use Copilot AI effectively. This process takes time and resources. Support costs can rise as users ask for help or need troubleshooting. If you skip proper onboarding, you risk pilot purgatory, where your AI project stalls and never reaches full deployment. This purgatory drains efficiency and increases your implementation risk.

Cost CategoryPrimary Cost Driver
Platform licensingNumber of users and feature tier
Model usageQuery volume and context size
Operations and supportNumber of workflows and business units

Integration and Workflow Challenges

Compatibility Issues

Integrating Copilot AI with your existing systems can be complex. You may need to hire consultants or buy extra software to make everything work together. This adds to your tax and can slow down your project.

Workflow Disruptions

Workflow disruptions can lower accuracy and reliability. If Copilot AI does not fit smoothly into your daily tasks, you may see productivity drop. Low adoption rates lead to underutilization, which means you pay for features you do not use. The absence of native tools to track usage makes it hard to measure return on investment. You may also face unpredictable usage-based billing, which complicates your financial planning.

Note: Strong governance and regular reviews help you avoid these hidden costs and improve your AI efficiency.

Hidden and Operational Expenses

Productivity Losses

You may notice that productivity drops when you deploy copilot ai without proper planning. Copilot can automate tasks and boost efficiency, but operational inefficiencies often lead to wasted time. If users do not receive enough training, they may struggle to use copilot ai effectively. This confusion can cause repeated queries, which increases token consumption and compute usage. You pay for these extra resources, but you do not see a real improvement in productivity. Tax professionals often point out that idle licenses and underutilized features create hidden tax burdens. When you track usage and focus on high-value applications, you protect your budget and improve accuracy.

Tip: Regular audits help you identify idle licenses and wasted tokens, so you can optimize your copilot deployment and maximize productivity.

Security and Compliance Risks

Security and compliance risks can create unexpected financial impact for your organization. Misconfigured permissions may allow unauthorized access, which leads to compliance violations and costly penalties. Oversharing sensitive data increases the risk of data breaches, causing financial losses and damaging your reputation. Content sprawl results in inefficient resource allocation and higher storage costs. You must monitor these risks to avoid hidden operational expenses.

Risk FactorImpact on Operational Expenses
Misconfigured permissionsUnauthorized access and compliance violations, increasing costs due to penalties and remediation
Oversharing of sensitive dataData breaches, financial losses, reputational damage
Content sprawlInefficient resource allocation, increased storage costs
  • Increased licensing costs often occur when users misunderstand copilot capabilities.
  • Non-compliance penalties arise from improper data management.
  • Cybersecurity risks can lead to significant financial losses.

You can reduce these risks by defining clear governance policies and conducting ongoing access reviews. Monitoring usage metrics helps you maintain compliance and protect sensitive data.

Data and API Costs

Data and API costs can escalate quickly when you integrate copilot with your enterprise systems. You pay $30 per user per month for copilot, but you also need supporting infrastructure. Professional services for deployment and configuration add to your expenses. Training programs for end users increase your spending. Ongoing management overhead pushes your costs higher over time. Many organizations see a 15-20% increase in overall Microsoft spending after integrating copilot ai.

Cost ComponentDescription
Licensing$30 per user per month, plus additional infrastructure licensing
Professional ServicesDeployment and configuration costs
Training ProgramsExpenses for end user training
Ongoing Management OverheadCosts for managing integration over time
Overall Spending Increase15-20% rise in Microsoft spending with AI integration
  • Copilot requires existing Microsoft 365 E3/E5 or Business Standard/Premium subscriptions.
  • Users on lower-tier licenses may need upgrades, which adds to your tax burden.
  • Usage-based charges for AI APIs and premium models increase your overall cost.

You can control these expenses by aligning subscription plans with your business needs and monitoring usage. Governance and prompt discipline help you avoid unnecessary spending and maintain a sustainable AI budget.

Copilot vs. Alternatives

Copilot vs. Alternatives

Open Source and In-House AI

You may wonder if open source or internal ai solutions can help you avoid the copilot tax. Many organizations choose open-source tools because they eliminate token fees and reduce strategic dependency on a single vendor. You do not face vendor lock-in, which means you have more control over your ai costs. Open-source ai requires you to invest in infrastructure and maintenance, but you often see long-term savings. For example, a global law firm switched from cloud-based tools to a secure, on-prem internal ai model. This move eliminated token fees and cut annual ai spending by $500,000. You also improve compliance and audit readiness with this approach.

Here are some key points to consider:

  • Open-source solutions remove token fees and lower total cost of ownership.
  • Internal ai gives you more control and flexibility.
  • Proprietary solutions like copilot involve ongoing subscription and token-based fees.
  • Open-source ai needs technical resources for setup and support.

If your team has the right skills, you can use internal ai to drive innovation and keep costs predictable.

Competing AI Tools

You have many choices when you compare copilot ai to other leading tools. Some popular options include ChatGPT, Tabnine, and Codeium. Each tool has its own pricing and features. The table below shows how copilot compares to ChatGPT across different tiers:

TierChatGPTMicrosoft Copilot
Free$0 (GPT-4o-mini, limited)$0 (basic, session context)
Individual Pro$20/mo (Plus) / $200/mo (Pro)$30/mo (Copilot Pro)
Team / Small Business$25/seat/mo (Team)$30/seat/mo (M365 add-on)
Enterprise~$30/seat/mo (custom)$30/seat/mo (M365 add-on)
API AccessPay-per-token (OpenAI API)Azure OpenAI (consumption-based)
Annual Discount~17% on Team planBundled with M365 agreements

Bar chart comparing ChatGPT and Copilot pricing across tiers

You see that copilot offers a free tier and a Pro version at $30 per month. ChatGPT Plus costs $20 per month and gives you more raw ai capability per dollar. If your organization already uses Microsoft 365 E5, adding copilot is a cost-effective choice. ChatGPT may work better for independent users or small teams.

Cost Divergence Explained

You need to understand why costs diverge between copilot and other ai tools. The main factors include features, support, and compliance. Some tools, like Codeium, offer unlimited completions for free. Others, such as Tabnine Enterprise, charge $59 per user for advanced features and compliance support. Copilot sits in the middle, balancing price and integration with Microsoft 365.

SolutionCost (Monthly)Features
GitHub Copilot$10Code suggestions, chat features, CLI assistance
Cursor Pro$20Unlimited completions, 500 fast premium requests per month
Tabnine Enterprise$59/userAdvanced features, team management, compliance features, custom model training
CodeiumFreeUnlimited completions, meaningful functionality for individual developers
Mid-tier options$10-20Balance features with affordability, suitable for freelancers and small teams
Enterprise plans$19-59/userOrganization-wide policy management, IP indemnity, enhanced security

You see higher costs when you need advanced compliance, team management, or custom model training. Lower-cost options may lack these features but support rapid innovation for smaller teams. You must weigh the value of integration, support, and compliance against the copilot tax. Careful selection helps you avoid unnecessary spending and maximize revenue from your ai investments.

Auditing Your Copilot Spending

Identifying Cost Centers

You need to know where your money goes when you invest in ai tools. Start by setting up cost centers for each department or business unit. This helps you see which teams use the most resources. Many organizations use cost center features to track spending and hold teams accountable. You can create cost centers for different programs or projects. This makes it easier to manage your budget and spot areas where you spend too much.

Here are some practical steps to identify major cost centers:

  • Use usage-based cost projections to predict future expenses.
  • Track budgets at the department level to see if you stay within limits.
  • Watch for unusual spending patterns that may signal waste.
  • Look for ways to optimize costs, such as adjusting licenses or features.

When you break down your spending, you can find hidden costs like zombie licenses or idle usage. Regular reviews help you keep your ai budget under control.

Tracking Usage and Value

You must measure how much value you get from your ai investment. Tracking usage and value helps you see if copilot delivers the results you expect. Use clear metrics to compare costs and benefits.

MetricDescription
Cost per hour savedHow much money you save for each hour of work automated by ai.
Payback periodHow long it takes for benefits to match your initial investment.
Productivity liftThe percentage increase in efficiency from using ai.
License utilization ratesThe share of licenses that users actually use.
Training completion ratesThe number of users who finish ai training.
User satisfaction levelsFeedback from users about their experience with ai.

Tip: Review these metrics every quarter. This helps you spot trends and make better decisions about your ai tools.

Tools for Expense Auditing

You can use several tools to audit your ai spending and find waste. Many mid-sized firms lose thousands each month on inactive accounts. Automating license reclamation can save money and improve your budget health. For example, LicenseIQ offers workflows that reclaim licenses after 30 days of inactivity. This keeps your license pool fresh and reduces waste.

A professional services firm with 350 seats reviewed their Microsoft stack, including copilot licenses. They found surprising usage patterns and saved over a million rupees in one year without changing tools. Before you add more seats, always pull usage reports to see who actually uses ai. This prevents you from paying for unused or redundant licenses.

Note: Effective auditing tools help you measure program effectiveness and avoid wasted budgets. Regular reviews and accountability measures keep your ai spending in check.

Reducing the Copilot Tax

License and Contract Negotiation

You can lower your copilot costs by taking a strategic approach to license and contract negotiation. Start by using your organization's size as leverage. Larger companies often have more bargaining power. Microsoft Account Executives want to include copilot in contracts, so you can use this motivation to your advantage.

Consider these proven strategies:

  • Negotiate phased implementations. This lets you evaluate copilot's impact before committing to a full rollout.
  • Have open conversations with licensing resellers. Transparency helps you find the best deal.
  • Offer to share your success story or participate in case studies. In return, you may receive discounts or added value.
  • Review contract terms carefully. Make sure you understand all fees and renewal clauses.

When you negotiate with confidence, you gain more control over your ai budget. You also set the stage for better long-term partnerships.

Optimizing Adoption and Training

Maximizing the value of ai tools depends on strong adoption and effective training. You want your team to use copilot in ways that boost productivity and accuracy. Companies have found creative ways to encourage adoption and reduce unnecessary spending.

  • A retail company launched a "31 Days of Copilot" campaign. Daily tips helped users try 12 new functions in one month.
  • A financial services firm used a digital adoption platform for in-app guidance. This increased usage by 36% compared to teams without support.
  • A construction company built a tiered support model. Copilot Champions, Power Users, and a Center of Excellence provided local help and kept training consistent.

You can also save money by rightsizing licenses and using analytics. Many organizations cut copilot spending by 20-35% with this approach. Track how much time your team saves and how often they automate tasks. Some users report saving over 10 hours per month when they use ai actively.

Tip: Focus on high-value users and monitor adoption rates. This ensures you invest in the right places and avoid wasted licenses.

Governance and Deployment Discipline

Strong governance and disciplined deployment help you control ai costs and maintain efficiency. Assign clear responsibilities to different teams. Each group plays a key role in managing copilot and related ai tools.

FunctionResponsibility
ITPermissions architecture, tenant configuration, admin controls, audit logging
SecurityThreat monitoring, DLP policies, incident response for copilot-related events
LegalAUP development, compliance framework mapping, contract, and IP risk
HREmployee training, AUP acknowledgment, acceptable use enforcement
Business leadersUse case approval, team-level rollout decisions, shadow AI reporting

You must also follow important frameworks to stay compliant and protect your data.

FrameworkKey Considerations
GDPRLawful basis, data rights, retention, cross-border transfers
HIPAABAA required, PHI access must follow the minimum necessary rule
SOC 2Include copilot in scope, logs, access reviews, and DLP evidence
ISO 27001Risk assessment, supplier security, and access controls

To keep your ai environment secure and cost-effective, take these steps:

  • Complete permission audits and fix oversharing.
  • Apply sensitivity labels for data classification.
  • Configure and test DLP policies.
  • Enforce multi-factor authentication and conditional access.
  • Review and acknowledge acceptable use policies.
  • Enable audit logs and send them to your SIEM.
  • Regularly review agents and connectors.

Note: Ongoing monitoring and prompt discipline prevent costs from spiraling. You protect your investment and support sustainable ai growth.

Reviewing AI Tool Usage

You need to review your AI tool usage regularly to keep your Copilot costs under control. Many organizations buy licenses and expect instant results. Real value comes from understanding how your team uses these tools over time. You can spot waste, improve productivity, and make smarter decisions when you track usage closely.

Start by setting a schedule for reviews. Early in your Copilot journey, focus on how many people use the tool and how often they log in. As your team grows, look for ways Copilot helps save time and boost productivity. When your organization reaches maturity, measure how Copilot supports innovation and gives you an edge over competitors. In the optimization stage, check if you get the most value for your money and if Copilot aligns with your business goals.

Here is a simple table to guide your review process at each stage:

StageFocus Area
Early StageUsage rates and basic adoption metrics
Growth StageProductivity improvements and time savings
Maturity StageInnovation metrics and competitive advantage
Optimization StageCost efficiency and strategic impact

You should use both quantitative and qualitative data. Pull reports that show license activity, token consumption, and feature usage. Ask users for feedback about their experience. This helps you find out which features matter most and which ones go unused.

Consider these steps to make your reviews effective:

  • Set clear goals for each review period.
  • Compare actual usage to your original expectations.
  • Identify licenses that remain idle for more than 30 days.
  • Look for patterns in token consumption and workflow changes.
  • Adjust your deployment based on what you learn.

Tip: Involve team leaders in the review process. They can provide insights about workflow changes and help you spot areas for improvement.

You can also benchmark your results against similar organizations. This shows if you lead or lag in AI adoption and helps you set realistic targets. Regular reviews keep your AI strategy on track and ensure you get the best return on your investment.

Building a Sustainable AI Budget

Forecasting Ongoing Costs

You need to predict your ongoing costs to keep your AI budget healthy. Start with a pilot deployment for two or three months. Measure how much your team actually uses Copilot during this time. Use these numbers to estimate future expenses, adding a safety margin for unexpected growth. You can also run a prototyping exercise. Simulate real workflows and transactions to see how Copilot fits your business. Watch how your team interacts with the tool. This helps you model expected volumes before you roll out Copilot to everyone.

  • Begin with a small pilot to measure real usage.
  • Track token consumption and compute usage.
  • Use the data to forecast costs for a larger rollout.
  • Add a buffer to your estimates for safety.

This approach gives you a clear view of your AI spending. You avoid surprises and make better decisions for your organization.

Aligning AI Spend with Goals

You should connect your AI investments to your business goals. This ensures your spending supports growth and revenue. Focus on high-impact roles first. Assign Copilot licenses to the teams that will benefit most. Train these teams on real workflows so they can use Copilot to its full potential. Embed Copilot into daily work processes. This makes AI a natural part of your operations and drives innovation.

StepDescription
1Align licenses to high-impact roles to ensure effective use of Copilot.
2Train teams on real workflows to maximize the tool's potential.
3Embed Copilot into daily work processes for seamless integration.
4Measure outcomes that matter to the business to assess true ROI.

Measure the results that matter most to your business. Track how Copilot helps you save time, improve accuracy, and increase revenue. When you align your AI strategy with your goals, you get the best return on your investment.

Accountability for AI Expenses

You must hold your teams accountable for AI spending. Keep a complete inventory of all Copilot-related assets. Track key metrics like usage and policy violations. Gather feedback from users to make sure your policies work in practice. Use scheduled and dynamic reports to keep leaders informed. Monitor new AI capabilities and regulations to stay ahead of risks.

Centralized governance helps you manage AI spending as a portfolio. Build dashboards that show cost and usage data. Assign an owner for AI cost management. Set approval levels for new services and standard contract terms. Run phased rollouts with clear cost-per-outcome metrics. Optimize your models and prompts to reduce waste. Negotiate contracts that are clear and fair.

Make cost awareness part of your culture. Set up approval workflows and share budget ownership. Schedule quarterly reviews with different teams. Show cost metrics to everyone so they use AI responsibly. This discipline helps you control expenses and protect your revenue.

Tip: When you make accountability part of your process, you build a sustainable AI budget that supports growth and innovation.


You face the copilot tax through hidden costs like monitoring, data cleansing, and governance staffing, which can reach up to 80% of your project spend. These expenses impact your AI budget and can slow productivity gains. To control copilot ai costs, you need real-time visibility, regular audits, and strong governance.

Delivering real business value requires a clear-eyed view of the hidden expenses: data readiness, security, and continuous governance.

Tax professionals recommend phased rollouts and disciplined deployment. Reassess your AI strategy often to maximize productivity and ensure every tax dollar drives results.

FAQ

What is the Copilot Tax?

You pay more than just the license fee for Copilot. The Copilot Tax includes hidden costs like token consumption, compute usage, and operational overhead. These expenses can add up quickly if you do not monitor them.

How can I spot zombie licenses?

Check your user activity reports. Look for licenses that show no usage for 30 days or more. Remove or reassign these seats to avoid paying for unused accounts.

Does Copilot require extra Microsoft subscriptions?

Yes. You need Microsoft 365 E3/E5 or Business Standard/Premium. If you use lower-tier plans, you may need to upgrade. This adds to your overall AI spending.

How do I reduce token waste?

Train your team to use Copilot efficiently. Encourage clear prompts and avoid repeated queries. Regular audits help you find and fix unnecessary token usage.

What is the verification tax?

The verification tax is the time and resources you spend checking AI outputs for accuracy. You pay for this through extra work hours and possible delays.

Can I negotiate Copilot pricing?

You can negotiate with Microsoft, especially if you buy many licenses. Ask for phased rollouts, bundled discounts, or custom terms. Always review your contract before signing.

What tools help track Copilot costs?

Tool TypeExample Use
License AuditingFind idle licenses
Usage AnalyticsMonitor token consumption
Cost DashboardsVisualize spending trends

Use these tools to keep your AI budget under control.

🚀 Want to be part of m365.fm?

Then stop just listening… and start showing up.

👉 Connect with me on LinkedIn and let’s make something happen:

  • 🎙️ Be a podcast guest and share your story
  • 🎧 Host your own episode (yes, seriously)
  • 💡 Pitch topics the community actually wants to hear
  • 🌍 Build your personal brand in the Microsoft 365 space

This isn’t just a podcast — it’s a platform for people who take action.

🔥 Most people wait. The best ones don’t.

👉 Connect with me on LinkedIn and send me a message:
"I want in"

Let’s build something awesome 👊

1
00:00:00,000 --> 00:00:03,100
Your CFO just asked the question nobody on your team can answer.

2
00:00:03,100 --> 00:00:05,000
What exactly are we paying for?

3
00:00:05,000 --> 00:00:06,900
Not a trick question, a real one.

4
00:00:06,900 --> 00:00:09,260
18 months into your co-pilot deployment,

5
00:00:09,260 --> 00:00:12,940
the promise was two to five hours saved per week per user.

6
00:00:12,940 --> 00:00:14,700
The invoice says something different,

7
00:00:14,700 --> 00:00:16,340
and somewhere between the pilot presentation

8
00:00:16,340 --> 00:00:18,940
and the quarterly budget review three hidden cost layers

9
00:00:18,940 --> 00:00:20,860
appeared that nobody warned you about.

10
00:00:20,860 --> 00:00:22,140
By the end of this episode,

11
00:00:22,140 --> 00:00:23,980
you'll have a framework to audit every dollar

12
00:00:23,980 --> 00:00:26,260
your organization is spending on AI,

13
00:00:26,260 --> 00:00:29,100
and a clear picture of where that money is actually going.

14
00:00:29,100 --> 00:00:31,100
The entry fee nobody reads carefully.

15
00:00:31,100 --> 00:00:32,700
Let's start with the number everyone knows.

16
00:00:32,700 --> 00:00:34,220
$30 per user per month.

17
00:00:34,220 --> 00:00:35,060
It sounds simple.

18
00:00:35,060 --> 00:00:36,980
It isn't that $30 is an add-on.

19
00:00:36,980 --> 00:00:39,460
It sits on top of whatever your organization already pays

20
00:00:39,460 --> 00:00:41,140
for Microsoft 365.

21
00:00:41,140 --> 00:00:42,980
And if you're a mid to large enterprise,

22
00:00:42,980 --> 00:00:45,860
that base is almost certainly E3 or E5.

23
00:00:45,860 --> 00:00:47,900
E3 runs roughly $36 per user per month.

24
00:00:47,900 --> 00:00:49,540
E5 runs around $57.

25
00:00:49,540 --> 00:00:52,100
So before a single co-pilot prompt gets typed,

26
00:00:52,100 --> 00:00:55,500
you're already at $66 to $87 per user per month

27
00:00:55,500 --> 00:00:56,980
just to reach the starting line.

28
00:00:56,980 --> 00:00:58,660
And the starting line keeps moving.

29
00:00:58,660 --> 00:01:00,700
As of July 1st, 2026,

30
00:01:00,700 --> 00:01:02,900
Microsoft has implemented list price increases

31
00:01:02,900 --> 00:01:06,340
of nine to 33% across most business and enterprise plans.

32
00:01:06,340 --> 00:01:08,900
Business basic went from $6 to $7.50.

33
00:01:08,900 --> 00:01:11,900
The frontline F1 license went from $2.25 to $3.

34
00:01:11,900 --> 00:01:15,060
That's a 33% increase for your lowest cost users.

35
00:01:15,060 --> 00:01:17,740
E3 and E5 are up roughly nine to 10%.

36
00:01:17,740 --> 00:01:19,820
And these aren't negotiable the way they used to be

37
00:01:19,820 --> 00:01:22,420
because Microsoft eliminated enterprise agreement volume

38
00:01:22,420 --> 00:01:24,860
discounts in November 2025.

39
00:01:24,860 --> 00:01:27,420
Every customer now pays level A list price

40
00:01:27,420 --> 00:01:29,660
regardless of how many seats they buy.

41
00:01:29,660 --> 00:01:31,300
The leverage that large enterprises used

42
00:01:31,300 --> 00:01:33,420
to have in renewal negotiations is gone.

43
00:01:33,420 --> 00:01:36,380
Now Microsoft has introduced the E7 Frontier Suite

44
00:01:36,380 --> 00:01:38,460
at $99 per user per month.

45
00:01:38,460 --> 00:01:42,100
It bundles E5 M365 co-pilot, the Entra Suite,

46
00:01:42,100 --> 00:01:45,340
and Agent 365 into a single SQ marketed

47
00:01:45,340 --> 00:01:49,100
as roughly 15% cheaper than buying each component separately.

48
00:01:49,100 --> 00:01:50,540
And that's probably accurate.

49
00:01:50,540 --> 00:01:52,220
But here's what the math doesn't show.

50
00:01:52,220 --> 00:01:54,340
You're now committing to a single vendor

51
00:01:54,340 --> 00:01:57,580
for identity, security, productivity, and AI

52
00:01:57,580 --> 00:01:58,900
in one multi-year contract.

53
00:01:58,900 --> 00:01:59,820
But that's not a discount.

54
00:01:59,820 --> 00:02:01,020
That's a dependency.

55
00:02:01,020 --> 00:02:04,380
Analyst watching Microsoft's commercial strategy

56
00:02:04,380 --> 00:02:07,980
since 2023 have summarized it in one phrase,

57
00:02:07,980 --> 00:02:09,940
charging more and giving less flexibility.

58
00:02:09,940 --> 00:02:12,740
The elimination of tiered volume pricing,

59
00:02:12,740 --> 00:02:14,860
the bundling of AI into high cost suites,

60
00:02:14,860 --> 00:02:17,660
the removal of standalone co-pilot-profile consumers.

61
00:02:17,660 --> 00:02:19,300
These aren't isolated decisions.

62
00:02:19,300 --> 00:02:21,180
They're a coherent commercial strategy

63
00:02:21,180 --> 00:02:25,420
to normalize AI spend as a fixed rising cost of doing business.

64
00:02:25,420 --> 00:02:28,020
For finance teams, the shock comes at renewal.

65
00:02:28,020 --> 00:02:31,540
Organizations are seeing 40 to 50% effective price increases

66
00:02:31,540 --> 00:02:33,500
when they account for both the list price hikes

67
00:02:33,500 --> 00:02:35,620
and the loss of historical discount structures.

68
00:02:35,620 --> 00:02:38,980
A company that was paying $45 per user per month two years ago

69
00:02:38,980 --> 00:02:41,420
can now be looking at $65 to $75

70
00:02:41,420 --> 00:02:43,820
before co-pilot even enters the conversation.

71
00:02:43,820 --> 00:02:44,940
Here's the critical point.

72
00:02:44,940 --> 00:02:46,500
None of this is the co-pilot tax.

73
00:02:46,500 --> 00:02:47,500
This is just admission.

74
00:02:47,500 --> 00:02:50,500
The base license, the E5 stack, the Frontier Suite commitment,

75
00:02:50,500 --> 00:02:52,460
this is what you pay to get into the building.

76
00:02:52,460 --> 00:02:54,500
The tax is what happens once you're inside

77
00:02:54,500 --> 00:02:56,940
when the billing model shifts from predictable seats

78
00:02:56,940 --> 00:02:58,460
to something far less visible.

79
00:02:58,460 --> 00:03:00,140
The seat cost is on your invoice.

80
00:03:00,140 --> 00:03:01,980
What comes next, isn't.

81
00:03:01,980 --> 00:03:04,140
Two billing systems running simultaneously,

82
00:03:04,140 --> 00:03:05,820
so you've accepted the entry fee.

83
00:03:05,820 --> 00:03:07,140
You know what the seats cost.

84
00:03:07,140 --> 00:03:09,700
And now you think you know what AI costs your organization.

85
00:03:09,700 --> 00:03:11,820
You don't because Microsoft doesn't run one billing system

86
00:03:11,820 --> 00:03:12,660
for AI.

87
00:03:12,660 --> 00:03:15,660
It runs two and they operate on completely different logic.

88
00:03:15,660 --> 00:03:17,420
The first is licensing based.

89
00:03:17,420 --> 00:03:19,740
This is the world most finance teams understand.

90
00:03:19,740 --> 00:03:21,940
You assign the seat, you pay a fixed monthly rate,

91
00:03:21,940 --> 00:03:24,380
you forecast it by multiplying headcount by price.

92
00:03:24,380 --> 00:03:27,180
M365 co-pilot falls here, so does GitHub co-pilot,

93
00:03:27,180 --> 00:03:30,460
so does Teams premium, predictable, structured, easy to model.

94
00:03:30,460 --> 00:03:33,180
The number on the invoice matches the number in the spreadsheet.

95
00:03:33,180 --> 00:03:35,020
The second system is consumption based.

96
00:03:35,020 --> 00:03:36,780
This is where Azure Open Eye tokens live,

97
00:03:36,780 --> 00:03:38,180
where co-pilot credits get billed,

98
00:03:38,180 --> 00:03:40,020
where security compute units get consumed.

99
00:03:40,020 --> 00:03:41,340
Nothing here is fixed.

100
00:03:41,340 --> 00:03:44,340
Everything scales with usage, with how many prompts get fired,

101
00:03:44,340 --> 00:03:45,980
how long the context windows are,

102
00:03:45,980 --> 00:03:48,700
how many agent loops complete before a task resolves.

103
00:03:48,700 --> 00:03:50,940
The billing is variable, the visibility is low,

104
00:03:50,940 --> 00:03:52,740
and the reconciliation is manual.

105
00:03:52,740 --> 00:03:54,420
Now here's where it gets complicated.

106
00:03:54,420 --> 00:03:56,100
Some of Microsoft's products don't fit

107
00:03:56,100 --> 00:03:57,660
cleanly into either system.

108
00:03:57,660 --> 00:04:00,100
Co-pilot studio for example hits both at once.

109
00:04:00,100 --> 00:04:02,460
You pay a per user license to access the platform.

110
00:04:02,460 --> 00:04:04,180
Then the agents you build on that platform,

111
00:04:04,180 --> 00:04:06,460
consume co-pilot credits every time they run.

112
00:04:06,460 --> 00:04:09,340
And those credits appear not in your M365 admin center,

113
00:04:09,340 --> 00:04:10,580
but on your Azure bill.

114
00:04:10,580 --> 00:04:13,740
Two separate invoices, two separate approval chains, one product.

115
00:04:13,740 --> 00:04:15,860
The same logic applies to security co-pilot,

116
00:04:15,860 --> 00:04:19,700
which Microsoft bundled into M365 e5 in January, 2026,

117
00:04:19,700 --> 00:04:22,180
with a monthly security compute unit allocation.

118
00:04:22,180 --> 00:04:24,780
Use more than your allocation and you spill into variable

119
00:04:24,780 --> 00:04:25,940
as your consumption.

120
00:04:25,940 --> 00:04:28,340
A security analyst running a complex threat investigation

121
00:04:28,340 --> 00:04:30,100
can exhaust a month's SCU budget

122
00:04:30,100 --> 00:04:32,260
in a single afternoon of intensive queries,

123
00:04:32,260 --> 00:04:33,780
and that overrate shows up somewhere

124
00:04:33,780 --> 00:04:35,940
other than where your IT team is looking.

125
00:04:35,940 --> 00:04:38,180
This fragmentation has a practical consequence

126
00:04:38,180 --> 00:04:40,220
that most organizations don't fully appreciate

127
00:04:40,220 --> 00:04:42,220
until they're sitting in a budget review trying

128
00:04:42,220 --> 00:04:43,420
to explain a spike.

129
00:04:43,420 --> 00:04:45,980
There is no unified AI consumption dashboard.

130
00:04:45,980 --> 00:04:49,460
Finance teams reconciling what AI actually cost last quarter

131
00:04:49,460 --> 00:04:52,100
have to pull reports from at least three separate places.

132
00:04:52,100 --> 00:04:55,220
The M365 admin center for seat level licensing,

133
00:04:55,220 --> 00:04:56,860
the Azure cost management console

134
00:04:56,860 --> 00:04:58,660
for token and compute consumption,

135
00:04:58,660 --> 00:05:02,340
and co-pilot studios analytics for agent level activity.

136
00:05:02,340 --> 00:05:04,180
These reports don't automatically align.

137
00:05:04,180 --> 00:05:06,860
The data formats differ, the attribution logic differs,

138
00:05:06,860 --> 00:05:08,420
and the billing cycles can differ.

139
00:05:08,420 --> 00:05:11,740
One user interaction can trigger all three simultaneously.

140
00:05:11,740 --> 00:05:14,340
A knowledge worker asks co-pilot to summarize a document

141
00:05:14,340 --> 00:05:15,700
and draft a response.

142
00:05:15,700 --> 00:05:18,700
That request touches the M365 co-pilot license.

143
00:05:18,700 --> 00:05:20,620
If the response pulls from a custom agent

144
00:05:20,620 --> 00:05:23,380
built in co-pilot studio, it consumes credits.

145
00:05:23,380 --> 00:05:25,460
If the underlying model call routes through a custom

146
00:05:25,460 --> 00:05:28,060
Azure open AI deployment, it burns tokens

147
00:05:28,060 --> 00:05:29,300
built directly to Azure.

148
00:05:29,300 --> 00:05:30,940
The user sees one interaction.

149
00:05:30,940 --> 00:05:33,900
The CFO sees three cost lines on three separate invoices.

150
00:05:33,900 --> 00:05:35,940
This isn't an accident of product design.

151
00:05:35,940 --> 00:05:38,620
It's a consequence of Microsoft building AI capabilities

152
00:05:38,620 --> 00:05:41,700
across an ecosystem that predates the AI era

153
00:05:41,700 --> 00:05:43,340
then layering new consumption models

154
00:05:43,340 --> 00:05:45,540
on top of legacy licensing structures.

155
00:05:45,540 --> 00:05:47,380
The architecture reflects how these products

156
00:05:47,380 --> 00:05:48,900
were acquired and integrated,

157
00:05:48,900 --> 00:05:50,940
not how a coherent billing system would be designed

158
00:05:50,940 --> 00:05:51,980
from scratch.

159
00:05:51,980 --> 00:05:53,860
The practical result is straightforward.

160
00:05:53,860 --> 00:05:56,660
Most organizations believe they know what AI costs them.

161
00:05:56,660 --> 00:05:58,740
They know the seat count, they know the per user rate,

162
00:05:58,740 --> 00:06:00,620
what they're missing is everything underneath.

163
00:06:00,620 --> 00:06:02,900
The variable consumption layer that runs in parallel,

164
00:06:02,900 --> 00:06:05,500
charges differently, and reports separately.

165
00:06:05,500 --> 00:06:07,580
That's the layer where the real money disappears.

166
00:06:07,580 --> 00:06:09,580
What a token actually costs you.

167
00:06:09,580 --> 00:06:11,020
To understand where the money goes,

168
00:06:11,020 --> 00:06:13,380
you first need to understand the unit of consumption.

169
00:06:13,380 --> 00:06:15,540
And for AI, that unit is a token.

170
00:06:15,540 --> 00:06:18,100
A token is roughly three to four characters of English text,

171
00:06:18,100 --> 00:06:20,260
not a word, not a sentence, a fragment.

172
00:06:20,260 --> 00:06:22,380
The word enterprise is about two tokens.

173
00:06:22,380 --> 00:06:25,940
A full paragraph is somewhere between 75 and 150 tokens

174
00:06:25,940 --> 00:06:27,860
depending on word length and punctuation.

175
00:06:27,860 --> 00:06:31,100
A standard page of business text, around 750 words,

176
00:06:31,100 --> 00:06:33,420
runs approximately 1,000 tokens.

177
00:06:33,420 --> 00:06:35,540
A dense technical document with longer words,

178
00:06:35,540 --> 00:06:37,900
code snippets, or specialized terminology

179
00:06:37,900 --> 00:06:40,540
can push 300 to 500 tokens per page.

180
00:06:40,540 --> 00:06:42,540
This matters because every co-pilot interaction

181
00:06:42,540 --> 00:06:45,580
has two token components, what goes in and what comes out.

182
00:06:45,580 --> 00:06:48,820
Input tokens carry the prompt, the context, any documents

183
00:06:48,820 --> 00:06:51,380
you've attached, any previous conversation history

184
00:06:51,380 --> 00:06:52,180
in the session.

185
00:06:52,180 --> 00:06:54,620
Output tokens are the response the model generates,

186
00:06:54,620 --> 00:06:56,100
and they're priced differently.

187
00:06:56,100 --> 00:06:58,500
On Azure OpenAI, where token pricing is actually visible

188
00:06:58,500 --> 00:07:02,900
to customers, GPT-5 runs 1.25 per million input tokens,

189
00:07:02,900 --> 00:07:04,980
and $10 per million output tokens.

190
00:07:04,980 --> 00:07:07,940
That's an 8x price differential between reading and writing.

191
00:07:07,940 --> 00:07:09,940
Output tokens aren't just more expensive.

192
00:07:09,940 --> 00:07:11,500
They're often where the waste concentrates

193
00:07:11,500 --> 00:07:14,220
because a vague prompt generates a long verbose response.

194
00:07:14,220 --> 00:07:15,260
More on that shortly.

195
00:07:15,260 --> 00:07:18,700
Now, M365 co-pilot doesn't expose this math to you.

196
00:07:18,700 --> 00:07:21,980
The $30 monthly fee abstracts all of it into a flat rate,

197
00:07:21,980 --> 00:07:25,380
and Microsoft manages the underlying token economics internally.

198
00:07:25,380 --> 00:07:26,540
But those economics are real,

199
00:07:26,540 --> 00:07:28,620
and they determine how Microsoft makes decisions

200
00:07:28,620 --> 00:07:30,300
about model selection, feature rollout,

201
00:07:30,300 --> 00:07:32,780
and the caps built into different licensed tiers.

202
00:07:32,780 --> 00:07:35,700
The abstraction benefits Microsoft's simplicity narrative.

203
00:07:35,700 --> 00:07:37,580
It doesn't necessarily benefit yours.

204
00:07:37,580 --> 00:07:39,940
Here's why the hidden economics matter for your organization

205
00:07:39,940 --> 00:07:41,540
specifically.

206
00:07:41,540 --> 00:07:44,500
A light knowledge worker, someone using co-pilot occasionally,

207
00:07:44,500 --> 00:07:47,780
maybe summarizing a few emails, running a quick document search,

208
00:07:47,780 --> 00:07:49,660
generating a short status update,

209
00:07:49,660 --> 00:07:53,700
consumes somewhere between 40,000 and 100,000 tokens per month.

210
00:07:53,700 --> 00:07:55,700
That's the low end of the usage curve.

211
00:07:55,700 --> 00:07:59,180
A heavy analytical user, a consultant building decks from research,

212
00:07:59,180 --> 00:08:01,740
a lawyer reviewing contracts, a financial analyst running

213
00:08:01,740 --> 00:08:04,100
iterative Excel queries with complex context,

214
00:08:04,100 --> 00:08:06,900
can reach 1.6 million tokens per month.

215
00:08:06,900 --> 00:08:10,780
That's a 16X to 40X usage gap between the lightest and heaviest users.

216
00:08:10,780 --> 00:08:14,660
And under flat per user licensing, they both pay exactly $30.

217
00:08:14,660 --> 00:08:17,140
What this creates is a structural cross subsidy.

218
00:08:17,140 --> 00:08:19,620
Your light users subsidize your heavy users.

219
00:08:19,620 --> 00:08:23,180
Microsoft prices the flat license around an assumed average distribution

220
00:08:23,180 --> 00:08:24,820
of usage across the organization.

221
00:08:24,820 --> 00:08:27,860
If your workforce skews heavily toward light users,

222
00:08:27,860 --> 00:08:31,140
which is true for most organizations deploying co-pilot broadly,

223
00:08:31,140 --> 00:08:33,420
you're paying frontier model prices for workloads

224
00:08:33,420 --> 00:08:35,700
that don't justify frontier model economics.

225
00:08:35,700 --> 00:08:38,620
You're covering the cost of summarizing one email thread per day

226
00:08:38,620 --> 00:08:41,380
with pricing designed for someone processing hundreds of pages

227
00:08:41,380 --> 00:08:42,460
of legal discovery.

228
00:08:42,460 --> 00:08:44,420
The inverse problem also exists.

229
00:08:44,420 --> 00:08:47,900
If your organization has a small concentration of extremely heavy users,

230
00:08:47,900 --> 00:08:49,620
developers running agente workflows,

231
00:08:49,620 --> 00:08:51,780
analysts processing large document sets,

232
00:08:51,780 --> 00:08:53,780
researchers doing intensive retrieval,

233
00:08:53,780 --> 00:08:55,900
those users may be generating token volumes

234
00:08:55,900 --> 00:09:00,060
that cost Microsoft far more to serve than the $30 license covers.

235
00:09:00,060 --> 00:09:03,020
Which is one reason Microsoft is moving consumption heavy products

236
00:09:03,020 --> 00:09:06,420
like GitHub co-pilot and co-pilot studio to what explicit meter billing.

237
00:09:06,420 --> 00:09:07,820
The flat license isn't permanent.

238
00:09:07,820 --> 00:09:11,260
It's a transitional model on the way to a consumption first architecture.

239
00:09:11,260 --> 00:09:14,420
Security co-pilot already uses security compute units.

240
00:09:14,420 --> 00:09:17,580
GitHub co-pilot switched to AI credits in June, 2026.

241
00:09:17,580 --> 00:09:18,580
The pattern is clear.

242
00:09:18,580 --> 00:09:21,660
As Microsoft's internal token costs become more variable

243
00:09:21,660 --> 00:09:23,300
with frontier model usage,

244
00:09:23,300 --> 00:09:27,020
they push that variability towards the customer through meter tiers.

245
00:09:27,020 --> 00:09:28,900
So even if you're currently paying a flat fee

246
00:09:28,900 --> 00:09:30,860
and token costs feel invisible,

247
00:09:30,860 --> 00:09:33,580
what you're seeing now is the pricing model before full transparency,

248
00:09:33,580 --> 00:09:34,420
not after it.

249
00:09:34,420 --> 00:09:35,660
The tokens are already there.

250
00:09:35,660 --> 00:09:38,020
The bill just hasn't fully arrived yet.

251
00:09:38,020 --> 00:09:39,660
The idle token problem.

252
00:09:39,660 --> 00:09:41,900
Knowing what tokens cost is one problem,

253
00:09:41,900 --> 00:09:44,220
knowing how many you're wasting is a different one entirely.

254
00:09:44,220 --> 00:09:47,460
The term idle token waste doesn't appear on any invoice.

255
00:09:47,460 --> 00:09:48,540
It's not a line item,

256
00:09:48,540 --> 00:09:50,420
Microsoft surfaces to admins,

257
00:09:50,420 --> 00:09:52,420
and there's no dashboard that calculates it for you.

258
00:09:52,420 --> 00:09:54,100
But it's real, it's measurable and aggregate,

259
00:09:54,100 --> 00:09:57,100
and it's almost certainly one of the largest controllable cost variables

260
00:09:57,100 --> 00:09:58,860
in your AI deployment.

261
00:09:58,860 --> 00:10:01,020
Idl token waste is tokens that get billed,

262
00:10:01,020 --> 00:10:02,740
but don't contribute meaningful value

263
00:10:02,740 --> 00:10:05,500
to any output the user actually sees or uses.

264
00:10:05,500 --> 00:10:06,900
Four categories drive most of it.

265
00:10:06,900 --> 00:10:09,660
The first is background completions that never reach the screen.

266
00:10:09,660 --> 00:10:11,180
The model generates a suggestion.

267
00:10:11,180 --> 00:10:13,620
The user switches tabs, closes the window,

268
00:10:13,620 --> 00:10:15,500
or types over it before it renders.

269
00:10:15,500 --> 00:10:16,940
The inference already ran.

270
00:10:16,940 --> 00:10:18,460
The tokens are already spent.

271
00:10:18,460 --> 00:10:20,020
The second is suggestions that display

272
00:10:20,020 --> 00:10:21,660
but get discarded immediately.

273
00:10:21,660 --> 00:10:23,860
The co-pilot auto-complete that appears,

274
00:10:23,860 --> 00:10:25,260
gets read in half a second,

275
00:10:25,260 --> 00:10:27,180
and gets rejected with a backspace.

276
00:10:27,180 --> 00:10:29,700
Generation cost is identical whether the suggestion takes

277
00:10:29,700 --> 00:10:31,380
10 seconds to read or zero.

278
00:10:31,380 --> 00:10:33,620
The third is unnecessary context overhead.

279
00:10:33,620 --> 00:10:35,380
Large persistent instruction blocks,

280
00:10:35,380 --> 00:10:37,780
system prompts injected on every single request,

281
00:10:37,780 --> 00:10:39,980
entire repository indexes that get scanned

282
00:10:39,980 --> 00:10:42,580
even when the task only touches two files.

283
00:10:42,580 --> 00:10:44,620
This context accumulates quietly in the background

284
00:10:44,620 --> 00:10:46,420
of every interaction and charges

285
00:10:46,420 --> 00:10:48,540
at the full input token rate every time.

286
00:10:48,540 --> 00:10:51,780
The fourth and the most expensive is a genetic chatter.

287
00:10:51,780 --> 00:10:53,580
Agent workflows that loop through planning,

288
00:10:53,580 --> 00:10:55,060
calling, evaluating, and replaning

289
00:10:55,060 --> 00:10:56,860
without converging on a better result.

290
00:10:56,860 --> 00:10:58,860
Each loop is a full token event.

291
00:10:58,860 --> 00:11:01,340
A poorly designed agent that runs six planning cycles

292
00:11:01,340 --> 00:11:03,580
when two with suffice doesn't just cost more.

293
00:11:03,580 --> 00:11:04,740
It costs three times more

294
00:11:04,740 --> 00:11:06,500
and it produces the same output.

295
00:11:06,500 --> 00:11:08,500
GitHub gave us a rare concrete look at this

296
00:11:08,500 --> 00:11:11,140
when it started publishing internal token consumption reports

297
00:11:11,140 --> 00:11:13,020
for co-pilot-powered workflows.

298
00:11:13,020 --> 00:11:17,220
A report from March 2nd, 2026 showed 104.5 million tokens

299
00:11:17,220 --> 00:11:18,620
consumed in a single day

300
00:11:18,620 --> 00:11:22,220
across 75 workflow runs covering 41 distinct workflows.

301
00:11:22,220 --> 00:11:25,020
One workflow alone, a CI failure doctor agent,

302
00:11:25,020 --> 00:11:28,660
consumed 28.1 million tokens across just 13 runs.

303
00:11:28,660 --> 00:11:31,940
That's 2.2 million tokens per run for a diagnostic task.

304
00:11:31,940 --> 00:11:34,380
What makes that number instructive isn't the scale.

305
00:11:34,380 --> 00:11:35,580
It's what happened next.

306
00:11:35,580 --> 00:11:38,740
That same reporting period showed usage 37% lower

307
00:11:38,740 --> 00:11:40,340
than the prior measurement window.

308
00:11:40,340 --> 00:11:45,620
And 56% lower than the all-time peak of 237.8 million tokens

309
00:11:45,620 --> 00:11:48,460
recorded on February 11th, 2026.

310
00:11:48,460 --> 00:11:50,660
The work being done hadn't meaningfully decreased.

311
00:11:50,660 --> 00:11:52,460
The workflows had simply been tuned.

312
00:11:52,460 --> 00:11:53,620
No teams were laid off.

313
00:11:53,620 --> 00:11:55,100
No features were removed.

314
00:11:55,100 --> 00:11:56,820
Engineers reviewed the agent chains,

315
00:11:56,820 --> 00:11:59,660
trimmed unnecessary loops, tightened context injection

316
00:11:59,660 --> 00:12:01,820
and improved routing logic, more than half

317
00:12:01,820 --> 00:12:03,300
the token consumption evaporated.

318
00:12:03,300 --> 00:12:05,420
The closest published analogy for enterprise co-pilot

319
00:12:05,420 --> 00:12:07,780
comes from anthropics developer tooling ecosystem.

320
00:12:07,780 --> 00:12:10,220
Claude code, a code assistant with usage patterns

321
00:12:10,220 --> 00:12:12,220
similar to co-pilot's agentic features,

322
00:12:12,220 --> 00:12:16,380
typically runs 150 to $250 per developer per month

323
00:12:16,380 --> 00:12:19,620
before any optimization work is applied, roughly $13

324
00:12:19,620 --> 00:12:20,980
per active day.

325
00:12:20,980 --> 00:12:24,180
After systematic optimization using documented practices,

326
00:12:24,180 --> 00:12:27,180
model routing, context management, session discipline,

327
00:12:27,180 --> 00:12:30,700
tool pruning, teams report 40 to 85% reductions

328
00:12:30,700 --> 00:12:32,340
in total token consumption.

329
00:12:32,340 --> 00:12:34,300
That isn't a marginal efficiency improvement.

330
00:12:34,300 --> 00:12:35,820
That's a fundamental restructuring

331
00:12:35,820 --> 00:12:37,740
of what the system actually needs to do

332
00:12:37,740 --> 00:12:39,780
versus what it was defaulting to do.

333
00:12:39,780 --> 00:12:42,380
For enterprise co-pilot, there are no formal published benchmarks

334
00:12:42,380 --> 00:12:44,060
yet on idle waste rates.

335
00:12:44,060 --> 00:12:46,180
The token-based billing model is too new

336
00:12:46,180 --> 00:12:48,580
and the telemetry most organizations have access to

337
00:12:48,580 --> 00:12:50,740
doesn't separate productive from wasted tokens

338
00:12:50,740 --> 00:12:52,300
at a granular level.

339
00:12:52,300 --> 00:12:54,220
But working from the GitHub workflow data,

340
00:12:54,220 --> 00:12:55,780
the Claude code optimization evidence

341
00:12:55,780 --> 00:12:57,340
and the cloud infrastructure waste patterns

342
00:12:57,340 --> 00:12:59,500
that FinOps teams have tracked for years,

343
00:12:59,500 --> 00:13:03,100
a reasonable working hypothesis puts unoptimized enterprise,

344
00:13:03,100 --> 00:13:06,100
idle token waste somewhere between 30 and 60%

345
00:13:06,100 --> 00:13:07,820
of total build tokens.

346
00:13:07,820 --> 00:13:10,020
That means somewhere between a third and more than half

347
00:13:10,020 --> 00:13:12,020
of what you're paying for AI computation right now

348
00:13:12,020 --> 00:13:14,140
isn't producing value anyone can point to.

349
00:13:14,140 --> 00:13:17,380
The billing infrastructure to make this visible is arriving.

350
00:13:17,380 --> 00:13:20,340
GitHub's switch to AI credits on June 1st, 2026

351
00:13:20,340 --> 00:13:23,500
means every unproductive token is now a direct cost line,

352
00:13:23,500 --> 00:13:24,540
not an abstraction.

353
00:13:24,540 --> 00:13:27,460
Other co-pilot surfaces are moving in the same direction.

354
00:13:27,460 --> 00:13:30,500
Idl waste is a structural problem with a structural fix.

355
00:13:30,500 --> 00:13:32,620
But there's a behavioral problem sitting on top of it

356
00:13:32,620 --> 00:13:34,780
that makes the structural problem worse.

357
00:13:34,780 --> 00:13:38,820
The lazy prompting tax, the idle token problem is structural.

358
00:13:38,820 --> 00:13:41,260
It lives in the architecture of how agents loop,

359
00:13:41,260 --> 00:13:44,580
how context gets injected, how background processes fire,

360
00:13:44,580 --> 00:13:46,260
you can fix it with engineering.

361
00:13:46,260 --> 00:13:49,060
What's harder to fix is what happens at the keyboard.

362
00:13:49,060 --> 00:13:51,540
Most users interact with co-pilot the way they learn

363
00:13:51,540 --> 00:13:54,700
to interact with Google, broad query, scan the results,

364
00:13:54,700 --> 00:13:56,780
refine, repeat, type something vague,

365
00:13:56,780 --> 00:13:59,540
see what comes back at a follow-up, narrow it down.

366
00:13:59,540 --> 00:14:01,220
This worked fine for search because the cost

367
00:14:01,220 --> 00:14:02,980
of a bad search query is zero.

368
00:14:02,980 --> 00:14:04,540
The cost of a bad prompt is not zero.

369
00:14:04,540 --> 00:14:06,620
Every clarification turn in a co-pilot conversation

370
00:14:06,620 --> 00:14:07,820
is a token event.

371
00:14:07,820 --> 00:14:09,300
The first message goes in as input.

372
00:14:09,300 --> 00:14:10,700
The response comes back as output.

373
00:14:10,700 --> 00:14:12,100
The follow-up message goes in carrying

374
00:14:12,100 --> 00:14:14,700
the full prior conversation as context.

375
00:14:14,700 --> 00:14:16,820
The next response comes back longer now

376
00:14:16,820 --> 00:14:19,340
because the model is synthesizing more history.

377
00:14:19,340 --> 00:14:20,980
A five turn conversation to arrive

378
00:14:20,980 --> 00:14:24,700
at one usable paragraph can consume 10 times the tokens

379
00:14:24,700 --> 00:14:27,220
that a single well-structured prompt would have required.

380
00:14:27,220 --> 00:14:29,660
Same end result, 10 times the compute.

381
00:14:29,660 --> 00:14:32,060
Under flat licensing, this was invisible.

382
00:14:32,060 --> 00:14:33,380
The meter wasn't running.

383
00:14:33,380 --> 00:14:35,540
Under consumption billing, every one of those turns

384
00:14:35,540 --> 00:14:36,420
has a price.

385
00:14:36,420 --> 00:14:38,860
The context window problem compounds this.

386
00:14:38,860 --> 00:14:41,260
When users aren't sure what information the model needs,

387
00:14:41,260 --> 00:14:43,500
the natural instinct is to give it everything,

388
00:14:43,500 --> 00:14:46,380
paste in the entire contract, attach the full email chain

389
00:14:46,380 --> 00:14:49,140
going back six months, drop the whole code base into the prompt

390
00:14:49,140 --> 00:14:51,860
and ask why one function is behaving unexpectedly.

391
00:14:51,860 --> 00:14:54,700
This feels thorough, it's actually expensive.

392
00:14:54,700 --> 00:14:57,420
Input tokens are the cheaper side of the pricing equation,

393
00:14:57,420 --> 00:15:00,300
but context window abuse drives them to scale in ways

394
00:15:00,300 --> 00:15:02,020
that quickly overwhelm any savings

395
00:15:02,020 --> 00:15:03,460
from using a cheaper model.

396
00:15:03,460 --> 00:15:05,740
And then the output arrives because the model received

397
00:15:05,740 --> 00:15:08,540
a large unstructured input, it generates a long,

398
00:15:08,540 --> 00:15:10,940
comprehensive response that addresses every dimension

399
00:15:10,940 --> 00:15:13,300
of what you gave it, including all the dimensions

400
00:15:13,300 --> 00:15:15,100
you didn't actually need.

401
00:15:15,100 --> 00:15:17,300
Output tokens cost three to 10 times more

402
00:15:17,300 --> 00:15:19,620
than input tokens depending on the model.

403
00:15:19,620 --> 00:15:21,620
The verbose response to the vague prompt

404
00:15:21,620 --> 00:15:24,340
is the single most expensive interaction pattern

405
00:15:24,340 --> 00:15:25,900
in any AI deployment.

406
00:15:25,900 --> 00:15:27,740
Reasoning models make this worse.

407
00:15:27,740 --> 00:15:30,620
When a model runs extended chain of thought processing,

408
00:15:30,620 --> 00:15:32,500
working through a problem step by step

409
00:15:32,500 --> 00:15:35,220
before producing a final answer, it generates

410
00:15:35,220 --> 00:15:36,980
what are called thinking tokens.

411
00:15:36,980 --> 00:15:39,460
These are internal reasoning steps that consume compute,

412
00:15:39,460 --> 00:15:41,420
but don't appear in the visible response.

413
00:15:41,420 --> 00:15:43,260
On some reasoning model configurations,

414
00:15:43,260 --> 00:15:46,420
effective cost per interaction can run three to nine times higher

415
00:15:46,420 --> 00:15:47,860
than the headline per token rate

416
00:15:47,860 --> 00:15:49,860
because of these hidden reasoning steps.

417
00:15:49,860 --> 00:15:51,780
A prompt that looks like it should cost a few cents

418
00:15:51,780 --> 00:15:54,660
can cost 30 to 90 cents when a reasoning model spins up

419
00:15:54,660 --> 00:15:56,940
a full deliberation chain for what was essentially

420
00:15:56,940 --> 00:15:57,900
a simple task.

421
00:15:57,900 --> 00:15:59,500
None of this is a user failure.

422
00:15:59,500 --> 00:16:01,740
Users weren't trained on prompt economics

423
00:16:01,740 --> 00:16:04,060
because under the legacy flat licensing model,

424
00:16:04,060 --> 00:16:06,020
prompt economics didn't exist for them.

425
00:16:06,020 --> 00:16:07,860
There was no feedback loop, no signal

426
00:16:07,860 --> 00:16:10,940
that a poorly structured prompt cost more than a well-structured one.

427
00:16:10,940 --> 00:16:12,820
The behavior developed in an environment

428
00:16:12,820 --> 00:16:15,860
where tokens were free and it persisted into an environment

429
00:16:15,860 --> 00:16:17,220
where they aren't.

430
00:16:17,220 --> 00:16:19,300
This is the core of the lazy prompting tax.

431
00:16:19,300 --> 00:16:21,100
It's not that users are careless.

432
00:16:21,100 --> 00:16:22,580
It's that the system never taught them

433
00:16:22,580 --> 00:16:24,220
that carelessness had a cost.

434
00:16:24,220 --> 00:16:27,060
Every vague prompt, every unnecessary attachment,

435
00:16:27,060 --> 00:16:29,100
every six turn clarification loop

436
00:16:29,100 --> 00:16:31,260
that could have been one well-scoped request.

437
00:16:31,260 --> 00:16:34,180
None of that registered as waste under the old model.

438
00:16:34,180 --> 00:16:35,980
So the behavior is a governance failure

439
00:16:35,980 --> 00:16:37,900
wearing a user behavior disguise.

440
00:16:37,900 --> 00:16:40,780
The organization deployed a consumption-sensitive tool

441
00:16:40,780 --> 00:16:43,580
without building consumption awareness into the deployment.

442
00:16:43,580 --> 00:16:46,620
No prompt guidelines, no context discipline policies,

443
00:16:46,620 --> 00:16:48,620
no training on when to use a reasoning model

444
00:16:48,620 --> 00:16:51,060
versus a standard one or when to reset a session

445
00:16:51,060 --> 00:16:53,020
that's accumulated irrelevant history.

446
00:16:53,020 --> 00:16:54,620
The prompting problem is fixable,

447
00:16:54,620 --> 00:16:57,420
but fixing it sits on a foundation that's already unstable

448
00:16:57,420 --> 00:16:59,500
because the outputs users are trying to verify

449
00:16:59,500 --> 00:17:01,220
aren't always correct.

450
00:17:01,220 --> 00:17:03,980
The verification tax, layer the hallucination problem

451
00:17:03,980 --> 00:17:05,540
on top of everything we've just covered

452
00:17:05,540 --> 00:17:07,980
and the cost picture gets significantly darker.

453
00:17:07,980 --> 00:17:10,420
Here's what the verification tax actually is.

454
00:17:10,420 --> 00:17:12,900
The extra human time required to check, correct,

455
00:17:12,900 --> 00:17:14,660
and document AI-generated work

456
00:17:14,660 --> 00:17:16,620
before it meets professional standards.

457
00:17:16,620 --> 00:17:18,180
Not the time to generate the output.

458
00:17:18,180 --> 00:17:20,020
The time spent after the output arrives

459
00:17:20,020 --> 00:17:22,420
by someone qualified enough to know when it's wrong

460
00:17:22,420 --> 00:17:24,420
and it is wrong with surprising frequency.

461
00:17:24,420 --> 00:17:27,100
Research benchmarking LLM accuracy

462
00:17:27,100 --> 00:17:29,020
on open-ended knowledge-heavy tasks

463
00:17:29,020 --> 00:17:31,700
puts hallucination rates between 50 and 82%

464
00:17:31,700 --> 00:17:34,300
depending on model, domain, and task structure,

465
00:17:34,300 --> 00:17:36,900
not edge cases, not unusual prompts.

466
00:17:36,900 --> 00:17:39,740
Routine professional tasks in areas like tax research,

467
00:17:39,740 --> 00:17:41,420
compliance review, legal analysis,

468
00:17:41,420 --> 00:17:43,140
and client-facing documentation.

469
00:17:43,140 --> 00:17:45,540
The model produces fluent, confident pros

470
00:17:45,540 --> 00:17:47,100
that reads like it was written by someone

471
00:17:47,100 --> 00:17:48,460
who knows what they're talking about.

472
00:17:48,460 --> 00:17:50,380
The problem is it sometimes wasn't

473
00:17:50,380 --> 00:17:53,060
the model pattern matched to what a correct answer looks like

474
00:17:53,060 --> 00:17:54,340
without actually being correct.

475
00:17:54,340 --> 00:17:55,820
This creates a verification burden

476
00:17:55,820 --> 00:17:58,700
that doesn't disappear because the AI got faster at drafting.

477
00:17:58,700 --> 00:18:01,260
A tax professional reviewing an AI-generated analysis

478
00:18:01,260 --> 00:18:04,380
of a multi-duration filing still has to read every line.

479
00:18:04,380 --> 00:18:06,900
A compliance officer reviewing an AI-drafted policy

480
00:18:06,900 --> 00:18:09,100
still has to verify every citation.

481
00:18:09,100 --> 00:18:11,540
A lawyer reviewing an AI-produced contract summary

482
00:18:11,540 --> 00:18:13,180
still has to check every clause reference

483
00:18:13,180 --> 00:18:14,580
against the source document.

484
00:18:14,580 --> 00:18:16,260
The AI may have cut the drafting time

485
00:18:16,260 --> 00:18:18,180
from two hours to 15 minutes.

486
00:18:18,180 --> 00:18:20,220
The review time didn't shrink proportionally

487
00:18:20,220 --> 00:18:22,300
because the reviewer has no way to know in advance

488
00:18:22,300 --> 00:18:24,780
which parts of the output are correct and which aren't.

489
00:18:24,780 --> 00:18:25,740
They have to check all of it.

490
00:18:25,740 --> 00:18:27,780
Practitioner analysis of verification costs

491
00:18:27,780 --> 00:18:30,700
in regulated workflows puts the overhead at up to 48%

492
00:18:30,700 --> 00:18:33,220
above baseline in judgment-heavy processes.

493
00:18:33,220 --> 00:18:36,740
Meaning, if the underlying task without AI takes 10 hours,

494
00:18:36,740 --> 00:18:38,620
a workflow that uses AI for drafting

495
00:18:38,620 --> 00:18:40,740
but requires professional verification

496
00:18:40,740 --> 00:18:43,220
can take up to 14.8 hours total.

497
00:18:43,220 --> 00:18:45,060
The AI saves time on generation.

498
00:18:45,060 --> 00:18:48,020
The verification absorbs a significant portion of those savings

499
00:18:48,020 --> 00:18:50,060
and in some cases, all of them.

500
00:18:50,060 --> 00:18:53,380
The math gets uncomfortable when you apply fully loaded labor costs

501
00:18:53,380 --> 00:18:54,820
to professional review time.

502
00:18:54,820 --> 00:18:57,740
At $80 to $200 per hour for a senior knowledge worker,

503
00:18:57,740 --> 00:19:00,100
the person qualified to catch what the AI got wrong.

504
00:19:00,100 --> 00:19:02,100
Two minutes of review per AI answer costs

505
00:19:02,100 --> 00:19:04,460
between two by six seven and six and 67.

506
00:19:04,460 --> 00:19:07,180
10 minutes cost between 13 and $32.

507
00:19:07,180 --> 00:19:09,820
Scale that across a day of AI assisted work

508
00:19:09,820 --> 00:19:13,420
in a regulated environment, across a team, across a quarter,

509
00:19:13,420 --> 00:19:16,820
and the free AI draft has a very real cost attached to it.

510
00:19:16,820 --> 00:19:18,420
It just doesn't appear on your Azure Bill.

511
00:19:18,420 --> 00:19:19,740
There's a compounding problem here

512
00:19:19,740 --> 00:19:21,660
that the token economics make worse.

513
00:19:21,660 --> 00:19:23,460
Per token pricing doesn't discriminate

514
00:19:23,460 --> 00:19:25,300
between right answers and wrong ones.

515
00:19:25,300 --> 00:19:28,180
A hallucinated response costs the same as a correct one.

516
00:19:28,180 --> 00:19:30,260
A long hallucinated reasoning chain

517
00:19:30,260 --> 00:19:32,420
where the model doubles down on an incorrect path

518
00:19:32,420 --> 00:19:33,780
and elaborates at length.

519
00:19:33,780 --> 00:19:35,780
Costs more than a short correct answer

520
00:19:35,780 --> 00:19:37,900
because it generates more output tokens.

521
00:19:37,900 --> 00:19:40,300
You are literally paying more in token terms

522
00:19:40,300 --> 00:19:41,900
for the wrong answer that then requires

523
00:19:41,900 --> 00:19:43,500
the most review time to untangle.

524
00:19:43,500 --> 00:19:46,020
What this means practically is that the use cases

525
00:19:46,020 --> 00:19:48,140
where the verification tax hits hardest

526
00:19:48,140 --> 00:19:50,580
are exactly the use cases that organizations

527
00:19:50,580 --> 00:19:53,860
tend to deploy AI first because they seem high value.

528
00:19:53,860 --> 00:19:57,940
Legal, compliance, tax, client facing deliverables,

529
00:19:57,940 --> 00:20:00,020
regulated communications.

530
00:20:00,020 --> 00:20:02,020
These are the domains where professional judgment

531
00:20:02,020 --> 00:20:05,260
carries the most weight where the cost of error is highest

532
00:20:05,260 --> 00:20:07,620
and where the reviewer's time is most expensive.

533
00:20:07,620 --> 00:20:10,020
They're also the domains where AI's hallucination rate

534
00:20:10,020 --> 00:20:11,740
tends to be most consequential

535
00:20:11,740 --> 00:20:14,700
because mistakes in these areas don't just require correction,

536
00:20:14,700 --> 00:20:17,540
they require documentation that the correction happened.

537
00:20:17,540 --> 00:20:19,180
The honest framing is this.

538
00:20:19,180 --> 00:20:21,100
In high stakes professional workflows,

539
00:20:21,100 --> 00:20:23,780
AI often shifts work rather than eliminates it.

540
00:20:23,780 --> 00:20:26,300
Time savings happen at the generation stage.

541
00:20:26,300 --> 00:20:27,900
They get partially or fully recaptured

542
00:20:27,900 --> 00:20:29,220
at the verification stage.

543
00:20:29,220 --> 00:20:30,300
The net benefit is real,

544
00:20:30,300 --> 00:20:32,940
but smaller than headline productivity numbers suggest.

545
00:20:32,940 --> 00:20:34,500
And it has a hard ceiling determined

546
00:20:34,500 --> 00:20:36,620
by how much of the task requires judgment

547
00:20:36,620 --> 00:20:39,220
that only a qualified human can apply.

548
00:20:39,220 --> 00:20:41,060
This isn't an argument against using AI

549
00:20:41,060 --> 00:20:42,420
and professional workflows.

550
00:20:42,420 --> 00:20:45,220
It's an argument for being precise about which workflows,

551
00:20:45,220 --> 00:20:46,900
which models, which risk thresholds

552
00:20:46,900 --> 00:20:49,860
and what verification process accompanies the deployment.

553
00:20:49,860 --> 00:20:51,420
The verification tax is worst

554
00:20:51,420 --> 00:20:53,860
where it was designed into the use case from the start

555
00:20:53,860 --> 00:20:56,660
and invisible until someone looks at the billing.

556
00:20:56,660 --> 00:20:58,460
Zombie seats and the adoption gap.

557
00:20:58,460 --> 00:21:01,260
The verification tax is worst in the wrong use cases,

558
00:21:01,260 --> 00:21:02,580
which raises the question,

559
00:21:02,580 --> 00:21:04,820
where is co-pilot actually being deployed?

560
00:21:04,820 --> 00:21:07,980
70% of Fortune 500 companies have officially adopted

561
00:21:07,980 --> 00:21:09,980
M365 co-pilot.

562
00:21:09,980 --> 00:21:11,540
That number gets cited constantly

563
00:21:11,540 --> 00:21:13,980
in Microsoft earnings calls and analyst briefings

564
00:21:13,980 --> 00:21:16,260
as evidence of enterprise AI momentum.

565
00:21:16,260 --> 00:21:17,940
What it actually describes is pilots,

566
00:21:17,940 --> 00:21:20,100
limited rollouts and licensing purchases

567
00:21:20,100 --> 00:21:22,580
that preceded any serious change management work.

568
00:21:22,580 --> 00:21:25,180
Adoption in the reporting sense means seats were bought.

569
00:21:25,180 --> 00:21:27,340
It doesn't mean those seats are producing anything.

570
00:21:27,340 --> 00:21:29,140
The conversion rate tells the real story.

571
00:21:29,140 --> 00:21:32,860
Across organizations with licensed M365 co-pilot access,

572
00:21:32,860 --> 00:21:36,020
only about 35.8% of employees actively use it.

573
00:21:36,020 --> 00:21:38,180
Roughly one in three people with a paid license

574
00:21:38,180 --> 00:21:40,700
opens co-pilot with any meaningful frequency.

575
00:21:40,700 --> 00:21:42,740
The other two thirds have a license assigned,

576
00:21:42,740 --> 00:21:45,340
a $30 monthly charge appearing on the invoice

577
00:21:45,340 --> 00:21:46,940
and an interaction history

578
00:21:46,940 --> 00:21:49,620
that might show two sessions in the last month.

579
00:21:49,620 --> 00:21:51,260
One to summarize an email,

580
00:21:51,260 --> 00:21:55,220
one out of curiosity after someone mentioned it in a meeting.

581
00:21:55,220 --> 00:21:57,740
Compare that to ChatGPT's voluntary conversion rate.

582
00:21:57,740 --> 00:22:01,180
Approximately 83.1% of users who have access to it

583
00:22:01,180 --> 00:22:02,620
actually use it regularly.

584
00:22:02,620 --> 00:22:05,020
That gap isn't explained by capability differences.

585
00:22:05,020 --> 00:22:07,460
Co-pilot has access to organizational data,

586
00:22:07,460 --> 00:22:08,820
deep office integration,

587
00:22:08,820 --> 00:22:10,980
and the same underlying model infrastructure.

588
00:22:10,980 --> 00:22:12,460
The gap is explained by fit.

589
00:22:12,460 --> 00:22:14,900
ChatGPT meets users where they already are,

590
00:22:14,900 --> 00:22:16,740
in a frictionless consumer interface,

591
00:22:16,740 --> 00:22:18,940
doing tasks they chose for themselves.

592
00:22:18,940 --> 00:22:21,180
Co-pilot arrives inside a workplace context

593
00:22:21,180 --> 00:22:23,540
where nobody redesigned the workflow around it.

594
00:22:23,540 --> 00:22:25,700
Nobody trained people on when to use it

595
00:22:25,700 --> 00:22:28,540
and the default answer to should I use co-pilot for this

596
00:22:28,540 --> 00:22:30,140
is still, I'm not sure.

597
00:22:30,140 --> 00:22:32,980
This is the zombie seat, a paid 30-up a month license

598
00:22:32,980 --> 00:22:35,420
assigned to a user who has it, knows it's there,

599
00:22:35,420 --> 00:22:37,820
and generates near zero measurable value from it.

600
00:22:37,820 --> 00:22:39,780
Not because they're resistant to AI

601
00:22:39,780 --> 00:22:42,380
because nobody connected the license to a specific task,

602
00:22:42,380 --> 00:22:44,260
a specific process, a specific outcome

603
00:22:44,260 --> 00:22:46,020
that made using it the obvious choice.

604
00:22:46,020 --> 00:22:47,620
The scale of this problem is concrete.

605
00:22:47,620 --> 00:22:50,940
At 5,000 licensed users with 35% active usage,

606
00:22:50,940 --> 00:22:54,140
you have roughly 3,750 people in the zombie category.

607
00:22:54,140 --> 00:22:58,700
That's $9,500 per month, $1.17 million per year,

608
00:22:58,700 --> 00:23:01,700
in license spend that is producing nothing you can point to.

609
00:23:01,700 --> 00:23:03,780
And that figure only counts the co-pilot add-on.

610
00:23:03,780 --> 00:23:06,220
It doesn't include the E3 or E5 base licenses

611
00:23:06,220 --> 00:23:07,700
those users are also sitting on,

612
00:23:07,700 --> 00:23:10,300
or the opportunity cost of having purchased AI access

613
00:23:10,300 --> 00:23:12,020
that isn't functioning as an asset.

614
00:23:12,020 --> 00:23:14,980
Organizations that have run formal license optimization cycles

615
00:23:14,980 --> 00:23:18,940
consistently find 10 to 20% of their total M365 costs

616
00:23:18,940 --> 00:23:21,140
are recoverable through identifying and reclaiming

617
00:23:21,140 --> 00:23:23,020
idle or redundant assignments.

618
00:23:23,020 --> 00:23:25,980
That initial pass usually surfaces the clearest waste.

619
00:23:25,980 --> 00:23:28,420
Departed employees who still hold licenses,

620
00:23:28,420 --> 00:23:30,660
users whose role changed and no longer justifies

621
00:23:30,660 --> 00:23:33,100
their current SKU add-ons assigned broadly

622
00:23:33,100 --> 00:23:35,500
for a project that concluded months ago.

623
00:23:35,500 --> 00:23:37,860
Organizations that move from periodic audits

624
00:23:37,860 --> 00:23:40,620
to continuous automated governance push that figure

625
00:23:40,620 --> 00:23:42,660
toward 20 to 30% over time.

626
00:23:42,660 --> 00:23:46,220
For a 5,000 seat environment with a fully loaded M365 stack

627
00:23:46,220 --> 00:23:48,580
averaging $65 per user per month,

628
00:23:48,580 --> 00:23:52,620
20% waste recovery represents $780,000 per year.

629
00:23:52,620 --> 00:23:55,020
That's not cost-cutting, that's a reallocation opportunity.

630
00:23:55,020 --> 00:23:57,580
Recovered budget that can fund the co-pilot expansion

631
00:23:57,580 --> 00:23:59,820
for roles where it actually generates return

632
00:23:59,820 --> 00:24:02,060
or the co-pilot studio agent development

633
00:24:02,060 --> 00:24:03,540
for the two or three workflows

634
00:24:03,540 --> 00:24:06,620
where AI delivers a genuinely measurable difference.

635
00:24:06,620 --> 00:24:08,020
This is the self-funding narrative

636
00:24:08,020 --> 00:24:10,380
that the most disciplined IT organizations are running

637
00:24:10,380 --> 00:24:11,620
in 2026.

638
00:24:11,620 --> 00:24:13,980
Don't ask the board for more AI budget.

639
00:24:13,980 --> 00:24:15,660
Audit the AI budget you already have,

640
00:24:15,660 --> 00:24:17,260
recover the waste and redirect it

641
00:24:17,260 --> 00:24:20,540
toward the 35% of users who will actually use what you give them.

642
00:24:20,540 --> 00:24:21,940
The money is already in the system.

643
00:24:21,940 --> 00:24:23,900
It's just sitting in the wrong seats.

644
00:24:23,900 --> 00:24:25,020
Zombie seats are a symptom.

645
00:24:25,020 --> 00:24:26,660
The root cause is a deployment model

646
00:24:26,660 --> 00:24:28,580
that prioritized speed over fit.

647
00:24:28,580 --> 00:24:31,060
The blanket rollout problem.

648
00:24:31,060 --> 00:24:33,220
The zombie seat problem didn't happen by accident.

649
00:24:33,220 --> 00:24:34,460
It was the predictable outcome

650
00:24:34,460 --> 00:24:36,300
of a specific procurement decision

651
00:24:36,300 --> 00:24:39,460
that most organizations made in 2023 and 2024

652
00:24:39,460 --> 00:24:40,900
and many are still making today.

653
00:24:40,900 --> 00:24:42,740
Call it the license first approach

654
00:24:42,740 --> 00:24:44,980
by the seats quickly, run a light training session,

655
00:24:44,980 --> 00:24:46,300
send an announcement email

656
00:24:46,300 --> 00:24:48,060
and let adoption happen organically.

657
00:24:48,060 --> 00:24:49,780
The reasoning felt sound at the time.

658
00:24:49,780 --> 00:24:51,820
Broad access creates broad familiarity.

659
00:24:51,820 --> 00:24:54,660
Familiarity creates usage, usage creates value,

660
00:24:54,660 --> 00:24:57,100
get the licenses in place and the behavior will follow.

661
00:24:57,100 --> 00:24:58,540
The data doesn't support it.

662
00:24:58,540 --> 00:25:02,180
The license first approach consistently produces ROI below 100%

663
00:25:02,180 --> 00:25:03,980
with payback timelines that stretch past

664
00:25:03,980 --> 00:25:05,820
any reasonable planning horizon

665
00:25:05,820 --> 00:25:07,180
and license utilization rates

666
00:25:07,180 --> 00:25:09,060
that mirror what we just described.

667
00:25:09,060 --> 00:25:11,340
Broad access didn't drive broad adoption.

668
00:25:11,340 --> 00:25:13,100
It drove broad license assignment,

669
00:25:13,100 --> 00:25:14,820
which is a different thing entirely.

670
00:25:14,820 --> 00:25:17,620
The failure isn't mysterious once you look at it structurally.

671
00:25:17,620 --> 00:25:19,460
Buying a license doesn't change a workflow.

672
00:25:19,460 --> 00:25:21,700
It doesn't redesign the process around the tool,

673
00:25:21,700 --> 00:25:24,340
identify the specific tasks where AI fits,

674
00:25:24,340 --> 00:25:26,060
build the muscle memory for using it

675
00:25:26,060 --> 00:25:28,940
or create accountability for whether it's being used effectively.

676
00:25:28,940 --> 00:25:30,060
A license is access.

677
00:25:30,060 --> 00:25:31,660
Access without a reason to use it

678
00:25:31,660 --> 00:25:33,980
produces the conversion rate we're looking at.

679
00:25:33,980 --> 00:25:35,700
One in three people actually engaging

680
00:25:35,700 --> 00:25:38,340
with a tool that costs the same whether it's touched or not.

681
00:25:38,340 --> 00:25:40,300
There's a second consequence of blanket rollout

682
00:25:40,300 --> 00:25:42,900
that gets less attention than the adoption gap,

683
00:25:42,900 --> 00:25:45,420
but it's equally damaging governance collapses.

684
00:25:45,420 --> 00:25:48,020
When every employee in the organization has access to an AI

685
00:25:48,020 --> 00:25:50,740
that can query organizational data across SharePoint teams

686
00:25:50,740 --> 00:25:53,900
and email and nobody has defined what that AI should

687
00:25:53,900 --> 00:25:56,500
and shouldn't touch, what gets logged, what gets reviewed,

688
00:25:56,500 --> 00:25:58,700
what constitutes appropriate use.

689
00:25:58,700 --> 00:26:00,740
You haven't deployed AI responsibly.

690
00:26:00,740 --> 00:26:04,060
You've deployed AI chaoticly and called it democratization.

691
00:26:04,060 --> 00:26:07,020
ROI data from 2026 deployments is consistent

692
00:26:07,020 --> 00:26:09,660
on the relationship between usage depth and return.

693
00:26:09,660 --> 00:26:12,300
Organizations where 70% or more of licensed users

694
00:26:12,300 --> 00:26:15,700
invoke co-pilot daily in at least one core workflow

695
00:26:15,700 --> 00:26:17,980
see three to four times the ROI of organizations

696
00:26:17,980 --> 00:26:20,260
stuck at 30% weekly active usage.

697
00:26:20,260 --> 00:26:21,540
The license cost is identical.

698
00:26:21,540 --> 00:26:23,180
The outcome is not what separates them

699
00:26:23,180 --> 00:26:24,420
isn't the quality of the AI.

700
00:26:24,420 --> 00:26:26,460
It's whether the deployment was built around a workflow

701
00:26:26,460 --> 00:26:28,540
transformation or just a software rollout.

702
00:26:28,540 --> 00:26:30,060
The correct model isn't complicated

703
00:26:30,060 --> 00:26:33,100
but it runs against how most procurement cycles operate.

704
00:26:33,100 --> 00:26:34,940
You start with a concentrated pilot,

705
00:26:34,940 --> 00:26:36,940
a defined set of high value roles,

706
00:26:36,940 --> 00:26:40,020
specific workflows with measurable before and after metrics,

707
00:26:40,020 --> 00:26:42,460
dedicated training and an eight to 12 week window

708
00:26:42,460 --> 00:26:44,300
to generate real data.

709
00:26:44,300 --> 00:26:46,860
Not a survey asking people how they feel about co-pilot.

710
00:26:46,860 --> 00:26:48,820
Actual telemetry, actual cycle times,

711
00:26:48,820 --> 00:26:51,020
actual throughput numbers, then you expand

712
00:26:51,020 --> 00:26:52,700
only where the data supports it,

713
00:26:52,700 --> 00:26:54,700
role by role, process by process,

714
00:26:54,700 --> 00:26:56,260
with governance controls in place

715
00:26:56,260 --> 00:26:59,500
before the next wave of licenses ships, not after.

716
00:26:59,500 --> 00:27:02,300
Most procurement cycles work on annual commitment timelines

717
00:27:02,300 --> 00:27:04,420
with volume based pricing pressure at renewal.

718
00:27:04,420 --> 00:27:06,620
The incentive structure pushes toward buying more seats

719
00:27:06,620 --> 00:27:09,380
than you can absorb because the price per seat improves

720
00:27:09,380 --> 00:27:12,900
with volume and the EA negotiation is a once a year window.

721
00:27:12,900 --> 00:27:15,620
This means organizations often acquire two or three years

722
00:27:15,620 --> 00:27:17,780
worth of AI licensing in a single transaction

723
00:27:17,780 --> 00:27:20,180
before the deployment maturity exists to justify it.

724
00:27:20,180 --> 00:27:22,940
That's how the zombie seat population scales so fast.

725
00:27:22,940 --> 00:27:24,860
And it's why the fix isn't a technology problem.

726
00:27:24,860 --> 00:27:26,500
It's a deployment model problem.

727
00:27:26,500 --> 00:27:27,780
To build the right model though,

728
00:27:27,780 --> 00:27:29,700
you need to understand what co-pilot is actually

729
00:27:29,700 --> 00:27:31,340
completing against when you're deciding

730
00:27:31,340 --> 00:27:33,940
where to spend that recovered seat budget.

731
00:27:33,940 --> 00:27:36,260
What traditional automation actually costs?

732
00:27:36,260 --> 00:27:37,780
Before you can make a rational decision

733
00:27:37,780 --> 00:27:41,020
about where AI spend belongs in your automation portfolio,

734
00:27:41,020 --> 00:27:42,340
you need an honest baseline

735
00:27:42,340 --> 00:27:44,260
for what the alternatives actually cost,

736
00:27:44,260 --> 00:27:47,380
not the marketing version, the unit economics version.

737
00:27:47,380 --> 00:27:49,020
Robotic process automation has been

738
00:27:49,020 --> 00:27:50,820
the enterprise automation workhorse

739
00:27:50,820 --> 00:27:52,300
for the better part of a decade.

740
00:27:52,300 --> 00:27:54,060
The cost structure is straightforward.

741
00:27:54,060 --> 00:27:57,540
Unattended bot licenses typically run $8,000 to $20,000

742
00:27:57,540 --> 00:28:00,540
per year depending on vendor, addition, and contract size.

743
00:28:00,540 --> 00:28:03,380
Add infrastructure, virtual machines, orchestrators,

744
00:28:03,380 --> 00:28:06,140
monitoring tools, and implementation costs,

745
00:28:06,140 --> 00:28:08,380
which can match or exceed the first year of licensing

746
00:28:08,380 --> 00:28:09,860
on complex deployments.

747
00:28:09,860 --> 00:28:11,380
Then factor in ongoing maintenance

748
00:28:11,380 --> 00:28:14,260
because RPA bots are brittle, when the UI changes,

749
00:28:14,260 --> 00:28:15,660
when the upstream system gets updated,

750
00:28:15,660 --> 00:28:18,380
when the form layout shifts, someone has to go fix the bot.

751
00:28:18,380 --> 00:28:20,380
That maintenance cost is real and recurring,

752
00:28:20,380 --> 00:28:23,860
but here's what makes RPA compelling for the right use cases.

753
00:28:23,860 --> 00:28:27,180
Once a bot is deployed against a stable, high volume process,

754
00:28:27,180 --> 00:28:30,580
the marginal cost per execution approaches zero.

755
00:28:30,580 --> 00:28:32,660
An unattended bot running invoice posting

756
00:28:32,660 --> 00:28:36,100
100,000 times a year, against a system that doesn't change,

757
00:28:36,100 --> 00:28:38,660
against a data format that stays consistent,

758
00:28:38,660 --> 00:28:41,780
that bot's effective cost per task can fall to $0,000,000,

759
00:28:41,780 --> 00:28:44,220
one-ballons to $0,000,000, five-ballons,

760
00:28:44,220 --> 00:28:46,020
depending on how you amortize the build

761
00:28:46,020 --> 00:28:48,180
and maintenance overhead across volume.

762
00:28:48,180 --> 00:28:50,100
At scale with process stability,

763
00:28:50,100 --> 00:28:52,500
RPA's cost per transaction is nearly free.

764
00:28:52,500 --> 00:28:54,460
Determinist scripts are even cheaper.

765
00:28:54,460 --> 00:28:56,460
A Python microservice that validates data

766
00:28:56,460 --> 00:28:59,180
against a business rule transforms a file format

767
00:28:59,180 --> 00:29:01,300
or roots a record based on a field value

768
00:29:01,300 --> 00:29:03,260
runs on commodity compute.

769
00:29:03,260 --> 00:29:06,380
The per-call cost is in the range of fractions of a millionth of a dollar,

770
00:29:06,380 --> 00:29:08,780
not fractions of a cent, fractions of a millionth of a dollar.

771
00:29:08,780 --> 00:29:11,140
For mechanical well-specified logic with no ambiguity,

772
00:29:11,140 --> 00:29:13,620
this is the floor and it's a very low floor.

773
00:29:13,620 --> 00:29:16,380
Now, 2026 changed part of this equation.

774
00:29:16,380 --> 00:29:21,580
LLM API costs dropped roughly 80% between early 2025 and early 2026,

775
00:29:21,580 --> 00:29:23,260
driven by model efficiency improvements,

776
00:29:23,260 --> 00:29:26,380
hardware scale out and intense competition across providers.

777
00:29:26,380 --> 00:29:29,220
That price collapse made AI agents genuinely competitive

778
00:29:29,220 --> 00:29:30,500
for unstructured work,

779
00:29:30,500 --> 00:29:33,740
the tasks that were never viable for RPA in the first place.

780
00:29:33,740 --> 00:29:35,660
Email classification, contract extraction,

781
00:29:35,660 --> 00:29:37,120
document summarization and big,

782
00:29:37,120 --> 00:29:38,500
US routing decisions,

783
00:29:38,500 --> 00:29:40,540
these are tasks where RPA has always struggled

784
00:29:40,540 --> 00:29:43,780
because the inputs don't conform to a predictable structure.

785
00:29:43,780 --> 00:29:47,140
For those tasks, an AI call at fractions of a cent is now often cheaper

786
00:29:47,140 --> 00:29:49,780
than building and maintaining the rules-based alternative,

787
00:29:49,780 --> 00:29:52,180
but for structured work, the comparison is still brutal.

788
00:29:52,180 --> 00:29:55,700
An AI model doesn't get cheaper with volume the way an RPA bot does.

789
00:29:55,700 --> 00:29:58,380
Every prompt costs tokens, every output costs tokens.

790
00:29:58,380 --> 00:30:02,020
Scale from 10,000 invoice validations to 100,000 invoice validations

791
00:30:02,020 --> 00:30:04,140
and your AI cost scales linearly.

792
00:30:04,140 --> 00:30:06,740
The RPA cost barely moves once the bot is running.

793
00:30:06,740 --> 00:30:09,620
For deterministic high-volume stable processes,

794
00:30:09,620 --> 00:30:12,660
AI's variable cost structure is a fundamental disadvantage

795
00:30:12,660 --> 00:30:15,460
that no model price reduction has yet overcome.

796
00:30:15,460 --> 00:30:18,020
This is where the strategic question sharpens.

797
00:30:18,020 --> 00:30:22,020
Organizations evaluating AI or RPA are asking the wrong question.

798
00:30:22,020 --> 00:30:23,300
The right question is,

799
00:30:23,300 --> 00:30:25,780
which layer of this workflow requires judgment

800
00:30:25,780 --> 00:30:27,340
and which requires execution?

801
00:30:28,300 --> 00:30:29,100
Judgment.

802
00:30:29,100 --> 00:30:32,620
Reading intent from an ambiguous email,

803
00:30:32,620 --> 00:30:35,100
extracting meaning from an unstructured document,

804
00:30:35,100 --> 00:30:38,380
deciding how to handle an exception that doesn't fit a predefined category,

805
00:30:38,380 --> 00:30:40,300
that's where LLMs earn their cost.

806
00:30:40,300 --> 00:30:43,260
They're flexible, they generalize across input variation,

807
00:30:43,260 --> 00:30:46,620
they handle the cases a rules engine would have to explicitly anticipate.

808
00:30:46,620 --> 00:30:49,980
For that layer, even today's AI prices are often justified.

809
00:30:49,980 --> 00:30:53,420
Execution, updating the CRM record, posting the journal entry,

810
00:30:53,420 --> 00:30:56,140
moving the file, triggering the downstream workflow,

811
00:30:56,140 --> 00:30:59,100
that's mechanical, it's deterministic, it follows rules,

812
00:30:59,100 --> 00:31:02,220
RPA handles it at near zero marginal cost once it's deployed,

813
00:31:02,220 --> 00:31:04,460
and a deterministic script handles it cheaper still.

814
00:31:04,460 --> 00:31:05,820
There is no judgment involved,

815
00:31:05,820 --> 00:31:08,460
which means there's no reason to pay judgment prices for it.

816
00:31:08,460 --> 00:31:10,940
Using a reasoning engine to perform a mechanical task

817
00:31:10,940 --> 00:31:13,820
is the most expensive mistake in modern enterprise automation.

818
00:31:13,820 --> 00:31:15,980
Not because the AI is bad at the task, it can do it,

819
00:31:15,980 --> 00:31:18,620
but it costs 500 times more than the alternative.

820
00:31:18,620 --> 00:31:20,460
For identical output quality,

821
00:31:20,460 --> 00:31:22,460
that cost gap becomes catastrophic

822
00:31:22,460 --> 00:31:25,580
when you move from individual tasks to automated workflows.

823
00:31:25,580 --> 00:31:27,340
And that's exactly where most organizations

824
00:31:27,340 --> 00:31:29,020
are pushing co-pilot right now.

825
00:31:29,020 --> 00:31:30,620
The agentec cost explosion,

826
00:31:30,620 --> 00:31:32,300
the unit economics shift dramatically

827
00:31:32,300 --> 00:31:35,820
when you move from co-pilot's chat interface to agentec workflows.

828
00:31:35,820 --> 00:31:38,300
Tasks where the AI doesn't just generate a response,

829
00:31:38,300 --> 00:31:42,140
but plans multiple steps, calls tools, evaluates outcomes,

830
00:31:42,140 --> 00:31:44,380
and iterates toward a solution.

831
00:31:44,380 --> 00:31:46,620
A standard conversational interaction with co-pilot

832
00:31:46,620 --> 00:31:48,540
consumes roughly 2,000 tokens.

833
00:31:48,540 --> 00:31:51,660
You ask a question, you get an answer, the conversation ends.

834
00:31:51,660 --> 00:31:54,780
A single agentec task, where the system researches a topic

835
00:31:54,780 --> 00:31:58,060
across multiple sources, drafts a comprehensive response,

836
00:31:58,060 --> 00:32:00,220
validates the output against requirements,

837
00:32:00,220 --> 00:32:02,700
identifies gaps, and refines the draft.

838
00:32:02,700 --> 00:32:06,300
Consumes between 200,000 and 1,000,000 tokens.

839
00:32:06,300 --> 00:32:07,660
That's not a 10x difference.

840
00:32:07,660 --> 00:32:10,380
That's up to a 500x multiplier on token consumption

841
00:32:10,380 --> 00:32:12,460
for what appears to be a single user request.

842
00:32:12,460 --> 00:32:14,380
This massive escalation was quantified

843
00:32:14,380 --> 00:32:17,340
in a 2025 Stanford Digital Economy Lab study

844
00:32:17,340 --> 00:32:19,500
in partnership with Microsoft Research.

845
00:32:19,500 --> 00:32:21,100
The finding was stark.

846
00:32:21,100 --> 00:32:23,340
Agentec workflows consume orders of magnitude

847
00:32:23,340 --> 00:32:26,700
more tokens than interactive chat because every planning cycle,

848
00:32:26,700 --> 00:32:29,180
every tool invocation, every evaluation loop,

849
00:32:29,180 --> 00:32:31,740
represents a full inference pass through the model.

850
00:32:31,740 --> 00:32:34,540
A three-step plan becomes three separate token events.

851
00:32:34,540 --> 00:32:37,100
A five-turn refinement cycle becomes five more.

852
00:32:37,100 --> 00:32:39,580
What users perceive as one task generates

853
00:32:39,580 --> 00:32:41,260
dozens of internal computations.

854
00:32:41,260 --> 00:32:42,620
Each one build independently.

855
00:32:42,620 --> 00:32:44,300
The cost math is unforgiving.

856
00:32:44,300 --> 00:32:46,300
At Claude Sonnet 4.6 pricing,

857
00:32:46,300 --> 00:32:47,580
three per million input tokens,

858
00:32:47,580 --> 00:32:49,260
15 per million output tokens,

859
00:32:49,260 --> 00:32:50,700
a mid-range agentec task,

860
00:32:50,700 --> 00:32:54,460
consuming 300,000 tokens costs approximately one or 50 salsons.

861
00:32:54,460 --> 00:32:57,740
A complex agentec task at one million tokens reaches five lies 40.

862
00:32:57,740 --> 00:32:59,260
These numbers are single instances,

863
00:32:59,260 --> 00:33:00,220
scale them across a team,

864
00:33:00,220 --> 00:33:01,580
across a day, across a quarter,

865
00:33:01,580 --> 00:33:04,540
and they accumulate into material line items on your Azure bill.

866
00:33:04,540 --> 00:33:07,020
But here's where it gets genuinely uncomfortable.

867
00:33:07,020 --> 00:33:08,860
Nvidia executives have publicly stated

868
00:33:08,860 --> 00:33:11,580
that AI token-based pricing for enterprise automation

869
00:33:11,580 --> 00:33:14,540
is in some cases already exceeding the annualized cost

870
00:33:14,540 --> 00:33:17,180
of employee salaries for comparable workloads.

871
00:33:17,180 --> 00:33:18,300
Let that sink in.

872
00:33:18,300 --> 00:33:20,940
A single sophisticated agentec workflow running daily,

873
00:33:20,940 --> 00:33:23,660
consuming a million tokens at frontier model prices

874
00:33:23,660 --> 00:33:26,140
can cost more per year than paying someone a salary

875
00:33:26,140 --> 00:33:27,660
to do the work manually.

876
00:33:27,660 --> 00:33:29,740
Without guardrails, without limits,

877
00:33:29,740 --> 00:33:32,540
without cost awareness built into how the system decides

878
00:33:32,540 --> 00:33:35,100
which workflows get to run at which model tiers.

879
00:33:35,100 --> 00:33:38,700
The cost explosion problem isn't unique to any single vendor

880
00:33:38,700 --> 00:33:40,220
or model, it's structural.

881
00:33:40,220 --> 00:33:44,060
Agentec systems work by decomposing complex goals into sub-tasks,

882
00:33:44,060 --> 00:33:46,940
then executing those sub-tasks with AI assistance.

883
00:33:46,940 --> 00:33:49,020
Every sub-task is a prompt response cycle.

884
00:33:49,020 --> 00:33:50,540
Every prompt carries context.

885
00:33:50,540 --> 00:33:52,460
Every response generates output.

886
00:33:52,460 --> 00:33:55,180
The system's intelligence, its ability to break down problems

887
00:33:55,180 --> 00:33:57,740
and iterate towards solutions is the same property

888
00:33:57,740 --> 00:33:59,340
that makes it expensive at scale.

889
00:33:59,340 --> 00:34:02,940
Organizations deploying agentec workflows in 2026

890
00:34:02,940 --> 00:34:04,620
are discovering this in real time.

891
00:34:04,620 --> 00:34:07,660
A knowledge management agent that crawls your document repository

892
00:34:07,660 --> 00:34:09,980
to answer questions can generate hundreds of thousands

893
00:34:09,980 --> 00:34:12,940
of tokens per user per week if nobody constrained its search scope.

894
00:34:12,940 --> 00:34:16,140
A customer service agent that roots inquiries

895
00:34:16,140 --> 00:34:19,260
to appropriate teams can spawn multiple parallel workflows

896
00:34:19,260 --> 00:34:23,180
investigating the same issue if nobody built in deduplication logic.

897
00:34:23,180 --> 00:34:25,740
A compliance agent that generates regulatory summaries

898
00:34:25,740 --> 00:34:28,860
across multiple jurisdictions can run unbounded searches

899
00:34:28,860 --> 00:34:30,860
across your entire legal repository

900
00:34:30,860 --> 00:34:33,260
unless someone explicitly capped the query depth.

901
00:34:33,260 --> 00:34:35,100
These aren't failures of model capability.

902
00:34:35,100 --> 00:34:36,540
They're failures of governance.

903
00:34:36,540 --> 00:34:38,940
An agentec system without spending controls

904
00:34:38,940 --> 00:34:41,580
is an agentec system that will eventually generate a bill

905
00:34:41,580 --> 00:34:42,700
nobody expected.

906
00:34:42,700 --> 00:34:45,740
The classic incident, now becoming common enough to have patterns,

907
00:34:45,740 --> 00:34:47,660
involves sub-agent fan-out.

908
00:34:47,660 --> 00:34:51,100
One agent spawns multiple child agents to parallelize work.

909
00:34:51,100 --> 00:34:54,540
Each child agent independently generates tokens consuming compute

910
00:34:54,540 --> 00:34:57,660
and without a circuit breaker, the system scales to absurd costs.

911
00:34:57,660 --> 00:35:01,580
GitHub users in early 2026 experienced incidents

912
00:35:01,580 --> 00:35:05,500
were a single agentec workflow spike generated $47,000

913
00:35:05,500 --> 00:35:07,580
in compute charges in the course of an afternoon

914
00:35:07,580 --> 00:35:09,820
with no human intervention or oversight.

915
00:35:09,820 --> 00:35:11,420
The agent worked exactly as designed.

916
00:35:11,420 --> 00:35:13,740
It just wasn't designed with cost as a constraint.

917
00:35:13,740 --> 00:35:16,140
Token costs must be managed like labor costs,

918
00:35:16,140 --> 00:35:18,220
with budgets, with utilization targets,

919
00:35:18,220 --> 00:35:20,780
with clear cost per business outcome metrics

920
00:35:20,780 --> 00:35:22,620
that force explicit trade-offs.

921
00:35:22,620 --> 00:35:25,100
Run this agent on the expensive reasoning model

922
00:35:25,100 --> 00:35:27,100
or run it on the cheaper general model.

923
00:35:27,100 --> 00:35:28,860
Root this workflow through a local model

924
00:35:28,860 --> 00:35:30,380
or call the Frontier API.

925
00:35:30,380 --> 00:35:31,740
These aren't technical questions.

926
00:35:31,740 --> 00:35:33,020
They're business questions,

927
00:35:33,020 --> 00:35:35,180
and they require the same governance apparatus

928
00:35:35,180 --> 00:35:37,740
you'd build for any significant operational cost.

929
00:35:37,740 --> 00:35:40,300
The enterprise without token governance infrastructure in place

930
00:35:40,300 --> 00:35:43,660
is the enterprise that discovers a $47,000 incident

931
00:35:43,660 --> 00:35:45,100
on its Azure bill,

932
00:35:45,100 --> 00:35:47,260
and realizes nobody owns the decision

933
00:35:47,260 --> 00:35:50,460
about which agents can run at what cost for what benefit.

934
00:35:50,460 --> 00:35:52,940
Now we have the full picture of what's bleeding cash.

935
00:35:52,940 --> 00:35:55,180
The question is where the money is actually well spent.

936
00:35:55,180 --> 00:35:57,500
Where Copilot actually wins.

937
00:35:57,500 --> 00:35:59,660
The argument so far has been relentlessly negative

938
00:35:59,660 --> 00:36:00,540
and for good reason.

939
00:36:00,540 --> 00:36:01,980
The structural flaws are real.

940
00:36:01,980 --> 00:36:03,420
The waste is quantifiable.

941
00:36:03,420 --> 00:36:05,180
The governance gaps are consequential,

942
00:36:05,180 --> 00:36:06,860
but Copilot ROI is not a myth.

943
00:36:06,860 --> 00:36:07,580
It's real.

944
00:36:07,580 --> 00:36:09,180
It's just highly concentrated,

945
00:36:09,180 --> 00:36:11,180
and most organizations are deploying it

946
00:36:11,180 --> 00:36:12,780
in a way that diffuses the return

947
00:36:12,780 --> 00:36:14,780
across too many seats to make it visible.

948
00:36:14,780 --> 00:36:16,940
The break-even threshold is simple to calculate.

949
00:36:16,940 --> 00:36:19,420
At a fully loaded labor cost of $50 per hour,

950
00:36:19,420 --> 00:36:22,220
a $30 per month license becomes economically rational

951
00:36:22,220 --> 00:36:25,420
if the user captures just 36 minutes of save time per week.

952
00:36:25,420 --> 00:36:27,260
36 minutes, not even an hour.

953
00:36:27,260 --> 00:36:28,860
For that 36-minute threshold,

954
00:36:28,860 --> 00:36:32,060
an organization has already crossed into positive ROI territory.

955
00:36:32,060 --> 00:36:34,780
The CFO doesn't need to believe in productivity magic.

956
00:36:34,780 --> 00:36:38,460
Just 36 minutes of actual time recapture per week per user.

957
00:36:38,460 --> 00:36:41,180
Real-world data from organizations that deployed Copilot

958
00:36:41,180 --> 00:36:42,940
with intentional adoption programming,

959
00:36:42,940 --> 00:36:44,220
not just seat assignment,

960
00:36:44,220 --> 00:36:46,380
but structured training, champion networks,

961
00:36:46,380 --> 00:36:48,060
role-specific prompting libraries,

962
00:36:48,060 --> 00:36:50,060
and measurable success metrics,

963
00:36:50,060 --> 00:36:53,420
shows that users consistently reach three to four hours per week

964
00:36:53,420 --> 00:36:54,540
saved at maturity.

965
00:36:54,540 --> 00:36:58,380
A forest-to-total economic impact study commissioned by Microsoft,

966
00:36:58,380 --> 00:37:00,220
working with client data from organizations

967
00:37:00,220 --> 00:37:03,500
that actively drove adoption and process redesign around Copilot,

968
00:37:03,500 --> 00:37:07,260
reported 116% ROI in the first year alone.

969
00:37:07,260 --> 00:37:08,620
That's a flaw, not a ceiling,

970
00:37:08,620 --> 00:37:10,300
but here's the critical constraint.

971
00:37:10,300 --> 00:37:12,860
ROI concentrates in specific work patterns,

972
00:37:12,860 --> 00:37:14,780
not all roles, not all tasks,

973
00:37:14,780 --> 00:37:16,700
specific roles, specific workflows where

974
00:37:16,700 --> 00:37:18,860
Copilot's strengths align with actual bottlenecks

975
00:37:18,860 --> 00:37:19,980
in how work gets done.

976
00:37:19,980 --> 00:37:21,820
The high-value profile is straightforward,

977
00:37:21,820 --> 00:37:24,140
document heavy roles, communication heavy roles,

978
00:37:24,140 --> 00:37:27,180
analysis heavy roles, knowledge workers who spend hours each week

979
00:37:27,180 --> 00:37:30,060
in outlook, teams, word, Excel, and PowerPoint,

980
00:37:30,060 --> 00:37:31,660
people who draft, revise,

981
00:37:31,660 --> 00:37:34,300
summarize, search, analyze, and synthesize

982
00:37:34,300 --> 00:37:35,580
as the core of their job,

983
00:37:35,580 --> 00:37:37,260
not as a marginal activity.

984
00:37:37,260 --> 00:37:39,260
The strongest use cases are concrete.

985
00:37:39,260 --> 00:37:41,660
Drafting proposals and RFP responses,

986
00:37:41,660 --> 00:37:43,420
Copilot can scaffold the structure,

987
00:37:43,420 --> 00:37:46,460
pull relevant prior language and generate first-pass content

988
00:37:46,460 --> 00:37:48,860
that a senior consultant refines into the final bid.

989
00:37:48,860 --> 00:37:52,060
Summarizing meeting transcripts and long documents,

990
00:37:52,060 --> 00:37:54,460
Copilot can extract decisions, action items,

991
00:37:54,460 --> 00:37:57,660
and key points from hours of recorded content in seconds.

992
00:37:57,660 --> 00:38:01,260
Cross-system search across email, teams, and SharePoint,

993
00:38:01,260 --> 00:38:04,060
retrieving relevant context from organizational memory

994
00:38:04,060 --> 00:38:05,740
without manual navigation.

995
00:38:05,740 --> 00:38:08,380
First-pass Excel analysis, generating formulas,

996
00:38:08,380 --> 00:38:10,940
suggesting pivot structures, identifying outliers

997
00:38:10,940 --> 00:38:13,500
before a financial analyst fine-tunes the model.

998
00:38:13,500 --> 00:38:15,420
These aren't speculative use cases,

999
00:38:15,420 --> 00:38:17,260
sales organizations, consulting firms,

1000
00:38:17,260 --> 00:38:19,580
legal departments, and product management teams

1001
00:38:19,580 --> 00:38:22,060
have reported consistent three to four hours per week

1002
00:38:22,060 --> 00:38:23,660
saved a deployment maturity.

1003
00:38:23,660 --> 00:38:26,460
At a $75,000 fully loaded annual salary

1004
00:38:26,460 --> 00:38:30,060
that's $5,000 since $25 in annual time value per user.

1005
00:38:30,060 --> 00:38:32,380
The license costs $360, the math is not close.

1006
00:38:32,380 --> 00:38:34,620
The ROI multiplier that distinguishes thriving

1007
00:38:34,620 --> 00:38:37,820
copilot deployments from failing ones is adoption depth.

1008
00:38:37,820 --> 00:38:40,460
Organizations where 70% or more of licensed users

1009
00:38:40,460 --> 00:38:43,500
invoke copilot daily in at least one core workflow

1010
00:38:43,500 --> 00:38:46,300
see 200 to 400% first year ROI.

1011
00:38:46,300 --> 00:38:49,340
Organizations stuck at 30% weekly active usage

1012
00:38:49,340 --> 00:38:50,780
see near zero return.

1013
00:38:50,780 --> 00:38:53,340
Same license costs, same model, same potential,

1014
00:38:53,340 --> 00:38:55,020
three to four times the actual outcome,

1015
00:38:55,020 --> 00:38:57,820
driven entirely by whether the organization engineered usage

1016
00:38:57,820 --> 00:39:00,860
into the workflow or just assigned seats and hoped.

1017
00:39:00,860 --> 00:39:02,620
This is where the tiered access framework

1018
00:39:02,620 --> 00:39:04,220
becomes not a cost-cutting exercise

1019
00:39:04,220 --> 00:39:05,660
but a concentration strategy.

1020
00:39:05,660 --> 00:39:07,340
You're not trying to minimize copilot spend

1021
00:39:07,340 --> 00:39:10,300
that you're trying to maximize copilot ROI per dollar spent

1022
00:39:10,300 --> 00:39:12,220
which means concentrating licenses

1023
00:39:12,220 --> 00:39:14,540
where the work pattern matches copilot's strengths

1024
00:39:14,540 --> 00:39:16,460
where process redesign is feasible

1025
00:39:16,460 --> 00:39:18,060
and where the labor costs saved

1026
00:39:18,060 --> 00:39:20,060
justifies the infrastructure cost.

1027
00:39:20,060 --> 00:39:22,860
And letting everyone else access the free or bundled tier

1028
00:39:22,860 --> 00:39:25,260
for the cases where copilot is a utility

1029
00:39:25,260 --> 00:39:26,700
not a critical capability.

1030
00:39:26,700 --> 00:39:28,700
The organization seeing the best outcomes

1031
00:39:28,700 --> 00:39:30,780
are not the ones that bought the most seats.

1032
00:39:30,780 --> 00:39:32,380
They're the ones that bought the right seats

1033
00:39:32,380 --> 00:39:34,540
in the right roles backed by structured enablement

1034
00:39:34,540 --> 00:39:35,980
governed by clear metrics

1035
00:39:35,980 --> 00:39:37,820
and resourced with change management.

1036
00:39:37,820 --> 00:39:40,460
That's how you move from a deployment that bleeds cash

1037
00:39:40,460 --> 00:39:42,540
to a deployment that generates return.

1038
00:39:42,540 --> 00:39:45,340
Knowing where it wins tells you how to build the framework

1039
00:39:45,340 --> 00:39:48,380
that captures that value instead of defusing it across the organization

1040
00:39:48,380 --> 00:39:49,820
like expensive overhead.

1041
00:39:49,820 --> 00:39:51,340
The tiered access model.

1042
00:39:51,340 --> 00:39:53,740
The structural answer to both the zombie seat problem

1043
00:39:53,740 --> 00:39:56,780
and the agente cost explosion isn't to deploy more broadly

1044
00:39:56,780 --> 00:39:58,060
or more conservatively.

1045
00:39:58,060 --> 00:39:59,660
It's to deploy differently

1046
00:39:59,660 --> 00:40:01,660
with explicit tiers that separate

1047
00:40:01,660 --> 00:40:04,140
who gets what capability under what conditions

1048
00:40:04,140 --> 00:40:05,580
and with what governance attached.

1049
00:40:05,580 --> 00:40:06,780
This isn't a new concept.

1050
00:40:06,780 --> 00:40:09,900
Financial institutions have used tiered access for years.

1051
00:40:09,900 --> 00:40:13,100
Basic customer service reps get red only access to accounts.

1052
00:40:13,100 --> 00:40:15,260
Supervisors get limited modification rights,

1053
00:40:15,260 --> 00:40:18,220
senior staff get unrestricted system access with logging.

1054
00:40:18,220 --> 00:40:19,980
You're applying the same principle to AI

1055
00:40:19,980 --> 00:40:21,820
but organized around value and risk

1056
00:40:21,820 --> 00:40:23,900
rather than organizational hierarchy.

1057
00:40:23,900 --> 00:40:25,580
The framework typically works like this.

1058
00:40:25,580 --> 00:40:28,300
tier three is broad access with minimal controls.

1059
00:40:28,300 --> 00:40:31,820
Think of this as the free copilot tier available to all staff.

1060
00:40:31,820 --> 00:40:33,500
These users get access to public

1061
00:40:33,500 --> 00:40:35,660
or well curated internal knowledge bases

1062
00:40:35,660 --> 00:40:38,540
but not direct access to sensitive corporate data.

1063
00:40:38,540 --> 00:40:40,540
They can use copilot for writing, translation,

1064
00:40:40,540 --> 00:40:43,580
summarization of non-sensitive material generic Q&A.

1065
00:40:43,580 --> 00:40:46,380
No agent creation, no custom integrations,

1066
00:40:46,380 --> 00:40:49,100
no ability to connect to line of business systems.

1067
00:40:49,100 --> 00:40:51,340
The model is constrained to prevent leakage

1068
00:40:51,340 --> 00:40:53,020
of confidential information

1069
00:40:53,020 --> 00:40:54,860
and the usage footprint is monitored

1070
00:40:54,860 --> 00:40:56,220
but not heavily governed.

1071
00:40:56,220 --> 00:40:58,460
This tier is cheap, either completely free

1072
00:40:58,460 --> 00:41:00,700
or bundled into existing licenses.

1073
00:41:00,700 --> 00:41:02,620
And the expectation is that a large portion

1074
00:41:02,620 --> 00:41:04,060
of the organization lands here.

1075
00:41:04,060 --> 00:41:05,900
tier two is for standard knowledge workers

1076
00:41:05,900 --> 00:41:08,940
whose role directly benefits from deeper AI capability.

1077
00:41:08,940 --> 00:41:11,100
They get full M365 copilot licensing

1078
00:41:11,100 --> 00:41:12,700
with access to departmental content,

1079
00:41:12,700 --> 00:41:14,620
their email, their teams channels,

1080
00:41:14,620 --> 00:41:16,300
their shared sharepoint sites.

1081
00:41:16,300 --> 00:41:17,900
They can ask copilot questions

1082
00:41:17,900 --> 00:41:19,340
about organizational policies

1083
00:41:19,340 --> 00:41:21,180
internal processes customer accounts

1084
00:41:21,180 --> 00:41:22,460
within their department.

1085
00:41:22,460 --> 00:41:23,980
Standard app integrations with word,

1086
00:41:23,980 --> 00:41:25,740
excel, PowerPoint, outlook, and teams.

1087
00:41:25,740 --> 00:41:27,660
They can run basic automations and workflows

1088
00:41:27,660 --> 00:41:29,180
but cannot create custom agents

1089
00:41:29,180 --> 00:41:30,620
or modify governance policies.

1090
00:41:30,620 --> 00:41:32,620
The cost is the 30-dap a month add-on

1091
00:41:32,620 --> 00:41:34,060
but it's concentrated on roles

1092
00:41:34,060 --> 00:41:37,180
where documented usage patterns show concrete ROI.

1093
00:41:37,180 --> 00:41:38,780
Sales teams, legal departments,

1094
00:41:38,780 --> 00:41:41,020
financial analysts, project managers.

1095
00:41:41,020 --> 00:41:42,540
Governance is active.

1096
00:41:42,540 --> 00:41:44,780
There are clear policies about what copilot can

1097
00:41:44,780 --> 00:41:45,980
and cannot access

1098
00:41:45,980 --> 00:41:48,620
with periodic reviews of usage patterns.

1099
00:41:48,620 --> 00:41:51,820
tier one is restricted access for high-value high-risk work.

1100
00:41:51,820 --> 00:41:53,340
This is where custom agents live.

1101
00:41:53,340 --> 00:41:55,900
This is where board-level or senior executive decisions

1102
00:41:55,900 --> 00:41:57,260
get AI assisted.

1103
00:41:57,260 --> 00:41:59,340
This is where sensitive data like M&A planning,

1104
00:41:59,340 --> 00:42:00,620
proprietary research,

1105
00:42:00,620 --> 00:42:03,500
or regulated compliance work flows through copilot.

1106
00:42:03,500 --> 00:42:05,180
Access is explicitly provisioned.

1107
00:42:05,180 --> 00:42:07,580
You don't get tier one access because of your job title.

1108
00:42:07,580 --> 00:42:10,780
You get it because a specific high-value workflow requires it

1109
00:42:10,780 --> 00:42:13,500
and the governance overhead is worth the business case.

1110
00:42:13,500 --> 00:42:16,380
These agents run under strict conditional access policies.

1111
00:42:16,380 --> 00:42:18,060
They have restricted sharepoint search scope.

1112
00:42:18,060 --> 00:42:19,020
They log everything.

1113
00:42:19,020 --> 00:42:20,780
Changes to their prompts or tool integrations

1114
00:42:20,780 --> 00:42:22,540
go through formal change management.

1115
00:42:22,540 --> 00:42:24,780
The compliance and legal review is heavyweight.

1116
00:42:24,780 --> 00:42:26,380
The cost per agent is higher

1117
00:42:26,380 --> 00:42:28,380
not because the license is more expensive

1118
00:42:28,380 --> 00:42:31,660
but because the governance infrastructure around it is more expensive.

1119
00:42:31,660 --> 00:42:33,660
The business case for tiering is straightforward.

1120
00:42:33,660 --> 00:42:36,060
Concentrate the $30 per month add-on spend

1121
00:42:36,060 --> 00:42:38,380
on the 35% of users who will actually generate

1122
00:42:38,380 --> 00:42:40,620
3-4 hours per week of time value.

1123
00:42:40,620 --> 00:42:43,660
Let the other 65% use free or bundled copilot

1124
00:42:43,660 --> 00:42:45,900
for the cases where they benefit incidentally.

1125
00:42:45,900 --> 00:42:48,300
The reallocation isn't a cost-cutting exercise.

1126
00:42:48,300 --> 00:42:50,380
It's a return optimization exercise.

1127
00:42:50,380 --> 00:42:52,300
You're moving budget from low utilization seats

1128
00:42:52,300 --> 00:42:53,820
to high impact workflows.

1129
00:42:53,820 --> 00:42:55,580
You're recovering the zombie seat spend

1130
00:42:55,580 --> 00:42:57,820
and redirecting it toward agents and capabilities

1131
00:42:57,820 --> 00:42:59,900
that drive measurable business outcomes.

1132
00:42:59,900 --> 00:43:02,140
Access tiers must be tied to governance controls

1133
00:43:02,140 --> 00:43:03,660
not just job titles.

1134
00:43:03,660 --> 00:43:06,460
Sales director doesn't automatically equal tier two.

1135
00:43:06,460 --> 00:43:07,500
Rather you ask,

1136
00:43:07,500 --> 00:43:10,540
does this role spend significant time in document heavy workflows

1137
00:43:10,540 --> 00:43:12,860
where AI can materially reduce cycle time?

1138
00:43:12,860 --> 00:43:14,380
Does the team have process discipline

1139
00:43:14,380 --> 00:43:16,220
around how AI outputs get used?

1140
00:43:16,220 --> 00:43:17,980
Are there clear success metrics?

1141
00:43:17,980 --> 00:43:20,060
If yes to all three, provision tier two.

1142
00:43:20,060 --> 00:43:21,820
If not, tier three with a clear path

1143
00:43:21,820 --> 00:43:23,820
to upgrade one's conditions are met.

1144
00:43:23,820 --> 00:43:25,580
This is a portfolio management decision

1145
00:43:25,580 --> 00:43:27,660
not an IT configuration decision.

1146
00:43:27,660 --> 00:43:30,780
You're not optimizing for how many people can we give AI to.

1147
00:43:30,780 --> 00:43:34,140
You're optimizing for how do we maximize business value per dollar

1148
00:43:34,140 --> 00:43:37,500
of AI spend while keeping governance risk under control.

1149
00:43:37,500 --> 00:43:40,140
The tiering framework is what enables that optimization

1150
00:43:40,140 --> 00:43:41,900
to actually function as a lived practice

1151
00:43:41,900 --> 00:43:44,140
instead of a theoretical ideal.

1152
00:43:44,140 --> 00:43:46,140
Token governance as a financial discipline.

1153
00:43:46,140 --> 00:43:48,140
Once you've built the tiered access framework,

1154
00:43:48,140 --> 00:43:49,580
you've solved the structural problem

1155
00:43:49,580 --> 00:43:51,260
of who gets what capability,

1156
00:43:51,260 --> 00:43:53,180
but you haven't solved the operational problem

1157
00:43:53,180 --> 00:43:55,900
of managing how much that capability costs once it's in use.

1158
00:43:55,900 --> 00:43:58,620
That requires a different kind of discipline entirely.

1159
00:43:58,620 --> 00:44:01,260
Think of token governance as FinOps applied to AI.

1160
00:44:01,260 --> 00:44:03,100
Cloud cost management emerged as a discipline

1161
00:44:03,100 --> 00:44:05,180
because cloud infrastructure is metered.

1162
00:44:05,180 --> 00:44:07,020
You get a bill for every compute hour,

1163
00:44:07,020 --> 00:44:09,820
every gigabyte stored, every terabyte transferred.

1164
00:44:09,820 --> 00:44:12,620
Organizations that didn't actively manage cloud spend

1165
00:44:12,620 --> 00:44:15,740
watched their bills grow 30 to 40% year over year,

1166
00:44:15,740 --> 00:44:18,460
even when actual workload demand stayed flat.

1167
00:44:18,460 --> 00:44:20,220
The solution wasn't to use less cloud.

1168
00:44:20,220 --> 00:44:21,900
It was to build operational practices

1169
00:44:21,900 --> 00:44:24,220
that made cloud cost visible, attributable,

1170
00:44:24,220 --> 00:44:25,900
and subject to budget discipline.

1171
00:44:25,900 --> 00:44:27,900
Token governance applies the same principle

1172
00:44:27,900 --> 00:44:29,260
to LLM consumption.

1173
00:44:29,260 --> 00:44:31,820
The first operational step is to segment your user base

1174
00:44:31,820 --> 00:44:33,180
into consumption profiles.

1175
00:44:33,180 --> 00:44:34,700
Not everyone is a heavy user.

1176
00:44:34,700 --> 00:44:36,860
Light users, those using co-pilot occasionally,

1177
00:44:36,860 --> 00:44:38,300
mostly for straightforward tasks,

1178
00:44:38,300 --> 00:44:40,940
like email summarization or quick document search,

1179
00:44:40,940 --> 00:44:44,220
consume between 40,000 and 100,000 tokens per month.

1180
00:44:44,220 --> 00:44:47,420
Heavy analytical users, consultants building research-backed

1181
00:44:47,420 --> 00:44:50,300
deliverables, engineers debugging complex systems

1182
00:44:50,300 --> 00:44:53,420
with extended context, analyst processing documents sets,

1183
00:44:53,420 --> 00:44:55,900
can reach 1.6 million tokens per month.

1184
00:44:55,900 --> 00:44:58,700
That's a 16 to 44 difference in consumption

1185
00:44:58,700 --> 00:45:00,220
from the same license type.

1186
00:45:00,220 --> 00:45:03,020
Understanding where your population actually falls on that curve

1187
00:45:03,020 --> 00:45:05,180
is the foundation for meaningful governance.

1188
00:45:05,180 --> 00:45:07,260
But consumption tracking alone is sterile

1189
00:45:07,260 --> 00:45:08,700
if you're just counting tokens.

1190
00:45:08,700 --> 00:45:10,780
What matters is cost per business outcome.

1191
00:45:10,780 --> 00:45:12,140
Don't track cost per token.

1192
00:45:12,140 --> 00:45:15,580
Track cost per merged pull request, cost per proposal drafted,

1193
00:45:15,580 --> 00:45:17,580
cost per support case resolved.

1194
00:45:17,580 --> 00:45:20,140
That reorientation accomplishes two things.

1195
00:45:20,140 --> 00:45:22,540
First, it ties spending to actual value,

1196
00:45:22,540 --> 00:45:25,260
making it clear when token consumption is growing faster

1197
00:45:25,260 --> 00:45:27,500
than the outcomes it's producing.

1198
00:45:27,500 --> 00:45:29,900
Second, it creates a natural conversation

1199
00:45:29,900 --> 00:45:32,140
about whether the outcome justifies the spend,

1200
00:45:32,140 --> 00:45:35,420
which is the core question any business discipline should answer.

1201
00:45:35,420 --> 00:45:38,220
Prompt governance is the most immediately actionable lever,

1202
00:45:38,220 --> 00:45:40,140
large persistent instruction blocks,

1203
00:45:40,140 --> 00:45:42,460
shared system prompts, global configuration files,

1204
00:45:42,460 --> 00:45:44,620
detailed context about your organization's policies

1205
00:45:44,620 --> 00:45:47,980
and processes, get injected into every single interaction.

1206
00:45:47,980 --> 00:45:50,220
If that shared context is 2,000 tokens,

1207
00:45:50,220 --> 00:45:52,940
and it lives in every prompt, then every co-pilot user

1208
00:45:52,940 --> 00:45:55,340
is paying a 2,000 token tax on all interactions,

1209
00:45:55,340 --> 00:45:58,540
whether that context is relevant to the specific query or not.

1210
00:45:58,540 --> 00:46:01,180
Trim that shared instruction set to essentials.

1211
00:46:01,180 --> 00:46:03,020
Segment context by role or department

1212
00:46:03,020 --> 00:46:05,740
so users only carry the instructions relevant to their work.

1213
00:46:05,740 --> 00:46:08,700
The Claude code case studies document 40% cost reductions

1214
00:46:08,700 --> 00:46:11,180
from simply moving large persistent context files

1215
00:46:11,180 --> 00:46:13,900
out of active prompts and into documentation or knowledge

1216
00:46:13,900 --> 00:46:15,980
bases that get consulted when needed

1217
00:46:15,980 --> 00:46:18,060
rather than injected always.

1218
00:46:18,060 --> 00:46:20,300
Session lifecycle control compounds this.

1219
00:46:20,300 --> 00:46:22,540
An IDE session left open overnight,

1220
00:46:22,540 --> 00:46:24,220
where a developer had a rich conversation

1221
00:46:24,220 --> 00:46:26,140
about a code base earlier in the day

1222
00:46:26,140 --> 00:46:28,140
carries the full conversation history

1223
00:46:28,140 --> 00:46:30,780
into every subsequent interaction the next morning.

1224
00:46:30,780 --> 00:46:33,020
When that developer switches to a completely different code base,

1225
00:46:33,020 --> 00:46:35,100
that old context is still being carried forward,

1226
00:46:35,100 --> 00:46:37,980
still being charged as input tokens on every new request.

1227
00:46:37,980 --> 00:46:40,060
Encourage users to reset or clear sessions

1228
00:46:40,060 --> 00:46:42,380
when switching tasks, make session reset

1229
00:46:42,380 --> 00:46:44,460
a deliberate action in workflow guidance.

1230
00:46:44,460 --> 00:46:47,340
The cognitive load on the model and the token bill

1231
00:46:47,340 --> 00:46:49,820
drops as soon as that irrelevant context is dropped.

1232
00:46:49,820 --> 00:46:52,220
Model rooting is where discipline token governance

1233
00:46:52,220 --> 00:46:54,540
becomes cost reduction theater.

1234
00:46:54,540 --> 00:46:56,380
Not every task requires a frontier model.

1235
00:46:56,380 --> 00:46:58,460
A routine code completion can run on a smaller,

1236
00:46:58,460 --> 00:46:59,660
faster, cheaper model.

1237
00:46:59,660 --> 00:47:01,580
Complex multi-step reasoning should escalate

1238
00:47:01,580 --> 00:47:03,020
to a more capable tier.

1239
00:47:03,020 --> 00:47:05,500
If 70% of your interactions are straightforward

1240
00:47:05,500 --> 00:47:08,220
and 30% are complex, root the straightforward work

1241
00:47:08,220 --> 00:47:10,060
to a cheaper tier automatically.

1242
00:47:10,060 --> 00:47:12,620
The documented case studies show this discipline alone.

1243
00:47:12,620 --> 00:47:15,260
Deliberate model selection based on task complexity,

1244
00:47:15,260 --> 00:47:18,140
rather than defaulting everything to your most capable model,

1245
00:47:18,140 --> 00:47:22,380
produces between 70 and 85% reductions in total token spend

1246
00:47:22,380 --> 00:47:25,660
while maintaining quality on the tasks that require it.

1247
00:47:25,660 --> 00:47:27,420
Finally, set spend limits and alerts,

1248
00:47:27,420 --> 00:47:30,300
treat token budgets the way you treat headcount budgets.

1249
00:47:30,300 --> 00:47:32,620
Department gets allocated X tokens per month.

1250
00:47:32,620 --> 00:47:35,020
Once they approach the limit, they get visibility

1251
00:47:35,020 --> 00:47:36,860
and a conversation with finance.

1252
00:47:36,860 --> 00:47:38,940
Individual workflows get spending thresholds.

1253
00:47:38,940 --> 00:47:40,700
Agentex systems get circuit breakers

1254
00:47:40,700 --> 00:47:42,700
that prevent runaway token consumption

1255
00:47:42,700 --> 00:47:45,500
without explicit limits, cost grows invisibly

1256
00:47:45,500 --> 00:47:47,580
until an invoice arrives that nobody expected.

1257
00:47:47,580 --> 00:47:50,540
Governance without measurement is policy theater.

1258
00:47:50,540 --> 00:47:53,740
Governance with measurement is operational reality.

1259
00:47:53,740 --> 00:47:55,500
The ROI-ordered framework,

1260
00:47:55,500 --> 00:47:58,540
measurement and governance work backward from a simple fact.

1261
00:47:58,540 --> 00:48:00,460
Most organizations deployed co-pilot

1262
00:48:00,460 --> 00:48:02,860
without capturing what before looked like.

1263
00:48:02,860 --> 00:48:06,300
You can't calculate ROI if you never measured the baseline.

1264
00:48:06,300 --> 00:48:07,740
You can't prove the time was saved

1265
00:48:07,740 --> 00:48:10,700
if you didn't document how long the task took before AI arrived.

1266
00:48:10,700 --> 00:48:13,420
This baseline problem compounds across every other dimension

1267
00:48:13,420 --> 00:48:14,620
of an audit framework.

1268
00:48:14,620 --> 00:48:18,620
A real co-pilot ROI audit operates across eight interconnected dimensions.

1269
00:48:18,620 --> 00:48:20,620
Each one answers a specific question

1270
00:48:20,620 --> 00:48:22,460
that feeds into the others.

1271
00:48:22,460 --> 00:48:24,700
Dimension 1 is strategic alignment.

1272
00:48:24,700 --> 00:48:27,900
Does every licensed user have a documented relevant use case?

1273
00:48:27,900 --> 00:48:30,300
Not maybe they'll use this eventually?

1274
00:48:30,300 --> 00:48:34,140
Documented, a concrete answer to where does this person use co-pilot?

1275
00:48:34,140 --> 00:48:35,420
If you can't articulate that,

1276
00:48:35,420 --> 00:48:38,620
the license is speculative spend, not strategic investment.

1277
00:48:38,620 --> 00:48:41,820
What percentage of your licenses align to high-impact processes

1278
00:48:41,820 --> 00:48:44,060
where documented ROI patterns exist?

1279
00:48:44,060 --> 00:48:47,100
Sales, legal consulting, financial analysis?

1280
00:48:47,100 --> 00:48:49,980
Versus roles where co-pilot adoption is still experimental?

1281
00:48:49,980 --> 00:48:53,740
Dimension 2 is baseline and benefit measurement.

1282
00:48:53,740 --> 00:48:55,500
This is where the actual time data lives.

1283
00:48:55,500 --> 00:48:58,140
How long did a specific task take before co-pilot?

1284
00:48:58,140 --> 00:48:59,180
How long after?

1285
00:48:59,180 --> 00:49:00,140
Take the time saved,

1286
00:49:00,140 --> 00:49:01,980
multiply by fully loaded labor cost,

1287
00:49:01,980 --> 00:49:03,340
multiply by adoption rate.

1288
00:49:03,340 --> 00:49:04,620
That's your monetized benefit,

1289
00:49:04,620 --> 00:49:07,500
compare it to license cost plus enablement overhead.

1290
00:49:07,500 --> 00:49:10,220
This is the one dimension you cannot eyeball or guess at.

1291
00:49:10,220 --> 00:49:12,300
The CFO will not accept survey responses

1292
00:49:12,300 --> 00:49:14,620
about how co-pilot feels faster.

1293
00:49:14,620 --> 00:49:17,660
They will accept before and after cycle times against defined tasks.

1294
00:49:17,660 --> 00:49:19,420
Dimension 3 is adoption behavior.

1295
00:49:19,420 --> 00:49:22,060
What fraction of licensed users invoke co-pilot daily

1296
00:49:22,060 --> 00:49:23,900
in at least one core workflow?

1297
00:49:23,900 --> 00:49:25,020
Weekly, monthly,

1298
00:49:25,020 --> 00:49:27,100
this is where usage telemetry becomes ground truth.

1299
00:49:27,100 --> 00:49:29,020
The admin center usage reports show this.

1300
00:49:29,020 --> 00:49:30,380
The graph APIs expose it.

1301
00:49:30,380 --> 00:49:31,820
You need to know the conversion rate,

1302
00:49:31,820 --> 00:49:32,700
the real number,

1303
00:49:32,700 --> 00:49:34,380
not the aspirational one.

1304
00:49:34,380 --> 00:49:37,420
Organizations with 70% or higher daily active usage

1305
00:49:37,420 --> 00:49:39,580
see fundamentally different ROI profiles

1306
00:49:39,580 --> 00:49:41,180
than organizations at 30%.

1307
00:49:41,180 --> 00:49:42,460
The metric is not controversial.

1308
00:49:42,460 --> 00:49:44,140
What organizations do with the metric

1309
00:49:44,140 --> 00:49:45,980
is where the real discipline lives.

1310
00:49:45,980 --> 00:49:48,700
Dimension 4 is technical readiness and performance.

1311
00:49:48,700 --> 00:49:51,660
Are all users on supported M365 update channels?

1312
00:49:51,660 --> 00:49:53,340
Do they have the right base licenses?

1313
00:49:53,340 --> 00:49:55,340
Is there latency or reliability friction

1314
00:49:55,340 --> 00:49:56,940
that suppresses adoption?

1315
00:49:56,940 --> 00:49:58,860
Technical friction doesn't feel like a cost

1316
00:49:58,860 --> 00:50:01,100
until you realize it's preventing the time savings

1317
00:50:01,100 --> 00:50:02,380
you're supposed to be capturing.

1318
00:50:02,380 --> 00:50:04,380
Dimension 5 is data governance quality.

1319
00:50:04,380 --> 00:50:06,220
Can co-pilot deliver useful answers

1320
00:50:06,220 --> 00:50:08,540
if your SharePoint is a maze of overshared folders

1321
00:50:08,540 --> 00:50:09,820
and unlabeled documents?

1322
00:50:09,820 --> 00:50:12,220
A cleaner data estate increases ROI

1323
00:50:12,220 --> 00:50:13,740
by improving answer quality

1324
00:50:13,740 --> 00:50:16,140
while simultaneously reducing compliance risk.

1325
00:50:16,140 --> 00:50:17,340
This isn't theoretical.

1326
00:50:17,340 --> 00:50:18,780
Organizations that skip this step

1327
00:50:18,780 --> 00:50:20,540
discover it through a security incident,

1328
00:50:20,540 --> 00:50:21,900
not a success metric.

1329
00:50:21,900 --> 00:50:24,540
Dimension 6 is cost and licensing optimization.

1330
00:50:24,540 --> 00:50:27,180
Identify which licenses are actually generating value

1331
00:50:27,180 --> 00:50:28,540
and which are occupying seats.

1332
00:50:28,540 --> 00:50:29,740
Find users whose role changed

1333
00:50:29,740 --> 00:50:31,820
and no longer justifies their current skew.

1334
00:50:31,820 --> 00:50:34,860
Reclaim licenses from consistently low usage accounts.

1335
00:50:34,860 --> 00:50:36,540
Reallocate to high value roles.

1336
00:50:36,540 --> 00:50:37,660
This is not cost-cutting.

1337
00:50:37,660 --> 00:50:41,420
This is capital redeployment from low return to high return uses.

1338
00:50:41,420 --> 00:50:43,900
Dimension 7 is change management and enablement.

1339
00:50:43,900 --> 00:50:46,940
Did you run formal training beyond an email announcement?

1340
00:50:46,940 --> 00:50:48,300
Do you have champions in each department

1341
00:50:48,300 --> 00:50:49,820
who can model usage and troubleshoot?

1342
00:50:49,820 --> 00:50:51,580
Is there an ongoing enablement program

1343
00:50:51,580 --> 00:50:53,260
or just an initial onboarding?

1344
00:50:53,260 --> 00:50:55,980
Change management has a direct correlation with adoption depth

1345
00:50:55,980 --> 00:50:58,940
and adoption depth drives ROI by three to four times.

1346
00:50:58,940 --> 00:51:01,500
Dimension 8 is continuous improvement and governance.

1347
00:51:01,500 --> 00:51:03,580
How often are you reviewing ROI metrics?

1348
00:51:03,580 --> 00:51:05,740
Are you adjusting which tiers users belong to

1349
00:51:05,740 --> 00:51:07,340
based on actual usage patterns?

1350
00:51:07,340 --> 00:51:08,620
They have you built feedback loops

1351
00:51:08,620 --> 00:51:12,300
where usage data informs policy and policy informs usage allocation.

1352
00:51:12,300 --> 00:51:14,300
The quarterly license audit is not optional.

1353
00:51:14,300 --> 00:51:16,540
It's the mechanism that converts recovered spend

1354
00:51:16,540 --> 00:51:17,820
into funded expansion.

1355
00:51:17,820 --> 00:51:19,980
Without it, waste accumulates invisibly

1356
00:51:19,980 --> 00:51:23,100
and the next budget cycle perpetuates the same inefficiency.

1357
00:51:23,100 --> 00:51:24,540
The audit tells you what's broken.

1358
00:51:24,540 --> 00:51:26,380
The next step is fixing the deployment model.

1359
00:51:26,380 --> 00:51:28,860
The self-funding AI strategy.

1360
00:51:28,860 --> 00:51:30,460
The audit identifies the waste.

1361
00:51:30,460 --> 00:51:33,580
The tiered framework concentrates where you actually want to spend.

1362
00:51:33,580 --> 00:51:36,780
But where does the budget come from to fund the expansion?

1363
00:51:36,780 --> 00:51:38,860
This is where the self-funding narrative shifts

1364
00:51:38,860 --> 00:51:41,180
from cost consciousness to capital strategy.

1365
00:51:41,180 --> 00:51:42,860
The mechanism is straightforward.

1366
00:51:42,860 --> 00:51:45,900
Independent advisors and SAM specialists consistently report

1367
00:51:45,900 --> 00:51:48,140
that organizations uncover between 10

1368
00:51:48,140 --> 00:51:50,460
and 20% in unnecessary license costs

1369
00:51:50,460 --> 00:51:52,860
during their first formal optimization cycle.

1370
00:51:52,860 --> 00:51:54,140
Not through aggressive cuts,

1371
00:51:54,140 --> 00:51:56,060
through reclaiming seats that are actively assigned

1372
00:51:56,060 --> 00:51:58,460
but generating zero measurable value?

1373
00:51:58,460 --> 00:52:01,500
Users who left the organization but still hold licenses.

1374
00:52:01,500 --> 00:52:04,460
Roles that changed and no longer justify E5 level access.

1375
00:52:04,460 --> 00:52:06,300
Adon's purchased for short term projects

1376
00:52:06,300 --> 00:52:09,500
that concluded months ago, continuous automated governance.

1377
00:52:09,500 --> 00:52:11,580
Where systems continuously audit usage

1378
00:52:11,580 --> 00:52:13,260
and reallocate based on behavior

1379
00:52:13,260 --> 00:52:15,420
rather than waiting for annual cycles,

1380
00:52:15,420 --> 00:52:18,620
pushes that recovery range to 20% to 30% over time.

1381
00:52:18,620 --> 00:52:20,780
For a 5,000 user organization

1382
00:52:20,780 --> 00:52:22,780
where 30% of seats are genuinely idle

1383
00:52:22,780 --> 00:52:25,020
at the current per seat cost of $30 per month,

1384
00:52:25,020 --> 00:52:28,540
that idle population is carrying $540,000 per year

1385
00:52:28,540 --> 00:52:31,020
in unproductive co-pilot spend alone.

1386
00:52:31,020 --> 00:52:32,940
At the base E3 or E5 licenses,

1387
00:52:32,940 --> 00:52:34,300
those users are also holding

1388
00:52:34,300 --> 00:52:36,860
and the total recoverable cost climbs significantly higher.

1389
00:52:36,860 --> 00:52:38,540
That's not a marginal adjustment.

1390
00:52:38,540 --> 00:52:41,500
That's substantial capital sitting in the wrong allocation.

1391
00:52:41,500 --> 00:52:42,860
The self-funding narrative works

1392
00:52:42,860 --> 00:52:45,180
because that recovered budget doesn't disappear.

1393
00:52:45,180 --> 00:52:47,980
It funds exactly the capabilities that generate return.

1394
00:52:47,980 --> 00:52:50,540
Targeted co-pilot expansion to the high ROI roles

1395
00:52:50,540 --> 00:52:52,860
you identified in your strategic alignment audit.

1396
00:52:52,860 --> 00:52:54,860
The sales teams and consulting practices

1397
00:52:54,860 --> 00:52:58,300
where documented usage patterns show three to four hours per week saved.

1398
00:52:58,300 --> 00:53:01,100
Co-pilot's studio agent development

1399
00:53:01,100 --> 00:53:03,180
for the two or three specific workflows

1400
00:53:03,180 --> 00:53:05,100
where you measured concrete business outcomes.

1401
00:53:05,100 --> 00:53:06,540
The compliance research agent,

1402
00:53:06,540 --> 00:53:07,900
the CRM routing agent,

1403
00:53:07,900 --> 00:53:09,580
the proposal scaffold agent,

1404
00:53:09,580 --> 00:53:12,220
Azure Open AI Capacity for Custom Applications,

1405
00:53:12,220 --> 00:53:14,220
where token economics favor direct billing

1406
00:53:14,220 --> 00:53:15,820
over per user licenses.

1407
00:53:15,820 --> 00:53:17,260
The specialized domain co-pilot

1408
00:53:17,260 --> 00:53:18,460
that needs fine tuning,

1409
00:53:18,460 --> 00:53:21,420
the vertical specific agent built on proprietary data.

1410
00:53:21,420 --> 00:53:23,020
The reallocation works at scale

1411
00:53:23,020 --> 00:53:26,060
because concentration beats coverage in terms of ROI.

1412
00:53:26,060 --> 00:53:27,820
We've already established that organizations

1413
00:53:27,820 --> 00:53:30,700
hitting 70% or higher weekly active usage

1414
00:53:30,700 --> 00:53:32,620
see three to four times the return

1415
00:53:32,620 --> 00:53:35,900
on the same co-pilot license compared to those stuck at 30%.

1416
00:53:35,900 --> 00:53:37,500
That's not because the model improved.

1417
00:53:37,500 --> 00:53:39,180
That's because the deployment changed.

1418
00:53:39,180 --> 00:53:41,180
The same $30 monthly investment produces

1419
00:53:41,180 --> 00:53:43,020
a four to one difference in outcome

1420
00:53:43,020 --> 00:53:45,580
based on whether you're spreading it thin across a population

1421
00:53:45,580 --> 00:53:47,260
or concentrating it in roles

1422
00:53:47,260 --> 00:53:50,300
where the work pattern makes AI genuinely useful.

1423
00:53:50,300 --> 00:53:53,260
The phased expansion model operationalizes this principle.

1424
00:53:53,260 --> 00:53:55,020
You don't recover zombie seat licenses

1425
00:53:55,020 --> 00:53:57,740
and immediately redeploy them to the next 1,000 people.

1426
00:53:57,740 --> 00:53:59,500
You pilot with a concentrated group

1427
00:53:59,500 --> 00:54:02,540
100 to 500 users in genuinely high value roles.

1428
00:54:02,540 --> 00:54:03,980
You measure rigorously.

1429
00:54:03,980 --> 00:54:06,940
Time data, adoption metrics, cost per outcome.

1430
00:54:06,940 --> 00:54:08,460
Only when the data supports it,

1431
00:54:08,460 --> 00:54:10,380
do you expand to the next cohort.

1432
00:54:10,380 --> 00:54:12,540
Then repeat, each phase is a funding gate.

1433
00:54:12,540 --> 00:54:13,500
You don't get more budget

1434
00:54:13,500 --> 00:54:15,660
unless the prior phase demonstrated return.

1435
00:54:15,660 --> 00:54:18,460
This approach inverts the risk profile of AI spending.

1436
00:54:18,460 --> 00:54:19,820
The blanket rollout model,

1437
00:54:19,820 --> 00:54:22,060
by many seats, hope adoption follows,

1438
00:54:22,060 --> 00:54:23,900
puts all the risk up front.

1439
00:54:23,900 --> 00:54:25,820
You've committed the budget and the infrastructure

1440
00:54:25,820 --> 00:54:27,980
before you know if it will generate return.

1441
00:54:27,980 --> 00:54:29,660
The self-funding model defers the risk.

1442
00:54:29,660 --> 00:54:31,420
You prove return with a constrained group.

1443
00:54:31,420 --> 00:54:34,060
You recover waste from the broader organization.

1444
00:54:34,060 --> 00:54:36,940
You reinvest that recovered capital into expansion

1445
00:54:36,940 --> 00:54:38,460
but only in the directions

1446
00:54:38,460 --> 00:54:40,620
where data already validates the outcome.

1447
00:54:40,620 --> 00:54:43,740
This is not a cost-cutting exercise disguised as strategy.

1448
00:54:43,740 --> 00:54:44,540
It's the opposite.

1449
00:54:44,540 --> 00:54:47,580
You're preserving or expanding your AI capability footprint

1450
00:54:47,580 --> 00:54:50,700
while dramatically improving the efficiency of every dollar deployed.

1451
00:54:50,700 --> 00:54:52,940
The total AI budget doesn't necessarily shrink.

1452
00:54:52,940 --> 00:54:55,020
The return per dollar dramatically improves.

1453
00:54:55,020 --> 00:54:56,540
And the funding mechanism,

1454
00:54:56,540 --> 00:54:59,340
reallocation from low return to high return uses,

1455
00:54:59,340 --> 00:55:00,540
is internally sustainable,

1456
00:55:00,540 --> 00:55:03,500
which means you're not asking the board for more budget

1457
00:55:03,500 --> 00:55:05,180
to fund better AI strategy.

1458
00:55:05,180 --> 00:55:07,660
You're funding it by not wasting what you already have.

1459
00:55:07,660 --> 00:55:10,220
Before you can execute this reallocation effectively, though,

1460
00:55:10,220 --> 00:55:12,700
you need to address one structural prerequisite

1461
00:55:12,700 --> 00:55:15,020
that most organizations haven't actually solved.

1462
00:55:15,020 --> 00:55:17,500
The data governance prerequisite

1463
00:55:17,500 --> 00:55:19,260
Before you can execute any of this,

1464
00:55:19,260 --> 00:55:22,620
the tiered framework, the token governance, the ROI audit,

1465
00:55:22,620 --> 00:55:23,820
there's a foundational layer

1466
00:55:23,820 --> 00:55:26,300
that almost every organization underestimates.

1467
00:55:26,300 --> 00:55:28,060
Data quality and access control.

1468
00:55:28,060 --> 00:55:29,420
Co-pilot's value is tightly coupled

1469
00:55:29,420 --> 00:55:31,740
to how clean your data estate actually is,

1470
00:55:31,740 --> 00:55:32,780
not theoretically.

1471
00:55:32,780 --> 00:55:35,340
Operationally, when you grant co-pilot access

1472
00:55:35,340 --> 00:55:37,340
to your SharePoint teams and email archive,

1473
00:55:37,340 --> 00:55:39,180
you're not just enabling productivity.

1474
00:55:39,180 --> 00:55:40,460
You're exposing every permission,

1475
00:55:40,460 --> 00:55:41,900
misconfiguration, every document

1476
00:55:41,900 --> 00:55:43,340
that shouldn't be broadly accessible,

1477
00:55:43,340 --> 00:55:45,660
every folder structure that nobody's touched in three years

1478
00:55:45,660 --> 00:55:47,820
because nobody remembers why it was created.

1479
00:55:47,820 --> 00:55:50,140
Co-pilot doesn't create data access problems.

1480
00:55:50,140 --> 00:55:53,260
It surfaces existing ones at unprecedented speed and scale.

1481
00:55:53,260 --> 00:55:55,340
A human searching SharePoint manually might dig

1482
00:55:55,340 --> 00:55:57,740
through five or six documents before giving up.

1483
00:55:57,740 --> 00:55:59,740
Co-pilot can ingest 50 documents

1484
00:55:59,740 --> 00:56:02,540
in the time it takes a human to open the search interface.

1485
00:56:02,540 --> 00:56:04,940
If half those documents were never intended to be accessed

1486
00:56:04,940 --> 00:56:06,540
by the person running the query,

1487
00:56:06,540 --> 00:56:08,220
if they contain M&A preparation,

1488
00:56:08,220 --> 00:56:09,500
unreleased product roadmaps,

1489
00:56:09,500 --> 00:56:11,340
or confidential client information,

1490
00:56:11,340 --> 00:56:13,740
Co-pilot surfaces all of it with zero friction.

1491
00:56:14,620 --> 00:56:17,020
The access control failed long before Co-pilot arrived.

1492
00:56:17,020 --> 00:56:18,700
Co-pilot just made the failure visible.

1493
00:56:18,700 --> 00:56:20,860
The pre-deployment requirements are concrete.

1494
00:56:20,860 --> 00:56:24,140
Run a permissions audit across SharePoint, OneDrive, and Teams.

1495
00:56:24,140 --> 00:56:26,460
Find folders shared with anyone with the link.

1496
00:56:26,460 --> 00:56:28,620
Find Teams channels where guest access is active

1497
00:56:28,620 --> 00:56:31,260
for people long since departed from the organization.

1498
00:56:31,260 --> 00:56:33,180
Identify documents marked sensitive

1499
00:56:33,180 --> 00:56:34,620
but shared broadly anyway.

1500
00:56:34,620 --> 00:56:36,060
Remediate the oversharing.

1501
00:56:36,060 --> 00:56:38,940
Implement sensitivity labels that actually cover active content

1502
00:56:38,940 --> 00:56:41,340
and not the aspirational labels you defined two years ago,

1503
00:56:41,340 --> 00:56:43,340
but the labels that reflect how your organization

1504
00:56:43,340 --> 00:56:45,260
actually classifies information today.

1505
00:56:45,260 --> 00:56:46,940
Setup restricted search parameters,

1506
00:56:46,940 --> 00:56:49,420
so Co-pilot doesn't return results from data.

1507
00:56:49,420 --> 00:56:51,900
The requesting user doesn't have legitimate access to.

1508
00:56:51,900 --> 00:56:53,180
This work is not glamorous.

1509
00:56:53,180 --> 00:56:55,180
It doesn't show up in a productivity demo.

1510
00:56:55,180 --> 00:56:57,100
It's also not optional if you want Co-pilot

1511
00:56:57,100 --> 00:56:58,700
to function as a capability,

1512
00:56:58,700 --> 00:57:00,220
rather than a liability.

1513
00:57:00,220 --> 00:57:01,820
Here's the practical consequence.

1514
00:57:01,820 --> 00:57:04,380
Organizations that skip the data governance prerequisite

1515
00:57:04,380 --> 00:57:07,500
and deploy Co-pilot into a dirty data environment experience

1516
00:57:07,500 --> 00:57:09,340
two simultaneous problems.

1517
00:57:09,340 --> 00:57:11,180
First, Co-pilot's quality degrades

1518
00:57:11,180 --> 00:57:13,100
because it's pulling from a messy knowledge base

1519
00:57:13,100 --> 00:57:16,060
full of outdated, redundant, and conflicting information.

1520
00:57:16,060 --> 00:57:19,020
Users ask questions and get answers that are technically accurate

1521
00:57:19,020 --> 00:57:21,420
but strategically obsolete or contradictory

1522
00:57:21,420 --> 00:57:24,780
depending on which document Co-pilot happened to rank higher.

1523
00:57:24,780 --> 00:57:27,420
Second, compliance and security exposure spike

1524
00:57:27,420 --> 00:57:29,900
because Co-pilot is surfacing sensitive information

1525
00:57:29,900 --> 00:57:32,700
that should never have been broadly accessible in the first place.

1526
00:57:32,700 --> 00:57:35,260
The security incident is how most organizations discover

1527
00:57:35,260 --> 00:57:36,300
they skipped this step,

1528
00:57:36,300 --> 00:57:37,900
not through an architecture review,

1529
00:57:37,900 --> 00:57:39,420
through a breach or an audit finding.

1530
00:57:39,420 --> 00:57:41,660
Someone accesses information through Co-pilot

1531
00:57:41,660 --> 00:57:43,580
that they shouldn't have had permission to see

1532
00:57:43,580 --> 00:57:46,300
and suddenly governance becomes urgent in a very different way.

1533
00:57:46,300 --> 00:57:49,820
A cleaner data estate increases ROI by improving answer quality

1534
00:57:49,820 --> 00:57:51,980
while simultaneously reducing compliance risk.

1535
00:57:51,980 --> 00:57:53,500
It's not a trade-off where you improve one

1536
00:57:53,500 --> 00:57:55,580
at the expense of the other, they move together.

1537
00:57:55,580 --> 00:57:57,500
Better classification and access control means

1538
00:57:57,500 --> 00:57:59,900
Co-pilot returns more relevant, more recent,

1539
00:57:59,900 --> 00:58:01,420
more trustworthy information.

1540
00:58:01,420 --> 00:58:03,900
Better access control means Co-pilot's search scope

1541
00:58:03,900 --> 00:58:06,140
is constrained by legitimate permission boundaries

1542
00:58:06,140 --> 00:58:07,980
rather than exploding across everything

1543
00:58:07,980 --> 00:58:09,740
the organization ever documented.

1544
00:58:09,740 --> 00:58:11,900
Data readiness is not an IT prerequisite,

1545
00:58:11,900 --> 00:58:13,100
it's a business prerequisite.

1546
00:58:13,100 --> 00:58:16,060
Your CFO needs to understand that before Co-pilot can work,

1547
00:58:16,060 --> 00:58:17,900
your SharePoint environment needs to function

1548
00:58:17,900 --> 00:58:19,500
like an actual information architecture

1549
00:58:19,500 --> 00:58:22,460
rather than a filing cabinet where everything eventually gets dumped.

1550
00:58:22,460 --> 00:58:23,980
Your legal team needs to confirm

1551
00:58:23,980 --> 00:58:26,220
that the content Co-pilot will have access to

1552
00:58:26,220 --> 00:58:27,580
is classified appropriately.

1553
00:58:27,580 --> 00:58:29,740
Your business leaders need to own the decision

1554
00:58:29,740 --> 00:58:32,220
about what data Co-pilot can surface to whom.

1555
00:58:32,220 --> 00:58:33,980
That's the foundation before anything else

1556
00:58:33,980 --> 00:58:35,900
in the framework actually lands cleanly.

1557
00:58:35,900 --> 00:58:37,740
The hybrid automation architecture,

1558
00:58:37,740 --> 00:58:39,980
the architecture question is no longer binary,

1559
00:58:39,980 --> 00:58:42,380
you're not choosing between deploy Co-pilot everywhere

1560
00:58:42,380 --> 00:58:44,540
and stick with RPA.

1561
00:58:44,540 --> 00:58:47,180
The organization seeing the best outcomes in 2026

1562
00:58:47,180 --> 00:58:49,100
are building something different entirely.

1563
00:58:49,100 --> 00:58:51,020
A layered stack where each component handles

1564
00:58:51,020 --> 00:58:53,740
the work is cheapest and most effective at performing.

1565
00:58:53,740 --> 00:58:55,500
Think of it as four distinct layers,

1566
00:58:55,500 --> 00:58:58,060
each with its own cost profile and responsibility.

1567
00:58:58,060 --> 00:58:59,980
The interpretation layer sits at the top.

1568
00:58:59,980 --> 00:59:01,740
This is where language understanding lives.

1569
00:59:01,740 --> 00:59:04,060
You have incoming email with unstructured text

1570
00:59:04,060 --> 00:59:06,700
or a document in PDF format with inconsistent layouts

1571
00:59:06,700 --> 00:59:09,980
or a customer support chat thread mixing technical details

1572
00:59:09,980 --> 00:59:11,420
with emotional context.

1573
00:59:11,420 --> 00:59:13,740
These inputs don't fit predefined templates.

1574
00:59:13,740 --> 00:59:14,700
They require judgment.

1575
00:59:14,700 --> 00:59:16,860
They require someone or something to read intent

1576
00:59:16,860 --> 00:59:17,740
and extract meaning.

1577
00:59:17,740 --> 00:59:19,340
This is where LLMs operate.

1578
00:59:19,340 --> 00:59:20,380
Classification.

1579
00:59:20,380 --> 00:59:22,780
Is this email a complaint or a question

1580
00:59:22,780 --> 00:59:24,780
or a request for information?

1581
00:59:24,780 --> 00:59:25,660
Extraction.

1582
00:59:25,660 --> 00:59:28,940
What are the key entities and what does the customer actually need?

1583
00:59:28,940 --> 00:59:29,980
Summarization.

1584
00:59:29,980 --> 00:59:32,780
What's the one paragraph version of this 50 page contract?

1585
00:59:32,780 --> 00:59:35,180
The interpretation layer is pure AI work.

1586
00:59:35,180 --> 00:59:37,260
The cost scales with token consumption

1587
00:59:37,260 --> 00:59:38,860
but the value it produces,

1588
00:59:38,860 --> 00:59:42,540
converting messy unstructured input into clean, structured signal

1589
00:59:42,540 --> 00:59:45,260
is what makes the entire downstream workflow possible.

1590
00:59:45,260 --> 00:59:47,420
The execution layer does the mechanical work.

1591
00:59:47,420 --> 00:59:49,420
Once you've extracted meaning from that email,

1592
00:59:49,420 --> 00:59:51,500
someone needs to update the CRM record,

1593
00:59:51,500 --> 00:59:53,420
create a ticket in the helpdesk system,

1594
00:59:53,420 --> 00:59:55,660
move a file from inbox to archive folder,

1595
00:59:55,660 --> 00:59:57,580
post a transaction to the general ledger,

1596
00:59:57,580 --> 00:59:59,580
log an action in the project management system.

1597
00:59:59,580 --> 01:00:00,940
These are deterministic,

1598
01:00:00,940 --> 01:00:02,780
rules-based predictable operations.

1599
01:00:02,780 --> 01:00:03,980
They don't require judgment.

1600
01:00:03,980 --> 01:00:06,220
They require precision and consistency.

1601
01:00:06,220 --> 01:00:08,860
This is where RPA or deterministic scripts operate.

1602
01:00:08,860 --> 01:00:11,180
The cost here approaches zero once deployed

1603
01:00:11,180 --> 01:00:12,780
because you're not reasoning about anything.

1604
01:00:12,780 --> 01:00:14,700
You're just moving data and triggering workflows.

1605
01:00:14,700 --> 01:00:17,500
The marginal cost per execution is fractions of a cent

1606
01:00:17,500 --> 01:00:18,940
or in the case of a Python script,

1607
01:00:18,940 --> 01:00:20,540
fractions of a millionth of a dollar.

1608
01:00:20,540 --> 01:00:23,740
The supervisory layer handles what neither AI nor RPA

1609
01:00:23,740 --> 01:00:24,780
can safely own.

1610
01:00:24,780 --> 01:00:25,740
Exceptions.

1611
01:00:25,740 --> 01:00:26,700
Approvals.

1612
01:00:26,700 --> 01:00:28,140
High stakes decisions.

1613
01:00:28,140 --> 01:00:30,780
The email that came in wasn't a standard request.

1614
01:00:30,780 --> 01:00:33,740
It's a special circumstance requiring human judgment.

1615
01:00:33,740 --> 01:00:36,380
The transaction looks like a duplicate, but might not be.

1616
01:00:36,380 --> 01:00:39,820
The contract contains a clause that deviates from standard language

1617
01:00:39,820 --> 01:00:41,980
and needs a lawyer's opinion before it's committed.

1618
01:00:41,980 --> 01:00:43,260
Humans handle these cases.

1619
01:00:43,260 --> 01:00:45,340
The cost is whatever the fully loaded labor rate is

1620
01:00:45,340 --> 01:00:47,100
for the person doing the review.

1621
01:00:47,100 --> 01:00:49,180
It's expensive compared to AI or RPA,

1622
01:00:49,180 --> 01:00:51,900
but it's necessary because the reputational cost

1623
01:00:51,900 --> 01:00:54,460
and legal liability of getting these cases wrong

1624
01:00:54,460 --> 01:00:57,660
far exceeds the labor cost of handling them thoughtfully.

1625
01:00:57,660 --> 01:00:59,260
The control layer is the infrastructure

1626
01:00:59,260 --> 01:01:01,180
that makes the whole stack defensible.

1627
01:01:01,180 --> 01:01:03,660
Governance, audit trails, compliance logging,

1628
01:01:03,660 --> 01:01:06,060
observability into what's happening at every stage.

1629
01:01:06,060 --> 01:01:07,500
Which AI model ran?

1630
01:01:07,500 --> 01:01:09,500
What were the input tokens and output tokens?

1631
01:01:09,500 --> 01:01:10,940
Which RPA bot executed?

1632
01:01:10,940 --> 01:01:13,500
Did a human approve this transaction before it was posted?

1633
01:01:13,500 --> 01:01:15,900
What happened when the system encountered an exception?

1634
01:01:15,900 --> 01:01:17,420
This layer doesn't execute work.

1635
01:01:17,420 --> 01:01:19,180
It certifies that work happened correctly.

1636
01:01:19,180 --> 01:01:20,300
The decisions are traceable.

1637
01:01:20,300 --> 01:01:22,060
That mistakes can be identified and corrected.

1638
01:01:22,060 --> 01:01:24,540
It's not cheap, but it's how you prove the regulators,

1639
01:01:24,540 --> 01:01:26,460
auditors and your own executive team

1640
01:01:26,460 --> 01:01:28,300
that automation is happening under control.

1641
01:01:28,300 --> 01:01:30,700
The practical mechanics are worth walking through.

1642
01:01:30,700 --> 01:01:31,980
An email arrives.

1643
01:01:31,980 --> 01:01:33,420
The interpretation layer.

1644
01:01:33,420 --> 01:01:37,580
An LLM running a classifier task reads it, determines intent

1645
01:01:37,580 --> 01:01:39,420
and extracts structured data.

1646
01:01:39,420 --> 01:01:41,020
Cost fractions of a cent.

1647
01:01:41,020 --> 01:01:43,340
Output, a decision about what to do next

1648
01:01:43,340 --> 01:01:45,420
and the key information needed to do it.

1649
01:01:45,420 --> 01:01:48,540
The execution layer, RPA, or a deterministic script

1650
01:01:48,540 --> 01:01:50,860
takes that structured data and executes the workflow.

1651
01:01:50,860 --> 01:01:53,180
It updates the CRM, logs the action,

1652
01:01:53,180 --> 01:01:55,660
roots the case to the appropriate queue or person,

1653
01:01:55,660 --> 01:01:57,820
cost near zero marginal expense.

1654
01:01:57,820 --> 01:02:00,140
The supervisory layer reviews exceptions.

1655
01:02:00,140 --> 01:02:02,780
The 5 or 10% of cases where the interpretation

1656
01:02:02,780 --> 01:02:05,340
wasn't confident or the situation requires judgment.

1657
01:02:05,340 --> 01:02:07,180
The human person makes the final call.

1658
01:02:07,180 --> 01:02:09,820
The control layer logs everything, creates an audit trail,

1659
01:02:09,820 --> 01:02:10,940
flags anomalies.

1660
01:02:10,940 --> 01:02:13,420
The cost advantage of this hybrid model is compound.

1661
01:02:13,420 --> 01:02:16,700
You pay AI prices, which are variable and token dependent,

1662
01:02:16,700 --> 01:02:18,460
only for the interpretation work,

1663
01:02:18,460 --> 01:02:20,700
where human labor would be even more expensive.

1664
01:02:20,700 --> 01:02:23,740
You pay near zero execution prices for the deterministic work,

1665
01:02:23,740 --> 01:02:26,780
where RPA or scripts are vastly cheaper than asking an AI

1666
01:02:26,780 --> 01:02:28,140
to do mechanical tasks.

1667
01:02:28,140 --> 01:02:30,540
You pay human labor rates, which are expensive,

1668
01:02:30,540 --> 01:02:34,060
but only for the exceptions and decisions that actually require human judgment.

1669
01:02:34,060 --> 01:02:37,100
You're not forcing expensive reasoning engines to do mechanical work

1670
01:02:37,100 --> 01:02:39,580
and you're not using cheap deterministic scripts for tasks

1671
01:02:39,580 --> 01:02:41,740
that require contextual understanding.

1672
01:02:41,740 --> 01:02:43,660
The result is an automation stack,

1673
01:02:43,660 --> 01:02:46,940
where cost aligns to value and risk aligns to control,

1674
01:02:46,940 --> 01:02:48,460
not hype, not hope.

1675
01:02:48,460 --> 01:02:49,980
Structural economics.

1676
01:02:49,980 --> 01:02:51,900
Building your co-pilot business case.

1677
01:02:51,900 --> 01:02:54,380
A defensible business case in 2026

1678
01:02:54,380 --> 01:02:59,020
looks materially different from the AI business cases of 2024 and 2025.

1679
01:02:59,020 --> 01:03:00,460
The difference isn't academic.

1680
01:03:00,460 --> 01:03:02,220
It's the difference between getting budget approved

1681
01:03:02,220 --> 01:03:04,380
and getting budget revoked after the first renewal.

1682
01:03:04,380 --> 01:03:07,980
The five components that make a case defensible are baseline metrics,

1683
01:03:07,980 --> 01:03:09,980
conservative time savings assumptions,

1684
01:03:09,980 --> 01:03:13,580
pilot design, measurement methodology, and scale criteria.

1685
01:03:13,580 --> 01:03:17,020
You need all five, and they need to work together as an integrated model,

1686
01:03:17,020 --> 01:03:19,420
not as separate justifications that you stack together

1687
01:03:19,420 --> 01:03:21,820
and hope nobody asks hard questions about.

1688
01:03:21,820 --> 01:03:24,540
Baseline metrics means you've captured how long specific tasks

1689
01:03:24,540 --> 01:03:26,380
take before co-pilot arrives,

1690
01:03:26,380 --> 01:03:28,220
not estimates, actual data.

1691
01:03:28,220 --> 01:03:30,380
If proposal drafting is your target use case,

1692
01:03:30,380 --> 01:03:32,620
you've timed how long a senior consultant spends

1693
01:03:32,620 --> 01:03:35,180
on a typical RFP from intake to delivery.

1694
01:03:35,180 --> 01:03:37,260
If meeting summarization is the focus,

1695
01:03:37,260 --> 01:03:40,620
you've measured how long it takes someone to review a recorded meeting

1696
01:03:40,620 --> 01:03:42,220
and produce actionable notes.

1697
01:03:42,220 --> 01:03:43,660
If email triage is the driver,

1698
01:03:43,660 --> 01:03:47,100
you've documented cycle time from message arrival to initial response.

1699
01:03:47,100 --> 01:03:48,780
This baseline is your control group.

1700
01:03:48,780 --> 01:03:50,620
Without it, you can't measure the delta.

1701
01:03:50,620 --> 01:03:52,780
Without the delta, you can't calculate ROI.

1702
01:03:52,780 --> 01:03:55,900
The CFO will accept nothing less than documented time data.

1703
01:03:55,900 --> 01:03:59,420
Conservative modeling is where you protect yourself against the optimism trap.

1704
01:03:59,420 --> 01:04:03,580
The Forrestor study reported three to four hours per week saved at maturity.

1705
01:04:03,580 --> 01:04:04,700
That's the upper bound.

1706
01:04:04,700 --> 01:04:07,580
Your business case should assume 30 to 60 minutes per week

1707
01:04:07,580 --> 01:04:09,260
saved as your primary model.

1708
01:04:09,260 --> 01:04:12,860
Run sensitivity analysis up to three to five hours per week for mature users,

1709
01:04:12,860 --> 01:04:14,780
but lead with conservative assumptions.

1710
01:04:14,780 --> 01:04:17,900
If your case only works if you hit the optimistic scenario,

1711
01:04:17,900 --> 01:04:21,100
you've built a case that will fail the moment adoption under performs

1712
01:04:21,100 --> 01:04:24,140
or task complexity turns out to be higher than expected.

1713
01:04:24,140 --> 01:04:28,060
Build the case so it works at conservative assumptions and outperforms from there.

1714
01:04:28,060 --> 01:04:30,300
Pilot design is the operational linchpin.

1715
01:04:30,300 --> 01:04:34,140
You're not deploying co-pilot to the entire organization and measuring what happens.

1716
01:04:34,140 --> 01:04:37,100
You're running a controlled experiment with 100 to 500 users

1717
01:04:37,100 --> 01:04:39,500
in specific roles for eight to 12 weeks.

1718
01:04:39,500 --> 01:04:40,940
The roles are chosen explicitly,

1719
01:04:40,940 --> 01:04:43,660
sales, legal consulting, product management, customer service,

1720
01:04:43,660 --> 01:04:46,140
because they're document heavy and communication heavy

1721
01:04:46,140 --> 01:04:49,340
and you've already hypothesized that co-pilot fits their workflow.

1722
01:04:49,340 --> 01:04:51,740
The users in those roles get dedicated training,

1723
01:04:51,740 --> 01:04:53,580
not a generic email announcement.

1724
01:04:53,580 --> 01:04:55,500
You create role-specific prompting libraries,

1725
01:04:55,500 --> 01:04:57,100
so people aren't starting from scratch.

1726
01:04:57,100 --> 01:04:59,740
You establish clear success metrics per process.

1727
01:04:59,740 --> 01:05:01,340
Not did people like it?

1728
01:05:01,340 --> 01:05:04,860
Metrics like did proposal cycle time drop 20%

1729
01:05:04,860 --> 01:05:08,540
or did email response time improve 15 minutes per interaction?

1730
01:05:08,540 --> 01:05:12,540
Or did document revision cycles shrink from three rounds to two?

1731
01:05:12,540 --> 01:05:15,420
Measurement methodology is the operational discipline

1732
01:05:15,420 --> 01:05:17,500
that captures those metrics rigorously.

1733
01:05:17,500 --> 01:05:20,700
You're tracking cycle time, throughput, error rates, and adoption depth.

1734
01:05:20,700 --> 01:05:23,580
Adoption depth means daily active usage in core workflows,

1735
01:05:23,580 --> 01:05:25,020
not license assignment.

1736
01:05:25,020 --> 01:05:27,340
You're capturing this data before the pilot starts,

1737
01:05:27,340 --> 01:05:29,020
continuously during the pilot,

1738
01:05:29,020 --> 01:05:30,380
and at a defined endpoint,

1739
01:05:30,380 --> 01:05:33,180
so you can quantify the before and after gap.

1740
01:05:33,180 --> 01:05:36,060
You're not conducting surveys asking people how productive they feel.

1741
01:05:36,060 --> 01:05:38,860
The CFO will dismiss productivity feelings as anecdotes.

1742
01:05:38,860 --> 01:05:41,820
You're showing measurable change in how work actually flows.

1743
01:05:41,820 --> 01:05:44,540
Scale criteria is where you define the graduation gate.

1744
01:05:44,540 --> 01:05:47,500
You've run your pilot and gathered 12 weeks of data.

1745
01:05:47,500 --> 01:05:52,060
You've calculated ROI based on measured time savings multiplied by fully loaded labor cost

1746
01:05:52,060 --> 01:05:55,180
and compared to license costs plus enablement overhead.

1747
01:05:55,180 --> 01:05:59,260
You've set a threshold, maybe 150% ROI minimum,

1748
01:05:59,260 --> 01:06:01,100
maybe six month payback requirement,

1749
01:06:01,100 --> 01:06:04,060
maybe 60% adoption depth in the pilot cohort.

1750
01:06:04,060 --> 01:06:05,180
Whatever your threshold is,

1751
01:06:05,180 --> 01:06:08,220
it needs to be explicit and documented before the pilot starts.

1752
01:06:08,220 --> 01:06:10,700
Once the pilot ends, you measure against that threshold.

1753
01:06:10,700 --> 01:06:12,220
If you hit it, you expand.

1754
01:06:12,220 --> 01:06:16,140
If you don't, you diagnose why and iterate before committing to a wider rollout.

1755
01:06:16,140 --> 01:06:20,780
The critical insight that most organizations miss is that ROI tracks adoption,

1756
01:06:20,780 --> 01:06:21,740
not licenses.

1757
01:06:21,740 --> 01:06:23,500
You can buy all the seats you want.

1758
01:06:23,500 --> 01:06:25,420
ROI doesn't follow adoption follows,

1759
01:06:25,420 --> 01:06:27,420
change management and training are not overhead.

1760
01:06:27,420 --> 01:06:31,180
They are the line items that determine whether the license actually generates return.

1761
01:06:31,180 --> 01:06:33,260
Include them in your cost model from the beginning,

1762
01:06:33,260 --> 01:06:34,540
not as an afterthought.

1763
01:06:34,540 --> 01:06:36,860
A business case built this way is defensible with the CFO

1764
01:06:36,860 --> 01:06:40,460
because it's built on the same logic finance uses to evaluate any investment.

1765
01:06:40,460 --> 01:06:43,260
Documented baseline, conservative assumptions,

1766
01:06:43,260 --> 01:06:44,460
measurable outcomes,

1767
01:06:44,460 --> 01:06:45,740
gated expansion.

1768
01:06:45,740 --> 01:06:47,420
What the next 12 months require.

1769
01:06:47,420 --> 01:06:51,020
The structural shifts happening in 2026 aren't temporary, they're directional.

1770
01:06:51,020 --> 01:06:55,740
Microsoft's consumption first strategy is accelerating across the entire stack.

1771
01:06:55,740 --> 01:07:00,220
The hybrid model, mixing per-user licenses with meter capacity consumption,

1772
01:07:00,220 --> 01:07:03,020
is hardening into the standard rather than the exception.

1773
01:07:03,020 --> 01:07:06,860
Security co-pilot, which launched bundled into M365E5

1774
01:07:06,860 --> 01:07:09,180
with monthly security compute unit allocations,

1775
01:07:09,180 --> 01:07:13,980
is essentially a technical preview of how all co-pilot consumption will eventually flow.

1776
01:07:13,980 --> 01:07:17,260
You're looking at a future where co-pilot licensing works like cloud infrastructure

1777
01:07:17,260 --> 01:07:18,700
did five years ago.

1778
01:07:18,700 --> 01:07:22,060
Predictable base costs plus variable consumption on top

1779
01:07:22,060 --> 01:07:24,060
with no unified dashboard to track it all.

1780
01:07:24,060 --> 01:07:27,660
Organizations that build token governance infrastructure now,

1781
01:07:27,660 --> 01:07:31,340
the observability systems, the budget controls, the cost per outcome metrics,

1782
01:07:31,340 --> 01:07:34,220
will have a structural cost advantage as consumption billing expands.

1783
01:07:34,220 --> 01:07:35,580
You're not betting on a future model,

1784
01:07:35,580 --> 01:07:37,580
you're preparing for the one that's already here.

1785
01:07:37,580 --> 01:07:39,820
The one still operating on hope and assumption

1786
01:07:39,820 --> 01:07:42,220
will discover that they need governance infrastructure,

1787
01:07:42,220 --> 01:07:45,100
the moment an unexpected invoice arrives and nobody can explain it.

1788
01:07:45,100 --> 01:07:48,460
Three immediate actions compress the urgency.

1789
01:07:48,460 --> 01:07:51,820
First, conduct a full license and usage audit across M365

1790
01:07:51,820 --> 01:07:53,900
and any GitHub co-pilot deployments.

1791
01:07:53,900 --> 01:07:56,540
Not aspirational metrics about what you think people use.

1792
01:07:56,540 --> 01:08:00,060
Actual usage data from the admin center and usage reporting APIs.

1793
01:08:00,060 --> 01:08:02,460
Identify which licenses are generating measurable value

1794
01:08:02,460 --> 01:08:03,820
and which are seats in name only.

1795
01:08:03,820 --> 01:08:06,860
This isn't theoretical, this is capital redeployment preparation.

1796
01:08:06,860 --> 01:08:10,220
Second, define roll-based license templates.

1797
01:08:10,780 --> 01:08:16,220
Explicit mappings that say sales roll gets tier two with x capabilities.

1798
01:08:16,220 --> 01:08:19,340
Frontline staff gets tier three with y constraints.

1799
01:08:19,340 --> 01:08:22,380
Engineering leadership gets tier one with z governance.

1800
01:08:22,380 --> 01:08:23,900
These templates aren't permanent.

1801
01:08:23,900 --> 01:08:25,980
They are the framework that governs initial provisioning

1802
01:08:25,980 --> 01:08:29,260
and guides upgrade decisions as actual usage data accumulates.

1803
01:08:29,260 --> 01:08:33,740
Third, integrate co-pilot usage telemetry into your quarterly business reviews.

1804
01:08:33,740 --> 01:08:37,340
Make token consumption and cost per outcome visible to the business owners

1805
01:08:37,340 --> 01:08:38,620
who control the budget.

1806
01:08:38,620 --> 01:08:39,980
Finance sees the line item.

1807
01:08:39,980 --> 01:08:41,660
Engineering sees the adoption curve.

1808
01:08:41,660 --> 01:08:44,220
Sales sees time impact on deal cycles.

1809
01:08:44,220 --> 01:08:46,060
Visibility precedes discipline.

1810
01:08:46,060 --> 01:08:50,780
The 12-month horizon is the bridge between where you are now and where you need to be.

1811
01:08:50,780 --> 01:08:53,020
12 months from today, the conversation shouldn't be.

1812
01:08:53,020 --> 01:08:54,540
We have co-pilot licenses.

1813
01:08:54,540 --> 01:08:58,300
But we have a governed AI portfolio with measurable ROI-per-tier,

1814
01:08:58,300 --> 01:09:00,300
recovering 30% of idle spend,

1815
01:09:00,300 --> 01:09:03,180
and reinvesting that capital into high-impact agent development.

1816
01:09:03,180 --> 01:09:04,860
That's not aspirational.

1817
01:09:04,860 --> 01:09:08,300
That's operational reality for the organization's moving at scale right now.

1818
01:09:08,300 --> 01:09:11,740
The competitive gap is widening, but not in the direction most people assume.

1819
01:09:11,740 --> 01:09:14,860
It's not widening between organizations that have AI and those that don't.

1820
01:09:14,860 --> 01:09:16,140
Everyone has AI now.

1821
01:09:16,140 --> 01:09:18,780
The gap is widening between organizations that govern it,

1822
01:09:18,780 --> 01:09:21,420
with explicit tiers, clear metrics, token budgets,

1823
01:09:21,420 --> 01:09:22,860
and cost per outcome discipline,

1824
01:09:22,860 --> 01:09:24,540
and organizations that don't.

1825
01:09:24,540 --> 01:09:27,900
The governed ones see three to four times the return on the same license spend.

1826
01:09:27,900 --> 01:09:30,540
They're recapturing 30% of idle investment.

1827
01:09:30,540 --> 01:09:34,300
They are deploying agents confidently because they understand the cost profile

1828
01:09:34,300 --> 01:09:36,940
and have the controls in place to prevent surprises.

1829
01:09:36,940 --> 01:09:40,060
The ungoverned ones are treating co-pilot like an infinite utility,

1830
01:09:40,060 --> 01:09:41,900
wondering why their invoices are climbing,

1831
01:09:41,900 --> 01:09:44,700
and unable to articulate what they're getting in return.

1832
01:09:44,700 --> 01:09:46,220
The co-pilot tax is not inevitable.

1833
01:09:46,220 --> 01:09:47,180
It's a choice.

1834
01:09:47,180 --> 01:09:49,500
Organizations that deploy without a framework

1835
01:09:49,500 --> 01:09:52,220
that assume broad access will naturally drive adoption,

1836
01:09:52,220 --> 01:09:54,540
that don't measure baselines before deployment,

1837
01:09:54,540 --> 01:09:57,100
that don't tier access by ROI or risk,

1838
01:09:57,100 --> 01:09:59,180
that lack token governance infrastructure,

1839
01:09:59,180 --> 01:10:01,500
those organizations are choosing to bleed cash.

1840
01:10:01,500 --> 01:10:04,860
The organizations that deploy with the framework we've outlined in this episode

1841
01:10:04,860 --> 01:10:06,060
are choosing differently.

1842
01:10:06,060 --> 01:10:08,220
They're extracting return, they're managing risk,

1843
01:10:08,220 --> 01:10:11,820
they're making capital allocation decisions based on evidence rather than hope.

1844
01:10:11,820 --> 01:10:15,020
The work starts now, not after you've built the perfect solution.

1845
01:10:15,020 --> 01:10:18,940
Now, the best time to establish governance was before you deployed co-pilot.

1846
01:10:18,940 --> 01:10:20,780
The second best time is today.

1847
01:10:20,780 --> 01:10:23,980
The co-pilot tax isn't just a line item on your Microsoft invoice,

1848
01:10:23,980 --> 01:10:27,660
it's the gap between the potential you bought and the value you're actually capturing.

1849
01:10:27,660 --> 01:10:31,500
It's the 48% verification burden, it's the 60% idle token waste.

1850
01:10:31,500 --> 01:10:35,900
It's the zombie seed sitting in roles that don't need reasoning engines to do mechanical work,

1851
01:10:35,900 --> 01:10:38,780
but as we've seen, this isn't an inevitable cost of doing business,

1852
01:10:38,780 --> 01:10:39,820
it's a governance choice.

1853
01:10:39,820 --> 01:10:42,380
If you built the tiers, ordered the baselines,

1854
01:10:42,380 --> 01:10:44,620
and treat tokens like a financial discipline,

1855
01:10:44,620 --> 01:10:47,180
you don't just stop the bleeding, you fund the future.

1856
01:10:47,180 --> 01:10:50,220
If this episode changed how you think about your AI strategy,

1857
01:10:50,220 --> 01:10:53,180
or if it gave you the data you need for your next budget review,

1858
01:10:53,180 --> 01:10:55,260
follow me, Mirko Peters, on LinkedIn.

1859
01:10:55,260 --> 01:10:57,740
I'm always looking for the next structural flaw to deconstruct,

1860
01:10:57,740 --> 01:11:00,780
and your feedback helps me find the topics that matter most right now,

1861
01:11:00,780 --> 01:11:04,540
and if you want more deep dives into the reality of the M365 ecosystem,

1862
01:11:04,540 --> 01:11:06,540
subscribe to the M365FM podcast.

1863
01:11:06,540 --> 01:11:07,900
Leave a review while you're there.

1864
01:11:07,900 --> 01:11:11,500
It helps more leaders find this framework before their next renewal.

1865
01:11:11,500 --> 01:11:16,300
Share this with your team, especially if you're looking at your 2026 budget right now.

Mirko Peters Profile Photo

Founder of m365.fm, m365.show and m365con.net

Mirko Peters is a Microsoft 365 expert, content creator, and founder of m365.fm, a platform dedicated to sharing practical insights on modern workplace technologies. His work focuses on Microsoft 365 governance, security, collaboration, and real-world implementation strategies.

Through his podcast and written content, Mirko provides hands-on guidance for IT professionals, architects, and business leaders navigating the complexities of Microsoft 365. He is known for translating complex topics into clear, actionable advice, often highlighting common mistakes and overlooked risks in real-world environments.

With a strong emphasis on community contribution and knowledge sharing, Mirko is actively building a platform that connects experts, shares experiences, and helps organizations get the most out of their Microsoft 365 investments.