May 28, 2026

Breaking the Scale Barrier: Building Multi-Tenant SaaS on Power Pages

Breaking the Scale Barrier: Building Multi-Tenant SaaS on Power Pages
Breaking the Scale Barrier: Building Multi-Tenant SaaS on Power Pages
M365 FM Podcast
Breaking the Scale Barrier: Building Multi-Tenant SaaS on Power Pages

Breaking the Scale Barrier explores what it really takes to build scalable multi-tenant SaaS solutions with Microsoft Power Pages and Dataverse. Instead of focusing on simple customer portals or low-code demos, the episode dives into the architectural decisions that become critical once a platform must support multiple customers, strict security boundaries, enterprise governance, and large-scale growth.

The conversation explains why many Power Platform projects struggle when they move beyond a single environment and how proper tenant isolation, identity management, API strategy, and automation can determine whether a SaaS platform succeeds or fails. It also examines the balance between low-code simplicity and the need for pro-code extensibility when scaling enterprise applications.

Topics include Dataverse design for tenant separation, authentication with Microsoft Entra External ID, governance and ALM practices, performance considerations, API limitations, and patterns for combining Azure services with Power Platform capabilities. The episode highlights both the strengths and the practical limitations of Power Pages in real-world SaaS scenarios.

The key takeaway is that Power Pages can absolutely power enterprise-grade SaaS platforms, but scalability is not automatic. Successful solutions require intentional architecture, operational discipline, and a fusion approach that combines low-code productivity with professional engineering practices.

Apple Podcasts podcast player iconSpotify podcast player iconYoutube Music podcast player iconSpreaker podcast player iconPodchaser podcast player iconAmazon Music podcast player icon

Scaling a multi-tenant saas application can feel overwhelming. You want to keep every tenant’s data secure and isolated while your user base grows. Power Pages now lets you break through these limits. With Elastic Tables powered by Azure Cosmos DB, you gain the tools you need for Breaking the Scale Barrier. You get fast, reliable performance plus robust security for every tenant.

Key Takeaways

  • Multi-tenant architecture reduces costs and simplifies scaling for SaaS applications.
  • Power Pages uses Elastic Tables and Azure Cosmos DB for high scalability and tenant isolation.
  • Shared infrastructure in multi-tenant models lowers server and storage expenses.
  • Automated tenant management improves efficiency and supports self-service analytics.
  • Continuous monitoring helps maintain high performance and reliability for SaaS platforms.
  • Cost efficiency is achieved through shared infrastructure, auto-scaling, and regular spending reviews.
  • Power Pages supports strong security and compliance with SOC 2 and GDPR standards.

Why Multi-Tenant for SaaS

Single-Tenant vs. Multi-Tenant

When you build a saas application, you must choose between single-tenant and multi-tenant models. This choice shapes how you manage resources, security, and growth. In a single-tenant setup, each customer gets their own dedicated environment. In a multi-tenant model, you serve many customers from one shared system. The table below shows the main differences:

AspectSingle-TenantMulti-Tenant
Infrastructure CostHigh - dedicated resources for each customerLow - shared infrastructure amortizes cost
Security & IsolationVery high - data fully isolatedHigh - data is logically isolated, but sharing infrastructure requires strict controls
CustomizationEasy - environment can be customized per tenantHarder - changes affect all tenants, customization must be via configuration
ScalabilityDifficult - scaling means more instancesEasier - shared system scales efficiently
Updates & MaintenancePer customer - updates applied separatelyGlobal - updates deploy to all tenants at once
Typical Use CasesBest for regulated apps needing isolationBest for broad market SaaS targeting many customers

You can see that single-tenant models offer strong isolation and easy customization. However, they come with high costs and make scaling difficult. Multi-tenant models let you support many tenants on one platform. This approach lowers costs and makes it easier to grow your saas business.

Benefits for SaaS Scalability

You want your saas solution to reach more customers without running into limits. Multi-tenant architecture helps you do this. Here are some key benefits:

  • You can serve many tenants with one application. This reduces extra work and keeps your system simple.
  • Shared infrastructure means you spend less on servers and storage. You get economies of scale as your saas grows.
  • You only need to update and maintain one version of your application. All tenants get new features and fixes at the same time.
  • You can add new tenants without buying more hardware. This makes your costs predictable and helps you plan for growth.

Tip: Companies that use multi-tenant models often cut infrastructure costs by up to 50% compared to single-tenant setups. As you add more tenants, the cost for each one goes down.

Multi-tenant saas platforms give you the flexibility to expand quickly. You can focus on building features and serving your tenants, not on managing separate systems. This model supports scalability and helps you deliver value to more customers with less effort.

Multi-Tenant Architecture in Power Pages

Multi-Tenant Architecture in Power Pages

When you build a multi-tenant saas solution on Power Pages, you need to choose the right architecture. The way you design your system affects how you manage tenants, keep data secure, and support growth. Power Pages gives you several options to create a flexible and secure multi-tenant architecture.

Core Architecture Overview

Power Pages uses Microsoft Dataverse as its main data source. When you add pages to your website, Power Pages stores the page definitions in Dataverse. You can display data from Dataverse on your pages and collect user data directly into Dataverse tables. This setup makes it easy to manage data for many tenants in one place.

Elastic Tables take this further. They use Azure Cosmos DB to store data in a way that supports high scalability. Elastic Tables use partition-aware storage, which means they split data into logical groups. Each group can handle its own workload. This helps you avoid performance problems when many tenants use your saas at the same time.

Partition-aware storage also helps with tenant isolation. Each tenant’s data stays in its own partition, so you can keep data separate and secure. You get the benefits of fast performance and strong security, even as your number of tenants grows.

Note: Partitioning and Elastic Tables let you scale your multi-tenant architecture without slowing down your application.

Tenant Isolation Models

You have several ways to isolate tenants in Power Pages. Each model has its own strengths and trade-offs. Your choice depends on your business needs, the number of tenants, and your goals for scalability.

Shared Database with Tenant ID

In this model, all tenants share the same database. You use a Tenant ID field in each table to keep data separate. This approach is simple and cost-effective. You can manage many tenants with one set of resources.

  • Advantages: Low cost, easy to manage, efficient use of resources.
  • Disadvantages: You must use strict controls in your code to prevent data leaks between tenants.

Separate Databases per Tenant

Here, each tenant gets its own database. This model gives you the highest level of data isolation. You do not have to worry about one tenant’s data mixing with another’s.

  • Advantages: Maximum data isolation, solves the noisy neighbor problem, strong security.
  • Disadvantages: Higher cost, more complex to manage, may waste resources for small tenants.

Hybrid Approaches

Hybrid models combine the best parts of both shared and separate database models. For example, you might use a shared database for most tenants and give large tenants their own database. This lets you balance cost, scalability, and security.

  • Advantages: Flexible, can meet different customer needs, supports growth.
  • Disadvantages: More complex to set up and manage, needs careful planning.

Here is a table that compares these tenant isolation models:

Model TypeAdvantagesDisadvantages
Shared-Everything ModelLow cost, simple, efficient resource useHigher risk of data leaks, needs strict code controls
Schema-per-Tenant ModelBetter isolation, allows tenant customizationsMore complex, needs programmatic schema changes
Database-per-Tenant ModelMaximum isolation, no noisy neighbor issuesHighest cost, complex, may waste resources for small tenants
Hybrid and Tiered StrategiesFlexible, fits many customer typesComplex to manage, needs careful oversight

You can also use the Deployment Stamps pattern. This pattern creates a dedicated set of resources for each tenant or group of tenants. It gives you high isolation and supports continued scalability. You can automate deployment using tools like Bicep files or Terraform templates. This approach works well if you need strong isolation for some tenants and shared resources for others.

When you design your multi-tenant architecture in Power Pages, think about your goals for scalability, security, and cost. Power Pages, Dataverse, and Elastic Tables give you the tools to build a saas platform that grows with your business and keeps your tenants’ data safe.

Scalability with Elastic Tables

Scalability with Elastic Tables

Cosmos DB Integration

You want your saas platform to grow without limits. Cosmos DB integration gives you the power to scale your multi-tenant applications on Power Pages. Cosmos DB supports horizontal scalability, which means you can add more resources as your number of tenants increases. You do not need to worry about manual scaling or downtime. Cosmos DB handles automatic partitioning, so your data and throughput scale smoothly. You get low latency access, which keeps your application responsive for every tenant. High availability ensures your saas stays online, even across multiple cloud regions.

FeatureDescription
Horizontal ScalabilityEnables easy and cost-effective scaling of resources as demand increases.
Automatic PartitioningAllows for scaling data and throughput without manual intervention, ensuring continuous availability.
Low Latency AccessProvides ultra-low latency access, crucial for responsive applications.
High AvailabilityGuarantees up to 99.999% availability across multiple regions, essential for global SaaS applications.

Tip: Cosmos DB integration lets you serve tenants worldwide with reliable performance and minimal downtime.

Partition-Aware Storage

Partition-aware storage is the secret to predictable performance in multi-tenant saas solutions. You use partitions to keep each tenant’s data separate. This prevents the noisy-neighbor effect, where one tenant’s heavy workload slows down others. You can choose different models for partitioning. The pool model uses a shared database with a tenant_id column. You set query limits and resource groups to control performance. The silo model gives each tenant a dedicated database instance. This provides maximum isolation and helps avoid noisy neighbor issues.

Model TypeDescriptionNoisy Neighbor Mitigation
Pool ModelShared database with a tenant_id column.Requires query limits and database-level resource groups.
Silo ModelDedicated database instance for each tenant.Provides maximum isolation, but can still face hardware-level issues.

You can follow best practices to configure partition-aware storage:

  • Implement tenant-based resource isolation. This keeps performance steady for all tenants.
  • Automate tenant lifecycle management. You streamline provisioning, configuration, monitoring, and deprovisioning as your tenant base grows.
  • Regularly test and monitor your application. You find and fix bottlenecks before they affect performance.
  • Incorporate observability and governance features. Per-tenant monitoring and centralized logging help you maintain control and compliance.

Note: Partition-aware storage gives you the tools to balance scalability, security, and performance in your cloud saas platform.

Handling High-Volume Workloads

Your saas application must handle high-volume workloads as tenants grow. You use auto-scaling to adjust resources based on demand. Services with auto-scaling infrastructure make management easier. You set limits to avoid excessive costs and ensure alerts notify you when you reach maximum capacity. You understand service scaling boundaries, so you make informed decisions about scaling. You use meaningful load metrics to trigger scaling actions. You plan for unexpected demand spikes by maintaining unused capacity as a buffer. You prevent flapping by setting appropriate thresholds, which avoids continuous scaling actions that can destabilize performance. You use deployment stamps to scale workloads efficiently.

  • Use services with auto-scaling. Resources adjust automatically as demand changes.
  • Constrain auto-scaling. Set limits and alerts to control costs.
  • Understand service scaling boundaries. Know the limits and increments for each cloud service.
  • Use meaningful load metrics. Trigger scaling actions with accurate data.
  • Use a buffer. Keep extra capacity for sudden spikes in workload.
  • Prevent flapping. Set thresholds to avoid unstable scaling.
  • Use deployment stamps. Scale workloads with patterns that support growth.

Callout: Auto-scaling and load balancing help you maintain high performance for every tenant, even during peak usage.

Elastic Tables and Cosmos DB give you the foundation for scalable, high-performance multi-tenant saas applications. You can grow your tenant base, handle large workloads, and deliver fast, reliable experiences in the cloud.

Security and Tenant Isolation

You must make security your top priority when building scalable saas solutions. Power Pages gives you the tools to protect tenants and their data. You can meet compliance standards like SOC 2 and GDPR. You can also create strong tenant isolation, which keeps each tenant’s information safe in a multi-tenant environment. Let’s look at the main strategies for breaking the scale barrier while keeping security strong.

Authentication Strategies

You need robust authentication to control access for all tenants. Power Pages supports several methods to help you build tenant-aware security.

  • Use enterprise connections for authentication. This lets your customers sign in with their own federated Identity Provider (IdP) and enjoy Single Sign-On (SSO).
  • Set up tenant-specific authentication and authorization systems. This ensures privacy and helps you meet legal requirements.
  • Build virtual walls between tenants. These walls keep each tenant’s space secure and separate from others.

Azure AD B2C

Azure AD B2C works well for multi-tenant saas platforms. You can let tenants use their own credentials or connect with social accounts. This approach gives you flexibility and strong security. You can manage user flows, policies, and branding for each tenant. Azure AD B2C also supports SSO, which makes the login process smooth for users.

Custom Identity Providers

You may want to use custom identity providers for some tenants. Power Pages lets you integrate with many IdPs. You can connect to SAML, OpenID Connect, or OAuth2 providers. This flexibility helps you support unique authentication needs. You can also enforce tenant-specific security policies.

Authorization and Role Management

Authorization controls what users can do after authentication. You can set up roles and permissions for each tenant. Power Pages lets you define roles at the tenant level. You can give users access to only the data and features they need. This approach supports strong tenant isolation and helps you meet compliance goals. You can update roles as tenants grow or change.

Data Isolation Techniques

You must keep tenant data separate to maintain security. Power Pages uses Dataverse business units to help you do this. Business units create a security hierarchy. You can grant privileges at the business unit level. Each user belongs to one business unit. You can control access to include or exclude data from child business units. This structure supports a strong tenant isolation model and helps you enforce tenant-aware security.

Partition-aware routing adds another layer of protection. It directs queries to the right data partitions for each tenant. This keeps data isolated and improves performance. You can trust that tenants only access their own data, even as you scale your cloud infrastructure.

Tip: Use business units and partition-aware routing together to break the scale barrier and build scalable saas applications with strong security.

You can rely on Power Pages to deliver security, tenant isolation, and compliance. You can focus on growth, knowing your tenants stay protected.

Breaking the Scale Barrier: Best Practices

Scale-First Design

You should always start your saas journey with scale in mind. A scale-first approach helps you avoid costly redesigns later. You can use microservices to break your application into smaller parts. Each part can scale on its own. This modular design gives you flexibility and makes updates easier. Load balancing spreads traffic across servers. This keeps your system responsive and prevents overload. Automated scaling adjusts resources as demand changes. You do not need to worry about sudden spikes in usage. Multi-cloud strategies let you use more than one cloud provider. This increases reliability and gives you more options for embedded analytics and visualization. You can also focus on data security and compliance. Use encryption and access controls to protect your data. Regular audits help you meet regulations. These best practices set a strong foundation for your multi-tenant saas platform.

  • Use microservices for modular design and flexibility.
  • Apply load balancing for steady performance.
  • Enable automated scaling for cost efficiency.
  • Adopt multi-cloud strategies for reliability.
  • Prioritize data security and compliance.

Tip: A scale-first design supports embedded analytics, flexible embedding, and self-service analytics as your user base grows.

Automated Tenant Management

You can save time and reduce errors with automated tenant management. Power Pages lets you automate key steps for each new tenant. This process improves operational efficiency and supports best practices for embedded analytics and visualization. Here is a simple workflow you can follow:

StepDescription
1Create a new Business Unit for the tenant using the Dataverse Web API or Power Automate.
2Create a Team for the tenant’s Business Unit.
3Assign your application’s Security Roles to the Team.
4Add the customer’s users to the Team with bulk import or a self-service interface.
5Create seed data, such as default configuration records.
6Verify that licenses have been assigned to users if using ISV App License Management.

You can use self-service tools to let tenants manage their own users and settings. This increases flexibility and reduces your workload. Automated tenant management also supports embedded analytics, self-service analytics, and flexible embedding. You can deliver a seamless analytics experience for every tenant.

Continuous Monitoring

You need to monitor your saas platform to keep it healthy. Continuous monitoring helps you spot problems before they affect users. You can use synthetic monitoring to test performance by simulating user actions. Real user monitoring tracks actual user interactions. API monitoring checks your endpoints for performance issues. Infrastructure monitoring keeps an eye on your servers and databases. These best practices help you maintain high performance and reliability for embedded analytics and visualization.

Monitoring CategoryDescription
Synthetic monitoringProactively tests application performance by simulating user actions.
Real user monitoringTracks actual user interactions for real-time system health insights.
API monitoringMonitors API endpoints to identify performance issues affecting user experience.
Infrastructure monitoringTracks metrics of the underlying infrastructure for optimal performance.

You can use tools like Dotcom-Monitor to track query response times and database uptime. UserView Monitoring analyzes page load speed and transaction success rates. LoadView Load Testing validates performance under real-world traffic. These tools support performance tuning, embedded analytics, and self-service analytics. You can deliver a reliable analytics experience and flexible embedding for all tenants.

Note: Continuous monitoring and performance tuning are essential best practices for delivering high-quality embedded analytics and data visualization in your saas platform.

Cost Efficiency

You want your multi-tenant platform to deliver value while keeping expenses under control. Cost efficiency starts with how you design and manage your resources. Power Pages helps you achieve this by letting you share infrastructure across tenants. This approach reduces both upfront and ongoing costs. You avoid the high expense of setting up separate environments for each customer. Instead, you use one system that serves many, making your investment go further.

To keep costs low, you should size your compute and storage carefully. Avoid paying for unused capacity by matching resources to actual demand. Auto-scaling is a powerful tool here. It lets your system add or remove resources automatically as usage changes. You only pay for what you need, which keeps idle costs down. Set sensible limits so that no single tenant can use more than their fair share. This protects your budget and ensures every tenant gets reliable service.

Tip: Regularly review your cloud spending. Look for unused resources or services you no longer need. Removing these can save you money each month.

Tracking tenant usage is another key to cost efficiency. When you know how much each tenant uses, you can allocate costs fairly. This is especially important if usage varies a lot between tenants. Transparent billing builds trust. Show tenants clear reports that break down their charges. When tenants understand their bills, they are less likely to be surprised and more likely to stay with your service.

Here are some practical steps to optimize cost efficiency in your saas platform:

  • Use shared infrastructure to lower costs.
  • Right-size your compute and storage.
  • Enable auto-scaling to match resources to demand.
  • Monitor tenant usage and allocate costs fairly.
  • Set usage limits to prevent overuse.
  • Keep your databases and storage clean and efficient.
  • Review your spending regularly to find and remove waste.
StrategyBenefit
Shared infrastructureReduces setup and maintenance costs
Auto-scalingCuts idle costs, matches demand
Usage trackingEnables fair billing and cost allocation
Regular cost reviewsIdentifies and removes waste

You can build a saas solution that grows with your business and keeps costs predictable. By following these best practices, you make sure your platform remains both scalable and affordable.

Real-World SaaS Examples

Success Stories with Power Pages

You can find many organizations that have built successful multi-tenant platforms using Power Pages. For example, a global logistics company wanted to serve hundreds of business clients through a single portal. They used Power Pages to create a secure, scalable solution. Each client accessed their own data and tools, while the company managed everything from one place. The team used Elastic Tables to handle large amounts of shipment data. As a result, they saw faster data entry and reporting, even as their customer base grew.

Another example comes from the healthcare industry. A provider needed to support many clinics with different needs. They chose Power Pages because it allowed them to customize features for each clinic without building separate systems. The platform’s tenant isolation features kept patient data secure. The provider also used partition-aware storage to prevent performance issues when clinics uploaded large files at the same time. This approach helped them meet strict privacy rules and scale their service to new locations.

Note: These stories show that you can use Power Pages to build flexible, secure, and high-performing saas platforms for many industries.

Lessons Learned

When you build a multi-tenant solution, you will face some common challenges. You can learn from others who have already solved these problems. The table below shows three frequent issues and how teams addressed them:

ChallengeProblemSolution
Data Mapping ComplexityD365 F&SCM has complex data relationshipsStart with essential entities only, expand gradually
Performance ConcernsWorried about ERP performance impactDual Write handles load balancing automatically
Customization RequirementsUnique business processes need custom featuresPower Pages supports custom development alongside low-code

You may find data mapping difficult, especially if you work with systems like Dynamics 365 Finance & Supply Chain Management. Start with the most important data first. Add more as you learn what your tenants need. Performance can also be a concern. You do not want one tenant’s heavy use to slow down others. Power Pages uses features like Dual Write to balance the load. This keeps your platform running smoothly. Customization is another challenge. Each tenant may want different features. Power Pages lets you add custom code or use low-code tools, so you can meet unique needs without rebuilding your whole system.

Tip: Focus on simple solutions first. Expand your features as your saas platform grows. This approach helps you avoid common pitfalls and deliver value to your tenants quickly.


You can build a secure, scalable multi-tenant saas platform on Power Pages by following a few key strategies:

  • Design for multi-tenancy from the start to support future growth.
  • Optimize performance based on real usage, not guesswork.
  • Monitor resource use to control costs as you scale.
  • Use strong data isolation and access controls to protect tenant data.
  • Support both horizontal and vertical scaling for resilience.

Elastic Tables and Cosmos DB give you the tools to handle growth and keep performance high. As Power Pages evolves, you will see even more ways to deliver modern SaaS solutions.

FAQ

How does Power Pages ensure tenant data isolation?

Power Pages uses Dataverse business units and partition-aware storage. You assign each tenant to a business unit. Partition-aware routing keeps data separate. This approach protects tenant data and supports compliance.

Can I customize features for each tenant?

Yes, you can use JSON-based schema flexibility. You add tenant-specific fields or configurations without changing the core schema. This lets you deliver custom experiences for each tenant.

What authentication options do I have for tenants?

You can use Azure AD B2C for enterprise authentication. You also connect to custom identity providers using SAML, OpenID Connect, or OAuth2. This gives you flexibility for different tenant needs.

How do Elastic Tables help with scaling?

Elastic Tables use Azure Cosmos DB for horizontal scaling. You store data in logical partitions. This setup handles high-volume workloads and keeps performance steady as your tenant base grows.

Is Power Pages compliant with industry standards?

Power Pages supports compliance with SOC 2 and GDPR. You use built-in security features and audit logs to meet regulatory requirements.

How can I monitor tenant activity and system health?

You set up continuous monitoring with tools like synthetic monitoring, real user monitoring, and API checks. These tools help you track performance, spot issues, and keep your SaaS platform reliable.

What strategies help control costs in a multi-tenant setup?

  • Use shared infrastructure to lower expenses.
  • Enable auto-scaling to match resources to demand.
  • Track tenant usage for fair billing.
  • Review spending regularly to remove waste.

🚀 Want to be part of m365.fm?

Then stop just listening… and start showing up.

👉 Connect with me on LinkedIn and let’s make something happen:

  • 🎙️ Be a podcast guest and share your story
  • 🎧 Host your own episode (yes, seriously)
  • 💡 Pitch topics the community actually wants to hear
  • 🌍 Build your personal brand in the Microsoft 365 space

This isn’t just a podcast — it’s a platform for people who take action.

🔥 Most people wait. The best ones don’t.

👉 Connect with me on LinkedIn and send me a message:
"I want in"

Let’s build something awesome 👊

1
00:00:00,000 --> 00:00:02,060
Most developers who've worked with Powerpages

2
00:00:02,060 --> 00:00:03,540
have the same mental model.

3
00:00:03,540 --> 00:00:04,460
It's a portal builder.

4
00:00:04,460 --> 00:00:06,600
You drop in some forms, wire up a few lists,

5
00:00:06,600 --> 00:00:08,660
configure authentication and ship.

6
00:00:08,660 --> 00:00:09,380
It's useful.

7
00:00:09,380 --> 00:00:11,740
It's fast to build, and it handles the basics well.

8
00:00:11,740 --> 00:00:14,180
But enterprise-grade sales at real scale?

9
00:00:14,180 --> 00:00:16,260
Multitenant workloads across hundreds of customers,

10
00:00:16,260 --> 00:00:18,580
millions of records, audit-ready data isolation.

11
00:00:18,580 --> 00:00:21,300
That's where you reach for custom as your infrastructure.

12
00:00:21,300 --> 00:00:22,460
That's the assumption.

13
00:00:22,460 --> 00:00:24,500
A small group of architects is quietly proving

14
00:00:24,500 --> 00:00:25,660
that assumption wrong.

15
00:00:25,660 --> 00:00:27,500
They're building global multitenant sales

16
00:00:27,500 --> 00:00:29,940
directly on Powerpages, not as a prototype,

17
00:00:29,940 --> 00:00:32,100
not as a proof of concept, but in production,

18
00:00:32,100 --> 00:00:35,060
with paying customers and enterprise security reviews.

19
00:00:35,060 --> 00:00:38,180
The engine making this possible isn't a workaround or a hack.

20
00:00:38,180 --> 00:00:40,420
It's a table type that's been sitting inside Dataverse

21
00:00:40,420 --> 00:00:43,560
for the past two years, backed by Azure Cosmos DB,

22
00:00:43,560 --> 00:00:45,880
designed specifically for the kind of scale

23
00:00:45,880 --> 00:00:48,100
that breaks everything else in the platform.

24
00:00:48,100 --> 00:00:50,180
Over the next 100 minutes, you'll understand exactly

25
00:00:50,180 --> 00:00:52,500
how this architecture works, where it breaks,

26
00:00:52,500 --> 00:00:53,860
and how to build it right.

27
00:00:53,860 --> 00:00:55,780
If that's the kind of deep technical strategy

28
00:00:55,780 --> 00:00:59,860
you want more of, subscribe to the M365FM podcast.

29
00:00:59,860 --> 00:01:01,900
This is what we do every episode.

30
00:01:01,900 --> 00:01:04,460
Why standard Dataverse tables break at scale?

31
00:01:04,460 --> 00:01:05,700
Before we get to the solution,

32
00:01:05,700 --> 00:01:07,660
we need to be precise about the failure mode.

33
00:01:07,660 --> 00:01:09,540
Because standard Dataverse tables don't scale

34
00:01:09,540 --> 00:01:11,100
is too vague to be useful.

35
00:01:11,100 --> 00:01:12,940
They scale fine for a lot of things.

36
00:01:12,940 --> 00:01:14,420
The problem is specific.

37
00:01:14,420 --> 00:01:17,260
Standard Dataverse tables are backed by Azure SQL.

38
00:01:17,260 --> 00:01:18,620
That's not a weakness.

39
00:01:18,620 --> 00:01:19,900
It's a deliberate design choice

40
00:01:19,900 --> 00:01:22,140
for the workload Dataverse was built around.

41
00:01:22,140 --> 00:01:24,740
Transactional relational CRM-style data.

42
00:01:24,740 --> 00:01:27,100
Accounts, contacts, opportunities, cases,

43
00:01:27,100 --> 00:01:28,820
moderate volumes, complex relationships,

44
00:01:28,820 --> 00:01:30,900
rich business logic, model driven UI,

45
00:01:30,900 --> 00:01:32,700
Azure SQL handles that extremely well.

46
00:01:32,700 --> 00:01:34,500
The ceiling becomes visible when you push it

47
00:01:34,500 --> 00:01:35,820
towards something different.

48
00:01:35,820 --> 00:01:37,260
Start ingesting telemetry.

49
00:01:37,260 --> 00:01:39,940
Start storing activity logs per user per session.

50
00:01:39,940 --> 00:01:42,100
Start building an audit trail that writes a record

51
00:01:42,100 --> 00:01:44,820
for every meaningful action across hundreds of tenants.

52
00:01:44,820 --> 00:01:46,620
The retrieval operations slow down.

53
00:01:46,620 --> 00:01:48,900
Bulk rights, creating or updating thousands

54
00:01:48,900 --> 00:01:50,580
of records in a single operation.

55
00:01:50,580 --> 00:01:53,060
Don't distribute across any horizontal infrastructure

56
00:01:53,060 --> 00:01:54,940
because there is no horizontal infrastructure.

57
00:01:54,940 --> 00:01:57,660
It's a single SQL instance scaling up, not out.

58
00:01:57,660 --> 00:01:59,660
Now add multi-tenancy.

59
00:01:59,660 --> 00:02:02,340
Hundreds of tenants sharing the same relational store

60
00:02:02,340 --> 00:02:05,580
creates what infrastructure teams call noisy neighbor effects.

61
00:02:05,580 --> 00:02:07,340
One tenant runs a heavy report.

62
00:02:07,340 --> 00:02:09,180
Another tenants uses start experiencing

63
00:02:09,180 --> 00:02:10,660
degraded response times.

64
00:02:10,660 --> 00:02:13,460
The workloads compete for the same pool of SQL resources

65
00:02:13,460 --> 00:02:15,220
and there's no partition aware mechanism

66
00:02:15,220 --> 00:02:17,300
to isolate them at the storage layer.

67
00:02:17,300 --> 00:02:19,300
You can implement row-level security

68
00:02:19,300 --> 00:02:20,900
at the Dataverse layer.

69
00:02:20,900 --> 00:02:22,700
And you should, but that doesn't change

70
00:02:22,700 --> 00:02:24,900
the underlying contention at the database level.

71
00:02:24,900 --> 00:02:26,300
Here's where it gets expensive.

72
00:02:26,300 --> 00:02:28,860
Dataverse storage isn't priced like a Zua Blob storage

73
00:02:28,860 --> 00:02:31,380
or even like a conventional managed database.

74
00:02:31,380 --> 00:02:35,100
Database capacity costs approximately $40 per gigabyte per month,

75
00:02:35,100 --> 00:02:36,860
not per terabyte per gigabyte.

76
00:02:36,860 --> 00:02:38,780
If your SAS platform accumulates a terabyte

77
00:02:38,780 --> 00:02:40,740
of data in standard dataverse tables,

78
00:02:40,740 --> 00:02:41,900
which is not an aggressive number

79
00:02:41,900 --> 00:02:44,700
for a multi-tenant platform storing operational history,

80
00:02:44,700 --> 00:02:47,060
activity logs, and audit events,

81
00:02:47,060 --> 00:02:51,860
that's over $40,000 per month in add-on storage capacity alone.

82
00:02:51,860 --> 00:02:54,900
Before compute, before licenses, before anything else,

83
00:02:54,900 --> 00:02:57,100
that number tends to stop conversations quickly.

84
00:02:57,100 --> 00:03:00,020
And here's the structural problem underneath the cost problem.

85
00:03:00,020 --> 00:03:02,060
This isn't something you can tune your way out of.

86
00:03:02,060 --> 00:03:05,140
You can optimize queries, add indexes, archive old records,

87
00:03:05,140 --> 00:03:07,060
and implement data retention policies.

88
00:03:07,060 --> 00:03:08,540
All of that helps at the margins.

89
00:03:08,540 --> 00:03:10,980
But the fundamental mismatch is architectural.

90
00:03:10,980 --> 00:03:13,620
Azure SQL optimizes for a workload pattern,

91
00:03:13,620 --> 00:03:15,980
relational, transactional, moderate volume.

92
00:03:15,980 --> 00:03:17,100
That is genuinely different

93
00:03:17,100 --> 00:03:19,020
from what a high volume append heavy,

94
00:03:19,020 --> 00:03:21,300
multi-tenant SAS platform generates.

95
00:03:21,300 --> 00:03:22,860
The ingestion patterns are different.

96
00:03:22,860 --> 00:03:24,220
The query patterns are different.

97
00:03:24,220 --> 00:03:25,860
The distribution requirements are different.

98
00:03:25,860 --> 00:03:27,900
The storage ceiling, the performance degradation

99
00:03:27,900 --> 00:03:30,620
under bulk operations, the noisy neighbor contention

100
00:03:30,620 --> 00:03:31,900
between tenants.

101
00:03:31,900 --> 00:03:34,540
These aren't bugs in data versus SQL layer.

102
00:03:34,540 --> 00:03:36,580
They're correct behaviors of a relational engine

103
00:03:36,580 --> 00:03:38,700
being asked to do something it wasn't designed for.

104
00:03:38,700 --> 00:03:39,820
That's the diagnosis.

105
00:03:39,820 --> 00:03:41,700
Not data versus slow.

106
00:03:41,700 --> 00:03:43,740
More precisely, standard dataverse tables

107
00:03:43,740 --> 00:03:44,860
are the wrong storage engine

108
00:03:44,860 --> 00:03:47,980
for the high volume operational data layer of a multi-tenant SAS

109
00:03:47,980 --> 00:03:49,860
and choosing them for that workload creates cost

110
00:03:49,860 --> 00:03:52,460
and performance problems that compound over time.

111
00:03:52,460 --> 00:03:54,220
The right engine for that layer exists.

112
00:03:54,220 --> 00:03:55,580
It lives inside dataverse.

113
00:03:55,580 --> 00:03:58,900
And it works through a completely different storage model.

114
00:03:58,900 --> 00:04:00,620
What elastic tables actually are?

115
00:04:00,620 --> 00:04:02,540
So what are elastic tables exactly?

116
00:04:02,540 --> 00:04:04,140
The name suggests something vague.

117
00:04:04,140 --> 00:04:06,020
Elastic could mean anything.

118
00:04:06,020 --> 00:04:08,180
But the technical definition is precise.

119
00:04:08,180 --> 00:04:11,540
Elastic tables are a distinct table type inside dataverse.

120
00:04:11,540 --> 00:04:13,380
From the outside, they look almost identical

121
00:04:13,380 --> 00:04:16,060
to standard tables, same web API endpoints,

122
00:04:16,060 --> 00:04:18,900
same security model, same power pages integration,

123
00:04:18,900 --> 00:04:20,380
same OD data query syntax.

124
00:04:20,380 --> 00:04:22,540
You create them in the Power Apps Maker portal.

125
00:04:22,540 --> 00:04:24,700
You add columns, you assign security roles,

126
00:04:24,700 --> 00:04:27,460
you expose them through PowerPages table permissions.

127
00:04:27,460 --> 00:04:29,820
The developer experience is deliberately familiar.

128
00:04:29,820 --> 00:04:32,100
What's completely different is what happens underneath.

129
00:04:32,100 --> 00:04:34,060
Instead of Azure Skull, the storage backend is

130
00:04:34,060 --> 00:04:35,660
Azure Cosmos DB.

131
00:04:35,660 --> 00:04:37,380
That single change in the storage engine

132
00:04:37,380 --> 00:04:39,780
produces a fundamentally different performance profile.

133
00:04:39,780 --> 00:04:42,460
Not because Cosmos DB is universally better than SQL,

134
00:04:42,460 --> 00:04:45,020
but because it was designed for a different workload.

135
00:04:45,020 --> 00:04:47,420
Cosmos DB is horizontally scalable, partitioned,

136
00:04:47,420 --> 00:04:48,860
and document oriented.

137
00:04:48,860 --> 00:04:51,300
Data isn't concentrated in a single database instance

138
00:04:51,300 --> 00:04:52,180
that scales up.

139
00:04:52,180 --> 00:04:53,940
It distributes across partitions.

140
00:04:53,940 --> 00:04:55,580
Each partition independently handling

141
00:04:55,580 --> 00:04:57,020
its slice of the total load.

142
00:04:57,020 --> 00:04:59,260
When traffic increases, the system scales out

143
00:04:59,260 --> 00:05:02,300
by adding partitions, not by adding capacity to a single node.

144
00:05:02,300 --> 00:05:04,460
Microsoft introduced elastic tables,

145
00:05:04,460 --> 00:05:06,180
specifically for the workload patterns

146
00:05:06,180 --> 00:05:07,860
that break the SQL back model.

147
00:05:07,860 --> 00:05:10,660
Telemetry ingestion, application logs, event streams,

148
00:05:10,660 --> 00:05:13,820
audit records, time series data, activity feeds.

149
00:05:13,820 --> 00:05:15,740
The defining characteristic of these workloads

150
00:05:15,740 --> 00:05:17,980
is that they're mostly writes, mostly append only,

151
00:05:17,980 --> 00:05:19,100
and they grow continuously.

152
00:05:19,100 --> 00:05:20,460
They don't look like CRM data.

153
00:05:20,460 --> 00:05:22,020
They look like operational exhaust.

154
00:05:22,020 --> 00:05:23,660
The continuous output of a running system

155
00:05:23,660 --> 00:05:25,420
rather than the carefully maintained records

156
00:05:25,420 --> 00:05:27,180
of a business process.

157
00:05:27,180 --> 00:05:30,100
The partition ID column is the mechanism that makes this work.

158
00:05:30,100 --> 00:05:33,100
Every elastic table record has a partition ID field,

159
00:05:33,100 --> 00:05:35,460
and that value determines which logical partition

160
00:05:35,460 --> 00:05:37,380
in Cosmos DB holds the record.

161
00:05:37,380 --> 00:05:39,300
Records with the same partition ID land

162
00:05:39,300 --> 00:05:40,700
in the same logical partition.

163
00:05:40,700 --> 00:05:42,420
Records with different partition IDs

164
00:05:42,420 --> 00:05:44,300
distribute across different partitions.

165
00:05:44,300 --> 00:05:47,340
This is how Cosmos DB achieves horizontal distribution,

166
00:05:47,340 --> 00:05:49,260
not through automatic load balancing,

167
00:05:49,260 --> 00:05:51,460
but through deliberate assignment at right time.

168
00:05:51,460 --> 00:05:54,100
The partition key controls the distribution model,

169
00:05:54,100 --> 00:05:55,420
which is why choosing it correctly

170
00:05:55,420 --> 00:05:58,460
is the most important decision in any elastic table design.

171
00:05:58,460 --> 00:06:00,260
We'll come back to that in detail.

172
00:06:00,260 --> 00:06:01,900
Two more characteristics are worth understanding

173
00:06:01,900 --> 00:06:02,980
before we move on.

174
00:06:02,980 --> 00:06:06,620
First, elastic tables include built-in TTL support, time to live.

175
00:06:06,620 --> 00:06:09,980
You can set a TTL value on any record, expressed in seconds,

176
00:06:09,980 --> 00:06:12,420
and Cosmos DB will automatically delete that record

177
00:06:12,420 --> 00:06:13,940
when the TTL expires.

178
00:06:13,940 --> 00:06:16,060
For operational data that only needs to be retained

179
00:06:16,060 --> 00:06:19,460
for 30, 60, or 90 days, audit logs, session events,

180
00:06:19,460 --> 00:06:21,580
telemetry samples, this means you never

181
00:06:21,580 --> 00:06:23,420
need a manual cleaner process.

182
00:06:23,420 --> 00:06:25,700
The storage footprint stays bounded automatically,

183
00:06:25,700 --> 00:06:28,180
which directly controls your storage costs over time.

184
00:06:28,180 --> 00:06:30,460
Second, and this matters for implementation.

185
00:06:30,460 --> 00:06:33,140
The dataverse web API surface is the same.

186
00:06:33,140 --> 00:06:34,580
You're not learning a new SDK.

187
00:06:34,580 --> 00:06:36,500
You're not configuring a separate Azure resource

188
00:06:36,500 --> 00:06:38,580
or managing Cosmos DB connection strings.

189
00:06:38,580 --> 00:06:41,340
The elastic table sits inside your dataverse environment,

190
00:06:41,340 --> 00:06:43,500
responds to the same rest API calls,

191
00:06:43,500 --> 00:06:46,300
and participates in the same solution, packaging, and deployment model

192
00:06:46,300 --> 00:06:47,980
as any other dataverse table.

193
00:06:47,980 --> 00:06:49,340
The abstraction is clean.

194
00:06:49,340 --> 00:06:50,740
The complexity is contained underneath,

195
00:06:50,740 --> 00:06:52,060
managed by the platform.

196
00:06:52,060 --> 00:06:53,580
What this means practically is that you get

197
00:06:53,580 --> 00:06:55,260
the developer experience of dataverse

198
00:06:55,260 --> 00:06:57,540
with the storage characteristics of Cosmos DB.

199
00:06:57,540 --> 00:07:00,500
That combination is what makes the architecture worth understanding,

200
00:07:00,500 --> 00:07:03,340
because the performance difference between these two storage engines

201
00:07:03,340 --> 00:07:05,980
under the right workload patterns isn't incremental.

202
00:07:05,980 --> 00:07:09,620
It's the difference between a system that works and one that doesn't.

203
00:07:09,620 --> 00:07:12,940
The performance gap, elastic versus standard tables.

204
00:07:12,940 --> 00:07:15,380
Knowing what elastic tables are built on is one thing.

205
00:07:15,380 --> 00:07:17,460
Knowing how much faster they actually run,

206
00:07:17,460 --> 00:07:19,980
and under what conditions, is what determines

207
00:07:19,980 --> 00:07:21,740
whether they belong in your architecture.

208
00:07:21,740 --> 00:07:24,500
The benchmark data here is clear enough to build decisions on.

209
00:07:24,500 --> 00:07:26,700
Community testing, including detailed work

210
00:07:26,700 --> 00:07:28,380
published by Stefano Demeliani,

211
00:07:28,380 --> 00:07:30,700
shows that bulk, create, update, and delete

212
00:07:30,700 --> 00:07:34,260
operations on elastic tables run between two and 10 times faster

213
00:07:34,260 --> 00:07:36,780
than equivalent operations on standard tables.

214
00:07:36,780 --> 00:07:40,220
Demeliani's own measurements landed at eight to nine times faster

215
00:07:40,220 --> 00:07:42,700
when using CreateMultiple and UpdateMultiple.

216
00:07:42,700 --> 00:07:45,860
The dataverse bulk operation APIs that group multiple records

217
00:07:45,860 --> 00:07:47,180
into a single request.

218
00:07:47,180 --> 00:07:48,540
That's not a marginal improvement.

219
00:07:48,540 --> 00:07:51,340
At that magnitude, the difference determines whether a batch ingestion

220
00:07:51,340 --> 00:07:53,100
job finishes in minutes or hours.

221
00:07:53,100 --> 00:07:56,420
The mechanism behind those numbers follows directly from the architecture.

222
00:07:56,420 --> 00:07:59,500
When you issue a CreateMultiple call against an elastic table,

223
00:07:59,500 --> 00:08:01,740
Cosmos DB distributes the right operations

224
00:08:01,740 --> 00:08:04,500
across multiple physical partitions simultaneously.

225
00:08:04,500 --> 00:08:07,580
Each partition handles its slice of the batch independently.

226
00:08:07,580 --> 00:08:10,900
There's no single SQL instance processing the inserts sequentially.

227
00:08:10,900 --> 00:08:13,620
The horizontal distribution is the performance advantage.

228
00:08:13,620 --> 00:08:16,340
The same design principle that makes cloud native databases

229
00:08:16,340 --> 00:08:19,220
faster in gestion is what elastic tables inherit

230
00:08:19,220 --> 00:08:21,420
when Cosmos DB handles the storage layer.

231
00:08:21,420 --> 00:08:25,100
But here's what you need to know before you start migrating everything.

232
00:08:25,100 --> 00:08:28,940
The tradeoff is real and ignoring it leads to bad architecture decisions.

233
00:08:28,940 --> 00:08:32,540
Single record crud operations are where elastic tables lose the advantage,

234
00:08:32,540 --> 00:08:35,500
creating one record at a time, updating one record at a time,

235
00:08:35,500 --> 00:08:38,540
deleting one record at a time, in measured tests,

236
00:08:38,540 --> 00:08:43,260
elastic tables perform the same as standard tables or slightly slower for these operations.

237
00:08:43,260 --> 00:08:47,260
The overhead of rooting a single operation through the Cosmos DB partition model

238
00:08:47,260 --> 00:08:51,100
doesn't disappear just because the storage engine is more capable at scale.

239
00:08:51,100 --> 00:08:54,260
For row by row crud, you're not unlocking the distributed architecture.

240
00:08:54,260 --> 00:08:56,180
You're just adding a layer of indirection.

241
00:08:56,180 --> 00:08:58,820
Retrieval is where the gap flips more significantly.

242
00:08:58,820 --> 00:09:01,660
Standard O-data queries, the kind that power list views,

243
00:09:01,660 --> 00:09:04,540
filtered grids and typical user-facing data displays,

244
00:09:04,540 --> 00:09:08,380
run two to three times slower on elastic tables than on standard tables.

245
00:09:08,380 --> 00:09:10,940
This is also a direct consequence of the storage model.

246
00:09:10,940 --> 00:09:15,980
Cosmos DB's document-oriented partitioning is optimized for writing and for partition aware reads.

247
00:09:15,980 --> 00:09:18,820
When a query doesn't include the partition key in its filter,

248
00:09:18,820 --> 00:09:22,540
Cosmos DB has to scan across physical partitions to assemble the result.

249
00:09:22,540 --> 00:09:25,700
That fan out is expensive and it shows up in latency.

250
00:09:25,700 --> 00:09:28,500
Standard SQL tables, with their traditional index structures,

251
00:09:28,500 --> 00:09:32,540
handle these ad hoc relational queries faster at typical business data volumes.

252
00:09:32,540 --> 00:09:35,500
So the performance picture, stated plainly, looks like this.

253
00:09:35,500 --> 00:09:38,180
Elastic tables dominate at high frequency rides.

254
00:09:38,180 --> 00:09:41,900
Large batch ingestion, telemetry pipelines, audit trail accumulation,

255
00:09:41,900 --> 00:09:43,380
and activity feed updates.

256
00:09:43,380 --> 00:09:46,620
Standard tables hold the advantage for complex relational queries,

257
00:09:46,620 --> 00:09:49,140
transactional integrity across multiple entities,

258
00:09:49,140 --> 00:09:54,060
and the kind of interactive crud that model-driven apps and portal forms generate.

259
00:09:54,060 --> 00:09:57,740
This next part is the architectural insight that most implementations miss.

260
00:09:57,740 --> 00:10:02,620
The answer to elastic tables or standard tables is almost never one or the other.

261
00:10:02,620 --> 00:10:06,100
Most SAS platforms have both workload types running simultaneously.

262
00:10:06,100 --> 00:10:09,140
There's a core business entity layer, the accounts, subscriptions,

263
00:10:09,140 --> 00:10:11,660
configurations, user profiles, product catalog,

264
00:10:11,660 --> 00:10:16,700
that is relational, transactional, and read heavy in ways that benefit from SQL semantics.

265
00:10:16,700 --> 00:10:19,380
And surrounding that core, there's an operational data layer,

266
00:10:19,380 --> 00:10:23,300
the logs, the events, the activity feeds, the audit records, the telemetry,

267
00:10:23,300 --> 00:10:26,740
that is a pen-heavy, high volume, and doesn't need relational joins.

268
00:10:26,740 --> 00:10:29,220
Those two layers have genuinely different storage requirements.

269
00:10:29,220 --> 00:10:32,540
The architecture that works uses each engine for what it's actually good at.

270
00:10:32,540 --> 00:10:35,340
Standard dataverse tables for the core entity layer.

271
00:10:35,340 --> 00:10:37,980
Elastic tables for the operational data streams.

272
00:10:37,980 --> 00:10:40,660
The two types coexist in the same dataverse environment,

273
00:10:40,660 --> 00:10:44,180
share the same security model, and are both accessible through the same API surface.

274
00:10:44,180 --> 00:10:46,260
You're not managing two separate data platforms.

275
00:10:46,260 --> 00:10:49,900
You're making deliberate choices about which table type serves, which workload,

276
00:10:49,900 --> 00:10:53,620
getting that decision right is the prerequisite for everything that follows.

277
00:10:53,620 --> 00:10:56,860
And the single most consequential design choice in the elastic table layer,

278
00:10:56,860 --> 00:11:01,580
the one that determines whether the whole system actually scales is the partition key.

279
00:11:01,580 --> 00:11:04,780
Partition key design, the decision you can't undo.

280
00:11:04,780 --> 00:11:08,980
The partition key is where most elastic table implementations either succeed or fail.

281
00:11:08,980 --> 00:11:11,020
Not because it's technically complicated to set,

282
00:11:11,020 --> 00:11:14,820
you specify it when you create the table, it populates a field called partitioned E.D.

283
00:11:14,820 --> 00:11:17,940
on every record, and the Cosmos DB layer handles the rest.

284
00:11:17,940 --> 00:11:19,620
The problem is that it's irreversible.

285
00:11:19,620 --> 00:11:23,220
Once a Cosmos DB container is created with a specific partition key,

286
00:11:23,220 --> 00:11:24,940
that key cannot be changed.

287
00:11:24,940 --> 00:11:28,660
Ever. The only way to change a partition key after the fact is to create a new container

288
00:11:28,660 --> 00:11:32,340
with the correct key and migrate every record from the old one into it.

289
00:11:32,340 --> 00:11:36,340
At millions of rows across a live production system, that migration is expensive,

290
00:11:36,340 --> 00:11:40,300
disruptive and entirely avoidable if the decision is made correctly upfront.

291
00:11:40,300 --> 00:11:42,420
This is not a minor operational footnote.

292
00:11:42,420 --> 00:11:46,300
It's the most consequential architectural decision in any elastic table design,

293
00:11:46,300 --> 00:11:50,260
and it needs to be treated as such, not as a schema detail you revisit later,

294
00:11:50,260 --> 00:11:53,300
but as a one-way door that you walk through deliberately.

295
00:11:53,300 --> 00:11:55,020
So what makes a good partition key?

296
00:11:55,020 --> 00:11:58,260
Three requirements, and all three need to be satisfied simultaneously.

297
00:11:58,260 --> 00:11:59,580
First, high-cut inality.

298
00:11:59,580 --> 00:12:01,860
The partition key needs many distinct values,

299
00:12:01,860 --> 00:12:05,340
enough to spread your data across a meaningful number of logical partitions.

300
00:12:05,340 --> 00:12:07,260
A key with only a handful of possible values

301
00:12:07,260 --> 00:12:10,020
concentrates your data into a small number of partitions,

302
00:12:10,020 --> 00:12:13,860
which undermines the horizontal distribution that makes elastic tables fast.

303
00:12:13,860 --> 00:12:17,060
Think thousands of distinct values at minimum, not dozens.

304
00:12:17,060 --> 00:12:19,940
Second, even distribution of both storage and request units,

305
00:12:19,940 --> 00:12:24,180
it's not enough for a key to have high-cut inality if 90% of your data

306
00:12:24,180 --> 00:12:27,860
and 90% of your traffic lands on a small subset of those values.

307
00:12:27,860 --> 00:12:30,860
The partition key needs to produce roughly balanced partitions.

308
00:12:30,860 --> 00:12:34,940
Similar amounts of data per partition, similar RU consumption per partition.

309
00:12:34,940 --> 00:12:36,980
Inbalanced partitions produce hotspots,

310
00:12:36,980 --> 00:12:39,060
and a hot partition absorbs disproportionate load

311
00:12:39,060 --> 00:12:41,460
while the rest of the container sits under utilized.

312
00:12:41,460 --> 00:12:43,140
You pay for the full provision throughput,

313
00:12:43,140 --> 00:12:45,300
but only effectively use a fraction of it.

314
00:12:45,300 --> 00:12:47,860
Third, alignment with your most common query patterns.

315
00:12:47,860 --> 00:12:50,780
Every query that includes the partition key in its filter

316
00:12:50,780 --> 00:12:52,540
is a single partition query.

317
00:12:52,540 --> 00:12:54,860
Fast, cheap, and predictable.

318
00:12:54,860 --> 00:12:58,140
Every query that doesn't include it becomes a cross-partition scan,

319
00:12:58,140 --> 00:13:01,580
fanning out across multiple physical partitions and consuming RUs

320
00:13:01,580 --> 00:13:03,500
proportional to how many it touches.

321
00:13:03,500 --> 00:13:06,140
Your partition key should be the field that appears most naturally

322
00:13:06,140 --> 00:13:07,980
in your highest frequency queries.

323
00:13:07,980 --> 00:13:12,460
For a multi-tenant SAS, the instinctive answer to all three requirements is tenant ED.

324
00:13:12,460 --> 00:13:14,580
And that instinct is correct, of partially.

325
00:13:14,580 --> 00:13:17,460
Tenant ED has high-cut inality if you have many tenants.

326
00:13:17,460 --> 00:13:19,100
It aligns perfectly with query patterns

327
00:13:19,100 --> 00:13:21,900
because virtually every meaningful query in a multi-tenant system

328
00:13:21,900 --> 00:13:23,780
is scoped to a specific tenant.

329
00:13:23,780 --> 00:13:26,820
And for small and medium-sized tenants with predictable data volumes,

330
00:13:26,820 --> 00:13:28,740
it distributes storage reasonably well.

331
00:13:28,740 --> 00:13:31,900
The failure mode arrives when your tenant distribution isn't uniform.

332
00:13:31,900 --> 00:13:36,700
It almost never is real SAS platforms develop what the industry calls elephant tenants.

333
00:13:36,700 --> 00:13:39,780
A small number of customers who generate a wildly disproportionate share

334
00:13:39,780 --> 00:13:41,100
of the total activity.

335
00:13:41,100 --> 00:13:43,420
One enterprise customer with a thousand users

336
00:13:43,420 --> 00:13:47,460
generates more events per day than 50 small business customers combined.

337
00:13:47,460 --> 00:13:49,580
When tenant ID is the partition key,

338
00:13:49,580 --> 00:13:51,500
and that enterprise customer's data accumulates

339
00:13:51,500 --> 00:13:55,300
on a single partition key value, you end up with a hot partition.

340
00:13:55,300 --> 00:13:58,540
One logical partition absorbs the majority of your RU consumption.

341
00:13:58,540 --> 00:13:59,900
The other partition sit quiet.

342
00:13:59,900 --> 00:14:03,420
And because Cosmos DB enforces throughput limits per physical partition,

343
00:14:03,420 --> 00:14:07,140
that hot partition throttles returning 429 errors,

344
00:14:07,140 --> 00:14:10,380
while the rest of the container operates well within its capacity.

345
00:14:10,380 --> 00:14:13,020
There's also a hard storage ceiling that makes this worse.

346
00:14:13,020 --> 00:14:15,460
A single logical partition in Cosmos DB,

347
00:14:15,460 --> 00:14:17,460
meaning a single partition key value,

348
00:14:17,460 --> 00:14:19,780
cannot exceed 20 gigabytes of data.

349
00:14:19,780 --> 00:14:23,260
For a large enterprise tenant generating millions of records over months,

350
00:14:23,260 --> 00:14:26,100
20 gigabytes is a real limit, not a theoretical one.

351
00:14:26,100 --> 00:14:27,940
When that ceiling is hit, new rights fail.

352
00:14:27,940 --> 00:14:29,660
The resolution requires data migration.

353
00:14:29,660 --> 00:14:30,860
There's no in place fix.

354
00:14:30,860 --> 00:14:33,100
So tenant ID works until it doesn't.

355
00:14:33,100 --> 00:14:35,460
The question isn't whether to use tenant ID.

356
00:14:35,460 --> 00:14:39,220
It should be the foundation of your partition strategy in any multi tenant system.

357
00:14:39,220 --> 00:14:43,100
The question is whether tenant id alone is sufficient for the tenant distribution

358
00:14:43,100 --> 00:14:45,860
you expect to have, not just the distribution you have today.

359
00:14:45,860 --> 00:14:49,940
Planning for that limit before you go to production is what separates architectures

360
00:14:49,940 --> 00:14:52,140
that scale from architectures that get rewritten.

361
00:14:52,140 --> 00:14:55,740
And Microsoft built the answer to this problem directly into Cosmos DB.

362
00:14:55,740 --> 00:14:59,180
Hierarchical partition keys, solving the elephant tenant problem.

363
00:14:59,180 --> 00:15:01,180
Hierarchical partition keys are the answer.

364
00:15:01,180 --> 00:15:05,500
And understanding how they work changes the design model for every multi tenant elastic table

365
00:15:05,500 --> 00:15:06,500
you'll ever build.

366
00:15:06,500 --> 00:15:08,460
The concept is straightforward once you see it.

367
00:15:08,460 --> 00:15:11,660
Instead of a single partition key field, just tenant id,

368
00:15:11,660 --> 00:15:14,260
you define an ordered hierarchy of up to three fields.

369
00:15:14,260 --> 00:15:15,700
Level one is tenant id.

370
00:15:15,700 --> 00:15:18,420
Level two is something that distributes data within a tenant,

371
00:15:18,420 --> 00:15:20,980
user id, customer id, project id,

372
00:15:20,980 --> 00:15:22,860
depending on what your domain looks like.

373
00:15:22,860 --> 00:15:25,380
Level three is something more granular still.

374
00:15:25,380 --> 00:15:29,980
Session id, order id, or in many cases, just the records own id.

375
00:15:29,980 --> 00:15:34,020
The effective logical partition becomes the concatenation of all three values in order.

376
00:15:34,020 --> 00:15:36,580
So instead of one logical partition per tenant,

377
00:15:36,580 --> 00:15:39,940
a large tenant automatically gets thousands of sub-partitions,

378
00:15:39,940 --> 00:15:43,020
one for each unique combination of tenant id, user id,

379
00:15:43,020 --> 00:15:45,500
and session or order identifier.

380
00:15:45,500 --> 00:15:47,540
The elephant tenant problem disappears.

381
00:15:47,540 --> 00:15:50,620
That enterprise customer generating 80% of your traffic

382
00:15:50,620 --> 00:15:54,020
no longer concentrates everything onto a single partition key value.

383
00:15:54,020 --> 00:15:58,700
Their data distributes across as many sub-partitions as they have users and sessions.

384
00:15:58,700 --> 00:16:02,260
The 20 gigabyte ceiling we discussed doesn't apply to the tenant as a whole.

385
00:16:02,260 --> 00:16:05,140
It applies to each individual logical partition

386
00:16:05,140 --> 00:16:09,260
and no single user session data comes close to that limit.

387
00:16:09,260 --> 00:16:12,300
The system scales with the tenant's growth automatically

388
00:16:12,300 --> 00:16:16,020
without any manual intervention and without migrating to a dedicated container.

389
00:16:16,020 --> 00:16:18,540
Here's what makes HBK genuinely elegant

390
00:16:18,540 --> 00:16:21,580
compared to alternative approaches like synthetic or salted keys.

391
00:16:21,580 --> 00:16:23,340
Query rooting stays intelligent.

392
00:16:23,340 --> 00:16:26,860
When your application queries for all records belonging to a specific tenant,

393
00:16:26,860 --> 00:16:30,300
which is the most common query pattern in any multi-tenant system,

394
00:16:30,300 --> 00:16:33,500
Cosmos DB knows that level one is tenant id.

395
00:16:33,500 --> 00:16:37,180
A filter on tenant id alone fans out only across that tenant sub-partitions,

396
00:16:37,180 --> 00:16:38,580
not the entire container.

397
00:16:38,580 --> 00:16:41,580
You're not paying the cost of a full cross-container scan.

398
00:16:41,580 --> 00:16:46,180
The query touches only the physical partitions that hold data for that tenant and nothing else.

399
00:16:46,180 --> 00:16:51,140
For a query that includes both tenant id and user id, the scope narrows further.

400
00:16:51,140 --> 00:16:54,900
Only the partitions holding that user's data within that tenant get touched.

401
00:16:54,900 --> 00:17:00,260
Each additional level you specify in the filter reduces the fan out and reduces the RU cost.

402
00:17:00,260 --> 00:17:01,860
Compare that to a synthetic key,

403
00:17:01,860 --> 00:17:05,060
say tenant id concatenated with a hash of the record id.

404
00:17:05,060 --> 00:17:08,020
You get good distribution, but you lose the intelligent rooting.

405
00:17:08,020 --> 00:17:11,780
A query filtering on tenant id alone can't be efficiently routed

406
00:17:11,780 --> 00:17:14,820
because the partition key value isn't just tenant id anymore.

407
00:17:14,820 --> 00:17:17,540
The system has to scan more broadly to find the matching records.

408
00:17:17,540 --> 00:17:20,980
HPKs preserve the routing semantics of a simple tenant id key

409
00:17:20,980 --> 00:17:23,780
while adding the distribution properties of a composite key.

410
00:17:23,780 --> 00:17:25,060
You get both.

411
00:17:25,060 --> 00:17:27,380
One constrained worth naming explicitly.

412
00:17:27,380 --> 00:17:30,100
HPKs must be configured at container creation time.

413
00:17:30,100 --> 00:17:34,900
This reinforces everything set in the previous section about partition key decisions being irreversible.

414
00:17:34,900 --> 00:17:37,780
You can't add hierarchical levels to an existing container

415
00:17:37,780 --> 00:17:40,340
any more than you can change the partition key entirely.

416
00:17:40,340 --> 00:17:44,020
If you've already shipped an elastic table with a flat tenant id partition key

417
00:17:44,020 --> 00:17:47,380
and you later decide you need HPK to handle a growing enterprise customer

418
00:17:47,380 --> 00:17:48,500
you're looking at a migration.

419
00:17:48,500 --> 00:17:50,980
That's why the design time decision matters so much.

420
00:17:50,980 --> 00:17:53,380
You're not just choosing a key for the data you have today.

421
00:17:53,380 --> 00:17:55,620
You're choosing a key for the distribution profile.

422
00:17:55,620 --> 00:17:56,820
You'll have 18 months from now.

423
00:17:56,820 --> 00:18:00,900
For a B2B SAS platform, the practical HPK layout

424
00:18:00,900 --> 00:18:03,620
that handles most tenant distribution scenarios looks like this.

425
00:18:03,620 --> 00:18:05,220
Level one is tenant id,

426
00:18:05,220 --> 00:18:07,540
Level two is user id, or customer id,

427
00:18:07,540 --> 00:18:09,620
Level three is the item's own id.

428
00:18:09,620 --> 00:18:12,500
That three level structure handles heterogeneous tenant sizes

429
00:18:12,500 --> 00:18:14,740
without any application level rooting logic.

430
00:18:14,740 --> 00:18:18,340
Small tenants with 10 users produce a modest number of sub-partitions.

431
00:18:18,340 --> 00:18:22,580
Enterprise tenants with thousands of users spread across thousands of sub-partitions automatically.

432
00:18:22,580 --> 00:18:25,940
One more thing worth confirming for architects who are planning event-driven workflows

433
00:18:25,940 --> 00:18:27,380
on top of the storage layer.

434
00:18:27,380 --> 00:18:30,740
The Cosmos DB change feed still works at the physical partition level,

435
00:18:30,740 --> 00:18:34,500
completely unaffected by whether you use HPKs or a flat key.

436
00:18:34,500 --> 00:18:38,580
If you're building real time processing pipelines that react to record creation or updates,

437
00:18:38,580 --> 00:18:40,180
HPKs don't break that pattern.

438
00:18:40,180 --> 00:18:42,100
The event architecture remains intact.

439
00:18:42,100 --> 00:18:44,420
The storage architecture is now designed for scale.

440
00:18:44,420 --> 00:18:46,660
The next question is what it actually costs to run it.

441
00:18:46,660 --> 00:18:47,700
Storage economics.

442
00:18:47,700 --> 00:18:50,020
Why elastic tables change the cost model?

443
00:18:50,020 --> 00:18:52,820
The cost structure of dataverse storage is one of those things

444
00:18:52,820 --> 00:18:55,700
that doesn't feel real until you're staring at a bill.

445
00:18:55,700 --> 00:18:57,140
Database capacity.

446
00:18:57,140 --> 00:19:03,220
The storage type that backstandard dataverse tables runs at approximately $40 per gigabyte per month.

447
00:19:03,220 --> 00:19:05,540
That number is already in your head from section one,

448
00:19:05,540 --> 00:19:08,500
but we didn't examine there is what changed in 2025

449
00:19:08,500 --> 00:19:12,020
and why that change shifts the economics of this entire architecture.

450
00:19:12,020 --> 00:19:18,020
In 2025, Microsoft moved elastic tables from database capacity billing to log capacity billing.

451
00:19:18,020 --> 00:19:21,060
Log capacity costs approximately $10 per gigabyte per month.

452
00:19:21,060 --> 00:19:24,740
That's a 75% reduction in effective storage cost for the same data

453
00:19:24,740 --> 00:19:27,220
achieved entirely by choosing the right table type.

454
00:19:27,220 --> 00:19:29,700
No infrastructure changes, no architectural compromises,

455
00:19:29,700 --> 00:19:31,940
no migration to an external service.

456
00:19:31,940 --> 00:19:34,820
You stay inside dataverse inside your existing environment

457
00:19:34,820 --> 00:19:37,700
with the same API surface and the same security model.

458
00:19:37,700 --> 00:19:41,140
You just pay a fraction of what you'd pay for equivalent data and standard tables.

459
00:19:41,140 --> 00:19:44,180
At small data volumes that difference doesn't feel urgent.

460
00:19:44,180 --> 00:19:48,180
At SaaS scale it determines whether the unit economics of your platform are viable.

461
00:19:48,180 --> 00:19:50,180
Run the math on a concrete scenario.

462
00:19:50,180 --> 00:19:53,060
As SaaS platform storing 500 gigabytes of telemetry,

463
00:19:53,060 --> 00:19:54,980
activity logs and audit events,

464
00:19:54,980 --> 00:19:59,220
not an unusual volume for a multi-tenant platform with several hundred active customers

465
00:19:59,220 --> 00:20:04,820
pays roughly $5,000 per month on elastic tables at log capacity pricing.

466
00:20:04,820 --> 00:20:09,540
The same data and standard dataverse database capacity costs around $20,000 per month.

467
00:20:09,540 --> 00:20:12,020
That $15,000 monthly gap compounds.

468
00:20:12,020 --> 00:20:16,740
Over a year, the difference is close to $180,000 in storage costs alone

469
00:20:16,740 --> 00:20:19,540
before accounting for any performance or scalability benefits.

470
00:20:19,540 --> 00:20:23,220
There's a catch-worth naming directly because ignoring it creates problems later.

471
00:20:23,220 --> 00:20:26,820
Each dataverse tenant receives only two gigabytes of log storage by default,

472
00:20:26,820 --> 00:20:29,780
regardless of how many licenses are attached to the environment.

473
00:20:29,780 --> 00:20:31,540
Once you exceed that default,

474
00:20:31,540 --> 00:20:35,380
access log usage doesn't automatically price at $10 per gigabyte.

475
00:20:35,380 --> 00:20:37,540
If you haven't purchased log capacity add-ons,

476
00:20:37,540 --> 00:20:39,940
it reverts to database capacity pricing.

477
00:20:39,940 --> 00:20:44,180
That means the cost advantage disappears for any storage above the default threshold,

478
00:20:44,180 --> 00:20:47,540
unless you've explicitly acquired the additional log capacity entitlement.

479
00:20:47,540 --> 00:20:50,340
This isn't a goture buried in fine print.

480
00:20:50,340 --> 00:20:51,620
It's just capacity planning,

481
00:20:51,620 --> 00:20:54,020
the same exercise you do for any cloud resource,

482
00:20:54,020 --> 00:20:56,820
but it does mean that realizing the full cost benefit

483
00:20:56,820 --> 00:21:02,500
requires purchasing log capacity add-ons proportional to your expected elastic table data volume

484
00:21:02,500 --> 00:21:04,260
before that data accumulates.

485
00:21:04,260 --> 00:21:09,300
T.D.L-based automatic deletion interacts with the cost model in a way that compounds the savings over time.

486
00:21:09,300 --> 00:21:12,340
Data that expires automatically never accumulates in your storage bill.

487
00:21:12,340 --> 00:21:15,940
A platform that retains telemetry for 90 days and audit events for one year

488
00:21:15,940 --> 00:21:18,820
with T.D.L. policies enforcing those retention windows

489
00:21:18,820 --> 00:21:21,460
maintains a predictable and bounded storage footprint.

490
00:21:21,460 --> 00:21:24,900
The alternative, manual archiving processes, scheduled cleanup jobs,

491
00:21:24,900 --> 00:21:27,460
periodic bulk deletes, consumes engineering time,

492
00:21:27,460 --> 00:21:29,060
introduces operational risk,

493
00:21:29,060 --> 00:21:32,340
and still tends to allow data to accumulate beyond intended retention windows.

494
00:21:33,780 --> 00:21:38,180
T.D.L makes retention enforcement and architectural property rather than an operational discipline.

495
00:21:38,180 --> 00:21:41,860
The cost model also creates a design constraint that functions as a benefit.

496
00:21:41,860 --> 00:21:43,780
It forces you to categorize your data,

497
00:21:43,780 --> 00:21:46,100
not all data belongs in the same storage tier,

498
00:21:46,100 --> 00:21:47,780
hot transactional records,

499
00:21:47,780 --> 00:21:49,060
active subscriptions,

500
00:21:49,060 --> 00:21:50,580
current user configurations,

501
00:21:50,580 --> 00:21:51,860
live business entities,

502
00:21:51,860 --> 00:21:56,420
belong in standard tables where relational query performance justifies the higher storage cost.

503
00:21:56,420 --> 00:21:58,740
High volume operational data,

504
00:21:58,740 --> 00:22:02,340
the streams of events, logs, and telemetry the document system activity,

505
00:22:02,340 --> 00:22:06,020
belongs in elastic tables where the cost and performance profile fits the workload.

506
00:22:06,020 --> 00:22:09,940
That categorization discipline produces cleaner architectures.

507
00:22:09,940 --> 00:22:14,660
It prevents the common failure mode where everything lands in the same table type because it's the default

508
00:22:14,660 --> 00:22:18,100
and cost and performance problems appear gradually as data accumulates.

509
00:22:18,100 --> 00:22:19,700
The economics are compelling.

510
00:22:19,700 --> 00:22:23,540
But lower costs and higher throughput don't matter if a tenant can see another tenant's data.

511
00:22:23,540 --> 00:22:27,220
Multitenant Security Foundations, isolation by design.

512
00:22:27,220 --> 00:22:31,300
Security in a multi-tenant system isn't a feature you add after the architecture is working.

513
00:22:31,300 --> 00:22:34,500
It's a constraint that shapes every design decision from the beginning.

514
00:22:34,500 --> 00:22:40,020
And in 2026 enterprise customers arrive with detailed security questionnaires before they sign anything.

515
00:22:40,020 --> 00:22:44,260
SOC2, ISO 27001, GDPR,

516
00:22:44,260 --> 00:22:48,660
and the security review processes of large enterprise buyers all share a common focus.

517
00:22:48,660 --> 00:22:53,940
Can you demonstrate that one tenant's data is completely inaccessible to every other tenant,

518
00:22:53,940 --> 00:22:57,380
in a way that's provably structural rather than just procedurally enforced?

519
00:22:57,380 --> 00:22:59,620
The distinction matters.

520
00:22:59,620 --> 00:23:02,500
We believe tenants are isolated, doesn't pass an audit.

521
00:23:02,500 --> 00:23:08,180
Here is the technical mechanism at the storage layer that makes cross-tenant access architecturally impossible.

522
00:23:08,180 --> 00:23:10,660
Does. That bar is achievable with this architecture,

523
00:23:10,660 --> 00:23:15,700
but it requires understanding how data versus security model actually works on elastic tables,

524
00:23:15,700 --> 00:23:18,420
and where the gaps are if you implement it carelessly.

525
00:23:18,420 --> 00:23:22,580
Start with the foundational principle that every multi-tenant security model depends on.

526
00:23:22,580 --> 00:23:26,340
Tenant context must be derived from authenticated identity.

527
00:23:26,340 --> 00:23:30,260
Not from a query parameter, the client sends, not from a field in the request body,

528
00:23:30,260 --> 00:23:33,300
from the identity token that was issued when the user authenticated.

529
00:23:33,300 --> 00:23:37,780
In a PowerPages implementation, that means the logged-in contact record and the web roles

530
00:23:37,780 --> 00:23:40,420
assigned to that contact are the source of tenant context.

531
00:23:40,420 --> 00:23:44,180
Any data access decision the portal makes flows from that identity.

532
00:23:44,180 --> 00:23:49,300
A user cannot change their tenant context by modifying a URL or manipulating an API call.

533
00:23:49,300 --> 00:23:51,780
Data versus security model applies fully to elastic tables.

534
00:23:51,780 --> 00:23:54,580
Every security primitive that works on standard tables,

535
00:23:54,580 --> 00:23:57,860
security roles, business units, team membership, record ownership,

536
00:23:57,860 --> 00:23:59,540
works the same way on elastic tables.

537
00:23:59,540 --> 00:24:03,220
This is not a simplified or reduced security model for the new table type.

538
00:24:03,220 --> 00:24:04,340
It's the same framework.

539
00:24:04,340 --> 00:24:09,220
For multi-tenant SaaS specifically, the pattern that scales is business unit-based security.

540
00:24:09,220 --> 00:24:12,100
Each tenant maps to a business unit inside dataverse.

541
00:24:12,100 --> 00:24:15,300
Users belonging to that tenant are assigned to that business unit.

542
00:24:15,300 --> 00:24:19,780
Security roles grant read access at the business unit scope for the elastic table,

543
00:24:19,780 --> 00:24:24,260
meaning a user sees only records owned by their business unit and nothing outside it.

544
00:24:24,260 --> 00:24:27,620
The query enforcement happens at the dataverse layer automatically.

545
00:24:27,620 --> 00:24:31,700
There's no application level filter that a developer needs to remember to include in every query.

546
00:24:31,700 --> 00:24:33,220
The platform enforces the boundary.

547
00:24:33,220 --> 00:24:36,100
Organization-owned elastic tables work differently

548
00:24:36,100 --> 00:24:38,980
and are worth addressing directly because they're often the default choice

549
00:24:38,980 --> 00:24:41,620
when developers aren't thinking carefully about isolation.

550
00:24:41,620 --> 00:24:45,700
An organization-owned table grants read access at the organization scope,

551
00:24:45,700 --> 00:24:49,780
meaning any user with the right security role can read any record in the table,

552
00:24:49,780 --> 00:24:52,100
regardless of which tenant they belong to.

553
00:24:52,100 --> 00:24:54,820
For internal administrative use that might be appropriate.

554
00:24:54,820 --> 00:24:59,460
For a multi-tenant SaaS where tenant isolation is a requirement, it's the wrong ownership model.

555
00:24:59,460 --> 00:25:04,340
Organization ownership means no row level boundary between tenants at the dataverse layer.

556
00:25:04,340 --> 00:25:08,660
User-owned or team-owned tables with business unit scoping are the correct foundation.

557
00:25:08,660 --> 00:25:12,340
The ownership model determines whether per row isolation is possible at all.

558
00:25:12,340 --> 00:25:15,540
Business unit scoping determines the granularity of that isolation.

559
00:25:15,540 --> 00:25:19,300
One feature deserves specific attention because it solves a real operational problem

560
00:25:19,300 --> 00:25:21,460
in multi-tenant ingestion architectures.

561
00:25:21,460 --> 00:25:24,420
The record ownership across business units capability

562
00:25:24,420 --> 00:25:27,620
enabled in the power platform admin center under environment settings

563
00:25:27,620 --> 00:25:30,260
decouples record ownership from the owning business unit.

564
00:25:30,260 --> 00:25:34,020
Without this feature, a record owned by a service account in the root business unit

565
00:25:34,020 --> 00:25:37,060
gets scoped to that business unit, which breaks tenant isolation

566
00:25:37,060 --> 00:25:40,420
if the service account is responsible for ingesting data for all tenants.

567
00:25:40,420 --> 00:25:44,580
With this feature enabled, a central service account can own records

568
00:25:44,580 --> 00:25:48,020
while the owning business unit field is set to the correct tenants business unit.

569
00:25:48,580 --> 00:25:51,620
Users in tenant A's business unit, C tenant A's records,

570
00:25:51,620 --> 00:25:54,340
users in tenant B's business unit, C tenant B's records.

571
00:25:54,340 --> 00:25:57,540
The service account that created all of them sees everything it's permitted to see

572
00:25:57,540 --> 00:25:58,980
through its own role assignment.

573
00:25:58,980 --> 00:26:02,180
Isolation is preserved without requiring per tenant service accounts.

574
00:26:02,180 --> 00:26:05,940
This capability is essential for any ingestion pipeline that operates centrally.

575
00:26:05,940 --> 00:26:10,420
Power automate flows as your functions plug-ins running under a shared application identity.

576
00:26:10,420 --> 00:26:15,380
Without it, the architecture forces you to choose between operational simplicity and security.

577
00:26:15,380 --> 00:26:19,700
With it, you don't have to choose. The dataverse security layer is now correctly configured,

578
00:26:19,700 --> 00:26:24,820
but isolation on elastic tables has a second dimension that operates below the platform layer

579
00:26:24,820 --> 00:26:28,580
at the storage level itself. Partition keys as a security boundary.

580
00:26:28,580 --> 00:26:32,980
The partition key does two jobs. The first job, distributing data across physical

581
00:26:32,980 --> 00:26:35,940
partitions for performance is what most architects focus on.

582
00:26:35,940 --> 00:26:40,180
The second job is less discussed, but in a multi-tenant context, it's equally important.

583
00:26:40,180 --> 00:26:44,100
The partition key functions as a logical security boundary at the storage layer.

584
00:26:44,100 --> 00:26:49,460
Here's how that works in practice. When tenant id is the first level of the partition key hierarchy,

585
00:26:49,460 --> 00:26:53,780
every query that includes tenant id in its filter is automatically routed to that tenant's

586
00:26:53,780 --> 00:26:58,900
partitions by Cosmos DB. The routing isn't enforced by application code, it isn't enforced by

587
00:26:58,900 --> 00:27:03,700
a dataverse security role. It's a structural property of how Cosmos DB resolves queries against

588
00:27:03,700 --> 00:27:07,620
the partition map. A query with a tenant it filter physically cannot reach another tenant's

589
00:27:07,620 --> 00:27:12,340
partitions. There's nothing in that tenant's partition range to return. The storage layer itself

590
00:27:12,340 --> 00:27:17,460
enforces the boundary. This is what provably un-bipassable isolation looks like at the storage layer,

591
00:27:17,460 --> 00:27:20,900
and it's the property that makes this architecture capable of satisfying the kind of

592
00:27:20,900 --> 00:27:24,820
audit scrutiny that enterprise customers apply to multi-tenant SaaS platforms.

593
00:27:24,820 --> 00:27:29,060
When a SOC-2 auditor asks how you prevent cross-tenant data access, the answer isn't,

594
00:27:29,060 --> 00:27:33,700
our application always includes a tenant filter. That answer depends on developers getting

595
00:27:33,700 --> 00:27:38,580
it right in every query, in every code path forever. The answer that passes is,

596
00:27:38,580 --> 00:27:44,020
the partition key is tenant-ed at level one, and Cosmos DB cannot return data across partition

597
00:27:44,020 --> 00:27:48,420
boundaries for a query that specifies the tenant. So the combination of dataverse business

598
00:27:48,420 --> 00:27:54,340
unit security and HPK with tenant id at level one gives you defense in depth. Dataverse enforces

599
00:27:54,340 --> 00:27:58,740
the tenant boundary at the platform layer. A user authenticated as tenant A's contact cannot be

600
00:27:58,740 --> 00:28:03,140
assigned to tenant B's web role, cannot own tenant B's records, cannot see through the business

601
00:28:03,140 --> 00:28:08,820
unit scope to rose outside their partition. Cosmos DB enforces the tenant boundary at the storage

602
00:28:08,820 --> 00:28:13,540
layer, queries scope to tenant id physically touch only their tenant's data. Two independent

603
00:28:13,540 --> 00:28:17,780
mechanisms operating at different layers, both enforcing the same boundary. Now let's name

604
00:28:17,780 --> 00:28:21,700
the failure mode because it happens more often than it should. The risk of misalignment appears

605
00:28:21,700 --> 00:28:26,580
when developers treat elastic tables the way they treat standard dataverse tables. As a data store

606
00:28:26,580 --> 00:28:30,740
where you write queries based on what you want to see, without thinking explicitly about partition

607
00:28:30,740 --> 00:28:35,380
scope. On a standard table a query that doesn't include a tenant filter might still produce correct

608
00:28:35,380 --> 00:28:39,780
results because dataverse role-based security filters the output before it reaches the user.

609
00:28:39,780 --> 00:28:45,300
On an elastic table a query that doesn't include tenant id in its filter becomes a cross-partition

610
00:28:45,300 --> 00:28:49,780
scan. Cosmos DB doesn't know what tenant context means unless your query expresses it through

611
00:28:49,780 --> 00:28:55,220
the partition key. The query fans out across physical partitions, the RU cost multiplies performance

612
00:28:55,220 --> 00:29:00,020
degrades, and depending on your security role configuration, records from multiple tenants might

613
00:29:00,020 --> 00:29:04,420
appear in the result set. That's not a theoretical vulnerability. It's a practical design failure that

614
00:29:04,420 --> 00:29:08,420
shows up when developers build their first elastic table feature by copying query patterns from

615
00:29:08,420 --> 00:29:12,500
existing standard table implementations without accounting for the fact that the storage model

616
00:29:12,500 --> 00:29:16,500
is different. The security role might still enforce the correct output in many cases,

617
00:29:16,500 --> 00:29:20,420
but the cross-partition scan has already happened, the RU cost has already been paid,

618
00:29:20,420 --> 00:29:25,460
and you've introduced a query pattern that could become a security gap if role configuration changes.

619
00:29:25,460 --> 00:29:30,900
The defensive practice is simple but requires consistency. Every query against an elastic table must

620
00:29:30,900 --> 00:29:36,580
include the partition key at level one in its filter. Every query, without exception, this isn't

621
00:29:36,580 --> 00:29:41,620
optional optimization. It's the mechanism that keeps both performance and security working correctly.

622
00:29:41,620 --> 00:29:46,660
When the filter is present the query is fast, cheap, and inherently scoped. When it's absent you're

623
00:29:46,660 --> 00:29:50,900
relying on the platform layer alone to enforce tenant boundaries which breaks the defense

624
00:29:50,900 --> 00:29:56,020
in-depth model and produces expensive queries in the process. The architecture relies on both layers

625
00:29:56,020 --> 00:30:00,740
working together. The next question is how you actually implement that at the row level across

626
00:30:00,740 --> 00:30:06,100
millions of records and hundreds of tenants. Implementing row-level security on elastic tables,

627
00:30:06,100 --> 00:30:10,580
row-level security in dataverse isn't a single switch you flip. It's the combined output of four

628
00:30:10,580 --> 00:30:15,620
separate mechanisms operating simultaneously. Record ownership, business unit hierarchy,

629
00:30:15,620 --> 00:30:20,020
security role scopes, and team membership. Each one contributes to the access decision that

630
00:30:20,020 --> 00:30:24,900
dataverse makes when a user queries a table. Understanding how they interact on elastic tables,

631
00:30:24,900 --> 00:30:29,780
specifically at the volumes that make elastic tables necessary determines whether your implementation

632
00:30:29,780 --> 00:30:34,180
actually holds under production load. The first thing to get clear is what doesn't work at scale.

633
00:30:34,180 --> 00:30:38,820
Explicit per record sharing, where you share individual records with specific users or teams,

634
00:30:38,820 --> 00:30:42,260
is a viable pattern for transactional business data with moderate record counts.

635
00:30:42,260 --> 00:30:46,900
It breaks completely on elastic tables where you might have millions of rows accumulating across

636
00:30:46,900 --> 00:30:52,020
hundreds of tenants. The overhead of maintaining sharing relationships at that volume is prohibitive.

637
00:30:52,020 --> 00:30:56,660
Every explicit share is a record in the system. Millions of shares across millions of rows

638
00:30:56,660 --> 00:31:00,420
creates its own performance problem, independent of the underlying elastic table.

639
00:31:00,420 --> 00:31:04,900
Cross record sharing is not the mechanism for this workload. The scalable pattern looks different.

640
00:31:04,900 --> 00:31:09,380
The business unit hierarchy does the heavy lifting and the key is that it operates at query time

641
00:31:09,380 --> 00:31:13,540
without per record configuration. When a security role grants read access at the business unit

642
00:31:13,540 --> 00:31:19,300
scope for an elastic table, every query that user executes is automatically filtered to records

643
00:31:19,300 --> 00:31:24,500
owned by their business unit. No application level filter required, no per record permission entry

644
00:31:24,500 --> 00:31:29,540
required. The scope is a property of the role applied uniformly across every record in the table.

645
00:31:29,540 --> 00:31:33,940
That's the mechanism that makes millions of rows manageable. The security boundary is enforced

646
00:31:33,940 --> 00:31:38,500
structurally, not record by record. Practical implementation follows a consistent pattern,

647
00:31:38,500 --> 00:31:43,620
create one business unit per tenant in your dataverse environment, assign all users belonging to

648
00:31:43,620 --> 00:31:47,940
that tenant to their corresponding business unit, create a security role that grants read at

649
00:31:47,940 --> 00:31:52,980
business unit scope for the elastic table. Write and create if users need to submit records,

650
00:31:52,980 --> 00:31:58,500
but read is always the critical access control for isolation. Assign that role to users in each tenant's

651
00:31:58,500 --> 00:32:03,060
business unit, the isolation is now structural, users in tenant A's business unit C tenant A's

652
00:32:03,060 --> 00:32:07,860
records, users in tenant B's business unit C tenant B's records. Neither group can reach the

653
00:32:07,860 --> 00:32:12,020
other's data regardless of how they query. The ingestion pipeline is where implementation

654
00:32:12,020 --> 00:32:16,180
discipline matters most because this is where the owning business unit field gets set,

655
00:32:16,180 --> 00:32:20,340
and getting it wrong at right time means the security model doesn't work correctly at red time.

656
00:32:20,340 --> 00:32:24,660
Whether you're ingesting through power automate flows, Azure Functions calling the

657
00:32:24,660 --> 00:32:29,860
dataverse web API or a plug in executing in the platform, the pipeline must derive the correct

658
00:32:29,860 --> 00:32:34,500
tenant business unit from the tenant context established during authentication, then set the owning

659
00:32:34,500 --> 00:32:39,060
business unit field on every elastic table record before it's written, not as an afterthought,

660
00:32:39,060 --> 00:32:44,420
not as a batch correction after the fact. At creation time, because that field drives every subsequent

661
00:32:44,420 --> 00:32:48,980
access decision. For implementations where a central service account handles ingestion for multiple

662
00:32:48,980 --> 00:32:52,740
tenants, which is the common pattern when you're using a single Azure Function or a shared power

663
00:32:52,740 --> 00:32:58,020
automate flow to process events from all tenants. The record ownership across business units feature

664
00:32:58,020 --> 00:33:02,580
is what makes this clean. The service account owns the record, the owning business unit field is set

665
00:33:02,580 --> 00:33:07,540
to the tenants business unit, dataverse enforces access based on the owning business unit, not the

666
00:33:07,540 --> 00:33:11,860
owner's business unit. The central ingestion account can write for every tenant without needing a

667
00:33:11,860 --> 00:33:16,580
separate application identity per tenant and user still only see records scope to their own business

668
00:33:16,580 --> 00:33:22,420
unit. Field level security extends this model to column granularity. On an elastic table, you can

669
00:33:22,420 --> 00:33:27,540
mark specific columns as secured, applying field security profiles that restrict read, create,

670
00:33:27,540 --> 00:33:32,820
and update access independently of the role-level permissions. If certain columns carry information

671
00:33:32,820 --> 00:33:38,420
that only specific roles should ever see, internal cost data, flagged risk scores, system-generated

672
00:33:38,420 --> 00:33:42,980
identifiers used in downstream integrations, those columns can be restricted without affecting

673
00:33:42,980 --> 00:33:47,860
access to the rest of the record. The role-level boundary handles tenant isolation, field-level

674
00:33:47,860 --> 00:33:52,340
security handles sensitivity within the tenant's own data. The security model is now enforced at

675
00:33:52,340 --> 00:33:58,180
the data layer, the architecture needs a front end. Power pages, as a SAS front end, what actually

676
00:33:58,180 --> 00:34:02,980
works. The security model is enforced at the data layer. Now the architecture needs a front end,

677
00:34:02,980 --> 00:34:06,580
and this is where most implementations run into a constraint they didn't plan for.

678
00:34:06,580 --> 00:34:11,140
Power pages is not a data ingestion gateway. It's not designed to handle bulk-right operations,

679
00:34:11,140 --> 00:34:15,780
high-frequency telemetry streams, or heavy-back end processing. What it is designed for,

680
00:34:15,780 --> 00:34:20,660
and does well, is serving authenticated users a fast, secure, permission-aware interface to

681
00:34:20,660 --> 00:34:26,100
dataverse data. Those are different jobs, and confusing them produces architectures that fail

682
00:34:26,100 --> 00:34:30,580
under load in ways that are difficult to debug after the fact. Here's the constraint you're working

683
00:34:30,580 --> 00:34:36,260
within. Authenticated PowerPages users receive approximately 200 to 400 dataverse requests per day

684
00:34:36,260 --> 00:34:41,220
as part of their portal capacity entitlement. Anonymous users receive approximately 80.

685
00:34:41,220 --> 00:34:46,180
Those numbers aren't per session or per hour. They are per 24-hour period, pooled across the site

686
00:34:46,180 --> 00:34:51,300
based on licensed capacity. Aside with 1,000 authenticated users, has roughly 400,000

687
00:34:51,300 --> 00:34:55,540
dataverse requests per day available across the entire portal. Every page load that queries

688
00:34:55,540 --> 00:35:00,580
dataverse, every form submission, every client-side web API call from JavaScript, all of it draws

689
00:35:00,580 --> 00:35:04,580
from that pool. This is the constraint that shapes every front end design decision, not as a

690
00:35:04,580 --> 00:35:08,500
limitation to work around, but as an engineering discipline that produces better performing

691
00:35:08,500 --> 00:35:14,020
portals than you'd build without it. The PowerPages web API is the right tool for what users

692
00:35:14,020 --> 00:35:18,900
directly interact with. When a user loads a dashboard, the web API retrieves their records client-side

693
00:35:18,900 --> 00:35:23,380
via JavaScript, enforcing table permissions, and web roles automatically without any additional

694
00:35:23,380 --> 00:35:28,100
security configuration. When a user submits a form, the web API creates or updates a record,

695
00:35:28,100 --> 00:35:32,180
and the permission check happens at the platform layer. The developer doesn't write authorization

696
00:35:32,180 --> 00:35:36,740
logic. The portal runtime handles it based on the web role assignments established in the security

697
00:35:36,740 --> 00:35:41,620
model. This is the correct use of the web API. User initiated reads and writes,

698
00:35:41,620 --> 00:35:45,700
"Scope to what the authenticated contact is permitted to see." For operations that require

699
00:35:45,700 --> 00:35:50,740
throughput, complexity, or cross-entity processing, bulk writes validation logic that spans multiple

700
00:35:50,740 --> 00:35:56,180
tables, calculations that aggregate data across tenant records, the web API is the wrong layer.

701
00:35:56,180 --> 00:36:00,980
PowerPages server logic is where those operations belong. Server logic executes within the power

702
00:36:00,980 --> 00:36:05,460
platform infrastructure closer to dataverse without the browser to portal to dataverse roundtrip

703
00:36:05,460 --> 00:36:10,980
that adds latency to every client-side call. Architecture guidance from Microsoft and from practitioners

704
00:36:10,980 --> 00:36:15,700
who've built production implementations consistently identify server logic as the lowest latency

705
00:36:15,700 --> 00:36:20,580
path for backend operations in a powerpages solution. You're not bypassing the platform,

706
00:36:20,580 --> 00:36:24,740
you're using the part of it designed for this exact purpose. The architectural principle that

707
00:36:24,740 --> 00:36:29,060
follows from this is worth stating explicitly because it determines how you decompose features.

708
00:36:29,060 --> 00:36:33,620
Anything the user directly sees or interacts with belongs in the web API layer. Anything that

709
00:36:33,620 --> 00:36:38,180
requires backend processing, queuing, or high-volume data movement belongs in server logic or in

710
00:36:38,180 --> 00:36:42,980
services operating outside the portal runtime entirely. Those aren't competing patterns.

711
00:36:42,980 --> 00:36:47,380
They're complementary layers of the same architecture, each handling what it's actually suited for.

712
00:36:47,380 --> 00:36:51,140
There's one more control mechanism that belongs in every production power pages deployment

713
00:36:51,140 --> 00:36:57,540
handling elastic table data, web application, firewall rate limiting on the cache API endpoint.

714
00:36:57,540 --> 00:37:01,860
The WAF sits in front of the portal and can enforce per client request limits before those requests

715
00:37:01,860 --> 00:37:07,060
reach dataverse. Without it, a poorly written client-side script, a bot, or a single user running an

716
00:37:07,060 --> 00:37:12,100
automated process, can drain the site's daily request pool before legitimate users get their share.

717
00:37:12,100 --> 00:37:17,060
With a rate limit configured, 50 calls per minute per client, for example, that scenario becomes

718
00:37:17,060 --> 00:37:21,940
impossible. The WAF rule fires, the abusive client gets blocked, and the request budget remains

719
00:37:21,940 --> 00:37:26,500
available for actual users. It's a straightforward configuration with a significant protective effect,

720
00:37:26,500 --> 00:37:30,820
and skipping it is the kind of oversight that produces capacity incidents in production.

721
00:37:30,820 --> 00:37:34,660
The request model is understood. Now the question is how to make the data retrieval side of the

722
00:37:34,660 --> 00:37:39,140
front and genuinely fast within those boundaries. Designing high performance data retrieval from

723
00:37:39,140 --> 00:37:43,620
elastic tables, the request model defines your budget, how you spend it determines whether

724
00:37:43,620 --> 00:37:48,420
users experience a fast portal or a frustrating one. And on elastic tables specifically, the retrieval

725
00:37:48,420 --> 00:37:54,500
decisions you make at design time have compounding effects. They determine latency, are you consumption,

726
00:37:54,500 --> 00:37:58,660
and whether the partition key architecture you've built actually delivers the performance it's

727
00:37:58,660 --> 00:38:04,020
capable of. Start with the most consequential choice. Service side rendering via liquid versus

728
00:38:04,020 --> 00:38:09,220
asynchronous client side loading via the web API. For small data sets, a few hundred records or fewer,

729
00:38:09,220 --> 00:38:14,260
this decision barely matters. Community testing comparing liquid and web API on the same power

730
00:38:14,260 --> 00:38:18,580
pages page with the same underlying data found roughly one second of load time for both approaches

731
00:38:18,580 --> 00:38:22,900
at that scale. Neither has a meaningful advantage when the data volume is low enough that the page

732
00:38:22,900 --> 00:38:27,860
assembles quickly regardless of whether rendering happens. The picture changes dramatically as data

733
00:38:27,860 --> 00:38:33,620
volume grows. At 10,000 records, liquid rendering takes between 22 and 30 seconds before the page

734
00:38:33,620 --> 00:38:38,020
becomes usable. The user sits looking at a blank or partially loaded page waiting for the server to

735
00:38:38,020 --> 00:38:43,540
fetch all the data, build the HTML, and return the entire rendered response. The web API path

736
00:38:43,540 --> 00:38:49,460
loads the page shell almost instantly. The user sees the layout, the navigation, the loading indicator,

737
00:38:49,460 --> 00:38:54,180
and the data arrives asynchronously in six to eight seconds. Perseve performance is not the same

738
00:38:54,180 --> 00:38:59,460
as actual data load time and the web API approach exploits that gap deliberately. Users tolerate

739
00:38:59,460 --> 00:39:04,340
waiting for content to appear inside a page they can already see far better than they tolerate waiting

740
00:39:04,340 --> 00:39:08,340
for a page that hasn't loaded at all. The architectural pattern that follows is clear.

741
00:39:08,340 --> 00:39:13,220
Render the page shell immediately, then load elastic table data asynchronously,

742
00:39:13,220 --> 00:39:18,020
via the web API with the loading indicator visible to the user from the first moment the page

743
00:39:18,020 --> 00:39:22,660
appears. This isn't a workaround for a performance limitation. It's the correct architecture for

744
00:39:22,660 --> 00:39:29,140
any data-heavy portal view and it happens to align perfectly with how the web API is designed to be used.

745
00:39:29,140 --> 00:39:33,620
Now, within that asynchronous loading pattern query discipline determines whether you're spending

746
00:39:33,620 --> 00:39:39,780
your request budget efficiently or wastefully. Three or data query options do most of the work. Use

747
00:39:39,780 --> 00:39:44,740
select to retrieve only the columns your UI actually displays. Not star selects that return every

748
00:39:44,740 --> 00:39:49,460
field on the record. Use filter to narrow the result set before it leaves the server, not after it

749
00:39:49,460 --> 00:39:54,100
arrives in the browser. Use top combined with pagination to load records in pages of manageable

750
00:39:54,100 --> 00:39:59,140
size rather than pulling the full data set in the single call. These three together reduce payload

751
00:39:59,140 --> 00:40:03,700
size, reduce server-side processing and reduce RU consumption on the cosmos DB side. Each one

752
00:40:03,700 --> 00:40:08,420
independently helps. All three together compound the benefit. For elastic tables specifically,

753
00:40:08,420 --> 00:40:13,140
there's a fourth requirement that overrides everything else. Every query must include the partition

754
00:40:13,140 --> 00:40:18,820
key in the filter. Every single one. The partition ID field or the tenant ID that maps to level one

755
00:40:18,820 --> 00:40:23,780
of your HPK hierarchy must appear in the filter expression on every call. Without it, the query

756
00:40:23,780 --> 00:40:28,420
becomes a cross-partition scan. Cosmos DB fans out across physical partitions looking for matching

757
00:40:28,420 --> 00:40:33,300
records. RU consumption multiplies proportional to how many partitions exist and response time increases

758
00:40:33,300 --> 00:40:38,420
accordingly. With it, the query roots directly to the relevant partitions executes in bounded time

759
00:40:38,420 --> 00:40:43,700
with predictable RU cost and returns quickly. This is the elastic table query discipline that separates

760
00:40:43,700 --> 00:40:48,500
implementations that perform well from implementations that look fine in development where data sets

761
00:40:48,500 --> 00:40:52,740
a small and collapse under production load when data accumulates across many tenants.

762
00:40:52,740 --> 00:40:58,900
Server-side caching is the final lever worth pulling aggressively. The PowerPages Web API supports

763
00:40:58,900 --> 00:41:03,860
caching repeated identical requests server-side which means data that many users request in the same

764
00:41:03,860 --> 00:41:09,300
form. A product catalog, a shared configuration view, a list of available options can be served

765
00:41:09,300 --> 00:41:13,940
from cache rather than triggering a fresh dataverse query for each user session. For read-heavy

766
00:41:13,940 --> 00:41:18,260
portal views where the underlying data changes infrequently, caching converts what would be dozens of

767
00:41:18,260 --> 00:41:23,940
individual dataverse requests into a single cache response. Against a constrained daily request

768
00:41:23,940 --> 00:41:29,220
budget, that multiplier matters considerably. Fast retrieval handles what users see. The right

769
00:41:29,220 --> 00:41:33,780
side of the architecture operates on completely different principles. Right architecture,

770
00:41:33,780 --> 00:41:38,580
getting data into elastic tables at scale. The retrieval architecture is designed, the security

771
00:41:38,580 --> 00:41:43,460
model is enforced. Now the question is, how data actually gets into elastic tables at the volumes

772
00:41:43,460 --> 00:41:47,700
that justify using them? Because the right path is where the most common architectural mistakes

773
00:41:47,700 --> 00:41:51,380
happen and they tend to be invisible until the system is under real load.

774
00:41:51,380 --> 00:41:56,580
The starting point is a clear distinction between two categories of rights that assess portal

775
00:41:56,580 --> 00:42:01,060
handles because they have fundamentally different architectural requirements. The first category is

776
00:42:01,060 --> 00:42:06,260
user initiated rights. A user submits a form, a user saves a configuration change, a user creates a

777
00:42:06,260 --> 00:42:11,220
record through a portal interaction. These are synchronous, low-frequency single-record operations

778
00:42:11,220 --> 00:42:15,300
triggered by deliberate human action. The PowerPages Web API handles these correctly,

779
00:42:15,300 --> 00:42:20,500
opposed to the Web API endpoint creates the record, table permissions enforce that the user is allowed

780
00:42:20,500 --> 00:42:25,780
to write to that table, the owning business unit gets set based on the tenant context and the operation

781
00:42:25,780 --> 00:42:30,900
completes. This is exactly the workload the Web API is designed for and it works well within it.

782
00:42:30,900 --> 00:42:35,700
The second category is where the architecture diverges, high volume system generated rights,

783
00:42:35,700 --> 00:42:40,820
telemetry events, click stream records, audit trail entries, activity feed updates generated by

784
00:42:40,820 --> 00:42:45,540
background processes. These don't originate from user button clicks, they originate from the systems

785
00:42:45,540 --> 00:42:50,660
own operation and the volume profile is completely different. A single user session might generate dozens

786
00:42:50,660 --> 00:42:55,220
or hundreds of events. Across a platform with thousands of active users that's millions of records

787
00:42:55,220 --> 00:42:59,780
per day accumulating in your elastic tables. The portal is not the ingestion point for this category,

788
00:42:59,780 --> 00:43:04,820
it's the display layer attempting to write high volume operational data directly from the browser

789
00:43:04,820 --> 00:43:10,420
through the PowerPages Web API burns the daily request entitlement described in the previous

790
00:43:10,420 --> 00:43:15,780
section creates contention between ingestion traffic and user interaction traffic and doesn't use the

791
00:43:15,780 --> 00:43:21,060
API that actually delivers the performance advantage. That API is create multiple, create multiple

792
00:43:21,060 --> 00:43:26,260
is the mechanism that makes the two to ten times bulk performance improvement real. It groups multiple

793
00:43:26,260 --> 00:43:31,380
records into a single dataverse request and Cosmos DB processes those rights with the horizontal

794
00:43:31,380 --> 00:43:36,260
distribution across partitions that gives elastic tables their throughput advantage. Single record

795
00:43:36,260 --> 00:43:40,660
rights via the portal web API don't unlock that distribution that you're just writing one record

796
00:43:40,660 --> 00:43:45,780
per round trip which is effectively the same throughput profile as a standard table. Create multiple

797
00:43:45,780 --> 00:43:51,300
rights dozens or hundreds of records in one call and the Cosmos DB storage layer distributes them

798
00:43:51,300 --> 00:43:56,260
across partitions simultaneously. That's where the performance gap becomes a practical system

799
00:43:56,260 --> 00:44:01,220
property rather than a benchmark number. The correct right architecture for high volume data

800
00:44:01,220 --> 00:44:06,740
looks like this user actions in PowerPages trigger lightweight events small API calls that acknowledge

801
00:44:06,740 --> 00:44:11,220
an action occurred and cue it for processing a back and service either an Azure function or a

802
00:44:11,220 --> 00:44:16,020
power automate flow with a premium trigger pulls from that queue assembles records and batches

803
00:44:16,020 --> 00:44:20,660
and issues create multiple calls against the elastic table the back and service runs under a

804
00:44:20,660 --> 00:44:25,300
service principle account which means its dataverse request consumption comes from tenant level

805
00:44:25,300 --> 00:44:30,900
limits rather than the per user portal entitlements the portal stays fast for users the ingestion

806
00:44:30,900 --> 00:44:35,220
pipeline operates independently without competing for the same request budget service principle

807
00:44:35,220 --> 00:44:40,660
accounts are worth a brief note on configuration they authenticate against dataverse using application

808
00:44:40,660 --> 00:44:45,540
identity rather than user identity which means they're not subject to the 200 to 400 daily

809
00:44:45,540 --> 00:44:50,340
requests ceiling that portal users face their consumption counts against the environment's tenant

810
00:44:50,340 --> 00:44:55,940
level limit which at typical dynamics 365 or power apps licensing scales is orders of magnitude

811
00:44:55,940 --> 00:45:01,060
larger isolating high volume back end ingestion behind a service principle is therefore both a

812
00:45:01,060 --> 00:45:05,940
throughput decision and a capacity management decision the partition key assignment happens in

813
00:45:05,940 --> 00:45:10,340
this back end pipeline and it cannot be delegated to a later step when a record is created in an

814
00:45:10,340 --> 00:45:15,780
elastic table the partition ID is set at that moment and becomes immutable the ingestion pipeline

815
00:45:15,780 --> 00:45:21,540
must derive the correct partition key value typically tenant I'd at level one of the hbk hierarchy

816
00:45:21,540 --> 00:45:26,580
from the tenant context established upstream in the processing chain before the create multiple call

817
00:45:26,580 --> 00:45:31,220
is issued getting this right in the pipeline means the security model and the query performance model

818
00:45:31,220 --> 00:45:35,620
both work correctly from the moment the record exists the right architecture is designed around

819
00:45:35,620 --> 00:45:39,940
batching and back end services but a third data pattern requires a different approach entirely

820
00:45:39,940 --> 00:45:46,340
jason columns schema flexibility without schema chaos every dataverse column you define is a

821
00:45:46,340 --> 00:45:51,380
contract the data type is fixed the field length is fixed the schema is locked into your solution

822
00:45:51,380 --> 00:45:56,100
deployed through an a lm pipeline and shared across every tenant in the environment that rigidity is

823
00:45:56,100 --> 00:46:00,820
a feature when your data model is stable and your tenants are homogeneous it becomes a liability when

824
00:46:00,820 --> 00:46:06,100
different customers need different data shapes and in b2b sass they almost always do tenant a wants

825
00:46:06,100 --> 00:46:11,220
to track five custom attributes on their activity records tenant b needs 12 different ones none of

826
00:46:11,220 --> 00:46:15,700
which overlap with tenant a's tenant c has requirements that won't be defined until their onboarding

827
00:46:15,700 --> 00:46:20,740
call next month in a standard dataverse schema accommodating this means either adding dozens of columns

828
00:46:20,740 --> 00:46:25,540
to cover every possible combination across every tenant most of which are empty for most tenants

829
00:46:25,540 --> 00:46:30,740
or building a separate metadata layer that maps custom fields to generic value columns both

830
00:46:30,740 --> 00:46:36,020
approaches work neither is clean both create ongoing schema management overhead that compounds as

831
00:46:36,020 --> 00:46:41,380
your tenant count grows elastic tables offer a third option string columns formatted as jason can

832
00:46:41,380 --> 00:46:46,660
sit alongside typed columns in the same table holding semi structured tenant specific data without

833
00:46:46,660 --> 00:46:51,940
requiring a schema change every time a tenant's requirements evolve the typed columns tenant ID

834
00:46:51,940 --> 00:46:56,900
timestamp records data the fields you query against stay first class the flexible metadata that

835
00:46:56,900 --> 00:47:02,180
varies by tenant lives in the jason column readable and right able through normal dataverse API calls

836
00:47:02,180 --> 00:47:06,980
queryable through cosmos SQL when you need to reach inside the structure the performance case

837
00:47:06,980 --> 00:47:11,860
for this pattern goes beyond schema flexibility consider what it costs at the API layer to store and

838
00:47:11,860 --> 00:47:16,980
retrieve a user's activity feed as individual records versus a single document if each activity

839
00:47:16,980 --> 00:47:23,060
item is its own elastic table row reading a user's last 500 activities requires a query that returns

840
00:47:23,060 --> 00:47:29,860
500 records 500 items in the result set 500 rows de-serialized and transmitted storing those 500 items

841
00:47:29,860 --> 00:47:35,860
as a jason array in a single row means one record retrieved one API call one round trip the 365

842
00:47:35,860 --> 00:47:40,900
on training implementation demonstrates exactly this a single elastic table row holding an entire

843
00:47:40,900 --> 00:47:46,740
user activity feed as a jason array stored and retrieved in a single operation at the request

844
00:47:46,740 --> 00:47:51,220
budget constraints described in the previous sections the difference between 500 API calls and

845
00:47:51,220 --> 00:47:55,860
one isn't a micro optimization it determines whether a feature is viable within the portals daily

846
00:47:55,860 --> 00:48:01,380
entitlement updating a jason document follows the same arithmetic one right to update the array

847
00:48:01,380 --> 00:48:06,100
regardless of how many items it contains compared to individual row creates and updates at that volume

848
00:48:06,100 --> 00:48:11,140
the reduction in API consumption is significant but the security risk of jason on elastic tables is

849
00:48:11,140 --> 00:48:15,460
concrete and needs to be understood before you adopt the pattern dataverse column level security

850
00:48:15,460 --> 00:48:19,860
operates at the column boundary you can mark a column as secured apply a field security profile

851
00:48:19,860 --> 00:48:24,500
to it and restrict which roles can read or write that column what you cannot do is apply that

852
00:48:24,500 --> 00:48:29,860
restriction selectively to individual keys inside a jason payload stored in that column the

853
00:48:29,860 --> 00:48:34,820
security model sees the column as a single unit a user with read access to the jason column sees

854
00:48:34,820 --> 00:48:39,620
the entire jason content every key every value everything stored in that structure there's no

855
00:48:39,620 --> 00:48:44,500
mechanism within dataverse to say this user can see the preferences key but not the internal

856
00:48:44,500 --> 00:48:49,300
risk score key the column is either readable or it isn't this means the design boundary between

857
00:48:49,300 --> 00:48:53,940
type columns and jason columns is not a matter of preference it's a security requirement jason

858
00:48:53,940 --> 00:49:00,180
columns are for non-sensitive flexible metadata configuration preferences display settings tenant

859
00:49:00,180 --> 00:49:04,980
specific feature flags optional attributes that vary in structure across customers everything

860
00:49:04,980 --> 00:49:09,620
sensitive identifiers used for compliance financial data personal information anything governed by

861
00:49:09,620 --> 00:49:14,740
field level security requirements belongs in first class typed columns where column level security

862
00:49:14,740 --> 00:49:19,380
can be applied with precision the discipline this requires isn't burdensome it's a simple

863
00:49:19,380 --> 00:49:24,420
categorization rule if you'd want to restrict you can see it it's not jason if it's tenant specific

864
00:49:24,420 --> 00:49:29,380
metadata that any user of that tenant can appropriately access jason is the right home for it that

865
00:49:29,380 --> 00:49:34,740
boundary consistently enforced gives you schema flexibility without the governance exposure that

866
00:49:34,740 --> 00:49:39,060
comes from treating jason columns as a general purpose container for everything that doesn't fit

867
00:49:39,060 --> 00:49:45,300
the relational model execute cosmos SQL query the analytical layer jason column solve the schema

868
00:49:45,300 --> 00:49:50,180
flexibility problem but storing data in jason creates an immediate query problem standard

869
00:49:50,180 --> 00:49:55,380
data verse queries don't understand what's inside them fetch xml o data advanced find every

870
00:49:55,380 --> 00:50:00,340
conventional data verse query mechanism treats a jason column as an opaque text string you can

871
00:50:00,340 --> 00:50:05,220
retrieve the columns content you can filter on whether the string is null or not null what you

872
00:50:05,220 --> 00:50:10,420
cannot do is filter on a property inside the jason sought by a nested value or project specific

873
00:50:10,420 --> 00:50:15,060
keys out of the document the query engine sees text it has no concept of structure within that

874
00:50:15,060 --> 00:50:19,940
text which means if your jason column contains a user's activity feed and you want to find all

875
00:50:19,940 --> 00:50:25,300
activities of a specific type within that feed you can't express that filter in standard data verse

876
00:50:25,300 --> 00:50:30,580
query terms you fetch the entire column and pass it client side which defeats most of the performance

877
00:50:30,580 --> 00:50:34,980
advantage of storing the data in jason in the first place execute cosmos school query is the

878
00:50:34,980 --> 00:50:40,980
mechanism that resolves this it's a data verse action that passes cosmos db SQL syntax directly

879
00:50:40,980 --> 00:50:46,100
to the underlying storage layer allowing queries that reach inside jason column structures and return

880
00:50:46,100 --> 00:50:51,940
results based on properties nested within the document the syntax is cosmos SQL familiar to anyone

881
00:50:51,940 --> 00:50:56,340
who's worked with cosmos db directly and it operates against the actual document model that elastic

882
00:50:56,340 --> 00:51:01,860
tables use internally properties inside jason columns become queryable fields arrays can be iterated

883
00:51:01,860 --> 00:51:07,060
and filtered nested objects can be projected and compared the performance profile flips at scale

884
00:51:07,060 --> 00:51:12,180
community testing comparing execute cosmos sql query against the standard data verse line provider

885
00:51:12,180 --> 00:51:17,540
for the same data found that cosmos SQL queering 5000 records was faster than the standard provider

886
00:51:17,540 --> 00:51:22,020
for that same data set the standard provider is optimized for the relational query patterns that

887
00:51:22,020 --> 00:51:26,900
work well on SQL backed tables cosmos sql is optimized for the document model that elastic tables

888
00:51:26,900 --> 00:51:32,580
actually use as data set size grows that alignment with the underlying storage engine compounds into

889
00:51:32,580 --> 00:51:37,860
a meaningful throughput advantage for analytical and flexible filtering scenarios the two mode query

890
00:51:37,860 --> 00:51:42,820
architecture that emerges from this is explicit and intentional use standard data verse queries

891
00:51:42,820 --> 00:51:48,580
o data through the web api fetch xml inflows and plugins for operational crud and user facing

892
00:51:48,580 --> 00:51:52,980
filtering on typed columns those queries are fast familiar and integrate cleanly with every

893
00:51:52,980 --> 00:51:58,260
data verse tooling layer use execute cosmos sql query for queries that need to reach inside jason

894
00:51:58,260 --> 00:52:04,020
structures analytical queries over document contents flexible filtering on tenant specific attributes

895
00:52:04,020 --> 00:52:08,820
searches across array elements within a jason feed the two modes aren't in competition they serve

896
00:52:08,820 --> 00:52:13,140
different query patterns against the same table and knowing when to reach for each one is what makes

897
00:52:13,140 --> 00:52:18,660
the full capability of elastic tables accessible one constraint applies to cosmos sql queries with

898
00:52:18,660 --> 00:52:23,700
the same force it applies to standard data verse queries partition scoping a cosmos sql query

899
00:52:23,700 --> 00:52:27,860
that doesn't include the partition id in its where clause fans out across every physical partition in

900
00:52:27,860 --> 00:52:33,620
the container consuming request units proportional to the entire data set at small data volumes that's

901
00:52:33,620 --> 00:52:38,980
inconvenient at the scale where execute cosmos sql query becomes genuinely useful millions of

902
00:52:38,980 --> 00:52:44,260
records distributed across hundreds of tenant partitions an unscobbed query is expensive enough to

903
00:52:44,260 --> 00:52:49,380
produce throttling and latency that uses notice the partition filter belongs in every cosmos

904
00:52:49,380 --> 00:52:54,580
SQL query same as it belongs in every standard query for workloads that go beyond what portal side

905
00:52:54,580 --> 00:52:59,780
analytical queries can reasonably handle aggregate dashboards over large historical data sets trend

906
00:52:59,780 --> 00:53:04,820
analysis across months of activity data cross entity reporting that joins elastic table data with

907
00:53:04,820 --> 00:53:10,740
standard table records power bi direct query is the recommended exit it offloads query execution

908
00:53:10,740 --> 00:53:15,300
entirely from the portal runtime running against the elastic table data through the data verse connector

909
00:53:15,300 --> 00:53:20,580
without consuming portal request entitlements heavy analytical work belongs outside the portal layer

910
00:53:20,580 --> 00:53:25,460
and direct query gives you a supported integrated path to get it there the query architecture is now

911
00:53:25,460 --> 00:53:30,020
complete on both the operational and analytical sides what remains is the blueprint that connects

912
00:53:30,020 --> 00:53:36,100
every component described so far into a coherent deployable system the reference architecture all

913
00:53:36,100 --> 00:53:41,620
components connected everything described across the previous 14 sections is a component this section

914
00:53:41,620 --> 00:53:46,580
is the blueprint that shows how those components connect into a system that actually runs in production

915
00:53:46,580 --> 00:53:50,580
the architecture has three distinct layers they're worth naming explicitly before going deeper because

916
00:53:50,580 --> 00:53:55,220
the separation is not organizational convenience it's the boundary that determines what each layer is

917
00:53:55,220 --> 00:54:00,020
responsible for and what it's allowed to do the front end experience layer lives in power pages

918
00:54:00,020 --> 00:54:05,380
the data processing layer runs across server logic power automate and as your functions the storage

919
00:54:05,380 --> 00:54:10,180
layer splits between standard data verse tables and elastic tables each holding the data it suited

920
00:54:10,180 --> 00:54:15,380
for started the top power pages manages the authenticated user session when a contact logs in

921
00:54:15,380 --> 00:54:19,860
their identity establishes the tenant context the web roles assigned to that contact determine which

922
00:54:19,860 --> 00:54:24,820
tables they can see which records within those tables are visible to them and what operations they

923
00:54:24,820 --> 00:54:30,580
can perform table permissions enforce tenant scoping at the portal layer automatically the web

924
00:54:30,580 --> 00:54:35,860
API serves user facing reads and writes through the client side JavaScript that renders dashboards

925
00:54:35,860 --> 00:54:41,380
feeds and interactive components every web API call includes the tenant id filter at level one

926
00:54:41,380 --> 00:54:46,260
of the partition key hierarchy that's the query discipline from section 11 applied consistently

927
00:54:46,260 --> 00:54:52,740
across every page in the portal WF rate limiting on the API endpoint sits in front of all of this

928
00:54:52,740 --> 00:54:57,220
enforcing per client request limits before traffic reaches the data verse service layer

929
00:54:57,220 --> 00:55:01,940
and protecting the daily request pool for legitimate users the processing layer handles everything

930
00:55:01,940 --> 00:55:07,220
that requires throughput orchestration or cross entity logic server logic runs synchronous operations

931
00:55:07,220 --> 00:55:12,260
that need low latency and tight platform integration validation rules that span multiple tables pricing

932
00:55:12,260 --> 00:55:17,700
calculations authorization checks that go beyond what table permissions express these run server

933
00:55:17,700 --> 00:55:22,420
side close to data verse without the browser to portal round trip overhead that adds latency to

934
00:55:22,420 --> 00:55:27,380
client side calls power automate and azure functions handle the asynchronous and batch workloads

935
00:55:27,380 --> 00:55:32,900
q processing bulk ingestion via create multiple scheduled aggregations event driven workflows

936
00:55:32,900 --> 00:55:37,300
triggered by record changes service principle accounts authenticate these back end services

937
00:55:37,300 --> 00:55:42,100
against data verse keeping high volume right traffic out of the portals per user entitlement pool

938
00:55:42,100 --> 00:55:46,580
and under the environments tenant level capacity instead the storage layer reflects the data

939
00:55:46,580 --> 00:55:51,060
classification discipline established in section six standard data verse tables hold the core

940
00:55:51,060 --> 00:55:56,180
business entities that the platform is built on tenant configuration records subscription data

941
00:55:56,180 --> 00:56:00,900
user profiles contact records anything that participates in relational queries has lookups

942
00:56:00,900 --> 00:56:05,460
to other entities or requires the transactional consistency that sequel back storage provides

943
00:56:05,460 --> 00:56:11,140
these tables are not replaced by elastic tables that complemented by them elastic tables hold the

944
00:56:11,140 --> 00:56:17,220
high volume operational data that surrounds those business entities activity logs telemetry streams

945
00:56:17,220 --> 00:56:22,580
audit events user feed documents click stream records the two table types co exist in the same

946
00:56:22,580 --> 00:56:27,940
environment access through the same API surface governed by the same security model but optimized

947
00:56:27,940 --> 00:56:32,980
for completely different workload profiles tenant isolation runs through all three layers in a way

948
00:56:32,980 --> 00:56:38,660
that's consistent demonstrable and layered the hbk with tenant ID at level one enforces the boundary

949
00:56:38,660 --> 00:56:43,380
at the cosmos DB storage layer queries scope to a tenant physically can't reach another tenant's

950
00:56:43,380 --> 00:56:48,180
partitions business unit ownership enforces the boundary at the data verse platform layer uses

951
00:56:48,180 --> 00:56:52,900
in tenant a's business unit can't see records owned by tenant bees business unit table permissions

952
00:56:52,900 --> 00:56:58,180
scope to web roles enforce the boundary at the portal layer authenticated contacts only see data

953
00:56:58,180 --> 00:57:03,780
that their role assignments permit three independent mechanisms operating at three different layers

954
00:57:03,780 --> 00:57:08,180
all enforcing the same tenant boundary that's what layered isolation looks like when it's built into

955
00:57:08,180 --> 00:57:12,420
the architecture rather than bolted on afterward the monitoring stack connects the operational

956
00:57:12,420 --> 00:57:17,380
reality to the architectures theoretical properties power platform admin center shows request

957
00:57:17,380 --> 00:57:23,700
consumption and storage growth by environment the power pages monitor hub available in the 2026 wave

958
00:57:23,700 --> 00:57:30,420
one release surfaces page load time session duration error rates and the ability to trace API

959
00:57:30,420 --> 00:57:36,580
throttling events back to specific pages or user actions cosmos DB metrics show are you consumption

960
00:57:36,580 --> 00:57:41,300
by partition key range and flag hot partitions before they produce latency that reaches the user

961
00:57:41,300 --> 00:57:46,500
every component in this stack is generally available as of 2026 the integration points are documented

962
00:57:46,500 --> 00:57:51,300
the apis are stable and the production track record on each individual piece is established this

963
00:57:51,300 --> 00:57:56,100
isn't an experimental combination of preview features it's a supported architecture built from

964
00:57:56,100 --> 00:58:00,180
mature platform capabilities migrating to it from where most organizations are today is a different

965
00:58:00,180 --> 00:58:05,060
challenge entirely migrating legacy portal solutions to elastic architecture the blueprint is clear

966
00:58:05,060 --> 00:58:09,540
for greenfield implementations but most organizations reading this already have something running

967
00:58:09,540 --> 00:58:15,060
a power pages solution built six months ago or two years ago on standard data verse tables that

968
00:58:15,060 --> 00:58:19,780
made sense at the time the question isn't just whether to adopt this architecture it's how to get

969
00:58:19,780 --> 00:58:24,420
there from where you already are without breaking production migration starts with classification not

970
00:58:24,420 --> 00:58:28,420
with tooling before you touch a single table you need a complete inventory of what your existing

971
00:58:28,420 --> 00:58:32,900
solution stores and how it's accessed this is the analytical step that determines everything else

972
00:58:32,900 --> 00:58:37,700
and skipping it to move faster is the category error that produces migrations you have to redo

973
00:58:37,700 --> 00:58:42,980
go through your existing tables and assign each one to one of two buckets high volume append

974
00:58:42,980 --> 00:58:47,860
heavy data the stuff that accumulates continuously that uses don't typically update after the fact

975
00:58:47,860 --> 00:58:52,660
that grows linearly with platform usage belongs in elastic tables core business entities with

976
00:58:52,660 --> 00:58:57,620
complex relationships active update patterns and relational dependencies stay exactly where they are

977
00:58:57,620 --> 00:59:02,260
you're not migrating your entire data model you're extracting a specific workload category from the

978
00:59:02,260 --> 00:59:06,900
relational store and placing it in the engine that's actually suited for it once classification is

979
00:59:06,900 --> 00:59:11,540
complete partition key selection becomes the decision that determines the quality of the migration

980
00:59:11,540 --> 00:59:16,340
this is where the immutability constraint from section four bites hardest in a migration context

981
00:59:16,340 --> 00:59:20,420
because you're choosing based on historical query patterns from the existing solution patterns

982
00:59:20,420 --> 00:59:24,900
that may not perfectly reflect how the new elastic table will be queried going forward pull your

983
00:59:24,900 --> 00:59:29,300
actual usage data before deciding look at which fields appear most frequently in filters across

984
00:59:29,300 --> 00:59:33,540
your existing queries against the tables you're migrating look at which tenants generate the most

985
00:59:33,540 --> 00:59:37,380
data volume and whether any of them are approaching a scale that would stress a single level tenant

986
00:59:37,380 --> 00:59:42,260
in partition key that analysis takes a few hours and prevents a migration you'd otherwise have to

987
00:59:42,260 --> 00:59:47,860
redo in 18 months when an elephant tenant emerges two migration patterns cover the range of scenarios

988
00:59:47,860 --> 00:59:52,740
you'll actually encounter for systems that cannot tolerate downtime platforms with continuous

989
00:59:52,740 --> 00:59:58,260
ingestion live user traffic integrations that write records around the clock the online migration

990
00:59:58,260 --> 01:00:03,380
pattern using cosmos DB change feed processor is the right approach stand up the new elastic table

991
01:00:03,380 --> 01:00:08,260
alongside the existing standard table write new records to both simultaneously run the change

992
01:00:08,260 --> 01:00:12,980
feed processor to capture ongoing changes from the source and replicate them to the destination

993
01:00:12,980 --> 01:00:18,660
backfill historical data asynchronously into the new table while the dual right keeps both current

994
01:00:18,660 --> 01:00:23,700
when your validation process confirms that the destination contains complete and accurate data

995
01:00:23,700 --> 01:00:28,580
redirect reads to the elastic table then cut over rights then decommission the source the platform

996
01:00:28,580 --> 01:00:35,060
never goes down users experience nothing for smaller data sets offer solutions with a defined maintenance

997
01:00:35,060 --> 01:00:40,740
window available internal portals early stage SaaS platforms environments where scheduled outages

998
01:00:40,740 --> 01:00:46,660
acceptable offline migration is simpler and lower risk pause writes copy data validate switch the

999
01:00:46,660 --> 01:00:51,700
operational complexity of dual rights synchronization introduces its own failure modes when you can

1000
01:00:51,700 --> 01:00:56,260
avoid it you should after cut over the most common post migration failure isn't data integrity

1001
01:00:56,260 --> 01:01:00,820
its query patterns development teams update the table configuration update the security roles validate

1002
01:01:00,820 --> 01:01:05,620
the data and then forget to update every query in the application to include the partition ID filter

1003
01:01:05,620 --> 01:01:10,660
the queries that previously ran fine on a standard table now execute as cross partition scans on

1004
01:01:10,660 --> 01:01:15,220
the elastic table performance degrades the team can't immediately identify the source because

1005
01:01:15,220 --> 01:01:19,860
the data is correct and the security model is intact the issue is invisible in functional testing

1006
01:01:19,860 --> 01:01:24,900
and only appears under production data volumes ordered every query path in the application layer

1007
01:01:24,900 --> 01:01:29,460
before go live make the partition ID filter a code review requirement not an afterthought

1008
01:01:29,460 --> 01:01:33,940
security role reconfiguration rounds out the migration checklist business unit assignments for

1009
01:01:33,940 --> 01:01:38,260
all records in the migrated table must be verified web role mappings must be confirmed against

1010
01:01:38,260 --> 01:01:43,940
the new tables permission configuration ownership fields set during the migration backfill must reflect

1011
01:01:43,940 --> 01:01:48,900
the correct tenant business unit for every record throughput provisioning and are you management

1012
01:01:48,900 --> 01:01:53,620
the migration is complete the architecture is running now you're responsible for keeping it fast

1013
01:01:53,620 --> 01:01:59,060
and that responsibility has a specific currency inside cosmos DB request units every operation that

1014
01:01:59,060 --> 01:02:04,500
touches an elastic table consumes our use a point red costs a fraction of an are you a right cost more

1015
01:02:04,500 --> 01:02:10,100
depending on document size a query costs proportionally to the number of documents scanned and the complexity

1016
01:02:10,100 --> 01:02:15,540
of the filter expression cosmos DB doesn't charge you by time or by connection it charges by the

1017
01:02:15,540 --> 01:02:21,380
computational work each operation requires normalized into a unit that accounts for CPU memory

1018
01:02:21,380 --> 01:02:27,300
and IO simultaneously when you provision throughput you're reserving a rate of our use a ceiling on how

1019
01:02:27,300 --> 01:02:32,500
much work the system can do per second before it starts returning throttle responses throughput can

1020
01:02:32,500 --> 01:02:37,380
be provisioned at two levels per container or shared across a database for critical elastic tables

1021
01:02:37,380 --> 01:02:42,580
in a multi tenant SAS the choice is straightforward shared database throughput is a pool that all

1022
01:02:42,580 --> 01:02:47,940
containers draw from simultaneously with no guarantees for any individual table when one container

1023
01:02:47,940 --> 01:02:52,740
in the database has a spike it draws from the same pool as every other container your elastic table

1024
01:02:52,740 --> 01:02:57,220
gets whatever the pool has left which may be less than it needs per container throughput eliminates

1025
01:02:57,220 --> 01:03:02,100
that ambiguity the are use you provision for the elastic table belongs to that table other database

1026
01:03:02,100 --> 01:03:06,740
objects don't compete for it for workloads where consistent performance matters and it always

1027
01:03:06,740 --> 01:03:12,340
matters in production SAS dedicated container throughput is the right provisioning model auto scale is

1028
01:03:12,340 --> 01:03:17,300
the default recommendation for SAS workloads because most SAS traffic patterns aren't flat usage

1029
01:03:17,300 --> 01:03:22,100
peaks during business hours drops overnight spikes when a new feature ships or a large tenant imports

1030
01:03:22,100 --> 01:03:27,140
historical data manual provisioning forces you to set throughput at the peak and pay for it around

1031
01:03:27,140 --> 01:03:32,420
the clock whether the system needs it or not auto scale sets a maximum are you as the ceiling your

1032
01:03:32,420 --> 01:03:38,260
workload can reach and cosmos DB scales actual provision throughput between 10% of that maximum

1033
01:03:38,260 --> 01:03:43,140
and the full amount in response to real demand you pay for the highest level reached in each hourly

1034
01:03:43,140 --> 01:03:48,500
billing period during quiet periods cost drops automatically during load spikes capacity expands

1035
01:03:48,500 --> 01:03:53,060
without manual intervention burst capacity extends this model for microspikes that auto scale isn't

1036
01:03:53,060 --> 01:03:57,780
designed to catch each physical partition accumulates up to five minutes of idle throughput

1037
01:03:57,780 --> 01:04:02,820
unused are you capacity that didn't get consumed during low traffic periods when a sudden spike

1038
01:04:02,820 --> 01:04:07,540
arrives the physical partition draws on that accumulated headroom rather than immediately throttling

1039
01:04:07,540 --> 01:04:12,820
requests the spike gets absorbed users don't experience the 429 responses that would otherwise

1040
01:04:12,820 --> 01:04:17,620
appear when instantaneous demand briefly exceeds the provision rate for workloads with short sharp

1041
01:04:17,620 --> 01:04:23,300
bursts a user exporting a large data set a background job processing a cue batch a scheduled

1042
01:04:23,300 --> 01:04:28,820
aggregation firing at the top of the hour burst capacity is what makes the system feel consistently

1043
01:04:28,820 --> 01:04:35,060
responsive without requiring you to permanently over provision the baseline large ingestion events

1044
01:04:35,060 --> 01:04:39,540
a tenant onboarding with historical data a partner integration sending a years worth of records

1045
01:04:39,540 --> 01:04:45,940
in a single job benefit from a different operational pattern scale ruse up before the batch runs run the

1046
01:04:45,940 --> 01:04:51,300
ingestion at the elevated throughput level then scale back down when the job completes cosmos db

1047
01:04:51,300 --> 01:04:56,260
supports this as a standard operation the cost of the elevated ruse for a two hour batch window is

1048
01:04:56,260 --> 01:05:01,060
negligible compared to the cost of running at that level permanently this is operationally simple

1049
01:05:01,060 --> 01:05:05,540
and significantly faster than throttling through a large ingestion under conservative baseline

1050
01:05:05,540 --> 01:05:10,820
provisioning ongoing monitoring comes down to three metrics normalized are you consumption by partition

1051
01:05:10,820 --> 01:05:15,460
key range it tells you whether throughput is distributed evenly or concentrated on a small number

1052
01:05:15,460 --> 01:05:20,340
of physical partitions data storage by physical partition tells you whether your data distribution is

1053
01:05:20,340 --> 01:05:25,140
balanced throttled requests by partition range tell you where the system is hitting its ceiling when

1054
01:05:25,140 --> 01:05:29,540
these three metrics show concentration on a small number of partition ranges over sustained periods

1055
01:05:29,540 --> 01:05:34,580
the correct response is not to raise the ruse ceiling indefinitely it's to revisit the partition key

1056
01:05:34,580 --> 01:05:39,620
design because a throughput problem that originates from a hot partition can't be solved by adding more

1057
01:05:39,620 --> 01:05:45,140
throughput alone it requires the same migration decision discussed in section 16 applied now to

1058
01:05:45,140 --> 01:05:50,500
the partition strategy rather than the table type governance audit standards and compliance readiness

1059
01:05:50,500 --> 01:05:55,220
the architecture performs the security model holds the throughput is provisioned correctly none of

1060
01:05:55,220 --> 01:05:59,540
that matters to an enterprise procurement team if you can't demonstrate it in a form that satisfies

1061
01:05:59,540 --> 01:06:05,220
an audit enterprise SaaS customers in 2026 don't arrive at a security review with open minds and

1062
01:06:05,220 --> 01:06:10,500
good faith assumptions they arrive with questionnaires long ones prepared by security teams who've

1063
01:06:10,500 --> 01:06:14,980
seen enough multi tenant incidents to know exactly where the weak points are the questions about

1064
01:06:14,980 --> 01:06:19,700
data isolation aren't asking whether you believe tenants are separated they're asking for the technical

1065
01:06:19,700 --> 01:06:24,180
evidence that demonstrates separation is enforced by the architecture itself evidence that exists

1066
01:06:24,180 --> 01:06:29,460
independently of any developers correct behavior on any given day that distinction between assertion

1067
01:06:29,460 --> 01:06:34,420
and evidence is the governance gap that kills deals s or c two auditors work through a defined

1068
01:06:34,420 --> 01:06:40,020
set of concerns for multi tenant SaaS logical access controls how roles and permissions are structured

1069
01:06:40,020 --> 01:06:45,140
whether they're tenant scoped and whether they can be audited over time change management how

1070
01:06:45,140 --> 01:06:50,900
modifications to the low-code configuration security roles table permissions web role assignments

1071
01:06:50,900 --> 01:06:55,700
move through review and approval before reaching production system operations whether cross tenant

1072
01:06:55,700 --> 01:07:00,420
access is detectable when it occurs and what monitoring generates that detection for low-code

1073
01:07:00,420 --> 01:07:05,620
platform specifically auditors increasingly probe whether user created flows and integrations

1074
01:07:05,620 --> 01:07:10,580
are capable of bypassing the isolation controls the platform vendor claims are in place the layered

1075
01:07:10,580 --> 01:07:14,660
architecture described across the previous sections generates the answers to those questions as

1076
01:07:14,660 --> 01:07:19,380
structural properties rather than documentation claims hbk with tenant id at level one is the storage

1077
01:07:19,380 --> 01:07:24,420
layer evidence you can pull the container definition show the partition key hierarchy and demonstrate

1078
01:07:24,420 --> 01:07:28,740
mathematically that a query scope to tenant a's key range cannot return documents from tenant b's key

1079
01:07:28,740 --> 01:07:33,460
range that's not a policy claim it's a constraint baked into the storage topology business unit

1080
01:07:33,460 --> 01:07:38,180
security is the platform layer evidence you can export the security role configurations show

1081
01:07:38,180 --> 01:07:42,980
the read scope set to business unit for every elastic table in scope and trace a specific user's

1082
01:07:42,980 --> 01:07:47,700
access path from their contact record through their web role assignment to the business units that

1083
01:07:47,700 --> 01:07:53,460
define their data boundary wf logs are the network layer evidence rate limit events block requests

1084
01:07:53,460 --> 01:07:57,940
per client call patterns all captured all time stamped all attributable to specific clients and

1085
01:07:57,940 --> 01:08:03,140
endpoints three independent evidence streams each auditable through different tooling each enforcing

1086
01:08:03,140 --> 01:08:08,420
the same boundary at a different layer that's what provably unbi possible isolation looks like in

1087
01:08:08,420 --> 01:08:14,260
an audit package Jason columns require a specific governance chapter because they're the place where

1088
01:08:14,260 --> 01:08:19,700
the architectures audit posture is most easily eroded the security rule from section 13 sensitive

1089
01:08:19,700 --> 01:08:24,900
fields stay in typed columns Jason holds only non sensitive metadata becomes a formal governance

1090
01:08:24,900 --> 01:08:29,620
requirement at the audit layer not just a design preference auditors examining a data verse

1091
01:08:29,620 --> 01:08:34,660
environment with Jason columns will ask what schema validation exists how you know sensitive data

1092
01:08:34,660 --> 01:08:39,620
hasn't migrated into Jason payloads over time and what controls prevent a developer from storing

1093
01:08:39,620 --> 01:08:44,900
a personal identifier or credential inside a Jason field that isn't captured by column level

1094
01:08:44,900 --> 01:08:50,900
security or DLP policies the governance response has three components first document the Jason schema

1095
01:08:50,900 --> 01:08:55,620
formally maintain a written specification of what keys are permitted in each Jason column

1096
01:08:55,620 --> 01:09:01,540
what types they carry and what they're explicitly prohibited from containing second make that schema

1097
01:09:01,540 --> 01:09:06,660
a code review gate any change that writes new keys into a Jason column requires a review against the

1098
01:09:06,660 --> 01:09:13,380
specification before it ships third run periodic scans of Jason payloads in production export samples

1099
01:09:13,380 --> 01:09:18,020
pass them programmatically and validate that no key pattern matches sensitive field signatures

1100
01:09:18,020 --> 01:09:22,340
this doesn't need to be continuous quarterly is sufficient for most audit frameworks the point is

1101
01:09:22,340 --> 01:09:26,660
that you can demonstrate the control exists and has been executed which is what an auditor needs to

1102
01:09:26,660 --> 01:09:31,540
take the box tenant data deletion is where this architectures compliance posture pays an unexpected

1103
01:09:31,540 --> 01:09:36,420
dividend GDPR and similar regulations require that a departing tenants data be removed completely

1104
01:09:36,420 --> 01:09:41,060
and demonstrably on this architecture that requirement has a clean technical answer T.T.L. policies

1105
01:09:41,060 --> 01:09:45,540
on elastic tables handle automatic expiration for operational data records that were always

1106
01:09:45,540 --> 01:09:50,740
intended to be time bounded or gone when their T.T.L. elapses without manual intervention business

1107
01:09:50,740 --> 01:09:55,700
unit scope deletion removes all records tied to a departing tenants business unit in a single

1108
01:09:55,700 --> 01:10:00,180
administrative operation the deletion is auditable the scope is defined by the same structure that

1109
01:10:00,180 --> 01:10:04,740
enforced isolation during active tenancy and the evidence of completion is the absence of records

1110
01:10:04,740 --> 01:10:08,820
in a scoped query the platform that can be trusted is the platform that can be audited

1111
01:10:08,820 --> 01:10:14,260
scaling the architecture from hundreds to thousands of tenants the architecture that works for

1112
01:10:14,260 --> 01:10:19,700
50 tenants doesn't automatically work for 5,000 not because the components change the hbk structure

1113
01:10:19,700 --> 01:10:23,940
the business unit model the web API patterns are all still correct but because the operational

1114
01:10:23,940 --> 01:10:27,940
assumptions that held its smaller scale stop holding when the tenant population grows large enough

1115
01:10:27,940 --> 01:10:33,060
to introduce genuine heterogeneity a platform with 50 tenants probably has a relatively uniform

1116
01:10:33,060 --> 01:10:38,980
distribution 5,000 tenants almost certainly doesn't some are tiny some are enormous and the occasional

1117
01:10:38,980 --> 01:10:43,300
one becomes an enterprise customer whose usage profile is categorically different from the rest

1118
01:10:43,300 --> 01:10:48,420
of the base this is the tier tenant reality that mature SaaS platforms eventually confront

1119
01:10:48,420 --> 01:10:52,580
and it's worth planning for explicitly rather than discovering it under pressure when your

1120
01:10:52,580 --> 01:10:58,180
largest customer starts generating latency complaints for small and medium tenants the majority of

1121
01:10:58,180 --> 01:11:04,020
your base at any realistic scale the shared container model with hbk works exactly as designed the partition

1122
01:11:04,020 --> 01:11:09,460
key distributes their data automatically across logical sub partitions their combined data volume

1123
01:11:09,460 --> 01:11:14,500
never approaches the 20gb ceiling for any individual partition key combination the are you distribution

1124
01:11:14,500 --> 01:11:18,580
is naturally balanced because no single small tenant concentrates enough traffic to saturate a

1125
01:11:18,580 --> 01:11:23,220
physical partition you onboard them provision their business unit assign their users to the correct

1126
01:11:23,220 --> 01:11:28,100
roles and the architecture handles their growth without manual intervention that's the benefit of

1127
01:11:28,100 --> 01:11:32,740
building the partitioning model correctly at the start the system scales passively for the typical

1128
01:11:32,740 --> 01:11:38,100
tenant freeing operational attention for the a typical ones large enterprise tenants require a

1129
01:11:38,100 --> 01:11:42,580
deliberate decision at the point where a single tenant's data volume or traffic pattern represents

1130
01:11:42,580 --> 01:11:47,140
a meaningful fraction of the shared containers total load the economics and isolation guarantees of

1131
01:11:47,140 --> 01:11:52,020
shared infrastructure no longer match what that customer expects dedicated containers give that

1132
01:11:52,020 --> 01:11:56,420
tenant their own throughput pool independent of what every other tenant on the platform is doing

1133
01:11:56,420 --> 01:12:01,380
dedicated cosmos db accounts go further providing complete infrastructure separation independent

1134
01:12:01,380 --> 01:12:06,660
sLA guarantees and the ability to provision scale and monitor that tenants environment without any

1135
01:12:06,660 --> 01:12:11,540
interaction with the shared tenant population that level of isolation also unlocks commercial

1136
01:12:11,540 --> 01:12:16,180
conversations enterprise customers often pay for dedicated infrastructure explicitly and the

1137
01:12:16,180 --> 01:12:20,260
architecture makes that tier commercially viable without requiring a fundamentally different

1138
01:12:20,260 --> 01:12:25,060
technical approach the transition between tiers is where most platforms stumble and the

1139
01:12:25,060 --> 01:12:29,780
stumbling point is almost always identifier consistency if your tenant identifiers are stable and

1140
01:12:29,780 --> 01:12:34,900
portable across tiers the same tenant ID value used in the shared container is used in the dedicated

1141
01:12:34,900 --> 01:12:39,620
container the same business unit structure applies then migrating a tenant from shared to

1142
01:12:39,620 --> 01:12:44,500
dedicated is a data movement operation the API surface doesn't change the portal doesn't change the

1143
01:12:44,500 --> 01:12:48,820
security model doesn't change uses experience nothing different on the day their organization moves

1144
01:12:48,820 --> 01:12:53,860
to a dedicated tier if identifiers were generated differently per tier or if the partition key

1145
01:12:53,860 --> 01:12:58,820
structure varies between shared and dedicated infrastructure the migration becomes an application

1146
01:12:58,820 --> 01:13:04,420
rewrite not a data migration design for portability from the beginning because retrofitting it is

1147
01:13:04,420 --> 01:13:08,740
painful tenant provisioning automation is the operational threshold that determines whether growth is

1148
01:13:08,740 --> 01:13:14,180
manageable manual provisioning creating business units configuring security roles assigning partition

1149
01:13:14,180 --> 01:13:19,460
key values works it dozens of tenants it becomes error prone in the hundreds and genuinely dangerous

1150
01:13:19,460 --> 01:13:24,820
at a thousand where a single misconfigured business unit assignment or an incorrectly set partition

1151
01:13:24,820 --> 01:13:30,100
key could expose one tenant's records to another the automation threshold that most practitioners

1152
01:13:30,100 --> 01:13:35,220
identify is somewhere around a thousand tenants get automated provisioning working before you reach

1153
01:13:35,220 --> 01:13:40,660
it not after the security risk of a provisioning error scales with tenant count and the operational overhead

1154
01:13:40,660 --> 01:13:45,220
of diagnosing and correcting manual configuration mistakes at that scale is substantial the monitoring

1155
01:13:45,220 --> 01:13:49,460
architecture grows with the tenant population in a way that's worth naming per environment dashboards

1156
01:13:49,460 --> 01:13:53,540
in power platform admin center give you the environment wide view per partition metrics and

1157
01:13:53,540 --> 01:13:58,580
cosmos DB tell you which tenants are approaching limits before those limits produce incidents application

1158
01:13:58,580 --> 01:14:03,860
level telemetry tagged with tenant i e gives you the tenant specific signal that neither platform

1159
01:14:03,860 --> 01:14:09,300
dashboard surfaces on its own all three layers are required simultaneously any one of them alone

1160
01:14:09,300 --> 01:14:14,660
leaves a blind spot that shows up at the worst possible moment real world performance what the numbers

1161
01:14:14,660 --> 01:14:19,140
actually look like the architecture has been described in components and principles what it actually

1162
01:14:19,140 --> 01:14:23,700
delivers in production deserves a direct accounting not aspirational language about what's

1163
01:14:23,700 --> 01:14:28,340
theoretically possible but a grounded picture of what the numbers look like when everything is

1164
01:14:28,340 --> 01:14:33,380
built correctly and what happens when it isn't start with the read side because that's what users

1165
01:14:33,380 --> 01:14:38,660
experience directly sub-second latency for user facing reads is achievable not as a best case

1166
01:14:38,660 --> 01:14:43,460
outcome under ideal conditions as a consistent operational characteristic when the three requirements

1167
01:14:43,460 --> 01:14:49,060
from earlier sections are met simultaneously the query includes the partition filter the result set

1168
01:14:49,060 --> 01:14:54,820
is shaped with select and top and repeated data is served from the web API server side cache rather

1169
01:14:54,820 --> 01:15:00,180
than triggering a fresh cosmos DB query for each session when those conditions hold the response time

1170
01:15:00,180 --> 01:15:04,580
from the portals perspective is bounded by network latency and browser rendering not by database

1171
01:15:04,580 --> 01:15:08,900
processing that's what fast looks like at this layer when any of those conditions breaks down the

1172
01:15:08,900 --> 01:15:14,740
degradation is steep and measurable across partition scan on an elastic table holding 50 million records

1173
01:15:14,740 --> 01:15:19,540
across several hundred tenons doesn't return in one second it returns when cosmos DB finishes

1174
01:15:19,540 --> 01:15:23,540
scanning every physical partition which scales with data set size in ways that produce the kind of

1175
01:15:23,540 --> 01:15:29,940
latency users describe as the page doesn't load the gap between a partition scoped query and an

1176
01:15:29,940 --> 01:15:34,340
unscoped one isn't a percentage improvement it's the difference between a sub-second response

1177
01:15:34,340 --> 01:15:38,820
and a multi-second time out this is why the query discipline from section 11 is an architectural

1178
01:15:38,820 --> 01:15:43,700
constraint not a performance tip the right pipeline behaves differently the two to ten times bulk

1179
01:15:43,700 --> 01:15:47,460
throughput improvement that elastic tables deliver over standard tables is not a property of the

1180
01:15:47,460 --> 01:15:53,380
table type alone it's a property of the combination of table type and batching discipline single

1181
01:15:53,380 --> 01:15:57,060
record rights issued one at a time through the portal web API produce essentially the same

1182
01:15:57,060 --> 01:16:02,020
throughput on an elastic table as on a standard table the cosmos DB storage engines distribution

1183
01:16:02,020 --> 01:16:06,980
advantage only materializes when create multiple assembles enough records in a single request to

1184
01:16:06,980 --> 01:16:11,540
actually distribute across partitions in parallel the performance gain is real it requires the

1185
01:16:11,540 --> 01:16:16,500
backend architecture described in section 12 to realize it teams that migrate to elastic tables

1186
01:16:16,500 --> 01:16:21,380
without changing their right patterns observe no throughput improvement and conclude the tables

1187
01:16:21,380 --> 01:16:25,460
didn't help they just didn't change the part of the architecture that needed to change

1188
01:16:26,340 --> 01:16:31,140
API request consumption remains the practical ceiling that shapes every design decision in this

1189
01:16:31,140 --> 01:16:36,340
architecture a portal with 1000 authenticated uses operates against a daily request pool of roughly

1190
01:16:36,340 --> 01:16:41,060
400,000 data versus interactions that sounds large until you calculate what a single dashboard

1191
01:16:41,060 --> 01:16:46,500
page costs if it's poorly designed eight queries on page load each returning full row sets without

1192
01:16:46,500 --> 01:16:52,500
select none of them cashed all of them running on every visit a hundred users accessing that page

1193
01:16:52,500 --> 01:16:57,700
across a workday consumes a substantial fraction of the daily pool before any other portal functionality

1194
01:16:57,700 --> 01:17:02,900
runs the arithmetic is what makes disciplined page design a business requirement not an engineering

1195
01:17:02,900 --> 01:17:07,540
preference storage cost at the scale this architecture is designed for is worth stating concretely

1196
01:17:07,540 --> 01:17:12,340
because the gap is large enough to affect sass unit economics a platform storing one terabyte

1197
01:17:12,340 --> 01:17:18,180
of elastic table data at log capacity pricing pays approximately $10,000 per month the same volume

1198
01:17:18,180 --> 01:17:23,860
in standard dataverse database capacity would cost over $40,000 that's $30,000 per month

1199
01:17:23,860 --> 01:17:28,100
in structural cost difference not from any optimization not from compression or cleanup just from

1200
01:17:28,100 --> 01:17:32,260
choosing the right table type and purchasing the log capacity add-ons that make the pricing tier

1201
01:17:32,260 --> 01:17:37,700
apply cross region performance requires an explicit decision rather than a default assumption

1202
01:17:37,700 --> 01:17:42,660
cosmos DB multi-region replication exists and works well but it adds cost that scales linearly

1203
01:17:42,660 --> 01:17:46,820
with the number of regions configured a three region deployment costs roughly three times the

1204
01:17:46,820 --> 01:17:51,460
single region storage cost that's appropriate when your tenant base is genuinely globally distributed

1205
01:17:51,460 --> 01:17:56,100
and latency to a distant primary region is measurable in user experience it's unnecessary overhead when

1206
01:17:56,100 --> 01:18:01,140
your users are concentrated in one geography and the second region provides only theoretical resilience

1207
01:18:01,140 --> 01:18:06,100
that nobody is actually paying for let actual user geography drive the replication decision

1208
01:18:06,100 --> 01:18:10,980
not aspirational global availability language in a product brief the honest performance picture is

1209
01:18:10,980 --> 01:18:15,780
this the architecture handles millions of records across hundreds of tenants with consistent fast

1210
01:18:15,780 --> 01:18:20,900
reads and high throughput batch rights when it's built to spec team capabilities and organizational

1211
01:18:20,900 --> 01:18:25,620
readiness the architecture is sound the numbers are real the governance story holds up under

1212
01:18:25,620 --> 01:18:30,900
audit scrutiny and yet teams fail to implement this correctly all the time not because the technology is

1213
01:18:30,900 --> 01:18:36,660
too complex but because they misclassify what kind of problem they're solving and staff accordingly

1214
01:18:36,660 --> 01:18:40,340
this architecture lives at the boundary between two disciplines that rarely share a team

1215
01:18:40,340 --> 01:18:44,580
low-code platform development and cloud infrastructure engineering have different mental models

1216
01:18:44,580 --> 01:18:49,060
different tooling vocabularies and different instincts about where risk originates most

1217
01:18:49,060 --> 01:18:53,220
organizations have people who are strong in one of those disciplines very few have people who

1218
01:18:53,220 --> 01:18:58,580
are fluent in both that gap is where implementations break down and naming it explicitly is more useful

1219
01:18:58,580 --> 01:19:03,460
than pretending the learning curve doesn't exist power platform developers the people who build

1220
01:19:03,460 --> 01:19:07,700
dataverse data models configure security roles design web roles and table permissions

1221
01:19:07,700 --> 01:19:12,660
write liquid and power effects need to internalize cosmos DB partitioning principles even though

1222
01:19:12,660 --> 01:19:18,580
they'll never open a cosmos DB blade directly the partition id they assign on an elastic table

1223
01:19:18,580 --> 01:19:24,420
behaves exactly like a cosmos DB partition key because it is one the 20 gigabyte logical partition

1224
01:19:24,420 --> 01:19:30,340
ceiling the hot partition failure mode the requirement that every query include the partition filter

1225
01:19:30,340 --> 01:19:34,980
these aren't dataverse concepts dressed in new language their cosmos DB constraints

1226
01:19:34,980 --> 01:19:39,860
surfaced through the dataverse API layer a developer who understands elastic tables as faster

1227
01:19:39,860 --> 01:19:44,340
standard tables will make partition key decisions with the same casualness they apply to a dataverse

1228
01:19:44,340 --> 01:19:48,740
column default and produce an architecture that degrades at exactly the moment scale arrives

1229
01:19:48,740 --> 01:19:53,460
cloud architects who approach this from the Azure infrastructure side face the inverse problem

1230
01:19:53,460 --> 01:19:58,100
they understand cosmos DB partitioning are you provisioning and multi-region replication

1231
01:19:58,100 --> 01:20:03,380
intuitively what they consistently underestimate is the power platform security model as the actual

1232
01:20:03,380 --> 01:20:07,940
isolation mechanism business units aren't a lightweight approximation of vnet's web roles aren't

1233
01:20:07,940 --> 01:20:11,940
an approximate substitute for our back these are the constructs that enforce tenant boundaries in

1234
01:20:11,940 --> 01:20:16,580
this architecture and an architect who roots around them because they look unfamiliar or tries to

1235
01:20:16,580 --> 01:20:21,380
replicate their function using Azure native access controls sitting outside the dataverse layer

1236
01:20:21,380 --> 01:20:26,260
ends up with a system where the platforms built in isolation mechanisms and the custom built isolation

1237
01:20:26,260 --> 01:20:30,740
mechanisms are out of sync that inconsistency is exactly where security gaps appear under audit

1238
01:20:30,740 --> 01:20:35,060
the governance layer requires a third capability that neither developer profile naturally covers

1239
01:20:35,940 --> 01:20:42,260
translating s o k 2 common criteria requirements or ISO 27001 annex a controls into dataverse security

1240
01:20:42,260 --> 01:20:46,980
role configurations field security profiles and audit evidence packages requires someone who

1241
01:20:46,980 --> 01:20:50,820
reads compliance frameworks and understands the technical architecture well enough to map one

1242
01:20:50,820 --> 01:20:55,940
to the other that's a rare combination in a single person on most teams is a structured collaboration

1243
01:20:55,940 --> 01:21:00,420
a compliance specialist who owns the framework requirements working closely with the power platform

1244
01:21:00,420 --> 01:21:04,740
architect who owns the configuration the collaboration only works when both sides understand

1245
01:21:04,740 --> 01:21:09,540
enough of the others language to have a productive conversation operational monitoring sits at the

1246
01:21:09,540 --> 01:21:15,220
same intersection the signals that matter live in two systems simultaneously power platform admin

1247
01:21:15,220 --> 01:21:20,260
center for request consumption and storage trends cosmos DB metrics for partition level throughput

1248
01:21:20,260 --> 01:21:24,900
and hotspot detection a monitoring function that only watches one of those systems is blind to the

1249
01:21:24,900 --> 01:21:29,380
failure modes that originate in the other the team model that consistently works is three roles

1250
01:21:29,380 --> 01:21:34,420
with clear ownership boundaries a power platform architect who owns the data model and security design

1251
01:21:34,420 --> 01:21:39,140
a cloud engineer who owns throughput provisioning and partition health monitoring and a governance lead

1252
01:21:39,140 --> 01:21:43,700
who owns the audit evidence and compliance documentation the roles can overlap in a small team

1253
01:21:43,700 --> 01:21:49,460
what can't overlap is the accountability the future of enterprise low code SaaS on power pages

1254
01:21:49,460 --> 01:21:54,020
the architecture works today the question worth spending time on before closing is where it goes

1255
01:21:54,020 --> 01:21:58,420
from here because the trajectory of the platform determines how much of what you build now remains

1256
01:21:58,420 --> 01:22:04,980
relevant in three years the 2026 wave one release for power pages is not a cosmetic update server

1257
01:22:04,980 --> 01:22:10,340
logic is expanding in scope and capability a lm tooling is maturing and the monitor hub gives teams

1258
01:22:10,340 --> 01:22:16,340
visibility into portal performance that previously required custom instrumentation each of those changes

1259
01:22:16,340 --> 01:22:21,060
directly strengthens the architecture described across the previous sections better server logic

1260
01:22:21,060 --> 01:22:26,020
means more business logic can run close to dataverse without requiring an azure function as an intermediary

1261
01:22:26,740 --> 01:22:31,620
better a lm means the security roles table permission configurations and partition key decisions

1262
01:22:31,620 --> 01:22:36,980
that form the core of this architecture can move through environments with the same rigor as code

1263
01:22:36,980 --> 01:22:41,300
better monitoring means hot partitions and throttle requests surface faster before they become

1264
01:22:41,300 --> 01:22:45,700
production incidents Microsoft's strategic direction is legible in those investments more

1265
01:22:45,700 --> 01:22:50,740
capability is moving server side more platform intelligence is moving closer to dataverse the distance

1266
01:22:50,740 --> 01:22:55,060
between where the low code platform operates and where enterprise grade infrastructure operates

1267
01:22:55,060 --> 01:22:59,940
is shrinking deliberately this architecture is positioned exactly where that convergence is heading

1268
01:22:59,940 --> 01:23:04,020
the a i dimension is arriving in a form that's practically relevant rather than aspirational

1269
01:23:04,020 --> 01:23:09,460
co-pilot integration with power pages is developing awareness of web roles table permissions and web

1270
01:23:09,460 --> 01:23:13,940
API patterns the configuration surfaces that determine whether this architecture is implemented

1271
01:23:13,940 --> 01:23:18,900
correctly a i assisted configuration won't replace the architectural decisions described in this

1272
01:23:18,900 --> 01:23:23,700
episode choosing a partition key correctly structuring a business unit hierarchy to match a tenant

1273
01:23:23,700 --> 01:23:28,420
model deciding which data belongs in json and which belongs in typed columns those remain judgment

1274
01:23:28,420 --> 01:23:33,220
calls that require understanding the workload what AI assistance does is lower the barrier to

1275
01:23:33,220 --> 01:23:37,540
implementing the configuration that executes those decisions correctly teams who understand

1276
01:23:37,540 --> 01:23:42,180
the architecture but struggle with the configuration details get faster to a working implementation

1277
01:23:42,180 --> 01:23:46,180
that changes who can build this even if it doesn't change what they're building elastic tables are

1278
01:23:46,180 --> 01:23:50,180
fully generally available the preview risk that might have justified hesitation two years ago is

1279
01:23:50,180 --> 01:23:54,420
gone production deployments are running the storage pricing model shifted to lock capacity

1280
01:23:54,420 --> 01:23:58,820
billing the partitioning capabilities are stable and execute cosmos sql query is documented

1281
01:23:58,820 --> 01:24:03,380
and supported the platform is ready for the workload this architecture targets the honest limitation

1282
01:24:03,380 --> 01:24:08,500
belongs in this conversation to this architecture is not the right answer for every sass product if

1283
01:24:08,500 --> 01:24:13,700
your data model is deeply relational complex joins across many entities hierarchical rollups

1284
01:24:13,700 --> 01:24:18,820
transactional workflows that span multiple tables and require asset guarantees across all of them

1285
01:24:18,820 --> 01:24:24,660
the sequel backed standard table model is still the correct foundation if your analytical workloads

1286
01:24:24,660 --> 01:24:29,860
require a lap query patterns over large historical data sets a dedicated data warehouse remains the

1287
01:24:29,860 --> 01:24:34,020
right tool if your product requires infrastructure level customization that the power platform simply

1288
01:24:34,020 --> 01:24:38,580
doesn't expose custom networking container orchestration workload isolation below the platform

1289
01:24:38,580 --> 01:24:43,460
abstraction a custom Azure stack is still the right choice platform constraints in exchange for

1290
01:24:43,460 --> 01:24:48,740
manage security compliance and integration is a trade that makes sense for specific product profile

1291
01:24:48,740 --> 01:24:53,780
not universally the opportunity this architecture opens is specific for sass products built on

1292
01:24:53,780 --> 01:25:01,540
Microsoft ecosystem data dynamics 365 integration Microsoft 365 identity power platform automation

1293
01:25:01,540 --> 01:25:07,140
as a first class feature the ceiling that most architects assumed existed was never a platform limit

1294
01:25:07,140 --> 01:25:11,940
it was a design assumption the tools to build past it have been available most teams just hadn't

1295
01:25:11,940 --> 01:25:16,980
assembled them in the right order power pages with elastic tables and hierarchical partition keys

1296
01:25:16,980 --> 01:25:21,860
is a legitimate enterprise sass engine the ceiling was always a design assumption not a platform

1297
01:25:21,860 --> 01:25:27,380
constrained here's where to start identify one high volume table in your current dataverse solution

1298
01:25:27,380 --> 01:25:33,220
activity logs audit records telemetry click stream events and evaluate whether it belongs in an

1299
01:25:33,220 --> 01:25:38,740
elastic table with a tenant aware partition key that single decision made correctly is the

1300
01:25:38,740 --> 01:25:43,300
beginning of this architecture if this changed how you think about the power platforms architectural

1301
01:25:43,300 --> 01:25:49,780
range subscribe to the m365 fm podcast every episode operates at this level of technical depth

1302
01:25:49,780 --> 01:25:54,980
across the Microsoft ecosystem connect with me milcopeters on LinkedIn share what you're building

1303
01:25:54,980 --> 01:26:00,020
and help shape what comes next and if this episode delivered value leave a review it helps more

1304
01:26:00,020 --> 01:26:04,020
architects find the show

Mirko Peters Profile Photo

Founder of m365.fm, m365.show and m365con.net

Mirko Peters is a Microsoft 365 expert, content creator, and founder of m365.fm, a platform dedicated to sharing practical insights on modern workplace technologies. His work focuses on Microsoft 365 governance, security, collaboration, and real-world implementation strategies.

Through his podcast and written content, Mirko provides hands-on guidance for IT professionals, architects, and business leaders navigating the complexities of Microsoft 365. He is known for translating complex topics into clear, actionable advice, often highlighting common mistakes and overlooked risks in real-world environments.

With a strong emphasis on community contribution and knowledge sharing, Mirko is actively building a platform that connects experts, shares experiences, and helps organizations get the most out of their Microsoft 365 investments.