The Pro-Code Edge: Architecting Copilot Plugins with Azure Functions for Developers


This episode of The Pro Code Edge explores how developers can extend Microsoft 365 Copilot with custom plugins powered by Azure Functions. The discussion focuses on moving beyond out-of-the-box capabilities to create tailored enterprise solutions that connect Copilot with business systems, APIs, and proprietary data.
Azure Functions are presented as an ideal platform for Copilot extensibility due to their serverless nature, scalability, and cost efficiency. By exposing business logic through secure APIs, developers can enable Copilot to retrieve information, execute processes, and interact with external applications using natural language.
The hosts emphasize that successful Copilot plugins require strong architectural foundations. Key considerations include authentication with Microsoft Entra ID, authorization, security, monitoring, error handling, and governance. Enterprise-grade solutions must be designed with reliability, maintainability, and compliance in mind from the beginning.
The episode also highlights the value of a pro-code approach. While low-code tools are useful for simpler scenarios, developers can use Azure Functions to implement advanced logic, integrate complex systems, and deliver highly customized user experiences. This flexibility allows organizations to unlock business-specific use cases that standard Copilot capabilities cannot address.
Additional insights cover API design, scalability, observability, and operational best practices. The conversation demonstrates how thoughtful architecture ensures that plugins remain secure, performant, and adaptable as business requirements evolve.
Overall, the episode shows how combining Microsoft 365 Copilot with Azure Functions enables organizations to create intelligent, conversational experiences that streamline workflows, surface critical information, and connect users directly with enterprise systems through AI-driven interactions.
Imagine you need to automate a business process in your organization. You try a low-code platform but hit a wall when you need advanced customization, strong security, and complex validation. Here’s why:
| Limitation | Description | Impact on Enterprises |
|---|---|---|
| Customization Constraints | Limits unique UI or algorithm development | 40% of managers face this challenge |
| Vendor Lock-In | Hard to migrate applications | 62% of IT leaders worry about this |
| Scalability Questions | Struggles with heavy loads or logic | Performance issues can arise |
| Shadow IT Risk | Apps built without IT oversight | 42% of managers see this as a problem |
| Security Blind Spots | Security gaps from lack of oversight | Increased enterprise risk |
You unlock the pro-code edge with Microsoft 365 Copilot Plugins powered by Azure Functions. This platform lets you combine the orchestration of Power Platform with the extensibility of pro-code, giving you the tools to build secure, scalable AI solutions that Microsoft Copilot can execute.
Key Takeaways
- Pro-code development offers advanced customization and control, allowing you to tailor solutions to your specific business needs.
- Using Microsoft 365 Copilot Plugins with Azure Functions enhances automation capabilities, enabling real-time data processing and integration.
- The Flex Consumption model in Azure Functions allows for efficient scaling, ensuring your plugins perform well under varying loads.
- Implementing strong security measures, such as HTTPS and Azure Key Vault, protects your data and maintains compliance with industry standards.
- Modular function design simplifies code management, making it easier to test, update, and scale your plugins effectively.
- Local testing with tools like Postman helps catch errors early, ensuring your plugins meet requirements before deployment.
- Regular monitoring and diagnostics with Azure Application Insights help you identify performance issues and improve reliability.
- Following a structured architecture checklist ensures you cover all essential aspects of plugin development, leading to successful enterprise automation.
Pro-Code Edge for Developers
Beyond Low-Code: Why Pro-Code Matters
You want to build solutions that go beyond the basics. Low-code platforms help you move fast, but they can limit your options when you need full control. The pro-code edge gives you the power to design, customize, and scale your automation exactly how you want.
Consider these differences:
- Customization and Control: You can create unique logic and interfaces that fit your business needs. Low-code tools offer speed, but pro-code lets you fine-tune every detail.
- Developmental Cost: Pro-code solutions may take more time and resources, but they deliver long-term flexibility. Low-code gets you started quickly, but complex needs often require deeper investment.
- Integration and Interfaces: You can connect to any system or build from scratch. Low-code works well with existing apps, but pro-code opens up more possibilities.
- Data and Infrastructure: You have full control over data flow and infrastructure. Low-code may need extra backend work for heavy data tasks.
With the pro-code edge, you can solve problems that low-code cannot address. You can orchestrate complex workflows, validate data across multiple systems, and ensure your solution meets enterprise standards.
Developer Empowerment with Copilot Plugins
You gain new capabilities when you use Copilot Plugins with Azure Functions. You can implement custom logic, automate specialized tasks, and connect to external systems in real time. This means you can build plugins that respond to natural language requests and deliver instant results.
Here is how Copilot Plugins empower you:
- You can create plugins that perform unique tasks, tailored to your organization.
- You can retrieve and process data from many sources, all within a secure environment.
- You can use administrator controls to manage which plugins are active, improving security and predictability.
- You can connect security tools and automate security programs, making your environment safer and more efficient.
To build robust Copilot Plugins, you need to blend low-code and pro-code tools. You also need to work with business users to understand their needs and translate them into technical solutions. By mastering these skills, you become the bridge between business goals and technical execution.
Enterprise Automation Use Cases
Copilot Plugins built with Azure Functions address a wide range of enterprise automation needs. You can use them in sales, marketing, HR, finance, IT, and operations. The table below shows common use cases:
| Business Function | Automation Use Cases |
|---|---|
| Sales | CRM insights, sales deck creation |
| Marketing | Campaign calendar, follow-up emails |
| HR | Onboarding, policy validation |
| Finance | Procurement approvals, risk reviews |
| IT | Helpdesk support, cross-system coordination |
| Operations | Document automation, process visibility |
Organizations see real results with Copilot Plugins. For example, Jamf reached over 70% adoption. Mercari cut IT tickets by 74%. Hearst resolved over 50% of support issues. These numbers show how the pro-code edge transforms business operations.
You can also boost efficiency with advanced orchestration and multi-system validation. Companies using multi-agent systems complete tasks 35% faster and need less human intervention. You can manage complex workflows and coordinate specialized AI agents, which gives you a clear advantage over low-code solutions.
By choosing the pro-code edge, you unlock the full potential of Microsoft Copilot Plugins and Azure Functions. You can build solutions that scale, adapt, and deliver real value to your organization.
Azure Functions Architecture

Serverless Design for Plugins
You can use Azure Functions to power your Copilot Plugins with a true serverless approach. This means you do not need to manage servers or worry about infrastructure. You write your code, deploy it, and let the platform handle the rest. Azure Functions scale automatically based on demand. If your plugin receives many requests, the system creates more instances to handle the load. When demand drops, it reduces resources to save costs.
This architecture gives you flexibility. You can focus on building business logic instead of maintaining hardware. Azure Functions work well for plugins that need to run code in response to events, such as HTTP requests or messages from other systems. You can connect your plugins to many services and APIs, making your solutions more powerful.
Azure Functions provide on-demand backend logic. You can use them for heavy calculations, secure data processing, and third-party API integrations. This makes them a strong choice for enterprise automation.
Flex Consumption Model
The Flex Consumption model changes how you use cloud resources. You can scale each function independently. This means you get better resource use and faster response times. You can adjust instance sizes and concurrency settings to match your workload. The 'always ready' feature keeps instances running, so your plugins respond instantly.
Here are some important points about the Flex Consumption model:
- You can scale each function as needed, which improves performance.
- You can choose different instance sizes and set how many requests each instance handles.
- The 'always ready' feature reduces cold start delays by keeping instances active.
- Enabling VNet injection adds very little delay, only about 37 milliseconds at the 50th percentile.
- Lower concurrency means more instances, which helps your app scale quickly when needed.
You can use these features to build plugins that handle large workloads without slowdowns. The Flex Consumption model helps you deliver a smooth experience for users, even during busy times.
OpenAPI and Integration Patterns
You can connect your Azure Functions to external systems using OpenAPI. OpenAPI describes your API in a standard way, so other tools and services can understand how to call your functions. This makes integration easier and more reliable.
Here is a table to help you decide when to use certain integration patterns:
| When to use this pattern | When not to use this pattern |
|---|---|
| You work with client-side code, like browser applications, and those constraints make callback endpoints difficult to provide. | You can use a service built for asynchronous notifications instead, like Azure Event Grid. |
| You call a service that uses only the HTTP protocol and the return service can't send callbacks due to firewall restrictions. | Responses must stream in real time to the client. Consider using Server-Sent Events (SSEs). |
| You integrate with workloads that don't support modern callback mechanisms like WebSockets or webhooks. | The client needs to collect many results, and the latency of those results is important. Consider using a message broker instead. |
| Server-side persistent network connections like WebSockets or SignalR are available. | |
| The network design supports open ports to receive asynchronous callbacks or webhooks. |
You can also use Azure API Management to add a secure layer to your architecture. This lets you control access, monitor usage, and apply security rules. A common flow looks like this: Power Apps send requests through a custom connector, which goes to API Management, then to your Azure Function, and finally to the external system. This setup gives you a low-code front end with a pro-code backend.
You can use Azure Functions for scenarios that need secure server-side logic, heavy processing, or integration with third-party APIs. This approach helps you build scalable and secure plugins for your organization.
Plugin Development Setup
Prerequisites and Tools for Developers
Before you start building plugins, you need the right tools and environment. You should install Visual Studio Code on your computer. Add the Azure Functions extension to Visual Studio Code. Make sure you have an active Azure subscription. You also need Azure Functions Core Tools. If you plan to use C#, install the C# extension. For Node.js or Python development, install Node.js and Python version 3.11 or higher. PowerShell 7.2 is useful for scripting tasks. You can use Azurite storage emulator to simulate Azure Storage locally. This setup helps you test and debug your code before deploying it.
Tip: Azurite lets you develop and test storage features without connecting to the cloud. You can find more information in Microsoft’s official documentation.
Creating Azure Functions
You can create Azure Functions to handle different tasks in your plugin. Keep your functions small and focused. This approach improves scalability and makes maintenance easier. Small functions scale faster and reduce cold start times. You can update them without affecting other parts of your application.
Choosing Triggers
You must select the right trigger for your function. Common triggers include HTTP requests, timers, and messages from queues. For Copilot Plugins, HTTP triggers are popular because they allow direct communication with external systems. You can respond to user requests or integrate with APIs. Choose a trigger that matches your business scenario.
Structuring Function Apps
Organize your function apps for clarity and efficiency. Treat your infrastructure as code to ensure consistent deployments. Use folders to separate functions by purpose. Implement comprehensive logging to monitor activity and troubleshoot issues. Optimize execution time to improve performance and reduce costs. Proper error handling prevents unnecessary retries and keeps your plugin reliable.
| Best Practice | Benefit |
|---|---|
| Small, focused functions | Easier scaling and maintenance |
| Infrastructure as code | Consistent deployments |
| Logging | Better monitoring |
| Error handling | Fewer retries |
Local Testing Best Practices
You should test your functions locally before deploying them. Use Postman to check if your APIs work as expected. Verify that plugin metadata is available at http://localhost:7071/.well-known/ai-plugin.json. Make sure the OpenAPI document loads at http://localhost:7071/swagger.json or http://localhost:7071/openapi.yaml. Confirm that your application logo can be retrieved from http://localhost:7071/logo.png.
Note: Local testing helps you catch errors early and ensures your plugin meets requirements. You can iterate quickly and deliver a reliable solution.
By following these steps, you set up a strong foundation for plugin development. You gain control over your environment and build plugins that scale and perform well.
Copilot Plugin Integration

Manifest and API Schema
You start building agents for Microsoft 365 Copilot by defining a manifest and an API schema. The manifest tells Copilot what your plugin can do. You use the Microsoft 365 Agents Toolkit to create a new agent or app. You select a declarative agent, add an action, and start with a new API. For authentication, you choose OAuth. You pick TypeScript as your language and set up your project directory and name.
The API schema uses OpenAPI to describe your endpoints. This schema acts as a contract between your plugin and Copilot. It lists the endpoints, the parameters they accept, and the values they return. This setup helps Copilot understand how to interact with your plugin. You ensure that your ai-powered apps follow a clear structure, making integration smooth and reliable.
Connecting Azure Functions
You connect Azure Functions to your plugin by following a few steps. You create a project in Azure AI Foundry and select Agents. You customize your agent by assigning an ID and writing instructions. You test your agent in the playground to see how it responds. You publish your Azure Function using sample code from GitHub. You update the connection string to link your agent to your function.
You set up Role Based Access Controls by enabling Managed Identity and granting the right roles. You build a flow in Power Platform to connect your Copilot Studio solution to your Azure Function App. You create your Copilot Studio agent and configure it to call the flow you built. This process lets you use cloud-based ai solutions that scale and stay secure.
Tip: Use the lifecycle section in the toolkit sidebar to provision and deploy your resources. You can customize your resource group and region for better control.
Handling Requests and Responses
When a user enters a prompt in Microsoft 365 Copilot, the platform preprocesses the input. It may access Microsoft Graph or other platforms if enabled. If web-grounding is on, Copilot gathers information from the Bing Index. Copilot sends the grounded prompt to the LLM, which generates a contextually relevant response. The response returns to the app and the user. Both the prompt and results are logged for admin review.
You keep all data encrypted in transit. Administrators control how Copilot interacts with the Bing Index. Data privacy and security commitments apply to all web-grounding interactions. This approach ensures that your apps stay reliable and secure.
You follow the copilot extensibility roadmap to keep your plugins up to date. You use the toolkit to manage your functions and monitor performance. By connecting Microsoft Copilot to Azure Functions, you unlock new possibilities for building agents and delivering value with Microsoft.
Data and Security for Pro Code Developers
State Management
You need to manage state carefully when building Copilot Plugins with Azure Functions. Stateless design works best for most scenarios. Each function should process requests independently. This approach improves scalability and reliability. When you must store state, use secure and scalable services like Azure Table Storage, Cosmos DB, or Azure SQL Database. These services help you persist data between function calls. You can also use Azure Cache for Redis for temporary state or session data.
Tip: Always encrypt sensitive data before storing it. Use managed identities to access storage securely without hard-coded credentials.
You should avoid storing secrets or user data in function code or environment variables. Instead, use Azure Key Vault to manage secrets and connection strings. This practice reduces risk and simplifies secret rotation.
Securing Data and Endpoints
You must protect your data and endpoints from unauthorized access. Azure Functions provides several built-in security features. Follow these best practices:
- Require HTTPS for all connections to ensure data stays encrypted in transit.
- Use access keys to restrict who can call your function endpoints.
- Enable App Service Authentication/Authorization to verify client identity.
- Deploy your function app inside a virtual network for extra protection.
- Utilize Azure API Management to authenticate and control requests.
- Disable administrative endpoints to prevent unwanted access.
You can strengthen your security posture by following these steps:
- Implement positive authentication for every client that accesses your functions.
- Use Defender for Cloud to assess security and enable advanced protection.
- Monitor and log all activities with Application Insights and Azure Monitor Logs.
Managed identities help you connect to other Azure services securely. Private endpoints keep your traffic inside your network. API Management adds a layer of control and visibility. These features give pro code developers the tools to build secure and compliant solutions.
Compliance and Privacy
You must meet strict compliance and privacy standards when integrating Copilot Plugins with enterprise systems. Different industries require different controls. The table below summarizes key standards and requirements:
| Compliance Standard | Key Requirements |
|---|---|
| GDPR | Control access, document consent, manage cross-border data transfers. |
| HIPAA | Validate outputs, protect patient data, maintain audit trails. |
| SOX | Map financial records to approved logs for integrity. |
| CCPA | Provide opt-out and deletion options for consumer data. |
You should document how your plugin handles data. Always validate outputs before sharing sensitive information. Maintain audit trails for all actions. These steps help you meet legal requirements and build trust with users.
Note: Regularly review your compliance posture. Update your processes as regulations change to stay ahead of new requirements.
Performance and Scalability in Azure
Cold Start Mitigation
When you deploy Azure Functions for Copilot Plugins, you want your solutions to respond quickly every time. Cold starts can slow down your plugin, especially if your function has not run for a while. You can use several strategies to reduce cold start delays:
- Choose hosting plans like Premium or App Service Plan. These plans keep your functions warm and ready to respond.
- Optimize your function code. Keep initialization logic light and avoid heavy processing during startup.
- Set up warm-up routines. Schedule periodic requests to your function so it stays active.
- Move to the Flex Consumption plan. This plan offers prewarmed instances and lets you adjust settings for faster responses.
Tip: You can schedule a timer-triggered function or use an external service to ping your function at regular intervals. This keeps your function from going idle and helps maintain fast response times.
Monitoring and Diagnostics
You need to monitor your Azure Functions to ensure they perform well and stay reliable. Azure Monitor's Application Insights gives you a unified view of your Copilot Plugin deployments. With Application Insights, you can:
- Track performance and spot slowdowns.
- Troubleshoot errors using real-time logs and traces.
- View dashboards that show key metrics like latency, error rates, and token usage.
You can also use the Azure portal to diagnose issues. Open your Flex Consumption app, select "Diagnose and solve problems," and search for deployment details. This helps you find and fix problems quickly.
Note: Monitoring tools help you catch issues before they affect users. You can use built-in dashboards to keep an eye on your plugins and make data-driven improvements.
Scaling for Enterprise Workloads
Azure Functions scale automatically to handle large workloads. You can design your plugins with a decoupled architecture, which makes it easier to add or remove components as needed. Azure Functions support both real-time and asynchronous processing, so you can handle many requests at once.
You expose secure APIs through Azure Functions, which helps protect your data. You can add resilient error handling, so your plugins respond with fallback messages if something goes wrong. Enterprise-ready monitoring and logging let you track activity and spot trends.
When you build plugins that scale, you boost productivity across your organization. Your solutions can handle busy periods without slowing down, and you can focus on delivering value instead of managing infrastructure.
Advanced Patterns and Best Practices
Modular Function Design
You can make your Copilot Plugins easier to manage by using modular function design. This approach breaks your code into small, focused parts. Each function does one job well. When you use modular design, you help Copilot generate cleaner and more predictable code. You also make your plugins easier to test and update.
Here is a table that shows the main benefits of modular function design:
| Benefit | Description |
|---|---|
| Improved Code Predictability | A structured architecture allows Copilot to generate cleaner and more predictable code. |
| Reduced Complexity | Hiding complexity behind stable abstractions simplifies the coding environment for AI. |
| Better Alignment | Establishing conventions helps Copilot follow rules and recognize patterns effectively. |
You should group related logic together and use clear naming for your functions. This helps you and your team understand the code. When you follow these patterns, you make your plugins more reliable and easier to scale.
Error Handling and Resilience
Building resilient plugins means planning for things that can go wrong. You need to handle errors in a way that keeps your plugin running smoothly. Good error handling helps you recover from problems and gives users helpful feedback.
The table below lists important strategies for error handling and resilience:
| Strategy Type | Description |
|---|---|
| Timeout Handling | Manage scenarios where external models take longer than expected, implementing user options for retries or skips. |
| API Failure Recovery | Handle various API errors (e.g., 401, 500) with defined recovery steps. |
| Partial Success Strategies | Allow workflows to continue even if some agents fail, ensuring overall process resilience. |
| User Cancellation | Implement graceful handling of user-initiated cancellations, preserving partial results. |
| Missing Tools | Address scenarios where required tools are not installed, providing clear error messages. |
| Out of Credits | Manage payment or quota errors effectively to inform users. |
| Retry Strategies | Use techniques like exponential backoff to manage retries for failed operations. |
You should always log errors and let users know what happened. If an API call fails, try again or offer a way to skip the step. When users cancel a task, save any progress made. These steps help your plugin stay strong, even when things do not go as planned.
Versioning and Deployment
You need to manage versions and deployments carefully to keep your Copilot Plugins stable. Always review AI-generated responses for accuracy before using them in production. Never store secrets or credentials in your source code. Use prompt engineering to get the best results from Copilot.
Here are some best practices for versioning and deployment:
- Review all AI-generated responses for correctness and applicability.
- Never save application secrets or credentials in source code.
- Use good prompt engineering techniques for effective results.
When you create prompts, follow these steps:
- Be clear and specific.
- Set expectations.
- Add context about your scenario.
- Break down your requests.
- Customize your code.
- Use Azure terminology.
- Use the feedback loop.
By following these patterns and best practices, you build plugins that are easier to maintain, more reliable, and ready for enterprise use.
Practical Examples for Developers
Sample Azure Function Plugin
You can build a sample Azure Function Plugin to automate tasks and connect your systems. The table below shows the key features you should consider when designing your plugin:
| Feature | Description |
|---|---|
| Automated Event Response | Enables automated responses to events, enhancing productivity through webhooks and alerts. |
| Trigger-based executions | Allows for automation of tasks, reducing the need for manual intervention by administrators. |
| Platform Agnostic | Can trigger events across different cloud services and run on various platforms, including containers. |
| Support for Major Languages | Supports popular programming languages, providing a comprehensive toolset for developers. |
| Integration with Azure Services | Natively integrates with many Azure services, enhancing functionality and ease of use. |
| Integration with 3rd Party Apps | Commonly integrates with external applications for notifications and data management. |
You can use these features to create plugins that fit your business needs. For example, you might set up a function that sends alerts when a new file appears in storage or updates records in a database when a form is submitted.
End-to-End Integration Example
You can see the value of Copilot Plugins with Azure Functions by following a real-world integration example. Here is how the process works:
- You reduce time spent on repetitive coding tasks by automating them with plugins.
- You focus on connecting agents to the right logic, user interface, and data sources.
- You use Copilot Plugins as supportive tools that enhance coding efficiency and reliability in cloud environments.
For instance, Copilot can help a hotel booking team retrieve real-time hotel availability from an Azure Storage Table or a third-party API. A support team can access customer data directly from an external CRM system through Copilot in Teams. This integration allows you to interact with external data sources in real time, which boosts productivity and ensures your apps deliver timely information.
Note: Seamless integration with Azure Functions means you can connect to many systems and automate complex workflows without extra overhead.
Debugging and Troubleshooting
You need strong debugging and troubleshooting skills to keep your plugins running smoothly. GitHub Copilot in Visual Studio can help you resolve issues faster and understand your codebase better. It gives you insights for fixing syntax errors, refactoring code, and troubleshooting unexpected behavior.
Common challenges include:
- Debugging agents that do not return expected results.
- No debug card returned if the orchestrator does not require Microsoft 365 data.
- Debug cards not returned due to capacity throttling.
You can use these techniques to improve your debugging process:
- Use GitHub Copilot for debugging applications.
- Analyze exceptions with AI assistance.
- Debug unit tests and multithreaded code with AI tools.
- Inspect exceptions using repository context.
- Troubleshoot breakpoints with AI assistance.
- Use conditional breakpoints and tracepoints.
Tip: Regular debugging and monitoring help you catch problems early and keep your plugins reliable.
Common Pitfalls and Solutions
Security Misconfigurations
You may face security misconfigurations when building Copilot Plugins with Azure Functions. These mistakes can expose your data or allow unauthorized access. One common issue is forgetting to require HTTPS. If you do not enforce HTTPS, attackers can intercept sensitive information. Another risk comes from leaving function keys or secrets in your code. This practice makes it easy for others to find and misuse your credentials.
You should always use managed identities to connect to other Azure services. This method removes the need for hard-coded secrets. You can also use Azure Key Vault to store secrets safely. Make sure you enable authentication and authorization for every endpoint. You should check your network settings and use private endpoints when possible. These steps help you protect your data and keep your plugins secure.
Tip: Review your security settings often. Use tools like Defender for Cloud to scan for risks and get alerts.
Performance Bottlenecks
Performance bottlenecks can slow down your plugins and frustrate users. You might see delays if your functions take too long to start or process requests. Large dependencies or heavy initialization code can cause cold starts. If you do not optimize your code, your plugin may use more resources than needed.
You can reduce cold starts by choosing the Flex Consumption plan or using Premium hosting. Keep your functions small and focused. Avoid loading large libraries unless you need them. Monitor your plugins with Application Insights to spot slowdowns. You should also test your plugins under different loads to see how they perform.
Note: Schedule regular warm-up requests to keep your functions ready for action.
Integration Challenges
You may run into integration challenges when connecting your plugins to other systems. Sometimes, APIs change or become unavailable. If you do not handle errors, your plugin may fail without warning. Data formats can also cause problems if they do not match what your function expects.
You should use clear error handling and retry logic. Always validate the data you receive before processing it. Use OpenAPI specifications to describe your endpoints. This practice helps other systems understand how to call your functions. Test your integrations often to catch issues early.
You can find many solutions by following best practices and using the right tools. When you plan ahead and monitor your plugins, you build reliable and secure automation for your organization.
Actionable Recommendations
Architecture Checklist
You need a clear checklist to build Copilot Plugins with Azure Functions. This checklist helps you create a strong architecture that supports enterprise automation. Review each item before you start your project.
- Define your business goals and automation needs.
- Choose the right triggers for your Azure Functions, such as HTTP or timer.
- Structure your function apps for modularity and easy maintenance.
- Use managed identities for secure access to Azure services.
- Store secrets in Azure Key Vault, not in your code.
- Require HTTPS for all endpoints.
- Set up logging and monitoring with Application Insights.
- Use Azure API Management to control and secure API access.
- Test your functions locally and in the cloud.
- Document your OpenAPI schema for clear integration.
- Review compliance requirements for your industry.
- Plan for scaling and cold start mitigation.
- Implement error handling and retry strategies.
- Use versioning for your plugins and APIs.
Tip: You can use this checklist as your technical architecture guide. It helps you avoid common mistakes and ensures your plugins meet enterprise standards.
The checklist gives you a step-by-step path. You can check off each item as you build your solution. This process helps you stay organized and deliver a reliable plugin.
Further Learning Resources
You can find many resources to help you master Copilot Plugins and Azure Functions. Explore these links and tools to deepen your knowledge.
| Resource Type | Description | Link |
|---|---|---|
| Microsoft Docs | Official documentation for Azure Functions | Azure Functions Documentation |
| GitHub Samples | Example code and plugin templates | Copilot Plugins Samples |
| Tutorials | Step-by-step guides for plugin development | Copilot Studio Tutorials |
| Community Forums | Ask questions and share experiences | Microsoft Q&A |
| Architecture Guides | Best practices for enterprise solutions | Azure Architecture Center |
Note: You can join the Microsoft developer community to stay updated. You will find webinars, blogs, and events that cover new features and best practices.
You can use these resources to solve problems, learn new skills, and connect with other developers. Keep exploring and improving your plugins. You will build solutions that scale and deliver real value.
You gain the pro-code edge when you use Azure Functions and Copilot Plugins for enterprise automation. Microsoft 365 Copilot lets you build ai-powered apps that boost productivity. You can create apps with advanced orchestration and secure integration. Pro code developers use the toolkit and technical architecture guide to streamline development. Microsoft Copilot supports building agents that connect to azure services. You unlock new possibilities for your apps and increase productivity. Start your journey with Microsoft 365 Copilot and explore Microsoft documentation, GitHub, and azure resources.
You can transform your development process and deliver solutions that scale.
FAQ
What are Copilot Plugins?
Copilot Plugins are extensions you build to add new features to Microsoft 365 Copilot. You use them to connect Copilot to external data, APIs, or custom logic. These plugins help you automate tasks and solve complex business problems.
Why should I use Azure Functions with Copilot Plugins?
Azure Functions let you run your code in the cloud without managing servers. You can scale your plugins easily and keep them secure. This setup gives you flexibility and power for enterprise automation.
How do I secure my Copilot Plugin endpoints?
You secure your endpoints by requiring HTTPS, using managed identities, and storing secrets in Azure Key Vault. You can also use Azure API Management to control access and monitor usage.
Can I test my plugins locally before deploying?
Yes, you can test your plugins on your computer. Use tools like Postman and Azurite to check your APIs and storage. Local testing helps you find and fix errors before you deploy to Azure.
What programming languages can I use for Azure Functions?
You can write Azure Functions in C#, JavaScript, Python, PowerShell, or Java. Choose the language that fits your skills and project needs.
How do I handle errors in my plugins?
You should log all errors and provide clear messages to users. Use retry logic for failed operations. Application Insights helps you track and diagnose issues.
What is the Flex Consumption plan?
The Flex Consumption plan gives you more control over scaling and performance. You can keep instances always ready, reduce cold starts, and adjust settings for your workload.
Where can I find more resources to learn about Copilot Plugins?
Visit Microsoft Docs, GitHub Samples, and Copilot Studio Tutorials for guides, code, and best practices.
🚀 Want to be part of m365.fm?
Then stop just listening… and start showing up.
👉 Connect with me on LinkedIn and let’s make something happen:
- 🎙️ Be a podcast guest and share your story
- 🎧 Host your own episode (yes, seriously)
- 💡 Pitch topics the community actually wants to hear
- 🌍 Build your personal brand in the Microsoft 365 space
This isn’t just a podcast — it’s a platform for people who take action.
🔥 Most people wait. The best ones don’t.
👉 Connect with me on LinkedIn and send me a message:
"I want in"
Let’s build something awesome 👊
1
00:00:00,000 --> 00:00:02,680
Your copilot can talk, it understands your context,
2
00:00:02,680 --> 00:00:04,680
and it can even reason through multi-step problems
3
00:00:04,680 --> 00:00:06,800
to give you a response that feels human.
4
00:00:06,800 --> 00:00:09,000
But the moment you ask it to do something computationally
5
00:00:09,000 --> 00:00:11,000
heavy, everything falls apart.
6
00:00:11,000 --> 00:00:13,760
If you needed to validate data across three legacy systems,
7
00:00:13,760 --> 00:00:16,640
transform that data, and apply specific business rules
8
00:00:16,640 --> 00:00:18,960
before giving an answer, it hits a wall.
9
00:00:18,960 --> 00:00:20,400
That wall is the connector architecture.
10
00:00:20,400 --> 00:00:22,800
Standard out of the box connectors are great
11
00:00:22,800 --> 00:00:24,560
for moving data from point A to point B,
12
00:00:24,560 --> 00:00:26,760
because that is exactly what they were designed to do.
13
00:00:26,760 --> 00:00:28,800
However, they were never built for heavy computation
14
00:00:28,800 --> 00:00:30,240
or complex orchestration.
15
00:00:30,240 --> 00:00:32,600
They definitely weren't built to handle the deep business
16
00:00:32,600 --> 00:00:34,880
logic that separates a demo-friendly chatbot
17
00:00:34,880 --> 00:00:36,920
from a system that actually gets work done.
18
00:00:36,920 --> 00:00:39,680
The real gap here isn't between what copilot can ask
19
00:00:39,680 --> 00:00:41,320
and what your data can answer.
20
00:00:41,320 --> 00:00:43,880
In reality, the gap is between what the bot can say
21
00:00:43,880 --> 00:00:45,720
and what your business actually needs to do.
22
00:00:45,720 --> 00:00:47,360
We are solving that problem today.
23
00:00:47,360 --> 00:00:49,320
We are moving away from a bot that just talks
24
00:00:49,320 --> 00:00:51,600
and building a bot that actually executes.
25
00:00:51,600 --> 00:00:54,720
The bridge to get there is pro-code as your functions,
26
00:00:54,720 --> 00:00:57,720
but they have to be architected the right way to work.
27
00:00:57,720 --> 00:00:59,640
Why standard connectors fail at scale?
28
00:00:59,640 --> 00:01:01,800
Most teams don't see this problem until it is too late
29
00:01:01,800 --> 00:01:02,880
to easily fix it.
30
00:01:02,880 --> 00:01:05,040
You start with a use case that looks simple on the surface,
31
00:01:05,040 --> 00:01:07,440
like a user asking about a specific invoice.
32
00:01:07,440 --> 00:01:10,040
To answer, the bot has to check three different systems,
33
00:01:10,040 --> 00:01:12,640
validate the information and apply your business rules.
34
00:01:12,640 --> 00:01:15,560
Naturally, you try to build this using standard connectors.
35
00:01:15,560 --> 00:01:17,440
The first issue is that these connectors
36
00:01:17,440 --> 00:01:18,840
are built for simple data movement
37
00:01:18,840 --> 00:01:20,200
rather than complex logic.
38
00:01:20,200 --> 00:01:22,160
They are perfectly fine if you just need to get a record
39
00:01:22,160 --> 00:01:23,280
or update a single field.
40
00:01:23,280 --> 00:01:25,320
They are not fine when you need to transform data
41
00:01:25,320 --> 00:01:26,760
and check it against multiple systems
42
00:01:26,760 --> 00:01:28,360
to return a structured result.
43
00:01:28,360 --> 00:01:30,440
When your copilot needs to perform real work,
44
00:01:30,440 --> 00:01:33,040
these standard connectors quickly become a massive bottleneck.
45
00:01:33,040 --> 00:01:34,600
You find yourself chaining them together
46
00:01:34,600 --> 00:01:36,240
where one connector calls another
47
00:01:36,240 --> 00:01:38,880
and then that one branches off based on a dozen conditions.
48
00:01:38,880 --> 00:01:41,120
You end up nesting logic, deeper and deeper
49
00:01:41,120 --> 00:01:44,200
and every single new connection adds more latency to the process.
50
00:01:44,200 --> 00:01:45,800
This is what I call the latency tax.
51
00:01:45,800 --> 00:01:47,680
Every time a connector makes a call,
52
00:01:47,680 --> 00:01:49,320
you pay for the network, roundtrip,
53
00:01:49,320 --> 00:01:52,160
the authentication and the overhead of moving that data.
54
00:01:52,160 --> 00:01:53,760
If you have a complex workflow,
55
00:01:53,760 --> 00:01:56,840
it can easily take 10 seconds or more for the user to get an answer.
56
00:01:56,840 --> 00:01:58,840
To the person waiting, it feels like the bot has crashed
57
00:01:58,840 --> 00:02:00,920
and usually they just give up and leave.
58
00:02:00,920 --> 00:02:03,600
Then you have to deal with governance and DLP policies.
59
00:02:03,600 --> 00:02:05,200
These rules are designed to stop data
60
00:02:05,200 --> 00:02:07,520
from moving between approved and blocked connectors,
61
00:02:07,520 --> 00:02:09,200
which is security doing its job.
62
00:02:09,200 --> 00:02:10,640
But this also creates silos
63
00:02:10,640 --> 00:02:13,240
where your finance tools can't talk to your HR tools
64
00:02:13,240 --> 00:02:14,600
without breaking a policy.
65
00:02:14,600 --> 00:02:15,840
You are constantly hitting walls
66
00:02:15,840 --> 00:02:17,200
that make sense for protection,
67
00:02:17,200 --> 00:02:19,920
but absolutely poison your technical architecture.
68
00:02:19,920 --> 00:02:22,400
At the same time, the low-code expressions in Power Automate
69
00:02:22,400 --> 00:02:24,560
have very real limits on what they can execute.
70
00:02:24,560 --> 00:02:27,120
You can handle basic string changes or simple logic,
71
00:02:27,120 --> 00:02:28,640
but they aren't built for heavy lifting
72
00:02:28,640 --> 00:02:30,760
or consuming external APIs efficiently.
73
00:02:30,760 --> 00:02:32,680
They simply weren't designed for the high-level work
74
00:02:32,680 --> 00:02:35,040
that your business actually requires to function.
75
00:02:35,040 --> 00:02:36,200
Because of these limits,
76
00:02:36,200 --> 00:02:38,480
teams start building messy workarounds.
77
00:02:38,480 --> 00:02:40,480
They create nested flows that call other flows
78
00:02:40,480 --> 00:02:42,960
or hide data in queues to try and delay the work.
79
00:02:42,960 --> 00:02:44,320
These integrations are brittle
80
00:02:44,320 --> 00:02:46,960
and they break the second your business requirements change.
81
00:02:46,960 --> 00:02:49,440
You are essentially trying to solve a connector problem
82
00:02:49,440 --> 00:02:51,840
by stacking one workaround on top of another.
83
00:02:51,840 --> 00:02:54,880
The reality is that none of this is maintainable or scalable
84
00:02:54,880 --> 00:02:57,320
because it isn't a real engineering solution.
85
00:02:57,320 --> 00:02:59,920
The core problem is that you are trying to do pro-code work
86
00:02:59,920 --> 00:03:01,400
using low-code tools.
87
00:03:01,400 --> 00:03:02,280
Connectors aren't bad,
88
00:03:02,280 --> 00:03:04,920
but they are simply the wrong tool for this specific job.
89
00:03:04,920 --> 00:03:07,160
What you actually need is total control over your environment.
90
00:03:07,160 --> 00:03:09,800
You need type safety, enterprise libraries,
91
00:03:09,800 --> 00:03:12,920
and the ability to use caching strategies or connection pooling.
92
00:03:12,920 --> 00:03:15,400
You need retry logic that actually understands your business
93
00:03:15,400 --> 00:03:18,520
and monitoring that catches a bug before a user ever sees it.
94
00:03:18,520 --> 00:03:19,960
That is why we use Azure Functions.
95
00:03:19,960 --> 00:03:22,520
It gives you a blank canvas where you use your own language
96
00:03:22,520 --> 00:03:23,840
and your own libraries.
97
00:03:23,840 --> 00:03:26,000
Your logic sits right next to the power platform
98
00:03:26,000 --> 00:03:28,840
but it stays connected through a hardened secure gateway.
99
00:03:28,840 --> 00:03:30,520
On paper, this architecture looks simple
100
00:03:30,520 --> 00:03:33,120
but it is often very hard to execute in practice.
101
00:03:33,120 --> 00:03:35,960
That is exactly what we are going to fix right now.
102
00:03:35,960 --> 00:03:39,200
The architecture choice, where the rubber meets the road.
103
00:03:39,200 --> 00:03:40,920
So what is the actual alternative here?
104
00:03:40,920 --> 00:03:43,400
It is not about replacing the power platform entirely
105
00:03:43,400 --> 00:03:46,760
which is the exact mistake most teams make the moment they hit a wall.
106
00:03:46,760 --> 00:03:48,840
They assume the only answer is to rip everything out
107
00:03:48,840 --> 00:03:51,000
and rewrite the whole system in net.
108
00:03:51,000 --> 00:03:52,560
But that is not the right move.
109
00:03:52,560 --> 00:03:54,800
The real answer is to extend the power platform
110
00:03:54,800 --> 00:03:58,200
with the very things it was never designed to handle in the first place.
111
00:03:58,200 --> 00:04:01,680
Azure Functions sit entirely outside that low-code perimeter.
112
00:04:01,680 --> 00:04:03,680
They are not part of the power platform ecosystem.
113
00:04:03,680 --> 00:04:06,080
They are not governed by those same restrictive rules
114
00:04:06,080 --> 00:04:08,720
and they are not throttled by those frustrating execution limits.
115
00:04:08,720 --> 00:04:11,240
Think of them as a blank, C-bath canvas
116
00:04:11,240 --> 00:04:13,640
where you can write real code that is type safe,
117
00:04:13,640 --> 00:04:17,080
fully testable, and connected to your existing enterprise libraries.
118
00:04:17,080 --> 00:04:20,800
They provide everything that power-automate expressions simply cannot be.
119
00:04:20,800 --> 00:04:22,880
But here is the critical part of the strategy.
120
00:04:22,880 --> 00:04:25,720
These functions are not disconnected from the power platform either.
121
00:04:25,720 --> 00:04:28,080
They plug right back into it through a clean interface
122
00:04:28,080 --> 00:04:30,840
using open API specs and custom connectors,
123
00:04:30,840 --> 00:04:33,720
which acts as the handshake between these two different worlds.
124
00:04:33,720 --> 00:04:36,520
When your co-pilot asks the power platform to perform a task,
125
00:04:36,520 --> 00:04:39,160
the platform does not struggle to do it with standard connectors.
126
00:04:39,160 --> 00:04:40,600
Instead it calls your function.
127
00:04:40,600 --> 00:04:42,840
Your function handles the heavy computation
128
00:04:42,840 --> 00:04:44,400
and returns structured data
129
00:04:44,400 --> 00:04:47,320
and then the power platform passes that result back to co-pilot.
130
00:04:47,320 --> 00:04:50,560
The entire process feels completely seamless to the person using it.
131
00:04:50,560 --> 00:04:52,640
This is what we call the fusion developer model
132
00:04:52,640 --> 00:04:55,440
and it is quickly becoming the standard for enterprise AI
133
00:04:55,440 --> 00:04:59,200
because it is the only architecture that actually works when you try to scale.
134
00:04:59,200 --> 00:05:01,600
Now as your functions come in a few different flavors
135
00:05:01,600 --> 00:05:04,640
and choosing the right one is the first real decision you have to make.
136
00:05:04,640 --> 00:05:07,720
You have the consumption plan, which is the original serverless model
137
00:05:07,720 --> 00:05:10,080
where the system scales all the way down to zero.
138
00:05:10,080 --> 00:05:11,720
You only pay for what you actually use,
139
00:05:11,720 --> 00:05:13,760
but the cold starts are absolutely brutal
140
00:05:13,760 --> 00:05:16,640
and we will come back to that specific problem in just a minute.
141
00:05:16,640 --> 00:05:18,080
Then you have the premium plan,
142
00:05:18,080 --> 00:05:20,240
which uses always on instances that stay warm,
143
00:05:20,240 --> 00:05:21,840
so cold starts are never an issue.
144
00:05:21,840 --> 00:05:24,240
The downside is that you are paying a baseline cost
145
00:05:24,240 --> 00:05:26,240
regardless of how much traffic you actually have.
146
00:05:26,240 --> 00:05:28,320
If you are building a co-pilot plugin
147
00:05:28,320 --> 00:05:30,960
that only gets cold during normal business hours,
148
00:05:30,960 --> 00:05:33,360
that becomes a very expensive way to operate.
149
00:05:33,360 --> 00:05:35,040
Finally you have Flex consumption,
150
00:05:35,040 --> 00:05:37,600
which is the new default as of 2026
151
00:05:37,600 --> 00:05:40,480
and the right choice for almost every co-pilot architecture.
152
00:05:40,480 --> 00:05:42,160
Flex gives you the best of both worlds
153
00:05:42,160 --> 00:05:44,640
because it is still serverless and scales dynamically,
154
00:05:44,640 --> 00:05:48,400
but you can configure always ready instances that stay warm.
155
00:05:48,400 --> 00:05:50,200
This creates a small pool of workers
156
00:05:50,200 --> 00:05:53,400
that are always initialized and ready to serve requests immediately.
157
00:05:53,400 --> 00:05:57,920
The baseline cost for an always ready instance is about $0.000004
158
00:05:57,920 --> 00:05:59,240
per gigabyte second,
159
00:05:59,240 --> 00:06:02,480
which is six and a half times cheaper than on-demand execution time.
160
00:06:02,480 --> 00:06:05,240
You get those warm starts without the massive premium price tag.
161
00:06:05,240 --> 00:06:07,640
Why does a warm start matter so much for a co-pilot?
162
00:06:07,640 --> 00:06:10,880
It matters because cold start latency is catastrophic for interactive AI.
163
00:06:10,880 --> 00:06:14,480
When a user types a question, co-pilot processes it and calls your function,
164
00:06:14,480 --> 00:06:16,040
but then your function has to wake up.
165
00:06:16,040 --> 00:06:18,960
It has to load the runtime, initialize every dependency,
166
00:06:18,960 --> 00:06:22,120
and de-serialize the request before it even starts running your code.
167
00:06:22,120 --> 00:06:25,680
This entire process can take five seconds or more from a cold start,
168
00:06:25,680 --> 00:06:27,680
leaving the user staring at a loading spinner
169
00:06:27,680 --> 00:06:30,520
until they eventually switch their mental context.
170
00:06:30,520 --> 00:06:33,040
By the time your function finally returns a result,
171
00:06:33,040 --> 00:06:34,920
they have already moved on to something else.
172
00:06:34,920 --> 00:06:37,760
Flex with always ready completely eliminates that friction.
173
00:06:37,760 --> 00:06:39,720
Because your instances are already warm,
174
00:06:39,720 --> 00:06:40,960
the runtime is loaded,
175
00:06:40,960 --> 00:06:43,200
and your dependencies are already initialized.
176
00:06:43,200 --> 00:06:46,120
When a request comes in, your code runs and the response goes out
177
00:06:46,120 --> 00:06:48,680
with sub-100 millisecond latency.
178
00:06:48,680 --> 00:06:50,560
The user perceives the interaction as instant,
179
00:06:50,560 --> 00:06:52,200
which is exactly what you want.
180
00:06:52,200 --> 00:06:55,640
Using C-plus in an isolated worker model gives you a level of control
181
00:06:55,640 --> 00:06:58,760
you just cannot find anywhere else in the power platform ecosystem.
182
00:06:58,760 --> 00:07:00,720
You get type safety from the compiler,
183
00:07:00,720 --> 00:07:02,720
testability through dependency injection,
184
00:07:02,720 --> 00:07:05,040
and access to any new get package you might need.
185
00:07:05,040 --> 00:07:06,840
You can use the exact same libraries,
186
00:07:06,840 --> 00:07:09,480
your backend teams are already using like entity framework
187
00:07:09,480 --> 00:07:11,960
for database queries, poly for resilience,
188
00:07:11,960 --> 00:07:14,080
or open telemetry for instrumentation.
189
00:07:14,080 --> 00:07:16,160
You can even manage your secrets through Key Vault,
190
00:07:16,160 --> 00:07:18,120
providing everything a real production system
191
00:07:18,120 --> 00:07:19,600
actually needs to survive.
192
00:07:19,600 --> 00:07:21,960
The connection between your co-pilot and your function
193
00:07:21,960 --> 00:07:24,080
lives and dies by the open API spec.
194
00:07:24,080 --> 00:07:25,560
This is a critical piece of the puzzle
195
00:07:25,560 --> 00:07:27,600
that we will spend a lot of time on later.
196
00:07:27,600 --> 00:07:30,480
The open API spec is not just a piece of documentation,
197
00:07:30,480 --> 00:07:33,080
but rather the semantic contract that tells the LLM
198
00:07:33,080 --> 00:07:35,760
when your function is relevant and exactly how to call it.
199
00:07:35,760 --> 00:07:36,880
If you write a poor spec,
200
00:07:36,880 --> 00:07:39,440
co-pilot will not use your function even when it should,
201
00:07:39,440 --> 00:07:42,520
but a well-written spec ensures co-pilot calls it reliably
202
00:07:42,520 --> 00:07:45,640
and passes the correct parameters every single time.
203
00:07:45,640 --> 00:07:47,880
Your function lives in Azure while your co-pilot lives
204
00:07:47,880 --> 00:07:48,920
in the power platform,
205
00:07:48,920 --> 00:07:52,000
and they communicate through open API and custom connectors.
206
00:07:52,000 --> 00:07:54,560
You write all the complex logic you need in CYs
207
00:07:54,560 --> 00:07:56,720
while the power platform handles the orchestration
208
00:07:56,720 --> 00:07:58,160
and the conversation flow.
209
00:07:58,160 --> 00:07:59,960
This clear division of labor is what separates
210
00:07:59,960 --> 00:08:02,400
a working professional plugin from a collection
211
00:08:02,400 --> 00:08:03,760
of janky workarounds.
212
00:08:03,760 --> 00:08:05,280
It might not be the sexiest solution,
213
00:08:05,280 --> 00:08:08,080
and it is not purely low code or purely pro code,
214
00:08:08,080 --> 00:08:10,480
but it is the real answer to the problem.
215
00:08:10,480 --> 00:08:13,480
The cold start problem and why flex consumption wins.
216
00:08:13,480 --> 00:08:15,960
Cold start is not just a minor latency issue,
217
00:08:15,960 --> 00:08:18,120
but rather a massive invisible overhead
218
00:08:18,120 --> 00:08:20,440
that triggers every single time your function wakes up
219
00:08:20,440 --> 00:08:21,360
from a nap.
220
00:08:21,360 --> 00:08:22,760
To understand why this happens,
221
00:08:22,760 --> 00:08:25,480
we have to look at what occurs when a function goes cold.
222
00:08:25,480 --> 00:08:27,440
Because nobody has called the function recently,
223
00:08:27,440 --> 00:08:30,240
Azure has scaled your instance all the way down to zero
224
00:08:30,240 --> 00:08:31,600
to save resources.
225
00:08:31,600 --> 00:08:33,120
When a new request finally arrives,
226
00:08:33,120 --> 00:08:35,560
the platform has to allocate a brand new worker,
227
00:08:35,560 --> 00:08:39,280
start the net runtime and load your assemblies from the disk.
228
00:08:39,280 --> 00:08:41,640
It then has to perform GT compilation on your code
229
00:08:41,640 --> 00:08:44,480
and initialize your entire dependency injection container.
230
00:08:44,480 --> 00:08:46,760
It still has to establish connections to your database,
231
00:08:46,760 --> 00:08:48,720
your cache, and any other external services
232
00:08:48,720 --> 00:08:50,560
before a single line of your business logic
233
00:08:50,560 --> 00:08:52,120
even starts to run.
234
00:08:52,120 --> 00:08:53,480
On the original consumption plan,
235
00:08:53,480 --> 00:08:55,840
this entire sequence happens on every single request
236
00:08:55,840 --> 00:08:57,560
whenever your traffic is sparse.
237
00:08:57,560 --> 00:08:59,120
You are paying for compute time
238
00:08:59,120 --> 00:09:01,200
from the very moment the request arrives
239
00:09:01,200 --> 00:09:03,440
until your code actually finishes executing.
240
00:09:03,440 --> 00:09:06,720
For a typical net function using a few enterprise libraries,
241
00:09:06,720 --> 00:09:09,560
that overhead usually lasts between one and five seconds.
242
00:09:09,560 --> 00:09:12,680
If your application has heavy initialization requirements,
243
00:09:12,680 --> 00:09:14,160
like loading machine learning models
244
00:09:14,160 --> 00:09:16,000
or building large dependency graphs,
245
00:09:16,000 --> 00:09:19,080
that wait time can easily stretch to 10 seconds or more.
246
00:09:19,080 --> 00:09:21,080
The timeline for the user is just brutal.
247
00:09:21,080 --> 00:09:23,280
They type a question, co-pilot processes it,
248
00:09:23,280 --> 00:09:26,000
and then your function takes two full seconds just to wake up.
249
00:09:26,000 --> 00:09:29,320
Even if your business logic runs in a lightning fast 100 milliseconds,
250
00:09:29,320 --> 00:09:32,000
the total perceived latency is still over two seconds.
251
00:09:32,000 --> 00:09:34,320
The user notices that delay, the bot feels sluggish,
252
00:09:34,320 --> 00:09:38,280
and the frustrating part is that nothing in your actual code is slow.
253
00:09:38,280 --> 00:09:41,160
The problem is entirely tied to the initialization process.
254
00:09:41,160 --> 00:09:43,680
This is the exact point where flex consumption diverges
255
00:09:43,680 --> 00:09:45,200
from the old way of doing things.
256
00:09:45,200 --> 00:09:48,000
Flex introduces the concept of always ready instances,
257
00:09:48,000 --> 00:09:50,040
where you configure a minimum number of instances
258
00:09:50,040 --> 00:09:51,800
that stay perpetually initialized.
259
00:09:51,800 --> 00:09:53,480
These instances never scale to zero,
260
00:09:53,480 --> 00:09:56,560
so the runtime stays loaded, your dependencies stay ready,
261
00:09:56,560 --> 00:09:58,640
and your database connections stay pooled.
262
00:09:58,640 --> 00:10:01,080
When a request arrives, there's no startup overhead at all
263
00:10:01,080 --> 00:10:02,720
because your code runs immediately.
264
00:10:02,720 --> 00:10:05,960
We start measuring latency in tens of milliseconds instead of seconds.
265
00:10:05,960 --> 00:10:08,640
The cost structure is what makes this approach economically sensible
266
00:10:08,640 --> 00:10:10,080
for most businesses.
267
00:10:10,080 --> 00:10:12,600
Always ready instances are built at a baseline rate
268
00:10:12,600 --> 00:10:16,600
of approximately 0.000004 dollars per gigabyte second,
269
00:10:16,600 --> 00:10:17,960
even when they are sitting idle.
270
00:10:17,960 --> 00:10:18,920
That is incredibly cheap.
271
00:10:18,920 --> 00:10:20,680
If you run a single always ready instance
272
00:10:20,680 --> 00:10:22,720
with 512 megabytes of memory,
273
00:10:22,720 --> 00:10:25,480
you are looking at a bill of roughly $15 per month.
274
00:10:25,480 --> 00:10:27,600
When you compare that to the premium plan,
275
00:10:27,600 --> 00:10:29,720
where the smallest tier keeps instances running
276
00:10:29,720 --> 00:10:31,520
at a much higher baseline cost,
277
00:10:31,520 --> 00:10:34,760
flex becomes the obvious choice for anyone who is cost conscious.
278
00:10:34,760 --> 00:10:36,840
Here is the key insight you need to remember.
279
00:10:36,840 --> 00:10:39,760
Always ready instances do not eliminate the cold start problem
280
00:10:39,760 --> 00:10:41,360
for every single scenario,
281
00:10:41,360 --> 00:10:43,800
but they do eliminate it for your baseline traffic.
282
00:10:43,800 --> 00:10:46,160
If your co-pilot generates 10 concurrent calls
283
00:10:46,160 --> 00:10:48,240
and you have two always ready instances,
284
00:10:48,240 --> 00:10:50,840
those first two requests will get warm performance.
285
00:10:50,840 --> 00:10:52,920
The remaining eight requests will trigger new instances
286
00:10:52,920 --> 00:10:54,520
that have to go through a cold start.
287
00:10:54,520 --> 00:10:56,880
In the real world, most concurrent co-pilot calls
288
00:10:56,880 --> 00:10:58,600
stay within your always ready capacity
289
00:10:58,600 --> 00:10:59,760
and feel instantaneous.
290
00:10:59,760 --> 00:11:01,440
You only see minor start-up latency
291
00:11:01,440 --> 00:11:04,920
when your demand genuinely exceeds that provisioned baseline.
292
00:11:04,920 --> 00:11:06,840
The premium plan takes a much more aggressive approach
293
00:11:06,840 --> 00:11:09,400
by keeping minimum instances running at all times
294
00:11:09,400 --> 00:11:11,480
without any scale to zero behavior.
295
00:11:11,480 --> 00:11:14,080
Cold start is eliminated across all of your traffic
296
00:11:14,080 --> 00:11:16,200
because those instances never go idle,
297
00:11:16,200 --> 00:11:17,960
but you definitely pay for that guarantee.
298
00:11:17,960 --> 00:11:19,480
You are stuck paying an hourly rate
299
00:11:19,480 --> 00:11:21,520
for reserved compute regardless of whether your function
300
00:11:21,520 --> 00:11:23,640
handles zero requests or a thousand.
301
00:11:23,640 --> 00:11:25,600
If you are running a use case that only happens
302
00:11:25,600 --> 00:11:27,640
during business hours, that pricing model
303
00:11:27,640 --> 00:11:29,800
simply does not make sense for your budget.
304
00:11:29,800 --> 00:11:32,320
Flex with one or two always ready instances
305
00:11:32,320 --> 00:11:35,400
hits the perfect sweet spot for most enterprise teams.
306
00:11:35,400 --> 00:11:37,280
Your baseline co-pilot traffic gets handled
307
00:11:37,280 --> 00:11:39,600
with sub-hundred millisecond start-up latency
308
00:11:39,600 --> 00:11:41,320
while your burst traffic scales dynamically
309
00:11:41,320 --> 00:11:43,360
without you having to pay a massive premium.
310
00:11:43,360 --> 00:11:45,520
Your monthly cost stays reasonable
311
00:11:45,520 --> 00:11:47,160
and you have successfully eliminated
312
00:11:47,160 --> 00:11:48,920
the single biggest performance problem
313
00:11:48,920 --> 00:11:51,600
that plagues serverless AI applications today.
314
00:11:51,600 --> 00:11:53,560
This is the very first technical decision
315
00:11:53,560 --> 00:11:55,880
that separates a real production architecture
316
00:11:55,880 --> 00:11:57,440
from a simple prototype.
317
00:11:57,440 --> 00:12:00,640
Open API specs, the bridge between code and AI.
318
00:12:00,640 --> 00:12:01,720
Now we get to the craft
319
00:12:01,720 --> 00:12:04,440
that actually separates a working plug-in from a broken one.
320
00:12:04,440 --> 00:12:06,040
Your as-your-function might be fast,
321
00:12:06,040 --> 00:12:08,040
your always ready instances are warm
322
00:12:08,040 --> 00:12:09,600
and your seawars code is solid,
323
00:12:09,600 --> 00:12:11,640
but none of that matters if co-pilot never calls
324
00:12:11,640 --> 00:12:13,040
your function in the first place.
325
00:12:13,040 --> 00:12:14,680
That is the job of open API.
326
00:12:14,680 --> 00:12:16,760
It tells the AI when your function is relevant
327
00:12:16,760 --> 00:12:18,240
and exactly how to invoke it.
328
00:12:18,240 --> 00:12:19,320
But here is the problem.
329
00:12:19,320 --> 00:12:22,120
Most developers treat open API like documentation
330
00:12:22,120 --> 00:12:23,840
or something you generate after the fact
331
00:12:23,840 --> 00:12:26,120
just to keep the API docs in sync with the code.
332
00:12:26,120 --> 00:12:29,280
They see it as a boring checkbox on a deployment checklist.
333
00:12:29,280 --> 00:12:31,560
But that is not what open API is anymore.
334
00:12:31,560 --> 00:12:34,120
In reality, open API is the semantic contract
335
00:12:34,120 --> 00:12:37,040
that the language model uses to reason about your function.
336
00:12:37,040 --> 00:12:38,800
It is the specification the AI reads
337
00:12:38,800 --> 00:12:40,520
to decide whether to call you at all
338
00:12:40,520 --> 00:12:42,360
and it serves as the instruction manual
339
00:12:42,360 --> 00:12:45,000
for filling in parameters and interpreting results.
340
00:12:45,000 --> 00:12:45,840
The distinction matters
341
00:12:45,840 --> 00:12:47,880
because documentation is written for humans.
342
00:12:47,880 --> 00:12:49,920
While open API specs for co-pilot
343
00:12:49,920 --> 00:12:51,280
need to be written for models.
344
00:12:51,280 --> 00:12:52,560
Those are not the same thing.
345
00:12:52,560 --> 00:12:55,880
Power platform currently standardizes on open API 2.0,
346
00:12:55,880 --> 00:12:59,200
which you might know as swagger, though support for 3.0
347
00:12:59,200 --> 00:13:01,360
is expanding through 2026.
348
00:13:01,360 --> 00:13:03,280
If you are building a production plug in today,
349
00:13:03,280 --> 00:13:06,000
you should target 2.0 for broad compatibility.
350
00:13:06,000 --> 00:13:08,280
The concepts we are discussing apply to both versions,
351
00:13:08,280 --> 00:13:11,200
but the tooling landscape still leans heavily on swagger.
352
00:13:11,200 --> 00:13:12,680
It starts with the operation ID.
353
00:13:12,680 --> 00:13:15,480
This is the identifier for each endpoint in your spec
354
00:13:15,480 --> 00:13:17,640
and it needs to be stable and verb based.
355
00:13:17,640 --> 00:13:19,600
It should describe what actually happens
356
00:13:19,600 --> 00:13:21,480
rather than what the internal code does.
357
00:13:21,480 --> 00:13:24,480
If you compare validate invoice against process V2,
358
00:13:24,480 --> 00:13:26,480
one tells the model exactly what to expect
359
00:13:26,480 --> 00:13:28,160
while the other tells the model nothing.
360
00:13:28,160 --> 00:13:30,760
The model will try to choose correctly based on context,
361
00:13:30,760 --> 00:13:33,600
but if you force it to guess, it will eventually guess wrong.
362
00:13:33,600 --> 00:13:36,280
When that happens, your entire plug and fail silently.
363
00:13:36,280 --> 00:13:38,200
Descriptions are where most specs fall apart
364
00:13:38,200 --> 00:13:40,680
because developers write descriptions for other developers.
365
00:13:40,680 --> 00:13:42,120
They explain how the endpoint works,
366
00:13:42,120 --> 00:13:44,760
the response format or the parameter validation rules,
367
00:13:44,760 --> 00:13:47,040
but the model does not read like a developer reads.
368
00:13:47,040 --> 00:13:49,560
The model needs to know when to use this operation.
369
00:13:49,560 --> 00:13:53,200
What user intent triggers it and what specific problem it solves.
370
00:13:53,200 --> 00:13:55,440
Compare two descriptions of the same function.
371
00:13:55,440 --> 00:13:58,560
One says returns invoice data from the accounting system
372
00:13:58,560 --> 00:14:01,120
while the other says use this when the user asks
373
00:14:01,120 --> 00:14:03,560
about a specific invoice to retrieve details
374
00:14:03,560 --> 00:14:07,040
including amount, date, vendor and payment status.
375
00:14:07,040 --> 00:14:09,120
The first tells the model nothing useful,
376
00:14:09,120 --> 00:14:12,080
but the second tells the model exactly when to call this operation.
377
00:14:12,080 --> 00:14:13,880
Parameter descriptions work the same way.
378
00:14:13,880 --> 00:14:16,280
You are not writing for developers who can infer the intent,
379
00:14:16,280 --> 00:14:18,920
but for a model that will pass your description to fill in values.
380
00:14:18,920 --> 00:14:21,280
You need to be explicit about format constraints.
381
00:14:21,280 --> 00:14:24,000
Instead of just saying date, you should say invoice date
382
00:14:24,000 --> 00:14:27,360
in ISO 8601 format and do not use relative terms
383
00:14:27,360 --> 00:14:28,680
like yesterday or last quarter.
384
00:14:28,680 --> 00:14:34,560
The model will try to send relative terms
385
00:14:34,560 --> 00:14:36,680
if you do not explicitly forbid them.
386
00:14:36,680 --> 00:14:39,000
And you want to prevent that failure before it happens.
387
00:14:39,000 --> 00:14:40,720
You also need to include examples.
388
00:14:40,720 --> 00:14:43,200
Do not just provide one, but use multiple examples
389
00:14:43,200 --> 00:14:45,480
showing realistic inputs and outputs.
390
00:14:45,480 --> 00:14:48,440
The model uses these as demonstrations of correct usage
391
00:14:48,440 --> 00:14:51,600
and a well-crafted example is worth 1,000 words of explanation.
392
00:14:51,600 --> 00:14:53,440
Response schemas also need to be complete.
393
00:14:53,440 --> 00:14:56,480
You should define every status code your function can return,
394
00:14:56,480 --> 00:15:00,000
including 200 for success, 400 for validation errors,
395
00:15:00,000 --> 00:15:02,400
404 when the resource is missing,
396
00:15:02,400 --> 00:15:05,160
and 429 when rate limits are hit.
397
00:15:05,160 --> 00:15:07,560
For each response, you must describe what it means.
398
00:15:07,560 --> 00:15:10,360
The model uses error schemas to understand failure modes.
399
00:15:10,360 --> 00:15:12,880
So if you do not define what a 400 error looks like,
400
00:15:12,880 --> 00:15:14,800
the model cannot learn to anticipate failure.
401
00:15:14,800 --> 00:15:16,400
It cannot adjust its behavior
402
00:15:16,400 --> 00:15:19,120
and it cannot explain to the user why something went wrong.
403
00:15:19,120 --> 00:15:21,600
This is the bridge between your code and the AI.
404
00:15:21,600 --> 00:15:24,440
If you get it right, co-pilot will use your function reliably,
405
00:15:24,440 --> 00:15:26,960
but if you get it wrong, your function will sit unused
406
00:15:26,960 --> 00:15:29,040
even when it is the perfect tool for the job.
407
00:15:29,040 --> 00:15:31,280
The irony is that this spec is often more important
408
00:15:31,280 --> 00:15:33,000
than the code itself.
409
00:15:33,000 --> 00:15:35,160
Writing open API for LLM reasoning.
410
00:15:35,160 --> 00:15:37,320
Most teams skip this step because they assume
411
00:15:37,320 --> 00:15:40,320
that having a technically correct open API spec is enough.
412
00:15:40,320 --> 00:15:41,000
It isn't.
413
00:15:41,000 --> 00:15:44,040
The model reads your description like a user reads documentation.
414
00:15:44,040 --> 00:15:47,040
So if your language is vague, the model makes vague inferences.
415
00:15:47,040 --> 00:15:49,840
It misses context and calls your operation when it shouldn't
416
00:15:49,840 --> 00:15:51,600
or worse, it doesn't call your operation
417
00:15:51,600 --> 00:15:53,120
when it absolutely should.
418
00:15:53,120 --> 00:15:56,280
One level deeper, we have the problem of operation collisions.
419
00:15:56,280 --> 00:15:57,920
This happens when you have two endpoints
420
00:15:57,920 --> 00:15:59,960
that sound similar from the model's perspective,
421
00:15:59,960 --> 00:16:02,520
like get customer by name and search customers.
422
00:16:02,520 --> 00:16:04,360
Both retrieve customer information
423
00:16:04,360 --> 00:16:06,120
and both accept a name parameter.
424
00:16:06,120 --> 00:16:08,240
So the model has to choose which one to call
425
00:16:08,240 --> 00:16:09,920
based on your descriptions.
426
00:16:09,920 --> 00:16:12,000
If those descriptions are interchangeable,
427
00:16:12,000 --> 00:16:14,480
the model will pick randomly and half your queries
428
00:16:14,480 --> 00:16:16,000
will go to the wrong operation.
429
00:16:16,000 --> 00:16:17,600
The fix isn't to rename everything,
430
00:16:17,600 --> 00:16:20,640
but to be explicit about the scenario each operation handles.
431
00:16:20,640 --> 00:16:22,720
Get customer by name should say, use this
432
00:16:22,720 --> 00:16:24,760
when the user provides an exact customer name
433
00:16:24,760 --> 00:16:26,760
and you need to retrieve a specific record.
434
00:16:26,760 --> 00:16:29,000
Meanwhile, search customers should say,
435
00:16:29,000 --> 00:16:31,040
use this when the user is asking about customers
436
00:16:31,040 --> 00:16:33,160
but hasn't specified a particular name
437
00:16:33,160 --> 00:16:34,840
such as when they ask to see customers
438
00:16:34,840 --> 00:16:36,600
from a specific region.
439
00:16:36,600 --> 00:16:39,520
Now the model can differentiate and it will choose correctly.
440
00:16:39,520 --> 00:16:41,560
Parameter examples are also non-negotiable,
441
00:16:41,560 --> 00:16:44,120
while standard documentation might include examples,
442
00:16:44,120 --> 00:16:46,880
open API specs for co-pilot must include them.
443
00:16:46,880 --> 00:16:50,640
The model uses examples as demonstrations of correct usage.
444
00:16:50,640 --> 00:16:54,640
When you write an example of 2020-4015 for a date parameter,
445
00:16:54,640 --> 00:16:57,640
you are teaching the model that this is the format you expect.
446
00:16:57,640 --> 00:16:59,400
You are preventing it from sending yesterday
447
00:16:59,400 --> 00:17:02,040
or last month or any other natural language expression.
448
00:17:02,040 --> 00:17:03,720
A single well chosen example prevents
449
00:17:03,720 --> 00:17:05,480
entire categories of failures.
450
00:17:05,480 --> 00:17:07,200
Error schemers get overlooked constantly
451
00:17:07,200 --> 00:17:09,440
because developers focus on the happy path.
452
00:17:09,440 --> 00:17:12,000
But the model needs to understand failure parts too
453
00:17:12,000 --> 00:17:14,120
and it needs structured error information to do that.
454
00:17:14,120 --> 00:17:17,240
A consistent error schema with a code, a message,
455
00:17:17,240 --> 00:17:20,200
and details tells the model how to interpret failures.
456
00:17:20,200 --> 00:17:22,560
It learns to recognize when a call failed and why,
457
00:17:22,560 --> 00:17:24,360
which allows it to adjust its behavior
458
00:17:24,360 --> 00:17:26,120
or explain the failure to the user.
459
00:17:26,120 --> 00:17:28,200
Without that structure, errors become opaque
460
00:17:28,200 --> 00:17:29,920
and the model cannot learn from them.
461
00:17:29,920 --> 00:17:32,520
Consider a practical example where your invoice validation
462
00:17:32,520 --> 00:17:35,240
endpoint returns a 400 status when the invoice number
463
00:17:35,240 --> 00:17:36,120
doesn't exist.
464
00:17:36,120 --> 00:17:38,720
If your error response is just a message saying not found,
465
00:17:38,720 --> 00:17:41,480
that is technically correct, but semantically useless.
466
00:17:41,480 --> 00:17:44,280
The model sees a 400 error and knows something went wrong,
467
00:17:44,280 --> 00:17:46,600
but it doesn't know to ask the user for clarification
468
00:17:46,600 --> 00:17:48,400
or suggest trying a different number.
469
00:17:48,400 --> 00:17:50,000
Compare that to a structured response
470
00:17:50,000 --> 00:17:52,120
that includes an invoice not found code
471
00:17:52,120 --> 00:17:54,680
and a suggestion to check the number and try again.
472
00:17:54,680 --> 00:17:56,520
Now the model understands the failure mode
473
00:17:56,520 --> 00:17:58,080
and can recover intelligently.
474
00:17:58,080 --> 00:18:00,080
Finally, you must curate your surface area.
475
00:18:00,080 --> 00:18:02,520
The temptation is to expose every endpoint you have built
476
00:18:02,520 --> 00:18:03,680
because it feels comprehensive,
477
00:18:03,680 --> 00:18:05,440
but in reality that is a disaster.
478
00:18:05,440 --> 00:18:08,520
Every operation you expose adds cognitive load to the model
479
00:18:08,520 --> 00:18:10,560
and every extra operation increases the chance
480
00:18:10,560 --> 00:18:11,680
of a wrong selection.
481
00:18:11,680 --> 00:18:14,720
Every unused endpoint in your spec is just technical debt.
482
00:18:14,720 --> 00:18:16,640
You should remove operations that aren't high value
483
00:18:16,640 --> 00:18:18,760
and hide internal administrative endpoints.
484
00:18:18,760 --> 00:18:21,360
Only expose what co-pilot actually needs to do its job.
485
00:18:21,360 --> 00:18:23,360
This curation process is where your architecture
486
00:18:23,360 --> 00:18:25,400
becomes intentional instead of accidental.
487
00:18:25,400 --> 00:18:27,560
C# Azure Functions, the engine.
488
00:18:27,560 --> 00:18:29,080
Now let's talk about the actual code
489
00:18:29,080 --> 00:18:30,680
that powers this architecture.
490
00:18:30,680 --> 00:18:34,000
Your co-pilot is fast and your open API spec is semantic
491
00:18:34,000 --> 00:18:37,040
and clear, but the function itself is where precision matters.
492
00:18:37,040 --> 00:18:38,560
This is the code that actually runs.
493
00:18:38,560 --> 00:18:40,400
This is where the theory meets the work.
494
00:18:40,400 --> 00:18:42,800
HTTP triggered functions are your standard entry point.
495
00:18:42,800 --> 00:18:45,280
Co-pilot calls an endpoint, your function receives the request,
496
00:18:45,280 --> 00:18:47,480
it processes data and then it returns a response.
497
00:18:47,480 --> 00:18:49,840
This request response pattern is fundamental.
498
00:18:49,840 --> 00:18:52,200
Everything else in this section is about making that cycle
499
00:18:52,200 --> 00:18:53,640
as efficient as possible.
500
00:18:53,640 --> 00:18:55,960
The isolated worker model in Net6 and above
501
00:18:55,960 --> 00:18:58,520
gives you control that you don't get in earlier versions.
502
00:18:58,520 --> 00:19:00,720
You own the startup sequence and manage the middleware,
503
00:19:00,720 --> 00:19:02,680
which means you configure dependency injection
504
00:19:02,680 --> 00:19:05,240
at the application level rather than the function level.
505
00:19:05,240 --> 00:19:06,840
You control what gets initialized
506
00:19:06,840 --> 00:19:10,120
once per application lifetime versus once per request.
507
00:19:10,120 --> 00:19:13,320
This control is exactly how you eliminate startup waste.
508
00:19:13,320 --> 00:19:14,640
Here's the critical pattern.
509
00:19:14,640 --> 00:19:17,440
Use Async and await throughout your entire stack.
510
00:19:17,440 --> 00:19:18,960
This shouldn't just be in your function handler,
511
00:19:18,960 --> 00:19:20,760
but in every layer from database queries
512
00:19:20,760 --> 00:19:23,240
to HTTP calls and cache operations.
513
00:19:23,240 --> 00:19:24,360
The reason is simple.
514
00:19:24,360 --> 00:19:27,120
Blocking calls waste your always ready capacity.
515
00:19:27,120 --> 00:19:29,360
When your code blocks while waiting for a database query,
516
00:19:29,360 --> 00:19:31,240
that thread just sits there idle.
517
00:19:31,240 --> 00:19:32,720
A thread is a scarce resource
518
00:19:32,720 --> 00:19:34,720
and your instance can only handle a limited number
519
00:19:34,720 --> 00:19:36,880
of concurrent requests before blocking starts
520
00:19:36,880 --> 00:19:39,360
queuing up the work with Async and await
521
00:19:39,360 --> 00:19:40,760
that thread returns to the pool.
522
00:19:40,760 --> 00:19:43,000
So your instance can handle the next request.
523
00:19:43,000 --> 00:19:44,360
Your always ready capacity suddenly
524
00:19:44,360 --> 00:19:45,880
serves 10 times the throughput.
525
00:19:45,880 --> 00:19:47,640
This isn't just a performance nicety.
526
00:19:47,640 --> 00:19:49,040
It is architectural.
527
00:19:49,040 --> 00:19:51,200
Without Async, your cold start latency
528
00:19:51,200 --> 00:19:54,160
won't even matter because your warm latency will be terrible anyway.
529
00:19:54,160 --> 00:19:55,280
Next, let's look at singletons.
530
00:19:55,280 --> 00:19:57,480
Your HTTP client, your database connection pool,
531
00:19:57,480 --> 00:19:59,400
and your caching client should all be initialized
532
00:19:59,400 --> 00:20:01,160
once when your function app starts.
533
00:20:01,160 --> 00:20:03,640
Most developers get this right for the HTTP client,
534
00:20:03,640 --> 00:20:05,320
but they miss it for other clients.
535
00:20:05,320 --> 00:20:07,640
A database connection takes time to establish,
536
00:20:07,640 --> 00:20:09,440
and a rate is client needs to authenticate.
537
00:20:09,440 --> 00:20:12,600
So these operations are expensive on every new initialization.
538
00:20:12,600 --> 00:20:14,120
They are only cheap on the second call
539
00:20:14,120 --> 00:20:16,320
because pooling handles the heavy lifting.
540
00:20:16,320 --> 00:20:18,200
Dependency injection handles this elegantly
541
00:20:18,200 --> 00:20:20,240
if you register your clients as singletons
542
00:20:20,240 --> 00:20:22,120
and inject them into your function handler.
543
00:20:22,120 --> 00:20:24,720
Every invocation then reuses the same instance,
544
00:20:24,720 --> 00:20:26,640
but there's a counter-intuitive pattern here.
545
00:20:26,640 --> 00:20:29,720
Use lazy initialization for expensive dependencies.
546
00:20:29,720 --> 00:20:32,360
If your function loads a heavy ML model on startup,
547
00:20:32,360 --> 00:20:35,920
that model consumes memory and CPU while it sits there unused.
548
00:20:35,920 --> 00:20:38,520
If only half your invocations actually need that model,
549
00:20:38,520 --> 00:20:40,720
you're wasting always ready capacity on something
550
00:20:40,720 --> 00:20:41,760
that isn't helping you.
551
00:20:41,760 --> 00:20:43,920
The solution is the lazy pattern in Seahorse.
552
00:20:43,920 --> 00:20:46,320
The dependency initializes on the first actual use
553
00:20:46,320 --> 00:20:47,840
rather than at app startup.
554
00:20:47,840 --> 00:20:50,200
Subsequent uses hit the cached instance instantly.
555
00:20:50,200 --> 00:20:51,960
This is how you keep cold starts low
556
00:20:51,960 --> 00:20:53,920
while still having access to heavy dependencies
557
00:20:53,920 --> 00:20:55,000
when they are needed.
558
00:20:55,000 --> 00:20:57,400
Ready to run publishing is another overlooked lever.
559
00:20:57,400 --> 00:20:59,440
It's a simple flag you set during the build process.
560
00:20:59,440 --> 00:21:02,000
Instead of your net assemblies being jade compiled
561
00:21:02,000 --> 00:21:03,560
at run time on the first use,
562
00:21:03,560 --> 00:21:05,040
they're pre-compiled ahead of time.
563
00:21:05,040 --> 00:21:07,480
Your function starts and all the code is already ready
564
00:21:07,480 --> 00:21:08,480
to execute.
565
00:21:08,480 --> 00:21:11,400
There is no jit overhead and no compilation delay.
566
00:21:11,400 --> 00:21:13,080
For latency critical functions,
567
00:21:13,080 --> 00:21:15,000
ready to run saves hundreds of milliseconds
568
00:21:15,000 --> 00:21:16,160
on a cold start.
569
00:21:16,160 --> 00:21:18,120
It also provides faster warm execution.
570
00:21:18,120 --> 00:21:19,640
It's a pure win and the trade-off
571
00:21:19,640 --> 00:21:21,240
of a slightly larger deployment package
572
00:21:21,240 --> 00:21:23,600
is irrelevant compared to the latency gain.
573
00:21:23,600 --> 00:21:25,400
These patterns are not optional for production
574
00:21:25,400 --> 00:21:26,800
co-pilot plugins.
575
00:21:26,800 --> 00:21:29,360
Using async throughout singletons for clients,
576
00:21:29,360 --> 00:21:31,320
lazy initialization for expensive work
577
00:21:31,320 --> 00:21:32,920
and ready to run publishing is the difference
578
00:21:32,920 --> 00:21:34,920
between a function that takes two seconds
579
00:21:34,920 --> 00:21:37,720
and one that takes 100 milliseconds every time.
580
00:21:37,720 --> 00:21:40,160
The actual business logic is usually straightforward.
581
00:21:40,160 --> 00:21:42,160
You validate invoices query databases
582
00:21:42,160 --> 00:21:43,880
or call downstream APIs.
583
00:21:43,880 --> 00:21:46,360
But the infrastructure around that logic determines
584
00:21:46,360 --> 00:21:48,760
whether your plugin feels instant or sluggish.
585
00:21:48,760 --> 00:21:51,240
This is where developers who understand serverless architecture
586
00:21:51,240 --> 00:21:52,760
separate themselves from developers
587
00:21:52,760 --> 00:21:54,840
who just run code in the cloud.
588
00:21:54,840 --> 00:21:57,600
Securing the function identity and network.
589
00:21:57,600 --> 00:21:59,600
A fast function that leaks data is worse
590
00:21:59,600 --> 00:22:01,520
than a slow function that stays secure.
591
00:22:01,520 --> 00:22:04,200
Performance is meaningless without a hardened perimeter.
592
00:22:04,200 --> 00:22:06,160
Your function lives in Azure, handles requests
593
00:22:06,160 --> 00:22:08,840
from power platform and calls downstream systems.
594
00:22:08,840 --> 00:22:10,840
It touches sensitive business data.
595
00:22:10,840 --> 00:22:12,640
Every connection point is an attack service.
596
00:22:12,640 --> 00:22:14,600
Every decision you make about authentication
597
00:22:14,600 --> 00:22:17,080
and network topology determines whether your plugin
598
00:22:17,080 --> 00:22:18,440
is secure or vulnerable.
599
00:22:18,440 --> 00:22:19,680
Start with identity.
600
00:22:19,680 --> 00:22:22,400
Managed identities are the foundational security pattern
601
00:22:22,400 --> 00:22:23,760
for Azure workloads.
602
00:22:23,760 --> 00:22:25,920
Instead of storing API keys or connection strings
603
00:22:25,920 --> 00:22:27,080
in your configuration,
604
00:22:27,080 --> 00:22:28,960
your function app gets an identity issued
605
00:22:28,960 --> 00:22:30,840
by Microsoft EntraID.
606
00:22:30,840 --> 00:22:32,480
When your function needs to authenticate
607
00:22:32,480 --> 00:22:34,400
to Azure SQL or Azure Storage,
608
00:22:34,400 --> 00:22:36,520
it proves its identity cryptographically.
609
00:22:36,520 --> 00:22:39,520
There are no secrets to rotate and no keys embedded in code.
610
00:22:39,520 --> 00:22:41,480
There are no credential leaks waiting to happen
611
00:22:41,480 --> 00:22:43,680
because the Azure platform handles all of it.
612
00:22:43,680 --> 00:22:46,120
Your only job is to request the identity
613
00:22:46,120 --> 00:22:48,760
and assign it the minimum permissions it needs.
614
00:22:48,760 --> 00:22:51,120
This pattern extends beyond Azure services.
615
00:22:51,120 --> 00:22:52,880
If your function needs to call a REST API
616
00:22:52,880 --> 00:22:54,320
that supports EntraID,
617
00:22:54,320 --> 00:22:56,720
your function simply authenticates as itself
618
00:22:56,720 --> 00:22:58,600
using that managed identity.
619
00:22:58,600 --> 00:23:00,440
It requests a token from EntraID
620
00:23:00,440 --> 00:23:02,680
and passes that token in the authorization header.
621
00:23:02,680 --> 00:23:04,480
The downstream service validates the token
622
00:23:04,480 --> 00:23:06,560
and grants access based on the permissions assigned
623
00:23:06,560 --> 00:23:07,840
to your functions identity.
624
00:23:07,840 --> 00:23:09,480
The entire handshake is cryptographic
625
00:23:09,480 --> 00:23:11,400
so no secrets ever touch your code.
626
00:23:11,400 --> 00:23:13,880
Now we have to consider who can call your function.
627
00:23:13,880 --> 00:23:16,360
By default, an HTTP triggered function in Azure
628
00:23:16,360 --> 00:23:17,960
is protected by a function key.
629
00:23:17,960 --> 00:23:19,720
It's a simple bearer token that you pass
630
00:23:19,720 --> 00:23:20,920
as a query parameter.
631
00:23:20,920 --> 00:23:23,560
Azure validates it and while this works for internal tools,
632
00:23:23,560 --> 00:23:25,840
it doesn't work for production co-pilot plugins.
633
00:23:25,840 --> 00:23:28,680
A key is a secret and secrets get leaked or shared.
634
00:23:28,680 --> 00:23:31,080
Eventually, secrets stop being secret.
635
00:23:31,080 --> 00:23:34,120
Microsoft EntraID authentication is the production pattern you need.
636
00:23:34,120 --> 00:23:37,320
You register your function app as an application in EntraID
637
00:23:37,320 --> 00:23:40,320
and configure it to accept tokens issued for that application.
638
00:23:40,320 --> 00:23:41,800
Power Platform's custom connector
639
00:23:41,800 --> 00:23:43,880
creates an app registration in your tenant
640
00:23:43,880 --> 00:23:45,840
and you grant that app permission to your function.
641
00:23:45,840 --> 00:23:47,320
When the connector calls your function,
642
00:23:47,320 --> 00:23:51,280
it includes a JWT token that proves it's authorized.
643
00:23:51,280 --> 00:23:53,040
As your validates the token signature,
644
00:23:53,040 --> 00:23:55,120
which is cryptographically bound to your function.
645
00:23:55,120 --> 00:23:58,000
Only calls from Power Platform with a valid token can reach your code.
646
00:23:58,000 --> 00:24:00,280
Unauthorized callers get rejected at the platform level
647
00:24:00,280 --> 00:24:01,960
before your code even runs.
648
00:24:01,960 --> 00:24:03,760
Network topology matters just as much.
649
00:24:03,760 --> 00:24:05,680
By default, your function is publicly accessible
650
00:24:05,680 --> 00:24:06,920
and the internet can reach it.
651
00:24:06,920 --> 00:24:10,320
That's convenient for testing, but it's a disaster for production.
652
00:24:10,320 --> 00:24:12,240
Private endpoints change this equation.
653
00:24:12,240 --> 00:24:14,800
You configure a private endpoint for your function
654
00:24:14,800 --> 00:24:17,320
so traffic routes through Azure's internal network.
655
00:24:17,320 --> 00:24:19,600
Your function has no public IP address,
656
00:24:19,600 --> 00:24:22,040
which means there's no way to reach it from the internet.
657
00:24:22,040 --> 00:24:23,800
It exists only inside your vnet
658
00:24:23,800 --> 00:24:25,520
and is accessible only to resources
659
00:24:25,520 --> 00:24:27,240
that have authorized access to that vnet.
660
00:24:27,240 --> 00:24:29,840
This private topology creates a hardened perimeter.
661
00:24:29,840 --> 00:24:32,840
Power Platform reaches your function through a private connection
662
00:24:32,840 --> 00:24:35,000
and the downstream systems your function calls
663
00:24:35,000 --> 00:24:37,040
are also behind private endpoints.
664
00:24:37,040 --> 00:24:39,160
Traffic never leaves the Microsoft network.
665
00:24:39,160 --> 00:24:41,200
There's no exposure window and no internet route
666
00:24:41,200 --> 00:24:42,960
where packets could be intercepted.
667
00:24:42,960 --> 00:24:45,800
The on-behave for flow is a subtler but crucial pattern.
668
00:24:45,800 --> 00:24:47,560
When a user interacts with a co-pilot,
669
00:24:47,560 --> 00:24:49,200
they're authenticated to Power Platform
670
00:24:49,200 --> 00:24:50,320
under their own identity.
671
00:24:50,320 --> 00:24:52,760
When your function calls a downstream system,
672
00:24:52,760 --> 00:24:56,160
it should act on behalf of that user rather than itself.
673
00:24:56,160 --> 00:24:58,760
The obi-o-flow handles this by receiving a token
674
00:24:58,760 --> 00:25:01,160
that represents the authenticated user.
675
00:25:01,160 --> 00:25:03,800
It exchanges that token for a downstream token
676
00:25:03,800 --> 00:25:05,640
that preserves the user's identity.
677
00:25:05,640 --> 00:25:08,680
The downstream system logs the action under the user's account,
678
00:25:08,680 --> 00:25:10,440
which is how you maintain audit trails
679
00:25:10,440 --> 00:25:12,560
and enforce user-level authorization.
680
00:25:12,560 --> 00:25:14,760
API management sits in front of your function
681
00:25:14,760 --> 00:25:17,080
to apply rate, limiting, and throttling.
682
00:25:17,080 --> 00:25:19,440
A single misbehaving flow can't overwhelm your backend
683
00:25:19,440 --> 00:25:22,200
because API am enforces quotas at the platform level.
684
00:25:22,200 --> 00:25:24,440
Your function doesn't need to implement this itself
685
00:25:24,440 --> 00:25:25,600
or check rate limits.
686
00:25:25,600 --> 00:25:27,920
It doesn't need to decide whether to accept or reject
687
00:25:27,920 --> 00:25:29,440
a request because APM does that
688
00:25:29,440 --> 00:25:31,560
before traffic ever reaches your function.
689
00:25:31,560 --> 00:25:33,000
Logging completes the picture.
690
00:25:33,000 --> 00:25:35,640
Every request should be logged with a correlation ID
691
00:25:35,640 --> 00:25:37,400
that traces it through your system.
692
00:25:37,400 --> 00:25:39,800
User context and operation outcomes matter
693
00:25:39,800 --> 00:25:42,160
and application insights is where this telemetry lives.
694
00:25:42,160 --> 00:25:45,320
It's queryable, it's retentable, and it's auditable.
695
00:25:45,320 --> 00:25:46,600
These patterns are the foundation
696
00:25:46,600 --> 00:25:48,360
of a secure production plugin.
697
00:25:48,360 --> 00:25:51,000
Managed identities, enter ID authentication,
698
00:25:51,000 --> 00:25:53,600
private endpoints, obo flows, and comprehensive logging
699
00:25:53,600 --> 00:25:57,160
aren't optional if you want to build something professional.
700
00:25:57,160 --> 00:26:00,240
The DLP bypass risk and how to mitigate it.
701
00:26:00,240 --> 00:26:03,440
Your function is secure, your custom connector is authenticated,
702
00:26:03,440 --> 00:26:04,680
and your network is hardened.
703
00:26:04,680 --> 00:26:07,080
But there is a structural vulnerability sitting right
704
00:26:07,080 --> 00:26:09,320
in front of you that most teams simply do not see
705
00:26:09,320 --> 00:26:11,080
until it becomes a massive problem.
706
00:26:11,080 --> 00:26:12,520
Custom connectors can be weaponized
707
00:26:12,520 --> 00:26:14,320
to bypass your DLP policies.
708
00:26:14,320 --> 00:26:17,200
This is not just a theoretical risk or a what-if scenario.
709
00:26:17,200 --> 00:26:18,040
It is documented.
710
00:26:18,040 --> 00:26:20,280
Security research has shown that a custom connector
711
00:26:20,280 --> 00:26:23,000
can internally reach a service that is explicitly blocked
712
00:26:23,000 --> 00:26:25,360
by your organization's DLP policies.
713
00:26:25,360 --> 00:26:27,720
The blocked connector stays blocked at the policy level,
714
00:26:27,720 --> 00:26:29,920
but the functionality becomes accessible anyway,
715
00:26:29,920 --> 00:26:31,640
because it is being proxy through something
716
00:26:31,640 --> 00:26:33,400
that is not on the block list.
717
00:26:33,400 --> 00:26:36,320
Here is how that actually happens in a real environment.
718
00:26:36,320 --> 00:26:38,520
Your organization has sensitive data in Salesforce,
719
00:26:38,520 --> 00:26:40,160
and your CFO makes a policy decision
720
00:26:40,160 --> 00:26:42,560
that Salesforce cannot connect to public cloud storage.
721
00:26:42,560 --> 00:26:46,040
A DLP rule is created, Salesforce moves to the blocked category
722
00:26:46,040 --> 00:26:47,720
and suddenly nobody can create flows
723
00:26:47,720 --> 00:26:50,840
that move data from Salesforce to Dropbox or OneDrive.
724
00:26:50,840 --> 00:26:53,840
The policy is enforced, and on the surface, everything looks fine.
725
00:26:53,840 --> 00:26:55,680
But then someone builds a custom connector
726
00:26:55,680 --> 00:26:58,640
that calls a webhook hosted in their personal Azure account.
727
00:26:58,640 --> 00:27:01,240
The custom connector sends Salesforce data to that webhook
728
00:27:01,240 --> 00:27:02,920
and the webhook writes it to Dropbox.
729
00:27:02,920 --> 00:27:05,200
The entire path goes around the DLP rule
730
00:27:05,200 --> 00:27:07,320
because the blocked connector never appears
731
00:27:07,320 --> 00:27:08,520
in the flow definition.
732
00:27:08,520 --> 00:27:09,760
The policy audit shows nothing,
733
00:27:09,760 --> 00:27:12,120
the custom connector is marked as business approved,
734
00:27:12,120 --> 00:27:13,440
and everything looks legitimate.
735
00:27:13,440 --> 00:27:16,040
Now, your sensitive data is sitting in a storage account
736
00:27:16,040 --> 00:27:18,640
that is completely outside of your organization's control.
737
00:27:18,640 --> 00:27:20,760
This is the structural vulnerability you have to face.
738
00:27:20,760 --> 00:27:22,560
Custom connectors are integration points
739
00:27:22,560 --> 00:27:25,440
that can proxy traffic to restricted destinations
740
00:27:25,440 --> 00:27:27,720
and unless you treat them as privileged assets,
741
00:27:27,720 --> 00:27:31,000
they become escape hatches from your entire governance model.
742
00:27:31,000 --> 00:27:34,120
Your first instinct might be to ban custom connectors entirely,
743
00:27:34,120 --> 00:27:35,520
but that is the wrong move,
744
00:27:35,520 --> 00:27:37,280
that kills legitimate use cases
745
00:27:37,280 --> 00:27:38,960
and stops your developers from integrating
746
00:27:38,960 --> 00:27:42,400
with systems that Power Platform does not natively support.
747
00:27:42,400 --> 00:27:44,280
Baning them does not solve the problem.
748
00:27:44,280 --> 00:27:45,720
It just creates pressure for people
749
00:27:45,720 --> 00:27:47,720
to find even more dangerous workarounds.
750
00:27:47,720 --> 00:27:49,320
The right answer is architecture.
751
00:27:49,320 --> 00:27:51,360
The custom connector should never have direct access
752
00:27:51,360 --> 00:27:52,760
to your downstream systems.
753
00:27:52,760 --> 00:27:54,880
Instead, it should root through a hardened gateway
754
00:27:54,880 --> 00:27:56,200
like API management.
755
00:27:56,200 --> 00:27:58,640
When you put APM in front of your Azure functions,
756
00:27:58,640 --> 00:28:01,280
it enforces your policies before any traffic ever reaches
757
00:28:01,280 --> 00:28:02,480
your function code.
758
00:28:02,480 --> 00:28:04,520
APIM becomes your central control point.
759
00:28:04,520 --> 00:28:06,240
It validates that the caller is authorized,
760
00:28:06,240 --> 00:28:07,520
it checks rate limits,
761
00:28:07,520 --> 00:28:09,640
and it logs every single request that comes through.
762
00:28:09,640 --> 00:28:12,320
But critically, it can enforce endpoint filtering.
763
00:28:12,320 --> 00:28:15,360
You define exactly which domains the API is allowed to reach
764
00:28:15,360 --> 00:28:16,960
and everything else gets blocked.
765
00:28:16,960 --> 00:28:19,760
Your custom connector can call your function a thousand times,
766
00:28:19,760 --> 00:28:22,000
but if that function tries to reach a blocked service,
767
00:28:22,000 --> 00:28:23,360
APIM rejects it.
768
00:28:23,360 --> 00:28:25,280
The policy is enforced at the gateway level
769
00:28:25,280 --> 00:28:27,400
rather than at the individual connector.
770
00:28:27,400 --> 00:28:29,680
This requires a fundamental mindset shift.
771
00:28:29,680 --> 00:28:31,600
You have to stop thinking of the custom connector
772
00:28:31,600 --> 00:28:33,920
as a black box that you trust to do the right thing,
773
00:28:33,920 --> 00:28:36,120
start thinking of it as a potential attack surface
774
00:28:36,120 --> 00:28:39,000
that is privileged, restricted, and constantly monitored.
775
00:28:39,000 --> 00:28:41,840
Operationally, this means you need real governance
776
00:28:41,840 --> 00:28:44,280
around who can create these connectors.
777
00:28:44,280 --> 00:28:46,280
You cannot let every developer build them.
778
00:28:46,280 --> 00:28:48,560
You need a small, specialized team.
779
00:28:48,560 --> 00:28:50,600
You also need to limit who can modify them
780
00:28:50,600 --> 00:28:52,000
and require formal approval
781
00:28:52,000 --> 00:28:53,840
before anything is deployed to production.
782
00:28:53,840 --> 00:28:56,440
Every connector must be treated as a tracked asset.
783
00:28:56,440 --> 00:28:58,120
The open API spec for your function
784
00:28:58,120 --> 00:29:00,080
becomes part of your enforcement strategy.
785
00:29:00,080 --> 00:29:01,640
You do not expose operations
786
00:29:01,640 --> 00:29:03,920
that would let someone call restricted services
787
00:29:03,920 --> 00:29:06,240
or upload files to arbitrary storage.
788
00:29:06,240 --> 00:29:09,440
You do not expose operations that bypass authentication.
789
00:29:09,440 --> 00:29:11,040
You design your functions contract
790
00:29:11,040 --> 00:29:14,040
with the assumption that the caller might be hostile.
791
00:29:14,040 --> 00:29:15,600
This might sound paranoid to some,
792
00:29:15,600 --> 00:29:17,440
but it is actually just realistic.
793
00:29:17,440 --> 00:29:20,000
Most people do not intentionally try to break the rules,
794
00:29:20,000 --> 00:29:22,640
but they do find creative solutions to business problems
795
00:29:22,640 --> 00:29:25,720
without understanding the security implications.
796
00:29:25,720 --> 00:29:28,080
A hardened perimeter prevents those good intentions
797
00:29:28,080 --> 00:29:30,680
from turning into serious security incidents.
798
00:29:30,680 --> 00:29:33,320
API management as the security front door.
799
00:29:33,320 --> 00:29:35,800
This is where your architecture starts to get sophisticated.
800
00:29:35,800 --> 00:29:37,200
API management is the component
801
00:29:37,200 --> 00:29:39,600
that takes a collection of scattered security concerns
802
00:29:39,600 --> 00:29:42,360
and transforms them into a unified enforcement layer.
803
00:29:42,360 --> 00:29:44,160
In the operational reality of the setup,
804
00:29:44,160 --> 00:29:47,080
API management sits directly between power platform
805
00:29:47,080 --> 00:29:48,480
and your Azure functions.
806
00:29:48,480 --> 00:29:51,400
It is the only thing your custom connector can actually reach.
807
00:29:51,400 --> 00:29:53,320
Your function does not have a public endpoint
808
00:29:53,320 --> 00:29:55,680
and there is no direct route from the connector
809
00:29:55,680 --> 00:29:56,840
to the function code.
810
00:29:56,840 --> 00:29:59,120
Everything has to flow through APM first.
811
00:29:59,120 --> 00:30:00,760
This is a decisive architectural choice
812
00:30:00,760 --> 00:30:03,200
because it makes APM your single point of control.
813
00:30:03,200 --> 00:30:05,200
Every request is inspected, logged,
814
00:30:05,200 --> 00:30:07,240
and evaluated against your policies
815
00:30:07,240 --> 00:30:09,000
before it ever touches your back end.
816
00:30:09,000 --> 00:30:11,200
Think about how this simplifies authentication.
817
00:30:11,200 --> 00:30:13,400
Your custom connector might use API keys,
818
00:30:13,400 --> 00:30:15,120
while your downstream services use
819
00:30:15,120 --> 00:30:16,720
"EntraID" with certificates
820
00:30:16,720 --> 00:30:19,040
and your internal API is "Use OAuth".
821
00:30:19,040 --> 00:30:21,360
APIM can accept all three of these types at the edge,
822
00:30:21,360 --> 00:30:23,000
normalize them into a standard format
823
00:30:23,000 --> 00:30:25,080
and then pass a clean request to your function.
824
00:30:25,080 --> 00:30:26,640
Your function does not need to understand
825
00:30:26,640 --> 00:30:29,920
the complexity of certificate validation or OAuth flows.
826
00:30:29,920 --> 00:30:31,960
It just receives a header saying the request
827
00:30:31,960 --> 00:30:35,040
is from a trusted caller and proceeds with the work.
828
00:30:35,040 --> 00:30:37,080
This decoupling makes your entire architecture
829
00:30:37,080 --> 00:30:39,000
much easier to maintain over time.
830
00:30:39,000 --> 00:30:41,960
The real power of APM emerges when you start using policies.
831
00:30:41,960 --> 00:30:44,280
A policy is just a rule that runs on inbound
832
00:30:44,280 --> 00:30:45,520
or outbound traffic.
833
00:30:45,520 --> 00:30:48,400
You can use them to inject headers, validate JWT tokens
834
00:30:48,400 --> 00:30:50,640
or transform request bodies by stripping out fields
835
00:30:50,640 --> 00:30:52,040
your function does not need.
836
00:30:52,040 --> 00:30:54,080
You can log every request to a security system
837
00:30:54,080 --> 00:30:56,280
for auditing or enforce schema validation
838
00:30:56,280 --> 00:30:58,680
so your function never sees malformed data.
839
00:30:58,680 --> 00:31:00,480
This is what we call preventive security.
840
00:31:00,480 --> 00:31:02,120
Your function does not get garbage input
841
00:31:02,120 --> 00:31:03,720
because APM rejects it at the door.
842
00:31:03,720 --> 00:31:05,680
It does not get called by unauthorized users
843
00:31:05,680 --> 00:31:08,840
because APM validates credentials before forwarding the request.
844
00:31:08,840 --> 00:31:10,800
You do not have to write basic security checks
845
00:31:10,800 --> 00:31:13,840
into your code because the gateway has already handled them.
846
00:31:13,840 --> 00:31:17,040
Rate limiting and quota management also live at this layer.
847
00:31:17,040 --> 00:31:19,280
You can define exactly how many requests per minute
848
00:31:19,280 --> 00:31:21,360
an API key can send, which prevents
849
00:31:21,360 --> 00:31:23,840
a single misbehaving flow and power platform
850
00:31:23,840 --> 00:31:25,400
from overwhelming your backend.
851
00:31:25,400 --> 00:31:27,520
If a flow starts sending millions of requests,
852
00:31:27,520 --> 00:31:30,840
APM throttles it with a too many requests response.
853
00:31:30,840 --> 00:31:32,720
This forces the flow to back off
854
00:31:32,720 --> 00:31:35,640
and ensures your function stays available for everyone else.
855
00:31:35,640 --> 00:31:37,600
Without this, a simple bug in a flow
856
00:31:37,600 --> 00:31:39,120
becomes a denial of service attack
857
00:31:39,120 --> 00:31:40,640
against your own infrastructure.
858
00:31:40,640 --> 00:31:43,160
APM also allows you to route traffic intelligently.
859
00:31:43,160 --> 00:31:44,680
You can set up rules to send traffic
860
00:31:44,680 --> 00:31:48,240
from specific IP addresses to a canary end point for testing.
861
00:31:48,240 --> 00:31:50,680
You can run A/B tests by sending a small percentage
862
00:31:50,680 --> 00:31:52,640
of traffic to a new version of your function
863
00:31:52,640 --> 00:31:54,600
while the rest stays on the stable version.
864
00:31:54,600 --> 00:31:56,920
This gives you granular control over roll outs.
865
00:31:56,920 --> 00:31:58,600
Instead of switching versions all at once
866
00:31:58,600 --> 00:32:01,720
and hoping for the best, you can move slowly and reduce risk.
867
00:32:01,720 --> 00:32:04,720
You should always publish your open API spec from APIM
868
00:32:04,720 --> 00:32:06,280
rather than the function itself.
869
00:32:06,280 --> 00:32:09,280
Your function's actual URL might be hidden deep in Azure
870
00:32:09,280 --> 00:32:11,080
and power platform should never see it.
871
00:32:11,080 --> 00:32:12,560
When power platform imports the spec,
872
00:32:12,560 --> 00:32:14,360
it only sees the APIM endpoint.
873
00:32:14,360 --> 00:32:17,080
This ensures the spec always matches the enforced behavior.
874
00:32:17,080 --> 00:32:18,840
You cannot accidentally bypass the gateway
875
00:32:18,840 --> 00:32:21,000
because the connector has no idea the function endpoint
876
00:32:21,000 --> 00:32:21,960
even exists.
877
00:32:21,960 --> 00:32:24,000
This architecture finally answers the big question
878
00:32:24,000 --> 00:32:27,200
of how to stop custom connectors from becoming bypass mechanisms.
879
00:32:27,200 --> 00:32:28,760
You eliminate the bypass entirely.
880
00:32:28,760 --> 00:32:31,360
The connector cannot reach anything except the gateway
881
00:32:31,360 --> 00:32:33,440
and the gateway enforces policies
882
00:32:33,440 --> 00:32:35,440
that are immutable to the average user.
883
00:32:35,440 --> 00:32:37,400
There is no side channel and no workaround.
884
00:32:37,400 --> 00:32:39,160
APIM is the enforcement surface
885
00:32:39,160 --> 00:32:42,120
that turns your governance intentions into a technical reality.
886
00:32:42,120 --> 00:32:44,600
It is the difference between having a policy on paper
887
00:32:44,600 --> 00:32:46,960
and actually preventing data from leaving your environment.
888
00:32:46,960 --> 00:32:49,360
This is the point where security stops being a document
889
00:32:49,360 --> 00:32:52,240
and starts being part of your infrastructure.
890
00:32:52,240 --> 00:32:54,360
Durable functions for stateful workflows.
891
00:32:54,360 --> 00:32:56,320
Most of the patterns we have discussed so far
892
00:32:56,320 --> 00:32:59,160
follow a very simple logic where a request comes in,
893
00:32:59,160 --> 00:33:02,880
the function processes it and a response goes out immediately.
894
00:33:02,880 --> 00:33:04,960
The single execution cycle works perfectly
895
00:33:04,960 --> 00:33:07,600
for things like data transformation or basic validation
896
00:33:07,600 --> 00:33:10,920
but real enterprise workflows almost never fit into that tiny box.
897
00:33:10,920 --> 00:33:13,240
Imagine your co-pilot user needs a multi-step approval
898
00:33:13,240 --> 00:33:14,320
for a new project.
899
00:33:14,320 --> 00:33:17,160
First, a request is created and rooted to a manager
900
00:33:17,160 --> 00:33:19,600
who then takes a few hours to approve or reject it.
901
00:33:19,600 --> 00:33:22,200
If they say yes, the request moves to the finance department
902
00:33:22,200 --> 00:33:24,840
for a second review and only after that final decision
903
00:33:24,840 --> 00:33:27,320
is the resource actually provisioned or archived.
904
00:33:27,320 --> 00:33:29,800
This entire sequence could easily take three days to finish
905
00:33:29,800 --> 00:33:32,400
and no HTTP request on Earth is going to stay open
906
00:33:32,400 --> 00:33:33,680
for 72 hours.
907
00:33:33,680 --> 00:33:36,120
Functions aren't designed to sit idle and wait for days
908
00:33:36,120 --> 00:33:37,680
and if you try to force them to,
909
00:33:37,680 --> 00:33:39,600
the system will simply time out and fail.
910
00:33:39,600 --> 00:33:42,360
This is the exact point where standard HTTP triggered functions
911
00:33:42,360 --> 00:33:45,360
hit a wall because they are completely stateless by design.
912
00:33:45,360 --> 00:33:47,920
Every time a function runs, it starts with a blank slate
913
00:33:47,920 --> 00:33:50,000
and has no memory of what happened five minutes ago
914
00:33:50,000 --> 00:33:52,120
or what the previous step accomplished.
915
00:33:52,120 --> 00:33:54,120
When your workflow needs to pause at step two
916
00:33:54,120 --> 00:33:55,960
to wait for a human to click a button,
917
00:33:55,960 --> 00:33:58,840
a standard function has nowhere to store that progress.
918
00:33:58,840 --> 00:34:00,520
You could try to build your own state management
919
00:34:00,520 --> 00:34:02,800
using external databases and complex timers,
920
00:34:02,800 --> 00:34:05,480
but that approach is usually brittle, hard to maintain
921
00:34:05,480 --> 00:34:07,880
and prone to breaking under pressure.
922
00:34:07,880 --> 00:34:10,320
Durable functions were built to solve this specific problem
923
00:34:10,320 --> 00:34:12,440
by using a collection of specialized functions
924
00:34:12,440 --> 00:34:14,240
that work together as a team.
925
00:34:14,240 --> 00:34:16,720
The orchestrator function acts as the brain of the operation
926
00:34:16,720 --> 00:34:19,400
and defines the entire workflow like a state machine.
927
00:34:19,400 --> 00:34:22,560
It decides which step comes next, waits for external conditions
928
00:34:22,560 --> 00:34:25,960
to be met and handles retries of a specific task fails.
929
00:34:25,960 --> 00:34:27,680
Because the orchestrator is deterministic
930
00:34:27,680 --> 00:34:30,880
and has no side effects, it doesn't make API calls itself,
931
00:34:30,880 --> 00:34:34,080
but instead coordinates the entire execution from start to finish.
932
00:34:34,080 --> 00:34:36,720
The actual heavy lifting is handled by activity functions
933
00:34:36,720 --> 00:34:39,760
which behave like the standard functions you are already used to.
934
00:34:39,760 --> 00:34:42,360
These activities are the ones that call downstream systems,
935
00:34:42,360 --> 00:34:45,800
validate your data or send out notification emails to the team.
936
00:34:45,800 --> 00:34:48,240
If an activity function fails or times out,
937
00:34:48,240 --> 00:34:50,120
the orchestrator sees that failure immediately
938
00:34:50,120 --> 00:34:52,840
and decides whether to try again or trigger a rollback.
939
00:34:52,840 --> 00:34:54,440
Everything starts with a client function,
940
00:34:54,440 --> 00:34:57,480
which is the entry point that triggers the entire orchestration.
941
00:34:57,480 --> 00:34:59,520
When a user asks for that approval workflow,
942
00:34:59,520 --> 00:35:01,280
the client function kicks off the process
943
00:35:01,280 --> 00:35:03,280
and hands back an instance ID right away.
944
00:35:03,280 --> 00:35:05,520
The user sees a message saying their request was submitted
945
00:35:05,520 --> 00:35:08,440
with a specific tracking number, giving them instant feedback
946
00:35:08,440 --> 00:35:11,360
while the actual work happens asynchronously in the background.
947
00:35:11,360 --> 00:35:13,600
This architecture completely flips the traditional model
948
00:35:13,600 --> 00:35:14,440
on its head.
949
00:35:14,440 --> 00:35:17,200
Instead of making a user wait for a complex process to finish,
950
00:35:17,200 --> 00:35:19,200
the workflow moves forward on its own timeline
951
00:35:19,200 --> 00:35:21,000
without blocking the interface.
952
00:35:21,000 --> 00:35:23,040
The user can check the status whenever they want
953
00:35:23,040 --> 00:35:25,000
or the system can send a web hook callback
954
00:35:25,000 --> 00:35:26,640
once the final step is complete.
955
00:35:26,640 --> 00:35:28,840
One of the biggest advantages is that the orchestrator
956
00:35:28,840 --> 00:35:32,000
treats the entire workflow as a piece of data that it can track.
957
00:35:32,000 --> 00:35:33,920
It knows exactly which activities finished
958
00:35:33,920 --> 00:35:36,320
and where the process stands at any given second.
959
00:35:36,320 --> 00:35:38,040
If the function app crashes or as you
960
00:35:38,040 --> 00:35:40,120
ammoves your instance to a different server,
961
00:35:40,120 --> 00:35:42,560
the orchestrator has a full record of the state
962
00:35:42,560 --> 00:35:44,360
and can pick up right where it left off.
963
00:35:44,360 --> 00:35:47,280
You never lose work and you never have to restart a three-day process
964
00:35:47,280 --> 00:35:50,160
from the very beginning just because of a minor system update.
965
00:35:50,160 --> 00:35:52,120
Resilience is also much easier to manage
966
00:35:52,120 --> 00:35:54,880
because retryser handled automatically by the platform.
967
00:35:54,880 --> 00:35:57,400
If a service is temporarily down or hitting a rate limit,
968
00:35:57,400 --> 00:35:59,680
you can set a retry policy with exponential back off
969
00:35:59,680 --> 00:36:01,040
at the orchestration level.
970
00:36:01,040 --> 00:36:02,600
This means every single activity gets
971
00:36:02,600 --> 00:36:04,520
that same level of protection without you
972
00:36:04,520 --> 00:36:06,280
having to write custom error handling code
973
00:36:06,280 --> 00:36:08,400
for every individual function.
974
00:36:08,400 --> 00:36:10,280
Durable functions really show their value
975
00:36:10,280 --> 00:36:13,480
when you need to handle fan out and fan in scenarios.
976
00:36:13,480 --> 00:36:16,120
If your workflow needs to run five different validation
977
00:36:16,120 --> 00:36:18,160
checks at the same time, the orchestrator
978
00:36:18,160 --> 00:36:21,000
can spawn all five activities simultaneously
979
00:36:21,000 --> 00:36:23,400
rather than waiting for them to finish one by one.
980
00:36:23,400 --> 00:36:25,880
Once every parallel task reports back with the result,
981
00:36:25,880 --> 00:36:28,880
the orchestrator fans back in and moves to the next phase of the project.
982
00:36:28,880 --> 00:36:31,720
This type of parallel execution cuts down on total latency
983
00:36:31,720 --> 00:36:34,640
and makes your enterprise workflows feel significantly faster.
984
00:36:34,640 --> 00:36:36,760
Status tracking is built directly into the runtime
985
00:36:36,760 --> 00:36:38,240
so you don't need to build custom tables
986
00:36:38,240 --> 00:36:39,760
to see if a process is still running.
987
00:36:39,760 --> 00:36:42,560
You can simply query the system for a specific instance ID
988
00:36:42,560 --> 00:36:45,400
to see which activities are done and what the results were.
989
00:36:45,400 --> 00:36:48,600
This is the professional way to handle the 80% of business logic
990
00:36:48,600 --> 00:36:51,200
that doesn't fit into a simple request response window.
991
00:36:51,200 --> 00:36:53,680
You stop fighting against the limitations of HTTP
992
00:36:53,680 --> 00:36:55,640
and start using a tool that was actually designed
993
00:36:55,640 --> 00:36:58,960
for long running, stateful work, latency optimization,
994
00:36:58,960 --> 00:37:00,440
every millisecond counts.
995
00:37:00,440 --> 00:37:03,440
While durable functions are great for those long running background tasks,
996
00:37:03,440 --> 00:37:06,240
most of your co-pilot interactions need to happen in real time.
997
00:37:06,240 --> 00:37:08,240
When a user types a question into a chat box,
998
00:37:08,240 --> 00:37:10,760
they expect the response to appear almost instantly.
999
00:37:10,760 --> 00:37:12,840
In these synchronous moments, your ability
1000
00:37:12,840 --> 00:37:14,960
to shave off every possible millisecond
1001
00:37:14,960 --> 00:37:17,080
becomes your biggest competitive advantage.
1002
00:37:17,080 --> 00:37:20,360
For simple operations like a quick database lookup or a validation check,
1003
00:37:20,360 --> 00:37:24,240
you should be aiming for a warm invocation latency of under 100 milliseconds.
1004
00:37:24,240 --> 00:37:27,080
If the task is more complex and involves calling multiple systems
1005
00:37:27,080 --> 00:37:28,880
or aggregating large amounts of data,
1006
00:37:28,880 --> 00:37:31,600
you should still try to stay under 300 milliseconds.
1007
00:37:31,600 --> 00:37:33,280
Once you cross that half-second mark,
1008
00:37:33,280 --> 00:37:34,800
users start to feel the lag
1009
00:37:34,800 --> 00:37:37,200
and if a response takes longer than a full second,
1010
00:37:37,200 --> 00:37:39,800
you have likely lost their attention entirely.
1011
00:37:39,800 --> 00:37:41,400
These numbers aren't just guesses,
1012
00:37:41,400 --> 00:37:45,320
but are actually based on how the human brain perceives the system as being responsive.
1013
00:37:45,320 --> 00:37:49,040
We expect an immediate acknowledgement of our input within about a tenth of a second
1014
00:37:49,040 --> 00:37:51,960
and anything slower than that creates a feeling of friction.
1015
00:37:51,960 --> 00:37:56,240
If your architecture can't deliver that sub-100 millisecond response on the happy path,
1016
00:37:56,240 --> 00:37:59,520
your users will start to doubt the reliability of the entire tool.
1017
00:37:59,520 --> 00:38:02,480
The foundation of a fast system is smart connection management.
1018
00:38:02,480 --> 00:38:05,720
As we discussed earlier, you should never create a brand new database connection
1019
00:38:05,720 --> 00:38:10,480
for every single request because the overhead of handshakes and authentication is far too high.
1020
00:38:10,480 --> 00:38:13,480
By using a pooled connection that stays alive between requests,
1021
00:38:13,480 --> 00:38:16,200
your code can start running the actual query immediately
1022
00:38:16,200 --> 00:38:18,720
without waiting for the network setup.
1023
00:38:18,720 --> 00:38:21,000
The same logic applies to your HTTP clients,
1024
00:38:21,000 --> 00:38:26,320
where reusing a single instance for all outbound calls eliminates the need for repeated TLS handshakes.
1025
00:38:26,320 --> 00:38:29,560
Where you put your infrastructure matters just as much as the code you write.
1026
00:38:29,560 --> 00:38:31,800
If your function is sitting in a data center in Virginia,
1027
00:38:31,800 --> 00:38:33,440
but your database is located in London,
1028
00:38:33,440 --> 00:38:37,240
every single query has to travel through undersea cables across the Atlantic.
1029
00:38:37,240 --> 00:38:41,080
That physical distance adds 50 to 100 milliseconds of pure travel time
1030
00:38:41,080 --> 00:38:43,400
before your database even sees the request.
1031
00:38:43,400 --> 00:38:46,840
By co-locating your functions, storage, and databases in the same region,
1032
00:38:46,840 --> 00:38:49,400
you can drop that roundtrip time down to single digits
1033
00:38:49,400 --> 00:38:51,800
and save a massive amount of time for free.
1034
00:38:51,800 --> 00:38:56,080
Catching is another way to amplify your speed by avoiding the network entirely.
1035
00:38:56,080 --> 00:38:59,520
Reference data and configuration settings that don't change often should be stored
1036
00:38:59,520 --> 00:39:01,920
in memory as static fields within your function code,
1037
00:39:01,920 --> 00:39:04,480
accessing a variable in memory takes microseconds,
1038
00:39:04,480 --> 00:39:08,080
which is thousands of times faster than making a roundtrip to a database.
1039
00:39:08,080 --> 00:39:11,160
For data that needs to be shared across multiple instances,
1040
00:39:11,160 --> 00:39:14,160
a distributed cache like Redis is a great middle ground
1041
00:39:14,160 --> 00:39:17,200
that is still much faster than a traditional disk-based query.
1042
00:39:17,200 --> 00:39:20,360
It is important to remember that latency is never a single,
1043
00:39:20,360 --> 00:39:22,400
consistent number for every user.
1044
00:39:22,400 --> 00:39:24,800
Some requests will hit the cache and be lightning fast,
1045
00:39:24,800 --> 00:39:28,800
while others will miss and have to fetch fresh data from a slow downstream service.
1046
00:39:28,800 --> 00:39:30,760
This is why you cannot rely on averages,
1047
00:39:30,760 --> 00:39:34,720
because an average of 300 milliseconds might be hiding a few requests
1048
00:39:34,720 --> 00:39:36,360
that take five seconds to finish.
1049
00:39:36,360 --> 00:39:38,360
Users don't remember the average experience,
1050
00:39:38,360 --> 00:39:41,920
but they definitely remember the one time the system felt broken and slow.
1051
00:39:41,920 --> 00:39:45,120
You need to focus your optimization efforts on tail latency,
1052
00:39:45,120 --> 00:39:48,600
specifically looking at your P95 and P99 metrics.
1053
00:39:48,600 --> 00:39:50,800
If your P99 latency is five seconds,
1054
00:39:50,800 --> 00:39:54,520
it means one out of every hundred requests is failing the user experience,
1055
00:39:54,520 --> 00:39:57,400
which happens about one's per conversation for an active user.
1056
00:39:57,400 --> 00:39:58,960
When a user hits that slow path,
1057
00:39:58,960 --> 00:40:01,800
they don't care that the other 99 requests were fast.
1058
00:40:01,800 --> 00:40:04,000
They just feel like the system is unreliable.
1059
00:40:04,000 --> 00:40:05,880
To fix this, you need deep instrumentation
1060
00:40:05,880 --> 00:40:09,040
that shows you exactly how long each dependency and code path is taking.
1061
00:40:09,040 --> 00:40:13,240
Once you can see which specific API call or database index is causing the slowdown,
1062
00:40:13,240 --> 00:40:15,120
you can actually do something about it.
1063
00:40:15,120 --> 00:40:18,280
Without that visibility, you are just guessing and hoping for the best.
1064
00:40:18,280 --> 00:40:21,320
This level of detail is what separates a basic function
1065
00:40:21,320 --> 00:40:25,520
that technically works from a co-pilot experience that feels truly instant.
1066
00:40:25,520 --> 00:40:28,400
Monitoring and observability.
1067
00:40:28,400 --> 00:40:30,280
Blind optimization is a waste of time.
1068
00:40:30,280 --> 00:40:31,640
You need eyes inside your system.
1069
00:40:31,640 --> 00:40:33,760
Application insights is where those eyes live.
1070
00:40:33,760 --> 00:40:37,040
It serves as the standard telemetry platform for Azure workloads
1071
00:40:37,040 --> 00:40:39,680
and it becomes your primary diagnostic tool for understanding
1072
00:40:39,680 --> 00:40:42,680
how your co-pilot plug-in actually behaves in production.
1073
00:40:42,680 --> 00:40:44,000
Not how you think it behaves,
1074
00:40:44,000 --> 00:40:45,720
not how it behaves in your test environment.
1075
00:40:45,720 --> 00:40:49,040
You need to see how it actually behaves when real users interact with it.
1076
00:40:49,040 --> 00:40:51,960
Send execution time data to application insights.
1077
00:40:51,960 --> 00:40:55,000
How long does each function invocation take from start to finish?
1078
00:40:55,000 --> 00:40:58,800
And how long does the database query or the downstream API call take?
1079
00:40:58,800 --> 00:41:01,600
The more granular you go, the better your visibility.
1080
00:41:01,600 --> 00:41:04,600
You are building a map of where time actually gets spent.
1081
00:41:04,600 --> 00:41:09,040
This level of detail is how you find the bottleneck that is limiting your whole system.
1082
00:41:09,040 --> 00:41:12,200
Sometimes the problem is obvious like a database query that takes two seconds
1083
00:41:12,200 --> 00:41:16,720
but other times it is subtle, like a cache miss that cascades into five sequential lookups.
1084
00:41:16,720 --> 00:41:18,640
Without instrumentation, you are just guessing.
1085
00:41:18,640 --> 00:41:20,400
With it, you see the actual behavior.
1086
00:41:20,400 --> 00:41:24,640
Correlation IDs are how you track a single user request through your entire system.
1087
00:41:24,640 --> 00:41:26,520
Your co-pilot user types a question
1088
00:41:26,520 --> 00:41:30,400
and that question generates a unique correlation ID that travels through power platform
1089
00:41:30,400 --> 00:41:31,680
and into your custom connector.
1090
00:41:31,680 --> 00:41:35,760
That ID enters your function and your function logs that ID with every operation it performs.
1091
00:41:35,760 --> 00:41:39,280
When your function calls your database, that query includes the correlation ID
1092
00:41:39,280 --> 00:41:42,360
and when it calls a downstream API, that request includes it too.
1093
00:41:42,360 --> 00:41:45,440
Later when you are investigating why a user request failed,
1094
00:41:45,440 --> 00:41:47,880
you search for that correlation ID across all your logs.
1095
00:41:47,880 --> 00:41:50,520
You see the entire journey, you see exactly where it broke.
1096
00:41:50,520 --> 00:41:54,720
This is how you move from something went wrong to this specific API call,
1097
00:41:54,720 --> 00:41:58,400
returned a 500 at 1432 17 UTC.
1098
00:41:58,400 --> 00:42:01,080
Custom metrics go beyond raw performance data.
1099
00:42:01,080 --> 00:42:04,160
You care about business outcomes, not just technical metrics,
1100
00:42:04,160 --> 00:42:06,200
track invoices validated per minute,
1101
00:42:06,200 --> 00:42:08,440
or approval workflows completed today,
1102
00:42:08,440 --> 00:42:10,720
or even your cache hit rate percentage.
1103
00:42:10,720 --> 00:42:14,800
These metrics answer the question of whether your system is actually working for your users.
1104
00:42:14,800 --> 00:42:16,760
A 100 millisecond response time is great,
1105
00:42:16,760 --> 00:42:20,360
but if only 60% of validation requests succeed, your system is failing.
1106
00:42:20,360 --> 00:42:22,840
Business metrics surface the problems that matter.
1107
00:42:22,840 --> 00:42:26,600
Alerting strategy separates operational maturity from reactive firefighting.
1108
00:42:26,600 --> 00:42:29,640
Alert on P95 and P99 latency spikes,
1109
00:42:29,640 --> 00:42:33,520
or when your error rate exceeds a threshold, or even on cold start frequency.
1110
00:42:33,520 --> 00:42:34,920
But do not alert on averages.
1111
00:42:34,920 --> 00:42:38,240
An average latency of 300 milliseconds tells you nothing,
1112
00:42:38,240 --> 00:42:42,240
because it masks the fact that half your requests take 100 milliseconds,
1113
00:42:42,240 --> 00:42:44,160
and the other half take 500.
1114
00:42:44,160 --> 00:42:45,320
Alert on the tail.
1115
00:42:45,320 --> 00:42:47,720
Alert when something changes unexpectedly.
1116
00:42:47,720 --> 00:42:50,040
Alert when your system violates a known pattern.
1117
00:42:50,040 --> 00:42:54,480
This is how you catch problems early instead of discovering them when users start to complain.
1118
00:42:54,480 --> 00:42:58,400
Diagnostic settings on your function app export logs to log analytics.
1119
00:42:58,400 --> 00:43:01,160
Application insights captures application telemetry,
1120
00:43:01,160 --> 00:43:05,200
while log analytics captures platform level logs for when instances scale,
1121
00:43:05,200 --> 00:43:07,840
when deployments happen, or when quotas are hit.
1122
00:43:07,840 --> 00:43:10,160
Together they give you complete visibility.
1123
00:43:10,160 --> 00:43:12,320
Logs live in log analytics for years,
1124
00:43:12,320 --> 00:43:16,200
which means you can query historical data or build long term trend analysis.
1125
00:43:16,200 --> 00:43:19,840
You can prove that a change you made six months ago improves stability.
1126
00:43:19,840 --> 00:43:23,080
Azure Monitor Workbooks tie this together into dashboards.
1127
00:43:23,080 --> 00:43:26,160
A workbook is a canvas where you build custom visualizations.
1128
00:43:26,160 --> 00:43:30,080
You add charts showing latency trends, gauges showing current error rates,
1129
00:43:30,080 --> 00:43:32,320
and tables listing your slowest operations.
1130
00:43:32,320 --> 00:43:35,560
You can even add heat maps showing traffic patterns by time of day.
1131
00:43:35,560 --> 00:43:38,560
These workbooks become the operational brain of your system.
1132
00:43:38,560 --> 00:43:40,560
When something feels off you check the workbook,
1133
00:43:40,560 --> 00:43:42,760
you see the actual behavior, you adjust.
1134
00:43:42,760 --> 00:43:44,920
The discipline here is measurement without obsession.
1135
00:43:44,920 --> 00:43:47,760
You need visibility, but you do not need to track everything.
1136
00:43:47,760 --> 00:43:51,760
Start with latency per operation, error rate and cold start frequency.
1137
00:43:51,760 --> 00:43:54,760
Add business metrics relevant to your domain and refine from there.
1138
00:43:54,760 --> 00:43:57,360
The goal is to catch degradation before it becomes an outage.
1139
00:43:57,360 --> 00:44:00,960
This is how you move from hoping your system works to knowing it works.
1140
00:44:00,960 --> 00:44:03,960
Testing the integration from unit to end to end.
1141
00:44:03,960 --> 00:44:06,560
Theory breaks on contact with reality.
1142
00:44:06,560 --> 00:44:10,760
Everything we have built from the function and the open API spec to the security infrastructure
1143
00:44:10,760 --> 00:44:12,560
needs to survive actual usage.
1144
00:44:12,560 --> 00:44:13,560
That is what testing does.
1145
00:44:13,560 --> 00:44:16,760
It bridges the gap between this should work and this actually works.
1146
00:44:16,760 --> 00:44:18,160
Start with unit tests.
1147
00:44:18,160 --> 00:44:20,560
Your C business logic needs full coverage.
1148
00:44:20,560 --> 00:44:23,560
Validate invoices transform data and apply business rules.
1149
00:44:23,560 --> 00:44:25,760
These tests run fast and they run locally.
1150
00:44:25,760 --> 00:44:27,360
They mock external dependencies.
1151
00:44:27,360 --> 00:44:30,960
So you are not actually calling your database or hitting external APIs.
1152
00:44:30,960 --> 00:44:33,760
Every time you run the test suite, this is standard practice.
1153
00:44:33,760 --> 00:44:37,760
But for serverless functions, unit tests become your primary validation tool.
1154
00:44:37,760 --> 00:44:40,560
You cannot easily debug a cold start issue in production
1155
00:44:40,560 --> 00:44:42,560
and you cannot step through your code with a debugger
1156
00:44:42,560 --> 00:44:44,960
while it is processing a live copilot request.
1157
00:44:44,960 --> 00:44:47,560
Unit tests catch logic errors before they escape.
1158
00:44:47,560 --> 00:44:50,160
Integration tests are where most teams stumble.
1159
00:44:50,160 --> 00:44:53,160
They test the contract between your code and the external world.
1160
00:44:53,160 --> 00:44:54,160
Here is what matters.
1161
00:44:54,160 --> 00:44:58,560
Call your function via HTTP and invoke it exactly the way the custom connector will invoke it.
1162
00:44:58,560 --> 00:45:04,360
Send a real HTTP request with real headers and a real payload, then capture the response.
1163
00:45:04,360 --> 00:45:07,560
Validate that the response schema matches your open API spec.
1164
00:45:07,560 --> 00:45:11,560
This is crucial because your open API spec defines what downstream systems expect.
1165
00:45:11,560 --> 00:45:15,160
If your function returns a field with the wrong type or misses a required field
1166
00:45:15,160 --> 00:45:19,760
or includes extra fields that the spec does not document, the custom connector might break.
1167
00:45:19,760 --> 00:45:21,560
Integration tests catch this.
1168
00:45:21,560 --> 00:45:24,360
They validate the contract, not the internal implementation.
1169
00:45:24,360 --> 00:45:26,560
Load testing reveals what unit tests hide.
1170
00:45:26,560 --> 00:45:29,160
Your function handles one concurrent request perfectly.
1171
00:45:29,160 --> 00:45:30,960
But what about 50 or 500?
1172
00:45:30,960 --> 00:45:33,360
Load testing tools like K6 or Azure.
1173
00:45:33,360 --> 00:45:36,160
Load testing simulate realistic concurrent load.
1174
00:45:36,160 --> 00:45:41,560
You define a scenario where 50 users each make 10 requests with realistic think time between them
1175
00:45:41,560 --> 00:45:43,360
and then the tool hammers your function.
1176
00:45:43,360 --> 00:45:45,360
It shows you where your latency degrades.
1177
00:45:45,360 --> 00:45:46,960
It shows you where you hit resource limits.
1178
00:45:46,960 --> 00:45:49,160
It shows you whether your rate limiting works.
1179
00:45:49,160 --> 00:45:54,760
It shows you your actual system behavior under stress, not imaginary behavior in a lab environment.
1180
00:45:54,760 --> 00:45:55,960
Here is the critical part.
1181
00:45:55,960 --> 00:45:58,360
Test with real user language, not API syntax.
1182
00:45:58,360 --> 00:46:02,160
Do not test by calling your function with a perfectly formatted JSON payload.
1183
00:46:02,160 --> 00:46:04,960
Test by simulating how co-pilot actually invokes your function.
1184
00:46:04,960 --> 00:46:09,360
Co-pilot sends natural language and the LLM interprets it and fills in parameters.
1185
00:46:09,360 --> 00:46:12,960
Sometimes it interprets ambiguously and sometimes it misses context.
1186
00:46:12,960 --> 00:46:15,760
You want to find these problems in testing, not in production.
1187
00:46:15,760 --> 00:46:21,360
A co-pilot plug-in that works perfectly with structured input but fails with natural language variations is broken.
1188
00:46:21,360 --> 00:46:24,560
Load testing that does not capture this reality is just theater.
1189
00:46:24,560 --> 00:46:27,560
The handshake between power platform and Azure is a specific test.
1190
00:46:27,560 --> 00:46:29,760
Your custom connector maps to your function.
1191
00:46:29,760 --> 00:46:32,360
It transforms input and it handles errors.
1192
00:46:32,360 --> 00:46:34,560
You need to validate that this mapping works.
1193
00:46:34,560 --> 00:46:38,360
Create a test flow in power automate or power apps call your custom connector
1194
00:46:38,360 --> 00:46:39,960
and verify that the call succeeds.
1195
00:46:39,960 --> 00:46:43,160
Verify that the response is processed correctly by the low-code layer.
1196
00:46:43,160 --> 00:46:45,960
This test is part integration test and part sanity check.
1197
00:46:45,960 --> 00:46:48,760
It catches configuration problems that code tests miss.
1198
00:46:48,760 --> 00:46:53,360
A firewall rule that blocks your connector from reaching your function will not show up in a unit test
1199
00:46:53,360 --> 00:46:55,160
but it will show up in a connector test.
1200
00:46:55,160 --> 00:46:58,360
Deployment slots let you test in production without affecting users.
1201
00:46:58,360 --> 00:47:00,560
You deploy a new version to a staging slot.
1202
00:47:00,560 --> 00:47:03,560
You run your tests against staging and you validate the behavior.
1203
00:47:03,560 --> 00:47:08,360
You do load testing against staging and then you flip the production slot to the staged version.
1204
00:47:08,360 --> 00:47:10,760
Now your production traffic roots to the new code.
1205
00:47:10,760 --> 00:47:14,160
If something breaks you flip back to the previous version in seconds.
1206
00:47:14,160 --> 00:47:15,560
This is continuous deployment.
1207
00:47:15,560 --> 00:47:17,160
It is how you iterate safely.
1208
00:47:17,160 --> 00:47:20,560
The testing discipline separates teams that ship stable copilot plugins
1209
00:47:20,560 --> 00:47:23,560
from teams that debug issues in production at 2 in the morning.
1210
00:47:23,560 --> 00:47:25,160
Testing is not about passing tests.
1211
00:47:25,160 --> 00:47:26,160
It is about confidence.
1212
00:47:26,160 --> 00:47:30,160
It is about knowing that your plugin will work when real users depend on it.
1213
00:47:30,160 --> 00:47:31,760
Versioning and evolution.
1214
00:47:31,760 --> 00:47:34,160
Your first copilot plugin is finally in production
1215
00:47:34,160 --> 00:47:36,560
and everything is working exactly as planned.
1216
00:47:36,560 --> 00:47:38,360
Users are starting to depend on it
1217
00:47:38,360 --> 00:47:41,160
and teams have already built their daily workflows around the tool.
1218
00:47:41,160 --> 00:47:42,760
But now you need to add a new feature
1219
00:47:42,760 --> 00:47:44,760
or maybe you found a bug that needs fixing.
1220
00:47:44,760 --> 00:47:46,560
This is the moment where versioning disciplines
1221
00:47:46,560 --> 00:47:48,760
separates a professional maintainable system
1222
00:47:48,760 --> 00:47:51,160
from one that becomes fragile and broken.
1223
00:47:51,160 --> 00:47:53,160
Here is the hard truth about software.
1224
00:47:53,160 --> 00:47:54,760
Once your function is in production,
1225
00:47:54,760 --> 00:47:56,360
it doesn't really belong to you anymore.
1226
00:47:56,360 --> 00:47:58,360
It belongs to the flows and systems that call it.
1227
00:47:58,360 --> 00:48:00,560
Those flows have specific dependencies
1228
00:48:00,560 --> 00:48:04,360
and they expect your parameters to mean the exact same thing every single time.
1229
00:48:04,360 --> 00:48:07,360
You cannot casually change those expectations without consequences.
1230
00:48:07,360 --> 00:48:10,360
Every update you make either respects those existing dependencies
1231
00:48:10,360 --> 00:48:11,760
or it completely breaks them.
1232
00:48:11,760 --> 00:48:14,160
The operation ID is your contract with the rest of the world.
1233
00:48:14,160 --> 00:48:16,760
When you define an operation in your Open API spec
1234
00:48:16,760 --> 00:48:18,360
with an ID-like validate invoice,
1235
00:48:18,360 --> 00:48:20,160
that name becomes your permanent anchor point.
1236
00:48:20,160 --> 00:48:22,960
Power Platform flows reference that specific ID to function
1237
00:48:22,960 --> 00:48:25,160
so if you change it, those flows will break silently.
1238
00:48:25,160 --> 00:48:27,360
They will try to call an operation that no longer exists
1239
00:48:27,360 --> 00:48:29,960
and the users will only see a failure they don't understand.
1240
00:48:29,960 --> 00:48:32,760
They will blame the system and stop trusting your tool,
1241
00:48:32,760 --> 00:48:35,360
which is why you should never change an operation ID.
1242
00:48:35,360 --> 00:48:38,160
It must stay stable for the entire life of that operation.
1243
00:48:38,160 --> 00:48:40,160
If the behavior changes fundamentally,
1244
00:48:40,160 --> 00:48:41,960
you aren't modifying an old tool.
1245
00:48:41,960 --> 00:48:43,560
You are building a new one with a new ID.
1246
00:48:43,560 --> 00:48:45,760
This is where semantic versioning becomes your best friend.
1247
00:48:45,760 --> 00:48:49,560
Your API needs a version number starting at 1.0.0.0.
1248
00:48:49,560 --> 00:48:51,560
The first number represents the major version,
1249
00:48:51,560 --> 00:48:53,760
the second is the minor version and the third is for patches.
1250
00:48:53,760 --> 00:48:55,160
When you add a new operation,
1251
00:48:55,160 --> 00:48:57,960
you bump that minor version to 1.1.0.
1252
00:48:57,960 --> 00:48:59,960
Clients can upgrade whenever they feel like it
1253
00:48:59,960 --> 00:49:03,360
because nothing broke since you only added features instead of removing them.
1254
00:49:03,360 --> 00:49:06,160
When you fix a bug in the code without changing the contract,
1255
00:49:06,160 --> 00:49:07,960
you just update the patch version.
1256
00:49:07,960 --> 00:49:11,360
However, if you break the contract or change how a flow needs to behave,
1257
00:49:11,360 --> 00:49:13,960
you must move to version 2.0.0.0.
1258
00:49:13,960 --> 00:49:16,560
This tells everyone downstream that something big changed
1259
00:49:16,560 --> 00:49:18,560
and they need to review their implementation.
1260
00:49:18,560 --> 00:49:20,760
Using separate function apps or deployment slots
1261
00:49:20,760 --> 00:49:23,560
lets you run multiple API versions at the exact same time.
1262
00:49:23,560 --> 00:49:25,160
Your version 1 app stays stable
1263
00:49:25,160 --> 00:49:26,960
and serves your current production traffic
1264
00:49:26,960 --> 00:49:29,360
while your version 2 app gets all the new features.
1265
00:49:29,360 --> 00:49:32,560
You can deploy the new version to a staging slot to validate it
1266
00:49:32,560 --> 00:49:35,360
before running both versions side by side in production.
1267
00:49:35,360 --> 00:49:37,960
The connector for version 1 keeps talking to the old app
1268
00:49:37,960 --> 00:49:40,560
and the connector for version 2 reaches the new one.
1269
00:49:40,560 --> 00:49:42,760
This lets teams migrate at their own pace
1270
00:49:42,760 --> 00:49:45,760
and nobody is forced to upgrade until they are actually ready.
1271
00:49:45,760 --> 00:49:49,160
This gradual transition is the only way to move the platform forward
1272
00:49:49,160 --> 00:49:51,160
without breaking the system's people rely on.
1273
00:49:51,160 --> 00:49:53,560
The open API spec itself is a piece of history
1274
00:49:53,560 --> 00:49:55,360
that needs proper version control.
1275
00:49:55,360 --> 00:49:57,960
You should store it and get right alongside your function code
1276
00:49:57,960 --> 00:49:59,960
so you can track every change in your commits.
1277
00:49:59,960 --> 00:50:01,960
When you release a new version of your API,
1278
00:50:01,960 --> 00:50:05,160
make sure to tag the commit that contains that specific spec.
1279
00:50:05,160 --> 00:50:08,560
Later on when a teammate asks what change between the old and new versions
1280
00:50:08,560 --> 00:50:09,960
you will actually have the answer.
1281
00:50:09,960 --> 00:50:13,760
You can compare the files and see exactly which operations were added or modified.
1282
00:50:13,760 --> 00:50:16,160
This is how you communicate clearly with other teams
1283
00:50:16,160 --> 00:50:18,760
and maintain a record of how the system evolved.
1284
00:50:18,760 --> 00:50:21,360
Deplication paths are where most teams start to struggle.
1285
00:50:21,360 --> 00:50:23,560
Eventually you will have an operation
1286
00:50:23,560 --> 00:50:25,760
that is no longer the best way to solve a problem
1287
00:50:25,760 --> 00:50:27,360
because a better version exists.
1288
00:50:27,360 --> 00:50:29,360
You might want to delete the old one immediately
1289
00:50:29,360 --> 00:50:31,560
but you have to remember that people are still using it.
1290
00:50:31,560 --> 00:50:32,760
Instead of deleting it,
1291
00:50:32,760 --> 00:50:35,360
market is deprecated in your spec and explain why.
1292
00:50:35,360 --> 00:50:36,560
Suggest a new alternative
1293
00:50:36,560 --> 00:50:38,160
and give everyone a clear timeline
1294
00:50:38,160 --> 00:50:39,960
for when the old version will disappear.
1295
00:50:39,960 --> 00:50:42,560
If you tell teams the operation will be gone by December,
1296
00:50:42,560 --> 00:50:44,560
they have months to plan their migration.
1297
00:50:44,560 --> 00:50:47,360
When that date finally arrives, they are ready for the change
1298
00:50:47,360 --> 00:50:49,360
and the transition happens without any drama.
1299
00:50:49,360 --> 00:50:51,360
This level of discipline turns your API
1300
00:50:51,360 --> 00:50:54,760
from a source of constant frustration into a reliable foundation.
1301
00:50:54,760 --> 00:50:57,560
It is the difference between being afraid to touch your code
1302
00:50:57,560 --> 00:50:59,360
and having the confidence to evolve.
1303
00:50:59,360 --> 00:51:01,760
You can move faster because you have a strategy
1304
00:51:01,760 --> 00:51:03,760
that respects the people using your work.
1305
00:51:03,760 --> 00:51:05,760
Your function is never truly finished
1306
00:51:05,760 --> 00:51:09,160
but it should evolve with purpose instead of moving in a state of chaos.
1307
00:51:09,160 --> 00:51:11,360
That is what makes a system professional.
1308
00:51:11,360 --> 00:51:13,560
Cost optimization and right sizing.
1309
00:51:13,560 --> 00:51:16,160
Performance and security are always the top priorities
1310
00:51:16,160 --> 00:51:18,560
but they have to live inside a real-world budget.
1311
00:51:18,560 --> 00:51:20,160
Every choice we have made so far
1312
00:51:20,160 --> 00:51:21,760
from using always-ready instances
1313
00:51:21,760 --> 00:51:24,160
to setting up comprehensive logging comes with a price tag.
1314
00:51:24,160 --> 00:51:26,760
The real question isn't whether you can afford this setup
1315
00:51:26,760 --> 00:51:29,760
but whether you are paying for it in the most efficient way possible.
1316
00:51:29,760 --> 00:51:32,560
Flex consumption billing is calculated using two main factors
1317
00:51:32,560 --> 00:51:34,760
how long your code runs and how many times it gets called.
1318
00:51:34,760 --> 00:51:36,960
You are essentially paying for gigabyte seconds.
1319
00:51:36,960 --> 00:51:40,560
If you run a function with 512 megabytes of memory for two seconds
1320
00:51:40,560 --> 00:51:42,560
that counts as one gigabyte second.
1321
00:51:42,560 --> 00:51:45,760
You also pay a small fee for every million times the function is triggered
1322
00:51:45,760 --> 00:51:47,560
which is usually around 40 cents.
1323
00:51:47,560 --> 00:51:50,760
If you optimize both of these areas, you can keep your bill under control
1324
00:51:50,760 --> 00:51:53,760
but if you ignore them, your costs will start to spiral.
1325
00:51:53,760 --> 00:51:57,560
Always-ready instances provide a baseline cost that you need to account for.
1326
00:51:57,560 --> 00:52:00,160
Let's look at the actual numbers to see how this works.
1327
00:52:00,160 --> 00:52:02,360
A single instance with two gigabytes of memory
1328
00:52:02,360 --> 00:52:04,560
costs a tiny fraction of a cent per second
1329
00:52:04,560 --> 00:52:06,560
but that idle time adds up fast.
1330
00:52:06,560 --> 00:52:08,560
Keeping one instance warm all day and night
1331
00:52:08,560 --> 00:52:11,360
costs about $288 every month.
1332
00:52:11,360 --> 00:52:14,760
If you need two instances that jumps to $576.
1333
00:52:14,760 --> 00:52:18,760
For that same price you could almost run a premium plan with a dedicated instance.
1334
00:52:18,760 --> 00:52:21,760
If you are constantly running multiple always-ready instances
1335
00:52:21,760 --> 00:52:24,760
the math changes and premium might actually be the cheaper path.
1336
00:52:24,760 --> 00:52:28,760
Most teams make a huge mistake by picking their memory size based on a gut feeling.
1337
00:52:28,760 --> 00:52:32,160
They assume a small function only needs 512 megabytes
1338
00:52:32,160 --> 00:52:34,560
because the data it processes is tiny.
1339
00:52:34,560 --> 00:52:38,960
What they miss is that as you atties your CPU power directly to how much memory you allocate
1340
00:52:38,960 --> 00:52:41,960
that small instance might take three full seconds to finish a task
1341
00:52:41,960 --> 00:52:43,760
because it has very little processing powers.
1342
00:52:43,760 --> 00:52:45,560
If you bump that up to two gigabytes,
1343
00:52:45,560 --> 00:52:49,160
the extra CPU might finish the same task in 600 milliseconds.
1344
00:52:49,160 --> 00:52:51,960
Even though the larger instance costs more per second,
1345
00:52:51,960 --> 00:52:54,960
the faster finish time can actually save you money on your baseline costs.
1346
00:52:54,960 --> 00:52:58,960
You end up with a lower bill and a response time that is three times faster.
1347
00:52:58,960 --> 00:53:01,560
This is the strange reality of serverless economics
1348
00:53:01,560 --> 00:53:04,160
where the more expensive option is sometimes the cheaper one.
1349
00:53:04,160 --> 00:53:06,560
Scaling up to a bigger instance can be more cost effective
1350
00:53:06,560 --> 00:53:08,560
than scaling out to many small ones.
1351
00:53:08,560 --> 00:53:12,760
You have to run the actual math for your specific workload instead of just guessing.
1352
00:53:12,760 --> 00:53:16,160
Azure Cost Management is the best tool for doing this math correctly.
1353
00:53:16,160 --> 00:53:19,560
It lets you break down your spending by specific operations or time periods
1354
00:53:19,560 --> 00:53:21,760
so you can see exactly where the money is going.
1355
00:53:21,760 --> 00:53:23,760
You can find the functions that run all the time,
1356
00:53:23,760 --> 00:53:25,960
but finish quickly, which are usually very cheap.
1357
00:53:25,960 --> 00:53:29,160
You can also spot the rare functions that take forever to finish
1358
00:53:29,160 --> 00:53:31,160
and cost a fortune every time they run.
1359
00:53:31,160 --> 00:53:33,160
Once you identify those expensive operations,
1360
00:53:33,160 --> 00:53:34,960
you can target them for optimization.
1361
00:53:34,960 --> 00:53:39,360
Faster code reduces your bill and better caching means you don't have to run the code as often.
1362
00:53:39,360 --> 00:53:42,560
Batching is your most powerful weapon when it comes to saving money.
1363
00:53:42,560 --> 00:53:45,360
If your copilot needs to check 10 different invoices,
1364
00:53:45,360 --> 00:53:47,560
do not make 10 separate calls to your function.
1365
00:53:47,560 --> 00:53:50,160
You should send all 10 invoices in one single request
1366
00:53:50,160 --> 00:53:52,160
so the function only has to run once.
1367
00:53:52,160 --> 00:53:56,160
This means you only deal with one cold start and one unit of billing instead of 10.
1368
00:53:56,160 --> 00:54:01,160
A single batch operation is almost always faster than 10 separate round trips over the network.
1369
00:54:01,160 --> 00:54:06,160
It pays for itself immediately by cutting down on invocation charges and execution time.
1370
00:54:06,160 --> 00:54:10,160
Reserved capacity is something you should consider once your traffic becomes predictable.
1371
00:54:10,160 --> 00:54:13,560
If you know for a fact that you will run 5,000 calls every day,
1372
00:54:13,560 --> 00:54:16,360
you can stop paying the standard on demand rates.
1373
00:54:16,360 --> 00:54:20,760
As your savings plans allow you to commit to a certain amount of usage in exchange for a 20% discount.
1374
00:54:20,760 --> 00:54:23,760
You are trading away some flexibility for a lower price,
1375
00:54:23,760 --> 00:54:26,160
so the math only works if you actually use what you promised.
1376
00:54:26,160 --> 00:54:29,360
If you overestimate, you are paying for air, but if you get it right,
1377
00:54:29,360 --> 00:54:31,360
it is the cheapest way to run at scale.
1378
00:54:31,360 --> 00:54:34,560
Optimizing your costs isn't something you do once and then forget about.
1379
00:54:34,560 --> 00:54:38,360
It is a constant cycle of measuring your results and making small adjustments.
1380
00:54:38,360 --> 00:54:41,360
Improvements in performance almost always lead to a lower bill
1381
00:54:41,360 --> 00:54:44,160
and looking at your costs often reveals where your code is slow.
1382
00:54:44,160 --> 00:54:45,760
These two goals actually work together.
1383
00:54:45,760 --> 00:54:48,560
Usually the fastest function is also the cheapest one to run,
1384
00:54:48,560 --> 00:54:50,160
so you should always be chasing both.
1385
00:54:50,160 --> 00:54:51,960
Deployment and CI CD.
1386
00:54:51,960 --> 00:54:55,160
Getting code from your laptop to production safely is only half the battle
1387
00:54:55,160 --> 00:54:58,960
because the other half is making sure you don't break everything once it's actually live.
1388
00:54:58,960 --> 00:55:02,360
This is where deployment discipline separates the high performers from the teams
1389
00:55:02,360 --> 00:55:05,960
that only ship once a quarter because they are terrified of a system crash.
1390
00:55:05,960 --> 00:55:08,360
Everything starts with infrastructure as code.
1391
00:55:08,360 --> 00:55:12,160
Your function apps, your key vaults and your storage accounts are all defined as code
1392
00:55:12,160 --> 00:55:13,960
rather than being clicked into existence.
1393
00:55:13,960 --> 00:55:16,560
You might use bicep if you are staying native to Microsoft
1394
00:55:16,560 --> 00:55:19,160
or you might choose terraform for a multi-cloud approach,
1395
00:55:19,160 --> 00:55:20,960
but the core principle never changes.
1396
00:55:20,960 --> 00:55:24,360
You aren't clicking buttons in the Azure portal to provision resources anymore.
1397
00:55:24,360 --> 00:55:27,160
Instead, you declare your infrastructure in code,
1398
00:55:27,160 --> 00:55:30,560
check it into Git, and version it just like your application logic.
1399
00:55:30,560 --> 00:55:33,360
When you deploy, that code drives the environment,
1400
00:55:33,360 --> 00:55:36,560
which makes your deployments repeatable across dev, test and production.
1401
00:55:36,560 --> 00:55:38,760
By using the same template for every stage,
1402
00:55:38,760 --> 00:55:41,360
you ensure that if a feature works in your test environment,
1403
00:55:41,360 --> 00:55:44,960
it will work in production because the underlying infrastructure is identical.
1404
00:55:44,960 --> 00:55:47,760
Automation is what makes this whole process reliable.
1405
00:55:47,760 --> 00:55:51,360
When you push code to Git, a pipeline in GitHub actions or Azure DevOps
1406
00:55:51,360 --> 00:55:53,760
triggers automatically to run your unit tests.
1407
00:55:53,760 --> 00:55:55,960
If those tests fail, the deployment stops immediately,
1408
00:55:55,960 --> 00:55:57,960
so you never ship broken code to your users.
1409
00:55:57,960 --> 00:56:01,960
If they pass, the machine runs integration tests against a staging environment
1410
00:56:01,960 --> 00:56:03,360
and builds your deployment package.
1411
00:56:03,360 --> 00:56:06,360
It validates your infrastructure code and checks for common mistakes
1412
00:56:06,360 --> 00:56:08,160
before anything touches a live server.
1413
00:56:08,160 --> 00:56:11,160
This entire flow is automatic, repeatable and auditable.
1414
00:56:11,160 --> 00:56:15,160
A deployment isn't a person running manual commands on a Friday afternoon,
1415
00:56:15,160 --> 00:56:17,960
but a machine following a strictly defined process.
1416
00:56:17,960 --> 00:56:20,960
This shift to automation is actually where your risk decreases.
1417
00:56:20,960 --> 00:56:23,160
Manual deployments are where mistakes happen
1418
00:56:23,160 --> 00:56:25,160
like when someone misses a configuration step
1419
00:56:25,160 --> 00:56:28,160
or accidentally deploys the wrong version of a file.
1420
00:56:28,160 --> 00:56:31,360
Automated pipelines remove that human element entirely.
1421
00:56:31,360 --> 00:56:33,760
The same process runs every single time,
1422
00:56:33,760 --> 00:56:36,160
so you know exactly what is happening under the hood.
1423
00:56:36,160 --> 00:56:38,760
Deployment slots give you an extra layer of safety.
1424
00:56:38,760 --> 00:56:40,760
Your production slot handles the live traffic
1425
00:56:40,760 --> 00:56:44,360
while your staging slot acts as a perfect mirror where you can test new code.
1426
00:56:44,360 --> 00:56:46,160
You deploy the new version to staging,
1427
00:56:46,160 --> 00:56:47,760
run your full load tests,
1428
00:56:47,760 --> 00:56:49,360
and validate that the system is stable.
1429
00:56:49,360 --> 00:56:51,360
Once you are ready, you perform a swap.
1430
00:56:51,360 --> 00:56:55,960
Staging becomes production and production becomes staging in an instantaneous transition.
1431
00:56:55,960 --> 00:56:58,960
If something goes wrong after the swap, you just flip it back.
1432
00:56:58,960 --> 00:57:01,560
The old code is still sitting there in the staging slot,
1433
00:57:01,560 --> 00:57:03,760
so you can revert in seconds without an outage.
1434
00:57:03,760 --> 00:57:05,360
This is how you ship with confidence.
1435
00:57:05,360 --> 00:57:06,960
Security is non-negotiable,
1436
00:57:06,960 --> 00:57:09,360
which means secrets never go in your code.
1437
00:57:09,360 --> 00:57:11,360
If a database password or a connection string
1438
00:57:11,360 --> 00:57:13,160
ends up in your git repository,
1439
00:57:13,160 --> 00:57:14,760
your entire system is compromised.
1440
00:57:14,760 --> 00:57:17,560
Anyone with access to that repo now has the keys to your data.
1441
00:57:17,560 --> 00:57:19,760
You should keep those secrets in Azure Key Vault
1442
00:57:19,760 --> 00:57:22,560
and let your function authenticate using a managed identity.
1443
00:57:22,560 --> 00:57:24,560
The secret never touches your source code
1444
00:57:24,560 --> 00:57:25,960
and it never appears in your logs.
1445
00:57:25,960 --> 00:57:28,760
It stays protected by Azure's encryption and access controls
1446
00:57:28,760 --> 00:57:29,760
at all times.
1447
00:57:29,760 --> 00:57:31,160
Environment variables are different
1448
00:57:31,160 --> 00:57:33,760
because they represent configuration that is safe to share.
1449
00:57:33,760 --> 00:57:36,160
Your function needs to know which database to talk to,
1450
00:57:36,160 --> 00:57:38,760
but that answer changes depending on where the code is running.
1451
00:57:38,760 --> 00:57:40,760
In dev the variable points to a test database,
1452
00:57:40,760 --> 00:57:42,960
but in production it points to the real thing.
1453
00:57:42,960 --> 00:57:44,760
The code stays exactly the same,
1454
00:57:44,760 --> 00:57:47,960
but the behavior changes based on the environment variables you've set.
1455
00:57:47,960 --> 00:57:50,960
Your rollback procedure is more important than the deployment itself.
1456
00:57:50,960 --> 00:57:53,960
If you use slots, a rollback is just a 30 second swap,
1457
00:57:53,960 --> 00:57:55,560
but if you don't have a tested plan,
1458
00:57:55,560 --> 00:57:57,360
you'll end up improvising under pressure.
1459
00:57:57,360 --> 00:57:59,360
When the team is stressed during an outage,
1460
00:57:59,360 --> 00:58:01,760
they make mistakes that make the downtime even longer.
1461
00:58:01,760 --> 00:58:04,760
You need to document your rollback steps and run a dry run every quarter.
1462
00:58:04,760 --> 00:58:07,760
You have to know exactly what to do before a crisis actually hits.
1463
00:58:07,760 --> 00:58:11,360
This level of discipline turns deployment from a high-stakes gamble
1464
00:58:11,360 --> 00:58:13,360
into a boring routine operation.
1465
00:58:13,360 --> 00:58:15,960
You end up shipping more often while breaking things less
1466
00:58:15,960 --> 00:58:18,960
and when something does go wrong, you recover before anyone notices.
1467
00:58:18,960 --> 00:58:21,960
Real-world scenario, invoice validation plugin.
1468
00:58:21,960 --> 00:58:23,960
Architectural principles can feel abstract
1469
00:58:23,960 --> 00:58:25,960
until they meet a real business problem.
1470
00:58:25,960 --> 00:58:29,360
Let's look at how a finance team builds an invoice validation plugin
1471
00:58:29,360 --> 00:58:31,960
to see how these different pieces actually work together.
1472
00:58:31,960 --> 00:58:35,160
The finance team usually spends hours answering the same basic questions
1473
00:58:35,160 --> 00:58:37,360
about vendor payments and approval thresholds.
1474
00:58:37,360 --> 00:58:39,360
Employees want to know if an invoice is valid
1475
00:58:39,360 --> 00:58:41,760
or if a specific vendor has been paid yet.
1476
00:58:41,760 --> 00:58:45,760
Right now, answering those questions means logging into an old accounting system
1477
00:58:45,760 --> 00:58:47,160
and digging through spreadsheets,
1478
00:58:47,160 --> 00:58:49,360
which takes about 10 minutes per request.
1479
00:58:49,360 --> 00:58:52,960
The team is essentially drowning in manual, low-value work.
1480
00:58:52,960 --> 00:58:55,560
A co-pilot plugin changes that dynamic entirely.
1481
00:58:55,560 --> 00:58:58,960
An employee can simply ask co-pilot if a specific invoice is valid
1482
00:58:58,960 --> 00:59:01,560
and the system roots that request to the validation plugin.
1483
00:59:01,560 --> 00:59:06,160
The plugin returns a clear, structured answer in less than 500 milliseconds.
1484
00:59:06,160 --> 00:59:09,960
The employee gets what they need instantly without bothering the finance department.
1485
00:59:09,960 --> 00:59:13,360
The open API spec for this plugin defines three specific operations.
1486
00:59:13,360 --> 00:59:16,160
You have validate invoice to check status, get invoice details
1487
00:59:16,160 --> 00:59:19,960
to pull record information, and flag invoice for review for manual audits.
1488
00:59:19,960 --> 00:59:22,760
These names are chosen carefully as verb-known pairs
1489
00:59:22,760 --> 00:59:25,360
so the LLM can easily tell them apart.
1490
00:59:25,360 --> 00:59:29,160
When a user asks for information, the model roots to the details operation
1491
00:59:29,160 --> 00:59:32,360
but when they ask for a status check, it hits the validation logic.
1492
00:59:32,360 --> 00:59:35,760
The engine behind this is a CPI's function running on flex consumption
1493
00:59:35,760 --> 00:59:37,560
with always-ready instances.
1494
00:59:37,560 --> 00:59:39,560
Instead of running all the logic in one place,
1495
00:59:39,560 --> 00:59:42,360
it uses durable functions to coordinate the workflow.
1496
00:59:42,360 --> 00:59:46,760
An orchestrator function starts by calling three separate activities at the same time.
1497
00:59:46,760 --> 00:59:50,160
One checks the accounting system, another verifies the vendor credentials
1498
00:59:50,160 --> 00:59:52,360
and the third looks for duplicate submissions.
1499
00:59:52,360 --> 00:59:53,760
Because these checks run in parallel,
1500
00:59:53,760 --> 00:59:56,960
the orchestrator only waits for the slowest one to finish before moving on.
1501
00:59:56,960 --> 01:00:00,360
If the invoice passes every check, the system calculates the approval threshold
1502
01:00:00,360 --> 01:00:01,560
and returns the result.
1503
01:00:01,560 --> 01:00:04,360
If it fails anywhere, the process stops and reports the error immediately.
1504
01:00:04,360 --> 01:00:06,960
The orchestrator packages all of this into structured data
1505
01:00:06,960 --> 01:00:08,560
that flows back to the co-pilot.
1506
01:00:08,560 --> 01:00:11,160
The user doesn't just see a yes or no,
1507
01:00:11,160 --> 01:00:13,760
but a clear explanation of why an invoice was rejected
1508
01:00:13,760 --> 01:00:15,160
or why it needs a manual review.
1509
01:00:15,160 --> 01:00:18,360
API management sits in front of the whole operation to keep things stable.
1510
01:00:18,360 --> 01:00:21,760
The custom connector talks to API M instead of hitting the function directly,
1511
01:00:21,760 --> 01:00:23,360
which allows you to enforce rate limits.
1512
01:00:23,360 --> 01:00:26,960
This prevents a single user from overwhelming the back end with too many requests.
1513
01:00:26,960 --> 01:00:29,560
API M also handles the heavy lifting of security
1514
01:00:29,560 --> 01:00:33,360
by validating tokens and injecting correlation IDs into every request.
1515
01:00:33,360 --> 01:00:36,160
It even checks the data coming back from the function
1516
01:00:36,160 --> 01:00:38,360
to make sure it matches the expected schema.
1517
01:00:38,360 --> 01:00:40,160
If the function returns something messy,
1518
01:00:40,160 --> 01:00:42,760
API M catches it before the co-pilot tries to read it.
1519
01:00:42,760 --> 01:00:43,960
Speed is the main goal here,
1520
01:00:43,960 --> 01:00:46,360
so the team uses several optimization tricks.
1521
01:00:46,360 --> 01:00:48,760
Since vendor credentials don't change very often,
1522
01:00:48,760 --> 01:00:51,960
they are cached directly in memory on the always ready instance.
1523
01:00:51,960 --> 01:00:53,960
The first request might be a bit slower,
1524
01:00:53,960 --> 01:00:57,760
but every request after that hits the cache for near instant results.
1525
01:00:57,760 --> 01:00:59,960
Connection pooling keeps the database links open,
1526
01:00:59,960 --> 01:01:03,360
so the system doesn't have to waste time creating new ones for every query.
1527
01:01:03,360 --> 01:01:05,560
By running the three main checks in parallel,
1528
01:01:05,560 --> 01:01:09,560
the total latency drops from three seconds down to about 300 milliseconds.
1529
01:01:09,560 --> 01:01:12,360
Monitoring keeps a close eye on three specific areas.
1530
01:01:12,360 --> 01:01:15,360
If the validation failure rate climbs above 5%,
1531
01:01:15,360 --> 01:01:18,160
an alert tells the team that the accounting system might be down.
1532
01:01:18,160 --> 01:01:20,560
If the P99 latency goes over one second,
1533
01:01:20,560 --> 01:01:23,560
the team knows they have a cache issue or a system slowdown.
1534
01:01:23,560 --> 01:01:27,560
They also track cold starts to make sure the always ready instances are doing their job.
1535
01:01:27,560 --> 01:01:29,160
If startup time starts to creep up,
1536
01:01:29,160 --> 01:01:31,960
it means a deployment changed something that made the app too heavy.
1537
01:01:31,960 --> 01:01:34,360
The result is a system with total traceability.
1538
01:01:34,360 --> 01:01:37,760
Every single request has a correlation ID that follows it through API M,
1539
01:01:37,760 --> 01:01:39,560
the function, and the database.
1540
01:01:39,560 --> 01:01:42,760
If a user asks why their invoice was denied three weeks later,
1541
01:01:42,760 --> 01:01:45,960
the finance team can pull the logs and see exactly which check failed.
1542
01:01:45,960 --> 01:01:47,360
This is a plug and build to scale.
1543
01:01:47,360 --> 01:01:48,960
It is fast, it is auditable,
1544
01:01:48,960 --> 01:01:52,160
and it is easy to maintain because every component has a specific job.
1545
01:01:52,160 --> 01:01:56,360
When you combine already instances with durable functions and API M governance,
1546
01:01:56,360 --> 01:01:59,360
the whole system becomes much stronger than its individual parts.
1547
01:01:59,360 --> 01:02:01,760
Common pitfalls and how to avoid them.
1548
01:02:01,760 --> 01:02:04,760
Most teams don't fail because they don't understand the architecture,
1549
01:02:04,760 --> 01:02:07,760
but rather because they skip steps that seem optional
1550
01:02:07,760 --> 01:02:10,360
until the moment they become catastrophic.
1551
01:02:10,360 --> 01:02:14,160
These pitfalls are essentially the tax you pay for cutting corners during development.
1552
01:02:14,160 --> 01:02:17,760
The first major mistake is writing your open API spec for the wrong audience.
1553
01:02:17,760 --> 01:02:20,360
You're a developer who understands your own code,
1554
01:02:20,360 --> 01:02:22,960
so you write clear documentation for other humans,
1555
01:02:22,960 --> 01:02:24,960
but the model doesn't read like a programmer.
1556
01:02:24,960 --> 01:02:26,160
It reads like a user.
1557
01:02:26,160 --> 01:02:29,360
When you write returns customer records filtered by status,
1558
01:02:29,360 --> 01:02:30,960
you've said something technically correct,
1559
01:02:30,960 --> 01:02:34,760
but the model just sees a generic operation for retrieving data.
1560
01:02:34,760 --> 01:02:37,760
When three different operations in your spec all do similar things like search,
1561
01:02:37,760 --> 01:02:41,760
get and fetch, the model gets confused and starts picking between them at random.
1562
01:02:41,760 --> 01:02:44,560
Your copilot calls the wrong operation, the flow fails,
1563
01:02:44,560 --> 01:02:46,760
and the user blames the bot for being broken.
1564
01:02:46,760 --> 01:02:50,760
What actually happened is your spec was too ambiguous for the LLM to navigate.
1565
01:02:50,760 --> 01:02:55,160
You need to write descriptions that explain when to use an operation instead of just what it does.
1566
01:02:55,160 --> 01:02:59,160
If you write use this when you need to find customers by their name or company,
1567
01:02:59,160 --> 01:03:02,360
the model finally has the language it needs to root the request correctly.
1568
01:03:02,360 --> 01:03:06,160
Choosing not to use flex always ready is like leaving your car running in the driveway all day
1569
01:03:06,160 --> 01:03:08,360
just to avoid a two second startup delay.
1570
01:03:08,360 --> 01:03:09,960
You might be solving the cold start problem,
1571
01:03:09,960 --> 01:03:12,960
but you're also burning money and wasting resources for no reason.
1572
01:03:12,960 --> 01:03:15,160
Classic consumption scales to zero, which means
1573
01:03:15,160 --> 01:03:18,160
if your plug-ins sits idle for an hour and a user finally makes a request,
1574
01:03:18,160 --> 01:03:20,160
the function has to spin up from scratch.
1575
01:03:20,160 --> 01:03:24,160
The user experiences a massive lag spike while they wait for the system to wake up.
1576
01:03:24,160 --> 01:03:29,160
Flex with always ready keeps a small instance warm at a fraction of the cost of a premium plan,
1577
01:03:29,160 --> 01:03:31,160
making it the obvious choice for any professional setup.
1578
01:03:31,160 --> 01:03:34,160
Teams often skip this because they think they'll optimize the performance later,
1579
01:03:34,160 --> 01:03:36,160
but later never actually comes.
1580
01:03:36,160 --> 01:03:40,160
They end up living with five second startup delays that compound into thousands of lost conversations
1581
01:03:40,160 --> 01:03:44,160
all while an always ready budget would have only cost about $200 a month.
1582
01:03:44,160 --> 01:03:48,160
Embedding secrets directly in your code is a fast way to lose your job.
1583
01:03:48,160 --> 01:03:50,160
Your function needs a database password,
1584
01:03:50,160 --> 01:03:52,160
so you hard-code it and check that code into Git,
1585
01:03:52,160 --> 01:03:55,160
which allows anyone who clones the repository to read your credentials.
1586
01:03:55,160 --> 01:03:59,160
They can connect to your production database and export your customer data to the internet
1587
01:03:59,160 --> 01:04:01,160
before you even realize there's a leak.
1588
01:04:01,160 --> 01:04:05,160
You didn't mean to expose the data, but you took a shortcut that bypassed basic security.
1589
01:04:05,160 --> 01:04:09,160
As your key vault exists for this exact reason, and you need to use it.
1590
01:04:09,160 --> 01:04:13,160
Your function should authenticate to the key vault using a managed identity,
1591
01:04:13,160 --> 01:04:17,160
so the secret is returned securely, without ever touching your source code.
1592
01:04:17,160 --> 01:04:20,160
Never hard-code your secrets because this isn't just a best practice.
1593
01:04:20,160 --> 01:04:22,160
It's a fundamental requirement for staying employed.
1594
01:04:22,160 --> 01:04:26,160
Blocking calls in your functions will completely waste your investment in always ready.
1595
01:04:26,160 --> 01:04:30,160
When your function calls an external API and the thread sits there waiting for a response,
1596
01:04:30,160 --> 01:04:34,160
that always ready instance is being consumed while doing zero actual work.
1597
01:04:34,160 --> 01:04:37,160
If the next request comes in while the thread is blocked,
1598
01:04:37,160 --> 01:04:40,160
there's no available instance to handle it, so a new one has to spin up.
1599
01:04:40,160 --> 01:04:46,160
This creates a cold start and a latency spike that defeats the entire purpose of keeping the instance warm in the first place.
1600
01:04:46,160 --> 01:04:51,160
You need to use async and a wait to make your API calls asynchronously so the thread can yield.
1601
01:04:51,160 --> 01:04:56,160
This allows another request to use that same thread while the first one waits for its data to come back.
1602
01:04:56,160 --> 01:05:01,160
Concurrency is what multiplies your capacity and blocking code just throws that multiplier away.
1603
01:05:01,160 --> 01:05:04,160
Flying without monitoring or logging is like flying a plane blind.
1604
01:05:04,160 --> 01:05:08,160
Your function might work perfectly in a test environment and the deployment might go fine,
1605
01:05:08,160 --> 01:05:12,160
but two weeks later, users start reporting that requests are failing silently.
1606
01:05:12,160 --> 01:05:16,160
Your function isn't throwing any obvious errors, it's just returning empty results and without logs,
1607
01:05:16,160 --> 01:05:17,160
you have no way to know why.
1608
01:05:17,160 --> 01:05:22,160
You'll spend three days guessing at the cause when you could have seen the answer in 30 seconds.
1609
01:05:22,160 --> 01:05:27,160
With proper logs, you would immediately see that a downstream dependency changed its response format and broke your passing logic.
1610
01:05:27,160 --> 01:05:32,160
Monitoring is not an optional add-on, it's insurance against spending your entire week debugging problems
1611
01:05:32,160 --> 01:05:34,160
that should have been visible from the start.
1612
01:05:34,160 --> 01:05:39,160
If you don't version your open API spec, your evolution will eventually turn into total chaos.
1613
01:05:39,160 --> 01:05:45,160
The moment you modify an operation or change a parameter type, you break every existing flow that depends on the old structure.
1614
01:05:45,160 --> 01:05:49,160
The person who created that flow has no idea what changed or why their automation suddenly stopped working,
1615
01:05:49,160 --> 01:05:52,160
so they file a support ticket and wait for an apology.
1616
01:05:52,160 --> 01:05:55,160
You end up reverting the change and realizing you haven't actually made any progress.
1617
01:05:55,160 --> 01:05:58,160
You need to version your spec and track every change you make.
1618
01:05:58,160 --> 01:06:02,160
When you modify a feature, bump the version number and communicate that shift to your users.
1619
01:06:02,160 --> 01:06:07,160
This gives teams the time they need to adapt so you can evolve the system without breaking everything in your path.
1620
01:06:07,160 --> 01:06:10,160
These aren't subtle problems that only show up for experts.
1621
01:06:10,160 --> 01:06:14,160
They are common, preventable and incredibly expensive if you choose to ignore them.
1622
01:06:14,160 --> 01:06:16,160
Scaling from pilot to enterprise.
1623
01:06:16,160 --> 01:06:21,160
Your invoice validation plugin is finally live and stable and the finance teams are actually using it.
1624
01:06:21,160 --> 01:06:28,160
But now the CRM team wants something similar, the HR team wants a bot for employee queries and the logistics team is asking for shipment tracking.
1625
01:06:28,160 --> 01:06:33,160
One single plugin proves that your architecture works but now you suddenly need 10 of them.
1626
01:06:33,160 --> 01:06:38,160
If you don't have a scaling strategy, you're going to spend the next year reinventing the same wheels over and over again.
1627
01:06:38,160 --> 01:06:44,160
You'll keep making the same decisions about APM configurations and function structures while making different mistakes in every single implementation.
1628
01:06:44,160 --> 01:06:48,160
By the time you get to plugin number five, you'll be drowning in maintenance.
1629
01:06:48,160 --> 01:06:53,160
This is the point where patent libraries separate the mature organizations from the chaotic ones.
1630
01:06:53,160 --> 01:06:58,160
A patent library is really just a collection of reusable templates that keep you from starting at zero every time.
1631
01:06:58,160 --> 01:07:03,160
Since you've already standardized the invoice validation plugin, you should extract that logic into a template.
1632
01:07:03,160 --> 01:07:10,160
Create a generic scaffolding that includes your HTTP triggers, your durable functions orchestrators and your standard error handling.
1633
01:07:10,160 --> 01:07:15,160
When a developer needs to build a new plugin, they start from this template instead of a blank screen.
1634
01:07:15,160 --> 01:07:19,160
They can focus entirely on the business logic because all the plumbing is already handled and correct.
1635
01:07:19,160 --> 01:07:28,160
Your open API templates should follow the same logic. Since you've already learned what the LLM understands, you can create a canonical structure for all new operations.
1636
01:07:28,160 --> 01:07:35,160
This ensures your specs stay consistent, your naming conventions remain the same, and the LLM never runs into any surprises.
1637
01:07:35,160 --> 01:07:40,160
Bicep modules allow you to encapsulate your infrastructure patterns so they can be deployed reliably.
1638
01:07:40,160 --> 01:07:46,160
Instead of every developer trying to write their own key vault setup or APM configuration, they should just call a pre-made module.
1639
01:07:46,160 --> 01:07:50,160
The module handles the complex details while the developer just provides the parameters.
1640
01:07:50,160 --> 01:07:58,160
If a dev needs a new plugin that handles a thousand requests a day, the module creates the right instant size and applies the necessary security policies automatically.
1641
01:07:58,160 --> 01:08:04,160
The developer doesn't have to write a single line of infrastructure code because they just declared what they needed and the system delivered it.
1642
01:08:04,160 --> 01:08:09,160
True consistency only emerges when you make reuse the easiest path forward.
1643
01:08:09,160 --> 01:08:15,160
Centralizing your authentication and authorization removes a massive amount of friction from the development process.
1644
01:08:15,160 --> 01:08:25,160
You shouldn't have every single plugin managing its own identity system instead you should create a shared authentication policy within APM so that when a request arrives the security check happens exactly once.
1645
01:08:25,160 --> 01:08:29,160
The request then carries a trusted identity token to every downstream service.
1646
01:08:29,160 --> 01:08:35,160
This centralization makes your governance much simpler because you have one single place to audit access or revoke credentials.
1647
01:08:35,160 --> 01:08:40,160
If compliance suddenly requires multi-factor authentication, you only have to implement it in one spot.
1648
01:08:40,160 --> 01:08:49,160
Teams won't try to bypass the security layer because you've made it the most convenient way to build. Implementing governance for custom connectors is the only way to prevent shadow integrations from taking over.
1649
01:08:49,160 --> 01:08:55,160
Without a process in place, a developer might create a custom connector for a new API and deploy it to production without anyone ever seeing it.
1650
01:08:55,160 --> 01:09:01,160
You'll have no idea it exists, even though it might be reaching a blocked service or exposing sensitive company data.
1651
01:09:01,160 --> 01:09:06,160
You need a simple approval workflow where developers submit their connector requests for a quick security review.
1652
01:09:06,160 --> 01:09:11,160
Once the connector is approved, it gets added to the pattern library for other teams to reuse.
1653
01:09:11,160 --> 01:09:19,160
This doesn't actually slow down development, it actually speeds things up because developers can build on top of approved patterns instead of making up ad hoc solutions on the fly.
1654
01:09:19,160 --> 01:09:25,160
A self-service portal can let developers request new functions without having to wait for your team to manually intervene.
1655
01:09:25,160 --> 01:09:31,160
The portal is just a simple form where a user describes what the function needs to validate and how much traffic they expect.
1656
01:09:31,160 --> 01:09:38,160
That form feeds directly into a pipeline that builds the function from your template, generates the spec, and deploys it to a deep environment.
1657
01:09:38,160 --> 01:09:43,160
This gives the developers instant scaffolding to work with while giving your team total visibility into what's being built.
1658
01:09:43,160 --> 01:09:47,160
It allows everyone to move faster without sacrificing the quality of the underlying system.
1659
01:09:47,160 --> 01:09:51,160
Runbooks are your operational insurance for when things go wrong in the middle of the night.
1660
01:09:51,160 --> 01:09:57,160
When a plug-in fails at 2 in the morning, the engineer on call shouldn't have to understand every detail of the architecture to fix it.
1661
01:09:57,160 --> 01:10:02,160
The runbook should tell them exactly what to do if latency spikes or if a cache hit rate drops below a certain level.
1662
01:10:02,160 --> 01:10:07,160
If the vendor credential cache expires, the instructions tell them to restart the always ready instances.
1663
01:10:07,160 --> 01:10:12,160
The engineer follows the steps and fixes the problem in 10 minutes instead of spending 2 hours investigating the root cause.
1664
01:10:12,160 --> 01:10:20,160
Runbooks capture the collective knowledge of everyone who has ever debugged the system and they are the difference between reactive firefighting and a confident response.
1665
01:10:20,160 --> 01:10:24,160
This is the path for scaling from a single pilot plug-in to a full enterprise platform.
1666
01:10:24,160 --> 01:10:28,160
You don't get there by hoping every team figures out the same lessons on their own.
1667
01:10:28,160 --> 01:10:33,160
You get there by building those lessons directly into the infrastructure and the processes that scale with you.
1668
01:10:33,160 --> 01:10:37,160
The strategic shift from low-code toys to enterprise engines.
1669
01:10:37,160 --> 01:10:39,160
We have walked through the entire architecture together.
1670
01:10:39,160 --> 01:10:44,160
You understand the hosting choices, the security perimeter, the monitoring discipline and the deployment rigor.
1671
01:10:44,160 --> 01:10:46,160
But these are not just abstract concepts.
1672
01:10:46,160 --> 01:10:50,160
They represent a fundamental shift in how enterprise AI actually gets built.
1673
01:10:50,160 --> 01:10:54,160
Low-code platforms change the game by democratizing automation.
1674
01:10:54,160 --> 01:10:59,160
Before Power Automate existed, you needed enterprise architects to design integration solutions at astronomical hourly rates.
1675
01:10:59,160 --> 01:11:02,160
Low-code put that power into the hands of business teams.
1676
01:11:02,160 --> 01:11:05,160
It was genuine innovation. It reduced friction and accelerated delivery.
1677
01:11:05,160 --> 01:11:08,160
But it also created a dangerous assumption that low-code can do everything.
1678
01:11:08,160 --> 01:11:12,160
The assumption is that you can automate anything with enough power, automate flows.
1679
01:11:12,160 --> 01:11:16,160
That assumption breaks the moment you hit sophisticated business logic.
1680
01:11:16,160 --> 01:11:18,160
This is where the fusion developer model emerges.
1681
01:11:18,160 --> 01:11:21,160
It is not about low-code replacing professionals.
1682
01:11:21,160 --> 01:11:24,160
And it is not about professionals dismissing low-code.
1683
01:11:24,160 --> 01:11:27,160
Instead, professionals use low-code where it excels.
1684
01:11:27,160 --> 01:11:29,160
For orchestration, coordination and connecting systems.
1685
01:11:29,160 --> 01:11:32,160
Then they use pro-code where it actually matters.
1686
01:11:32,160 --> 01:11:36,160
For heavy computation, complex transformations and strict business rules.
1687
01:11:36,160 --> 01:11:40,160
The co-pilot plug-in architecture you have learned is the physical manifestation of that fusion.
1688
01:11:40,160 --> 01:11:42,160
Power Platform handles what it was designed for.
1689
01:11:42,160 --> 01:11:45,160
As your functions handles what requires real engineering.
1690
01:11:45,160 --> 01:11:49,160
Teams that master this architecture build things their competitors cannot replicate.
1691
01:11:49,160 --> 01:11:53,160
A competitor might see your co-pilot plug-in working smoothly and decide they want one too.
1692
01:11:53,160 --> 01:11:56,160
They can build a basic version in a few weeks using standard connectors.
1693
01:11:56,160 --> 01:12:01,160
But if your plug-in handles edge cases through sophisticated C-POS logic, they cannot copy that quickly.
1694
01:12:01,160 --> 01:12:05,160
If it maintains sub-second latency through flex always ready configurations.
1695
01:12:05,160 --> 01:12:10,160
Or governs data access through APM policies and managed identities, it will take them months to catch up.
1696
01:12:10,160 --> 01:12:13,160
By the time they figure it out, you have already iterated.
1697
01:12:13,160 --> 01:12:16,160
You have added new features. You have become reliable.
1698
01:12:16,160 --> 01:12:19,160
You have gained a competitive advantage that compounds over time.
1699
01:12:19,160 --> 01:12:23,160
This advantage does not come from raw intelligence or some secret technology.
1700
01:12:23,160 --> 01:12:24,160
It comes from discipline.
1701
01:12:24,160 --> 01:12:28,160
It comes from understanding that every architectural decision serves a specific purpose.
1702
01:12:28,160 --> 01:12:34,160
Choosing always ready over cold starts or using durable functions instead of nested flows is intentional design.
1703
01:12:34,160 --> 01:12:35,160
It is not accidental emergence.
1704
01:12:35,160 --> 01:12:38,160
The investment multiplies across every plug-in you build.
1705
01:12:38,160 --> 01:12:42,160
When you build your first plug-in with proper infrastructure, you invest heavily in open API spec design.
1706
01:12:42,160 --> 01:12:45,160
APM configuration and monitoring. The initial bill is substantial.
1707
01:12:45,160 --> 01:12:47,160
But the second plug-in reuses everything.
1708
01:12:47,160 --> 01:12:49,160
The infrastructure is already there. The patterns are documented.
1709
01:12:49,160 --> 01:12:51,160
And the team understands the playbook.
1710
01:12:51,160 --> 01:12:53,160
The second plug-in costs a fraction of the first.
1711
01:12:53,160 --> 01:12:59,160
By the time you reach the tenth plug-in, you are deploying at a cadence that smaller, less disciplined teams simply cannot match.
1712
01:12:59,160 --> 01:13:02,160
Infrastructure investments compound.
1713
01:13:02,160 --> 01:13:05,160
Security and governance are not after thoughts in this model.
1714
01:13:05,160 --> 01:13:09,160
They are built in. You are not trying to bolt on compliance after the architecture already exists.
1715
01:13:09,160 --> 01:13:12,160
You are designing policies into APM from the very start.
1716
01:13:12,160 --> 01:13:14,160
You are not wondering who has access to what?
1717
01:13:14,160 --> 01:13:19,160
You are using managed identities so every access point is auditable and revocable.
1718
01:13:19,160 --> 01:13:21,160
You are not hoping someone remembers to log important events.
1719
01:13:21,160 --> 01:13:26,160
You are capturing correlation IDs and streaming them to application insights automatically.
1720
01:13:26,160 --> 01:13:29,160
The cost of these controls is low when they are part of the foundation.
1721
01:13:29,160 --> 01:13:32,160
The cost of retrofitting them later is astronomical.
1722
01:13:32,160 --> 01:13:36,160
Cost optimization becomes a continuous discipline rather than a crisis event.
1723
01:13:36,160 --> 01:13:40,160
You are not discovering six months down the line that your baseline costs are too high.
1724
01:13:40,160 --> 01:13:45,160
You are tracking costs weekly. You are right-sizing memory based on actual execution patterns.
1725
01:13:45,160 --> 01:13:48,160
You are using cash data to reduce expensive downstream calls.
1726
01:13:48,160 --> 01:13:54,160
These micro-optimizations compound. Over a year they save tens of thousands of dollars in infrastructure spend.
1727
01:13:54,160 --> 01:14:00,160
The real value is not any individual plug-in. It is the reusable infrastructure and the patterns that enable rapid innovation across dozens of plug-ins.
1728
01:14:00,160 --> 01:14:05,160
You have built a platform. New capabilities that used to take three months to deliver now take three weeks.
1729
01:14:05,160 --> 01:14:09,160
Because the foundation is solid. The foundation handles the scaling, the security and the monitoring.
1730
01:14:09,160 --> 01:14:13,160
Teams building on top can finally focus on business logic instead of infrastructure.
1731
01:14:13,160 --> 01:14:19,160
This is what separates organizations that just use AI tools from organizations that build AI platforms.
1732
01:14:19,160 --> 01:14:23,160
You have the blueprint. You understand how flex consumption eliminates cold starts.
1733
01:14:23,160 --> 01:14:27,160
You know how to write open API specs that the LLM actually understands.
1734
01:14:27,160 --> 01:14:30,160
You have learned to secure your work through APM and managed identities.
1735
01:14:30,160 --> 01:14:34,160
You have seen how to scale from one plug-in to an entire enterprise platform.
1736
01:14:34,160 --> 01:14:38,160
Start with one high value workflow. Do not pick the easiest problem.
1737
01:14:38,160 --> 01:14:42,160
Pick the one that matters most to your business. Validate the architecture on something real.
1738
01:14:42,160 --> 01:14:45,160
That first plug-in will teach you what works and what does not.
1739
01:14:45,160 --> 01:14:49,160
It will show you the operational reality behind these design principles.
1740
01:14:49,160 --> 01:14:52,160
Invest in open API specs and API management early.
1741
01:14:52,160 --> 01:14:55,160
They might feel like overhead when you are shipping your first plug-in.
1742
01:14:55,160 --> 01:14:59,160
By your third plug-in they will have paid for themselves through reduced maintenance and faster iteration.
1743
01:14:59,160 --> 01:15:03,160
Monitor everything obsessively. You cannot optimize what you cannot measure.
1744
01:15:03,160 --> 01:15:07,160
Latency, errors, cold starts and cache, hit rates all matter.
1745
01:15:07,160 --> 01:15:09,160
If it matters operationally you need to measure it.
1746
01:15:09,160 --> 01:15:12,160
Let the data guide your next move. Connect with the community.
1747
01:15:12,160 --> 01:15:15,160
Share your patterns and learn from the mistakes of others.
1748
01:15:15,160 --> 01:15:19,160
The Fusion developer model is still emerging. Teams are figuring this out in parallel right now.
1749
01:15:19,160 --> 01:15:21,160
What you learn becomes valuable to others.
1750
01:15:21,160 --> 01:15:24,160
And what they learn will accelerate your next iteration.
1751
01:15:24,160 --> 01:15:30,160
Subscribe to the M36-DiFM podcast for the latest on co-pilot, Azure and the future of enterprise AI.

Founder of m365.fm, m365.show and m365con.net
Mirko Peters is a Microsoft 365 expert, content creator, and founder of m365.fm, a platform dedicated to sharing practical insights on modern workplace technologies. His work focuses on Microsoft 365 governance, security, collaboration, and real-world implementation strategies.
Through his podcast and written content, Mirko provides hands-on guidance for IT professionals, architects, and business leaders navigating the complexities of Microsoft 365. He is known for translating complex topics into clear, actionable advice, often highlighting common mistakes and overlooked risks in real-world environments.
With a strong emphasis on community contribution and knowledge sharing, Mirko is actively building a platform that connects experts, shares experiences, and helps organizations get the most out of their Microsoft 365 investments.









