Most organizations think “HR automation” means a chatbot glued to a SharePoint folder full of PDFs. They’re wrong. That setup doesn’t automate HR. It accelerates confident nonsense — without evidence, without control, and without a defensible decision trail. Meanwhile the real costs compound quietly:
- Screening bias you can’t explain
- Ticket backlogs that never shrink
- Onboarding that drags for weeks
- Audits that turn into archaeology
No “prompt better” optimism. We’re building governed workflows — screening, triage, onboarding — using Copilot Studio as the brain, Logic Apps as the muscle, and evidence captured by default. If it can’t survive compliance, scale, and scrutiny — it doesn’t ship. Subscribe + Episode Contract If you’re scaling HR agents without turning your tenant into a policy crime scene, subscribe to M365 FM. That’s the contract here: Production-grade architecture.
Repeatable patterns.
Defensible design. This is not a feature tour.
Not legal advice.
And definitely not “prompt engineering theater.” We’ll walk three governed use cases end-to-end: • Candidate screening with bias and escalation controls
• HR ticket triage with measurable deflection
• Onboarding orchestration that survives retries and long-running state But first — we need to redefine what an HR agent actually is. Because it’s not a chatbot. HR Agents Aren’t Chatbots A chatbot answers questions. An HR agent makes decisions. Screen or escalate.
Route or resolve.
Approve or reject.
Provision or pause. The moment an LLM executes decisions without controlled action-space and an evidence trail, you don’t have automation. You have conditional chaos. The lever isn’t “smarter AI.” The lever is determinism:
- What actions are allowed
- Under which identity
- With which inputs
- With which guardrails
- Logged how
Logic Apps Standard = Muscle
MCP = Tool contract
Dataverse = Durable memory
Azure Monitor = Operational truth
Entra = Identity boundary Conversation reasons.
Tools enforce.
State persists.
Logs prove. If you collapse those layers, you lose governance. If you separate them, you get scale. Governance = Action Control Governance in agentic HR isn’t a committee. It’s action control. Action-space is everything the agent can do. Not say.
Do. Every tool must have:
- Identity
- Policy gates
- Telemetry
No policy → no constraint
No telemetry → no defensibility HR doesn’t run on hope. Human-in-the-Loop = Circuit Breaker Human-in-the-loop isn’t humility. It’s a circuit breaker. Confidence drops?
Policy risk triggered?
Irreversible action pending? Stop. Create an approval artifact.
Package evidence.
Record reason code.
Proceed only after decision. If the workflow keeps running, it isn’t HITL. It’s a notification. Observability If someone asks what happened, you should not investigate. You should retrieve. Audit-grade observability means:
- Prompt context captured
- Retrieval sources logged
- Tool calls correlated
- State transitions recorded
- Human overrides documented
Proxy minimization.
Confidence gates.
Recorded approvals.
Defensible shortlist. 2. HR Ticket Triage High-volume operational system. Deterministic classification.
Scoped knowledge retrieval.
Tier 1 auto-resolution.
Escalation with context package.
Measurable deflection. 3. Intelligent Onboarding Long-running orchestration system. Offer accepted event.
Durable state in Dataverse.
Provisioning via managed identity.
Idempotent workflows.
Milestone tracking to Day-30. No double provisioning.
No silent failure.
No ritual automation. Reliability Reality Agentic HR fails because distributed systems fail. So you design for: Idempotency — safe retries
Dead-letter paths — visible failure
State ownership — not chat memory
Versioned rubrics — controlled change
Kill switch — fast disable Reliability isn’t uptime. It’s controlled repetition. ROI That Actually Matters Scale doesn’t come from smarter AI. Scale comes from fewer exceptions. Measure what matters: Ticket triage:
- Deflection rate
- Auto-resolve percent
- Reopen rate
- Human touches per case
- Day-one ready rate
- Provisioning retry count
- Milestone completion time
- Review time per candidate
- Borderline rate
- Override frequency
- Consistency across rubric versions
- Start with Ticket Triage
- Add Onboarding Orchestration
- Deploy Candidate Screening last
High-risk automation last. Dev → Test → Prod with policy parity. Per-tool managed identities.
Scoped permissions.
Minimal PII in prompts.
Structured evidence in Dataverse. Final Message Most companies try to scale HR with smarter prompts. The ones that succeed scale it with safer systems. Fewer exceptions.
Fewer hidden permissions.
Fewer invisible overrides. Scale is not smarter AI. Scale is controlled action-space. If you want architectures that survive production — not demos — subscribe to M365 FM. And if your HR agent failed in a spectacular way, connect with Mirko Peters on LinkedIn and send it. We’ll dissect it.
Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-fm-modern-work-security-and-productivity-with-microsoft-365--6704921/support.
If this clashes with how you’ve seen it play out, I’m always curious. I use LinkedIn for the back-and-forth.
00:00:00,000 --> 00:00:05,200
Most organizations think HR automation means a chatbot glued to a SharePoint folder full of PDFs.
2
00:00:05,200 --> 00:00:10,000
They are wrong. That setup doesn't automate HR, it just speeds up the production of confident nonsense,
3
00:00:10,000 --> 00:00:14,700
and it does it without evidence, without controls, and without a defensible decision trail.
4
00:00:14,700 --> 00:00:17,100
Meanwhile, the real costs pile up quietly.
5
00:00:17,100 --> 00:00:22,200
Screening bias you can't explain, ticket backlogs that never shrink on boarding that drags for weeks
6
00:00:22,200 --> 00:00:24,100
and audits that turn into archaeology.
7
00:00:24,100 --> 00:00:29,600
This episode is about moving from passive HR data to deterministic HR decisions.
8
00:00:29,600 --> 00:00:33,300
Not magical thinking, governed use cases, a reproducible stack,
9
00:00:33,300 --> 00:00:37,000
and a design that survives contact with compliance, security, and scale.
10
00:00:37,000 --> 00:00:40,800
We're going to build three workflows, screening, triage, onboarding,
11
00:00:40,800 --> 00:00:46,100
using co-pilot studio as the brain and logic apps as muscle with evidence captured by default.
12
00:00:46,100 --> 00:00:50,800
If you're trying to scale HR agents without turning your tenant into a policy crime scene,
13
00:00:50,800 --> 00:00:56,100
subscribe to M365FM. That's the contract here, practical architecture that holds up in production.
14
00:00:56,100 --> 00:00:57,500
This episode is a blueprint.
15
00:00:57,500 --> 00:01:00,900
Control plane thinking, repeatable patterns you can apply in your own environment.
16
00:01:00,900 --> 00:01:06,900
It is not a feature tool, it is not legal advice, and it is definitely not just prompt better dressed up as engineering.
17
00:01:06,900 --> 00:01:10,200
We'll walk three governed use cases end to end.
18
00:01:10,200 --> 00:01:13,500
A candidate screening agent with bias and escalation controls,
19
00:01:13,500 --> 00:01:17,200
an HR ticket triage agent designed for measurable deflection,
20
00:01:17,200 --> 00:01:21,700
and an onboarding orchestrator that survives long running state and retries.
21
00:01:21,700 --> 00:01:27,200
Now we need to define what an HR agent actually is in system terms.
22
00:01:27,200 --> 00:01:31,200
The foundational misunderstanding, HR agents aren't chatbots.
23
00:01:31,200 --> 00:01:34,900
The foundational mistake is treating an HR agent like a conversation experience.
24
00:01:34,900 --> 00:01:39,900
A chatbot is a user interface, it answers questions, it might search a knowledge base,
25
00:01:39,900 --> 00:01:45,900
it might summarize a document, and in most organizations it lives in the same conceptual bucket as self-service.
26
00:01:45,900 --> 00:01:48,200
Helpful, optional, low stakes.
27
00:01:48,200 --> 00:01:52,800
An HR agent is not that, an HR agent is a distributed decision engine with tool access.
28
00:01:52,800 --> 00:01:56,200
That distinction matters because HR work isn't mainly about talking.
29
00:01:56,200 --> 00:01:58,000
It's about decisions and actions.
30
00:01:58,000 --> 00:02:02,800
Screen or escalate, root or resolve, approve or reject, provision or pause.
31
00:02:02,800 --> 00:02:07,600
And the moment an LLM touches those decisions without a controlled action space and an evidence trail,
32
00:02:07,600 --> 00:02:11,200
you have converted your HR operations into conditional chaos.
33
00:02:11,200 --> 00:02:14,200
Most people try to make the model smarter, but that's the wrong lever.
34
00:02:14,200 --> 00:02:17,800
Smarter isn't safer. Smarter just fails in more creative ways.
35
00:02:17,800 --> 00:02:20,900
The lever that matters is determinism, what actions are allowed,
36
00:02:20,900 --> 00:02:25,000
under which identities, with which inputs, with which guardrails and with which logs.
37
00:02:25,000 --> 00:02:30,600
If the system can't prove what it did and why it did it, it didn't do HR work, a generated text.
38
00:02:30,600 --> 00:02:34,500
Here's the uncomfortable truth, HR is where entropy wins by default.
39
00:02:34,500 --> 00:02:37,000
Why? Because HR policy is never a single policy.
40
00:02:37,000 --> 00:02:40,900
It's a patchwork, regional rules, union constraints, job families,
41
00:02:40,900 --> 00:02:44,100
exceptions for executives, accommodations, immigration timelines,
42
00:02:44,100 --> 00:02:47,900
and the one off, we did it this way last time that never got documented.
43
00:02:47,900 --> 00:02:52,900
Every exception is an entropy generator. Every undocumented exception becomes folklore.
44
00:02:52,900 --> 00:02:57,200
Over time, policy drifts away from intent. A passive chatbot amplifies that drift.
45
00:02:57,200 --> 00:03:01,900
It learns the wrong thing from the wrong artifact, and it presents it with a confident tone.
46
00:03:01,900 --> 00:03:06,500
And then humans defer to it because it sounds consistent and consistency feels like authority.
47
00:03:06,500 --> 00:03:11,300
That's how bias creeps in, too. Not from bad prompts, from systems behavior.
48
00:03:11,300 --> 00:03:17,600
A screening flow that uses unstructured criteria, vague scoring and inconsistent overrides will converge on proxies.
49
00:03:17,600 --> 00:03:21,600
It will favor the attributes that correlate with historical outcomes, not job relevance.
50
00:03:21,600 --> 00:03:25,000
If you can't show the rubric, the rationale, and the override reason codes,
51
00:03:25,000 --> 00:03:29,000
you can't defend the process. You also can't improve it because you have no instrumentation.
52
00:03:29,000 --> 00:03:32,500
So in architectural terms, HR agents are not probabilistic assistance.
53
00:03:32,500 --> 00:03:36,600
They are controlled workflow compilers. Copilot studio should capture intent.
54
00:03:36,600 --> 00:03:40,000
What the user is trying to do, what the case context is, what the constraints are,
55
00:03:40,000 --> 00:03:44,100
and what the next safe action is. Then it should call tools that execute deterministically.
56
00:03:44,100 --> 00:03:47,200
The model can reason, but the tools must enforce.
57
00:03:47,200 --> 00:03:49,500
The system must separate, decide from do.
58
00:03:49,500 --> 00:03:52,100
If you collapse those layers, you get the worst outcome.
59
00:03:52,100 --> 00:03:57,200
A probabilistic system taking irreversible actions, and then you retroactively try to explain it.
60
00:03:57,200 --> 00:03:59,600
That's not governance, that storytelling after the fact.
61
00:03:59,600 --> 00:04:02,200
This is why the action space matters more than the prompt.
62
00:04:02,200 --> 00:04:06,000
Action space is the total set of operations the agent can perform.
63
00:04:06,000 --> 00:04:10,500
Read these sources, write these records, send these messages, schedule these meetings,
64
00:04:10,500 --> 00:04:12,600
trigger these workflows, and nothing else.
65
00:04:12,600 --> 00:04:17,200
And every single action in that space needs three things, identity, policy and telemetry.
66
00:04:17,200 --> 00:04:20,400
Identity means the action happens under a real boundary.
67
00:04:20,400 --> 00:04:24,500
A managed identity, a service principle, or on behalf of user context,
68
00:04:24,500 --> 00:04:27,500
something you can constrain with entra and conditional access.
69
00:04:27,500 --> 00:04:30,900
If the agent can act as whoever, then nobody owns the outcome.
70
00:04:30,900 --> 00:04:33,300
Policy means the action has preconditions.
71
00:04:33,300 --> 00:04:38,800
Confidence thresholds require approvals, allowed fields, and explicit deny rules.
72
00:04:38,800 --> 00:04:41,200
Not please be careful. Real gates.
73
00:04:41,200 --> 00:04:45,500
Telemetry means you capture the evidence, what prompt came in, what sources were retrieved,
74
00:04:45,500 --> 00:04:51,500
what tool calls were executed, what data changed, who approved overrides and correlation ideas across the chain.
75
00:04:51,500 --> 00:04:55,300
Without that, your audit posture is hope, and HR doesn't get to run on hope.
76
00:04:55,300 --> 00:04:59,000
Now you might be thinking, "But we just want to reduce tickets and speed up onboarding."
77
00:04:59,000 --> 00:05:00,800
Good, those are exactly the right goals.
78
00:05:00,800 --> 00:05:04,800
But to get there, you have to start building chatbots and start building governed agents,
79
00:05:04,800 --> 00:05:08,100
systems where the conversational layer is the control surface,
80
00:05:08,100 --> 00:05:11,300
and the automation layer is the enforcement mechanism.
81
00:05:11,300 --> 00:05:14,200
Once you internalize that, the architecture becomes obvious.
82
00:05:14,200 --> 00:05:19,000
Copilot Studio orchestrates Logic Apps executes, Dataverse holds state, Monitor holds truth,
83
00:05:19,000 --> 00:05:22,300
and now we can build something that scales without lying to you.
84
00:05:22,300 --> 00:05:24,800
The Target Architecture. Copilot Studio.
85
00:05:24,800 --> 00:05:26,700
As Brain, Logic Apps as Muscle.
86
00:05:26,700 --> 00:05:29,400
Here's the Target Architecture stripped of marketing language.
87
00:05:29,400 --> 00:05:32,400
Copilot Studio is the Brain. Logic App Standard is the Muscle.
88
00:05:32,400 --> 00:05:38,000
The MCP server is the contract that stops the Brain from hallucinating capabilities it shouldn't have.
89
00:05:38,000 --> 00:05:39,700
Dataverse is memory with structure.
90
00:05:39,700 --> 00:05:44,400
Azure Monitor and Log Analytics are the part nobody wants to pay for right until the first incident.
91
00:05:44,400 --> 00:05:45,900
Start with Copilot Studio.
92
00:05:45,900 --> 00:05:47,900
Its job isn't answer questions.
93
00:05:47,900 --> 00:05:50,700
Its job is intent, capture, and orchestration.
94
00:05:50,700 --> 00:05:55,900
It runs the conversation, asks for missing parameters and decides which tool call is allowed next based on context.
95
00:05:55,900 --> 00:05:58,700
Think of it like a control surface for a workflow compiler.
96
00:05:58,700 --> 00:06:00,200
The user describes what they want.
97
00:06:00,200 --> 00:06:03,200
The agent translates that into a constrained plan,
98
00:06:03,200 --> 00:06:06,500
and then it delegates execution to deterministic tools.
99
00:06:06,500 --> 00:06:07,600
That last part matters.
100
00:06:07,600 --> 00:06:10,500
Copilot Studio should not be the place where you do HR.
101
00:06:10,500 --> 00:06:14,800
It should be the place where you decide what to do next and where you enforce the conversational policy,
102
00:06:14,800 --> 00:06:19,900
what information you're allowed to request, what you must disclose, and when you must stop and escalate.
103
00:06:19,900 --> 00:06:23,000
This is also where you group actions into govern tool sets.
104
00:06:23,000 --> 00:06:27,500
Candidate screening actions are not the same risk profile as ticket triage actions.
105
00:06:27,500 --> 00:06:31,900
If you blend them into one giant HR super agent, you've built an entropy engine.
106
00:06:31,900 --> 00:06:33,900
Then Logic Apps Standard.
107
00:06:33,900 --> 00:06:35,800
Logic Apps is where actions happen.
108
00:06:35,800 --> 00:06:43,000
Calling Microsoft Graph, writing to dataverse, sending approval emails, rooting tickets, generating documents, and triggering downstream systems.
109
00:06:43,000 --> 00:06:44,300
And the keyword is standard.
110
00:06:44,300 --> 00:06:48,000
For HR, you want the single tenant posture and enterprise controls,
111
00:06:48,000 --> 00:06:52,100
managed identities, network isolation, and predictable monitoring.
112
00:06:52,100 --> 00:06:54,400
Not because the multi-tenant model is bad,
113
00:06:54,400 --> 00:06:58,100
because HR data makes you pay for shared assumptions.
114
00:06:58,100 --> 00:07:02,600
Logic Apps Standard also forces a discipline most HR automations never have.
115
00:07:02,600 --> 00:07:04,600
Explicit workflow boundaries.
116
00:07:04,600 --> 00:07:06,200
Each workflow becomes a tool.
117
00:07:06,200 --> 00:07:07,700
Each tool has an input schema.
118
00:07:07,700 --> 00:07:09,200
Each tool has a permission model.
119
00:07:09,200 --> 00:07:13,100
That's exactly what you need if you want a system that behaves deterministically under pressure.
120
00:07:13,100 --> 00:07:15,300
Now, drop the MCP server into the middle.
121
00:07:15,300 --> 00:07:19,500
In the demo research, the MCP server is effectively Logic Apps exposing tools.
122
00:07:19,500 --> 00:07:20,700
That's the right mental model.
123
00:07:20,700 --> 00:07:22,500
MCP is your tool interface contract.
124
00:07:22,500 --> 00:07:26,700
It's how you describe in plain terms what a tool does and what payload it expects.
125
00:07:26,700 --> 00:07:30,900
So the model can map natural language into structured calls, but the trick is not the convenience.
126
00:07:30,900 --> 00:07:33,200
The trick is that MCP gives you a choke point.
127
00:07:33,200 --> 00:07:36,800
You decide which tools exist, how they're named, what parameters they accept,
128
00:07:36,800 --> 00:07:38,600
and which identity can invoke them.
129
00:07:38,600 --> 00:07:40,600
You stop letting the model invent actions.
130
00:07:40,600 --> 00:07:43,000
You force it to choose from a menu you own.
131
00:07:43,000 --> 00:07:46,600
Dataverse is next, and it's not optional if you want audit-grade operations.
132
00:07:46,600 --> 00:07:51,300
You need structured state, candidate records, screening rubrics, score artifacts, ticket cases,
133
00:07:51,300 --> 00:07:54,600
onboarding milestones, approval objects, override reason codes.
134
00:07:54,600 --> 00:07:58,600
If the state lives in chat transcripts and email threads, you don't have state.
135
00:07:58,600 --> 00:08:00,500
You have vibes.
136
00:08:00,500 --> 00:08:04,000
Dataverse also gives you something critical for long-running workflows.
137
00:08:04,000 --> 00:08:05,000
Durability.
138
00:08:05,000 --> 00:08:07,600
onboarding is not a single request response.
139
00:08:07,600 --> 00:08:10,300
It's days of waiting on people, systems, and failures.
140
00:08:10,300 --> 00:08:14,900
You need a system of record that survives retries and replays without duplicating actions.
141
00:08:14,900 --> 00:08:17,800
Then observability as your monitor and log analytics.
142
00:08:17,800 --> 00:08:20,900
This is where you capture operational truth across the chain.
143
00:08:20,900 --> 00:08:25,100
Not just the agent responded, you need the conversation session ID,
144
00:08:25,100 --> 00:08:27,700
the MCP tool call, the Logic App Run ID,
145
00:08:27,700 --> 00:08:30,000
the Dataverse transaction, and the outcome.
146
00:08:30,000 --> 00:08:33,000
Correlate them, store them, decide retention deliberately.
147
00:08:33,000 --> 00:08:37,800
Because without correlation IDs, you cannot answer the only question auditors ask that matters.
148
00:08:37,800 --> 00:08:39,400
Show me exactly what happened.
149
00:08:39,400 --> 00:08:41,600
Finally, interest sits across everything as enforcement.
150
00:08:41,600 --> 00:08:44,100
The agent is not a person, it's an identity boundary.
151
00:08:44,100 --> 00:08:47,400
Managed identity for Logic Apps, service principle for tool access,
152
00:08:47,400 --> 00:08:50,700
conditional access where appropriate, least privilege on graph scopes.
153
00:08:50,700 --> 00:08:55,200
You don't secure the agent by writing stern instructions in the system prompt.
154
00:08:55,200 --> 00:08:58,100
You secure it by making unauthorized actions impossible.
155
00:08:58,100 --> 00:08:59,300
That's the architecture.
156
00:08:59,300 --> 00:09:01,500
Copilot Studio Reasons and Roots.
157
00:09:01,500 --> 00:09:03,300
MCP defines capability.
158
00:09:03,300 --> 00:09:05,700
Logic Apps executes with identity and isolation.
159
00:09:05,700 --> 00:09:08,200
Dataverse holds state, monitor proofs behavior,
160
00:09:08,200 --> 00:09:09,700
entra and forces boundaries.
161
00:09:09,700 --> 00:09:13,700
Now we can talk about why Logic App Standard is the default for HR.
162
00:09:13,700 --> 00:09:15,500
Why Logic App Standard for HR?
163
00:09:15,500 --> 00:09:17,800
Isolation, identity and auditability.
164
00:09:17,800 --> 00:09:19,900
Logic App Standard is not a premium option.
165
00:09:19,900 --> 00:09:24,200
For HR, it's the baseline if you intend to keep PII inside controllable boundaries
166
00:09:24,200 --> 00:09:25,600
and still sleep at night.
167
00:09:25,600 --> 00:09:29,100
Most people pick consumption because it's easy, serverless, and it demos well.
168
00:09:29,100 --> 00:09:31,200
But HR workflows don't fail in the demo.
169
00:09:31,200 --> 00:09:35,200
They fail in production, under load, during audits after someone forwards a URL
170
00:09:35,200 --> 00:09:38,300
or when a maker copies a workflow into the wrong environment
171
00:09:38,300 --> 00:09:40,300
and nobody notices for six months.
172
00:09:40,300 --> 00:09:43,400
Standard changes the operating model in three ways that matter.
173
00:09:43,400 --> 00:09:46,900
Isolation, identity and auditability start with isolation.
174
00:09:46,900 --> 00:09:50,300
HR data has a habit of escaping through convenience.
175
00:09:50,300 --> 00:09:53,000
Consumption runs in a shared multi-tenant model.
176
00:09:53,000 --> 00:09:55,100
That doesn't mean it's insecure by definition.
177
00:09:55,100 --> 00:09:57,800
It means you are accepting shared infrastructure assumptions
178
00:09:57,800 --> 00:10:01,200
at the exact moment you're processing resumes, employee records,
179
00:10:01,200 --> 00:10:04,900
and hiring decisions that regulators now classify as high risk.
180
00:10:04,900 --> 00:10:08,300
You are choosing probabilistic comfort over deterministic control.
181
00:10:08,300 --> 00:10:09,700
Standard is single-tenant.
182
00:10:09,700 --> 00:10:11,100
That changes the blast radius.
183
00:10:11,100 --> 00:10:13,100
You get a dedicated runtime you can lock down
184
00:10:13,100 --> 00:10:15,500
and you can move the workflow behind private network boundaries.
185
00:10:15,500 --> 00:10:16,500
And yes, that matters.
186
00:10:16,500 --> 00:10:18,500
Even if your user sit in Teams all day,
187
00:10:18,500 --> 00:10:20,700
your agent doesn't care about your user experience,
188
00:10:20,700 --> 00:10:22,500
it cares about its calibable endpoints.
189
00:10:22,500 --> 00:10:25,900
HR workflows should not be public endpoints with hopes attached.
190
00:10:25,900 --> 00:10:28,100
Private endpoints and vnet integration
191
00:10:28,100 --> 00:10:31,000
let you make the MCP tool surface non-public by design.
192
00:10:31,000 --> 00:10:32,800
You can front it with controlled ingress.
193
00:10:32,800 --> 00:10:35,600
You can restrict who can call it and you can stop pretending
194
00:10:35,600 --> 00:10:39,300
that a secret URL parameter is an authentication strategy.
195
00:10:39,300 --> 00:10:40,800
Consumption gives you fewer options here.
196
00:10:40,800 --> 00:10:43,600
Standard gives you an architecture that can actually be constrained.
197
00:10:43,600 --> 00:10:44,800
Now identity.
198
00:10:44,800 --> 00:10:48,100
HR automation fails when everything runs under one shared connection
199
00:10:48,100 --> 00:10:49,300
that nobody owns.
200
00:10:49,300 --> 00:10:51,600
Standard plays well with managed identities.
201
00:10:51,600 --> 00:10:53,500
That means your logic app can authenticate
202
00:10:53,500 --> 00:10:56,900
to downstream services without embedding secrets in connection strings
203
00:10:56,900 --> 00:11:00,200
and without spreading credentials across environments like confetti.
204
00:11:00,200 --> 00:11:01,900
Managed identity isn't a nice to have.
205
00:11:01,900 --> 00:11:04,800
It's the only honest answer to who performed this action
206
00:11:04,800 --> 00:11:07,500
because in HR the system did it is not an explanation.
207
00:11:07,500 --> 00:11:09,300
It's an evasion.
208
00:11:09,300 --> 00:11:11,900
With standard, you can run each workflow as a tool
209
00:11:11,900 --> 00:11:13,600
with its own identity posture.
210
00:11:13,600 --> 00:11:15,800
You can separate read candidate profile
211
00:11:15,800 --> 00:11:18,600
from write scoring artifact from send offer letter
212
00:11:18,600 --> 00:11:20,100
and assign different permissions.
213
00:11:20,100 --> 00:11:22,400
That's least privilege in practice, not a slide deck.
214
00:11:22,400 --> 00:11:23,800
And once you have identity,
215
00:11:23,800 --> 00:11:26,800
entra becomes an enforcement layer instead of a directory.
216
00:11:26,800 --> 00:11:29,600
Conditional access can constrain who can invoke what
217
00:11:29,600 --> 00:11:31,800
from where and under what risk conditions.
218
00:11:31,800 --> 00:11:35,800
But it only works if your tool calls have real identities to bind to.
219
00:11:35,800 --> 00:11:38,000
Then there's auditability, which is where most
220
00:11:38,000 --> 00:11:40,000
agentech HR projects die quietly.
221
00:11:40,000 --> 00:11:43,800
If you run history captures PII and you don't configure guardrails
222
00:11:43,800 --> 00:11:45,800
you've built a compliance incident logger.
223
00:11:45,800 --> 00:11:48,000
Logic apps will happily store inputs and outputs.
224
00:11:48,000 --> 00:11:49,600
That's useful when you're debugging.
225
00:11:49,600 --> 00:11:53,000
It's catastrophic when those inputs include candidate resumes,
226
00:11:53,000 --> 00:11:55,800
medical accommodation notes or anything that triggers retention
227
00:11:55,800 --> 00:11:56,800
requirements.
228
00:11:56,800 --> 00:11:58,800
Standard doesn't magically fix that,
229
00:11:58,800 --> 00:12:01,800
but it gives you the enterprise posture to handle it properly.
230
00:12:01,800 --> 00:12:04,600
Secure inputs and outputs, controlled storage,
231
00:12:04,600 --> 00:12:07,600
and predictable diagnostic rooting into log analytics.
232
00:12:07,600 --> 00:12:11,200
You choose what evidence you keep, where it lives and how long it survives.
233
00:12:11,200 --> 00:12:14,600
You don't discover your retention policy during an incident review.
234
00:12:14,600 --> 00:12:16,200
And here's the unpleasant part.
235
00:12:16,200 --> 00:12:18,200
Logic apps doesn't give you native DLP.
236
00:12:18,200 --> 00:12:20,200
That's not a bug. It's the design reality.
237
00:12:20,200 --> 00:12:21,600
So you compensate with architecture.
238
00:12:21,600 --> 00:12:23,800
You enforce data minimization by design.
239
00:12:23,800 --> 00:12:25,600
You avoid putting raw PII into prompts.
240
00:12:25,600 --> 00:12:27,600
You pass identifiers, not payloads.
241
00:12:27,600 --> 00:12:31,400
You retrieve the sensitive data inside the tool boundary under identity
242
00:12:31,400 --> 00:12:35,400
and you store structured artifacts in data verse with explicit access controls.
243
00:12:35,400 --> 00:12:37,400
You treat the chat transcript as a control surface,
244
00:12:37,400 --> 00:12:38,600
not as a data lake.
245
00:12:38,600 --> 00:12:42,200
You also put per view boundaries where they actually apply on the data sources
246
00:12:42,200 --> 00:12:46,000
and repositories that co-pilot can retrieve from and on the storage locations
247
00:12:46,000 --> 00:12:47,600
where outputs land.
248
00:12:47,600 --> 00:12:49,200
The agent inherits your data hygiene.
249
00:12:49,200 --> 00:12:51,000
If your SharePoint permissions are a mess,
250
00:12:51,000 --> 00:12:53,400
co-pilot will simply surface your mess faster.
251
00:12:53,400 --> 00:12:56,400
Standard is also where you get predictable operational telemetry.
252
00:12:56,400 --> 00:12:59,000
Consumption will scale, but you're still living inside a model
253
00:12:59,000 --> 00:13:03,400
where runs are scattered and visibility can degrade into we think it happened.
254
00:13:03,400 --> 00:13:07,600
Standard makes it easier to treat this like any other production integration workload.
255
00:13:07,600 --> 00:13:12,800
Baselines, alerts, diagnostics, correlation IDs, and incident response.
256
00:13:12,800 --> 00:13:14,000
So the rule is simple.
257
00:13:14,000 --> 00:13:18,400
If the workflow touches candidate screening, onboarding, provisioning, or employee case data,
258
00:13:18,400 --> 00:13:21,000
standard is the default, not because it's fancy,
259
00:13:21,000 --> 00:13:24,800
because it's the only model that lets you enforce isolation, bind identity,
260
00:13:24,800 --> 00:13:27,400
and produce audit grade evidence without duct tape.
261
00:13:27,400 --> 00:13:30,600
And once you accept that, the governance layer stops being optional.
262
00:13:30,600 --> 00:13:34,400
It becomes the thing that keeps the agent from becoming a shadow admin with good grammar.
263
00:13:34,400 --> 00:13:38,400
Governance model, boundaries, roles, and the action space.
264
00:13:38,400 --> 00:13:39,800
Governance is not a committee.
265
00:13:39,800 --> 00:13:40,600
It's not a PDF.
266
00:13:40,600 --> 00:13:44,400
It's not an annual training module everyone clicks through while eating lunch.
267
00:13:44,400 --> 00:13:47,800
Governance is the set of constraints that make bad behavior impossible at scale.
268
00:13:47,800 --> 00:13:51,600
And in an agentic HR system, the governance unit is not the agent.
269
00:13:51,600 --> 00:13:53,400
It's the action space.
270
00:13:53,400 --> 00:13:57,200
Action space means the exact set of operations the agent can perform,
271
00:13:57,200 --> 00:13:59,800
which tools exist, what parameters they accept,
272
00:13:59,800 --> 00:14:02,400
what systems they can touch, and what they can change.
273
00:14:02,400 --> 00:14:05,800
Not what it can say, what it can do, because in HR words are cheap,
274
00:14:05,800 --> 00:14:08,400
actions create liability.
275
00:14:08,400 --> 00:14:10,600
This is where most deployments decay.
276
00:14:10,600 --> 00:14:15,400
They start with intent, help employees, reduce tickets, speed up hiring,
277
00:14:15,400 --> 00:14:17,400
and they end with capabilities, Brawl.
278
00:14:17,400 --> 00:14:21,000
Somebody adds one more connector, one more action group, one more exception.
279
00:14:21,000 --> 00:14:24,200
Then six months later, you have an HR agent that can read too much,
280
00:14:24,200 --> 00:14:26,200
write too much, and nobody remembers why.
281
00:14:26,200 --> 00:14:29,600
So the governance model starts with boundaries, first boundary environments.
282
00:14:29,600 --> 00:14:33,200
You don't test HR agents in production because you're just trying something.
283
00:14:33,200 --> 00:14:36,800
You build in dev, validate and test, and promote to prod with policy parity.
284
00:14:36,800 --> 00:14:40,000
Otherwise, you're not shipping an agent, you're shipping drift.
285
00:14:40,000 --> 00:14:42,000
Second boundary, tool groups.
286
00:14:42,000 --> 00:14:45,600
Candidate screening tools don't live next to onboarding provisioning tools.
287
00:14:45,600 --> 00:14:48,400
Ticket triage tools don't live next to offer letter generation.
288
00:14:48,400 --> 00:14:50,000
Separation is not bureaucracy.
289
00:14:50,000 --> 00:14:52,000
Separation is blast radius management.
290
00:14:52,000 --> 00:14:54,400
Third boundary, data surfaces.
291
00:14:54,400 --> 00:14:58,400
SharePoint folders full of resumes with random naming and no matter data discipline
292
00:14:58,400 --> 00:14:59,400
are not a data source.
293
00:14:59,400 --> 00:15:00,400
They're an entropy source.
294
00:15:00,400 --> 00:15:04,400
If the agent retrieves from it, it will amplify whatever chaos you stored there.
295
00:15:04,400 --> 00:15:10,000
Governance means you curate sources in force permissions and you decide what grounding actually means for HR.
296
00:15:10,000 --> 00:15:11,000
Now roles.
297
00:15:11,000 --> 00:15:15,600
In most companies, HR automation fails because everyone shares superpowers.
298
00:15:15,600 --> 00:15:18,600
The recruiter can change workflows, the maker can publish agents,
299
00:15:18,600 --> 00:15:21,800
the platform admin can see everything, then everyone can do everything.
300
00:15:21,800 --> 00:15:23,000
And nobody owns anything.
301
00:15:23,000 --> 00:15:26,200
So define roles like the system will punish you if you get them wrong.
302
00:15:26,200 --> 00:15:29,200
Recruiters and hiring teams, they should initiate screening workflows,
303
00:15:29,200 --> 00:15:31,000
request summaries and trigger approvals.
304
00:15:31,000 --> 00:15:33,800
They should not edit the scoring rubric logic in production.
305
00:15:33,800 --> 00:15:35,200
They should not add new tools.
306
00:15:35,200 --> 00:15:36,800
They should not bypass gates.
307
00:15:36,800 --> 00:15:41,000
HR operations, they own case processes, ticket routing, onboarding milestones
308
00:15:41,000 --> 00:15:42,600
and service delivery metrics.
309
00:15:42,600 --> 00:15:45,000
They can tune routing rules and knowledge boundaries.
310
00:15:45,000 --> 00:15:48,000
They do not get to expand the action space without review
311
00:15:48,000 --> 00:15:53,200
because just one more connector is how you end up with payroll data in the wrong place.
312
00:15:53,200 --> 00:15:59,000
Platform admins, they own environments, identity plumbing, connector governance and logging.
313
00:15:59,000 --> 00:16:03,600
They should not be the people deciding hiring criteria or writing screening rubrics
314
00:16:03,600 --> 00:16:08,200
when the same group controls policy and implementation exceptions become invisible.
315
00:16:08,200 --> 00:16:12,400
And invisible exceptions are how bias becomes a systems feature security and compliance.
316
00:16:12,400 --> 00:16:14,000
They don't approve the agent here.
317
00:16:14,000 --> 00:16:16,600
They approve the action space and the evidence plan.
318
00:16:16,600 --> 00:16:17,600
What gets logged?
319
00:16:17,600 --> 00:16:19,800
Where it's retained, how correlation works,
320
00:16:19,800 --> 00:16:23,000
and how fast you can disable pathways when something goes wrong.
321
00:16:23,000 --> 00:16:26,000
Now enforce those roles with entra, not good intentions.
322
00:16:26,000 --> 00:16:29,000
Every tool call should bind to an identity boundary.
323
00:16:29,000 --> 00:16:31,400
For logic apps, that means managed identities.
324
00:16:31,400 --> 00:16:35,800
For agent access, that means explicit permissions and conditional access where it matters.
325
00:16:35,800 --> 00:16:40,600
For co-pilot studio actions, it means no, everyone can use this agent default.
326
00:16:40,600 --> 00:16:45,000
You publish to audiences deliberately and you treat broad access as a privileged event.
327
00:16:45,000 --> 00:16:47,200
And you build least privilege at the tool level.
328
00:16:47,200 --> 00:16:50,000
Read tools and write tools are not the same.
329
00:16:50,000 --> 00:16:53,400
Get candidate profiles is not the same as create candidate record.
330
00:16:53,400 --> 00:16:56,600
Draft offer letter is not the same as send offer letter.
331
00:16:56,600 --> 00:17:01,200
If your action space doesn't separate these, you are relying on the model to self-regulate.
332
00:17:01,200 --> 00:17:05,600
It won't, it can't, it doesn't have incentives, it has probabilities.
333
00:17:05,600 --> 00:17:09,200
Then there's the part nobody wants but everyone needs the kill switch.
334
00:17:09,200 --> 00:17:12,600
You design every agent workflow with the assumption that you will disable it,
335
00:17:12,600 --> 00:17:16,400
not because you expect failure, but because entropy always finds a path.
336
00:17:16,400 --> 00:17:19,800
A kill switch is the ability to shut down tool invocation fast.
337
00:17:19,800 --> 00:17:23,400
Disable the logic app workflow, revoke the managed identity permissions,
338
00:17:23,400 --> 00:17:27,000
unpublish the agent, or block access via conditional access.
339
00:17:27,000 --> 00:17:31,000
Pick at least two layers because relying on one control is how outages become incidents.
340
00:17:31,000 --> 00:17:34,200
And when you hit the kill switch, you don't fix forward in production.
341
00:17:34,200 --> 00:17:37,000
You restore capability only after review.
342
00:17:37,000 --> 00:17:38,000
What happened?
343
00:17:38,000 --> 00:17:39,400
What evidence exists?
344
00:17:39,400 --> 00:17:40,600
What control failed?
345
00:17:40,600 --> 00:17:42,400
And what boundary needs tightening?
346
00:17:42,400 --> 00:17:44,000
This is the core governance truth.
347
00:17:44,000 --> 00:17:48,800
Makers will build, teams will copy, exceptions will accumulate, policies will drift.
348
00:17:48,800 --> 00:17:51,800
Your job is to make drift visible, constrainable and reversible.
349
00:17:51,800 --> 00:17:53,800
That's what action space governance actually is.
350
00:17:53,800 --> 00:17:55,800
Compliance hooks without legal theater.
351
00:17:55,800 --> 00:17:57,600
Oversight, notice evidence.
352
00:17:57,600 --> 00:18:01,200
Most teams here, AI compliance and immediately do one of two things.
353
00:18:01,200 --> 00:18:06,400
They either panic and stop, or they paste a disclaimer into the chat window and call it governance.
354
00:18:06,400 --> 00:18:08,200
Neither works.
355
00:18:08,200 --> 00:18:11,200
Compliance in an agentic HR system is not a statement.
356
00:18:11,200 --> 00:18:12,800
It's an operating capability.
357
00:18:12,800 --> 00:18:17,400
Oversight that can intervene, notice that is consistent, and evidence that survives scrutiny.
358
00:18:17,400 --> 00:18:19,400
Start with oversight.
359
00:18:19,400 --> 00:18:23,000
The law's people reference NYC local law 144 in hiring.
360
00:18:23,000 --> 00:18:26,000
Colorado's AI Act for high-risk systems.
361
00:18:26,000 --> 00:18:27,800
Are not asking you to become a lawyer.
362
00:18:27,800 --> 00:18:31,200
They are forcing you to behave like an operator of a decision system.
363
00:18:31,200 --> 00:18:33,000
Which means you must be able to show.
364
00:18:33,000 --> 00:18:37,800
Who reviewed the outcome when humans took over and how the system prevented harm when it was uncertain?
365
00:18:37,800 --> 00:18:40,000
That's why human in the loop isn't just a safety net.
366
00:18:40,000 --> 00:18:41,200
It's the governance primitive.
367
00:18:41,200 --> 00:18:45,800
If your candidate, screening agent can score a resume and shortlist a person without a gate.
368
00:18:45,800 --> 00:18:50,400
You've effectively built an automated employment decision tool with no supervision story.
369
00:18:50,400 --> 00:18:53,000
And the moment a candidate challenges the outcome,
370
00:18:53,000 --> 00:18:56,000
your but the recruiter had final say argument collapses,
371
00:18:56,000 --> 00:18:59,200
unless you can prove the recruiter actually did something meaningful.
372
00:18:59,200 --> 00:19:00,600
So oversight has to be real.
373
00:19:00,600 --> 00:19:01,400
It has to be measurable.
374
00:19:01,400 --> 00:19:03,000
It has to leave artifacts.
375
00:19:03,000 --> 00:19:05,600
Notice is next and this is where most teams get weird.
376
00:19:05,600 --> 00:19:09,600
They either over-lawyer it and turn the employee experience into a compliance pop-up parade
377
00:19:09,600 --> 00:19:12,200
or they avoid it entirely and hope nobody asks.
378
00:19:12,200 --> 00:19:15,400
But notice is not optional in the direction the world is moving.
379
00:19:15,400 --> 00:19:17,400
The practical approach is simple.
380
00:19:17,400 --> 00:19:21,600
Define when the system is assisting versus influencing.
381
00:19:21,600 --> 00:19:26,400
Assisting means it's drafting, summarizing, retrieving policy or preparing an information package.
382
00:19:26,400 --> 00:19:32,800
Influencing means it's scoring, ranking, recommending, rooting or changing access and records.
383
00:19:32,800 --> 00:19:34,200
Those are different risk profiles.
384
00:19:34,200 --> 00:19:35,400
Treat them differently.
385
00:19:35,400 --> 00:19:39,200
When the system influences, you implement consistent user-facing disclosure.
386
00:19:39,200 --> 00:19:42,800
In screening that might be candidate-facing and regulated jurisdictions.
387
00:19:42,800 --> 00:19:47,400
In internal HR ops, it might be employee-facing when a ticket is classified and auto-resolved.
388
00:19:47,400 --> 00:19:48,600
The point isn't the exact wording.
389
00:19:48,600 --> 00:19:53,000
The point is that you can demonstrate you had a repeatable notice mechanism tied to the workflow,
390
00:19:53,000 --> 00:19:55,200
not the personality of whoever wrote the prompt.
391
00:19:55,200 --> 00:19:56,200
Now evidence.
392
00:19:56,200 --> 00:19:59,000
This is the part everyone promises and nobody actually builds.
393
00:19:59,000 --> 00:20:02,600
An auditor does not care that your agent usually does the right thing.
394
00:20:02,600 --> 00:20:04,000
They ask for a single case.
395
00:20:04,000 --> 00:20:08,400
Show me what happened and to end for this candidate on this state under this policy version
396
00:20:08,400 --> 00:20:10,200
with this approver and this final outcome.
397
00:20:10,200 --> 00:20:12,200
So your evidence model has to be explicit.
398
00:20:12,200 --> 00:20:15,200
At minimum, you need five things captured and correlated.
399
00:20:15,200 --> 00:20:19,400
One, the user request and the system prompt context that drove the decision,
400
00:20:19,400 --> 00:20:22,400
not for performance theater for accountability.
401
00:20:22,400 --> 00:20:24,400
Two, the retrieved sources.
402
00:20:24,400 --> 00:20:26,600
If the agent grounded on a SharePoint document,
403
00:20:26,600 --> 00:20:29,000
store the document ID version and metadata.
404
00:20:29,000 --> 00:20:33,000
If it used people data, store what directory attributes it referenced.
405
00:20:33,000 --> 00:20:36,000
Otherwise, you can't answer the basic challenge based on what?
406
00:20:36,000 --> 00:20:43,200
Three, the tool calls, MCP tool name, input payload, output payload, and the logic app run ID.
407
00:20:43,200 --> 00:20:46,000
The system did not decide it called a tool.
408
00:20:46,000 --> 00:20:48,400
Tools leave traces, capture them.
409
00:20:48,400 --> 00:20:55,200
Four, the state transitions and data verse candidate created score artifact written HITL request issued.
410
00:20:55,200 --> 00:20:59,000
Approval received override reason code recorded short list updated.
411
00:20:59,000 --> 00:21:00,000
These are not chat events.
412
00:21:00,000 --> 00:21:01,200
They are business events.
413
00:21:01,200 --> 00:21:05,400
Five, the human actions who approved who rejected who overrode.
414
00:21:05,400 --> 00:21:08,200
What reason they gave and whether they had the right role to do it.
415
00:21:08,200 --> 00:21:13,800
Then you do all of this with correlation IDs that survive across co pilot studio, MCP, logic apps and data verse.
416
00:21:13,800 --> 00:21:16,200
If you can't stitch it together, you don't have evidence.
417
00:21:16,200 --> 00:21:17,200
You have scattered logs.
418
00:21:17,200 --> 00:21:19,400
A lot of teams worry they're logging too much.
419
00:21:19,400 --> 00:21:21,400
Good. That means they finally understand the risk.
420
00:21:21,400 --> 00:21:22,400
So be deliberate.
421
00:21:22,400 --> 00:21:25,000
Don't dump raw resumes and medical notes into logs.
422
00:21:25,000 --> 00:21:27,200
Store references and structured artifacts.
423
00:21:27,200 --> 00:21:29,200
Use secure inputs and outputs in workflow.
424
00:21:29,200 --> 00:21:31,200
So run history doesn't become a data leak.
425
00:21:31,200 --> 00:21:34,800
Define retention as a policy decision, not an accident of defaults.
426
00:21:34,800 --> 00:21:38,000
Evidence that can't be retained safely is just future liability.
427
00:21:38,000 --> 00:21:39,400
And here's the final constraint.
428
00:21:39,400 --> 00:21:41,800
Never claim compliance claim alignment.
429
00:21:41,800 --> 00:21:46,800
You are implementing oversight notice and evidence patterns that map to emerging requirements.
430
00:21:46,800 --> 00:21:51,000
You're building a system that can be audited, challenged and improved without rewriting history.
431
00:21:51,000 --> 00:21:54,000
That's what compliance hooks actually means in architecture terms.
432
00:21:54,000 --> 00:21:57,000
Now we can implement the first concrete control pattern.
433
00:21:57,000 --> 00:21:59,800
Hittl confidence gates as deterministic stops.
434
00:21:59,800 --> 00:22:03,200
Hittl pattern confidence gates as deterministic stops.
435
00:22:03,200 --> 00:22:05,400
Human in the loop is not a moral preference.
436
00:22:05,400 --> 00:22:06,800
It's an architectural circuit breaker.
437
00:22:06,800 --> 00:22:09,200
The point of Hittl isn't to make the agent humble.
438
00:22:09,200 --> 00:22:14,600
The point is to make irreversible actions impossible when the system certainty drops below a defined bar.
439
00:22:14,600 --> 00:22:18,800
That bar can be a confidence threshold, a risk category, a policy condition or all three.
440
00:22:18,800 --> 00:22:21,200
But it has to be explicit, enforced and logged.
441
00:22:21,200 --> 00:22:23,000
So here's the pattern. Confidence gates.
442
00:22:23,000 --> 00:22:26,600
A confidence gate is a deterministic stop in an otherwise autonomous workflow.
443
00:22:26,600 --> 00:22:28,800
The agent can reason retrieve draft and propose.
444
00:22:28,800 --> 00:22:32,400
But if the gate conditions are met, it must pause and request a human decision
445
00:22:32,400 --> 00:22:35,000
before any further tool calls that change state.
446
00:22:35,000 --> 00:22:40,400
Notice what that does. It converts a probabilistic model output into a controlled workflow step with ownership.
447
00:22:40,400 --> 00:22:44,200
Now teams usually implement this backward. They do Hittl when it feels risky.
448
00:22:44,200 --> 00:22:47,600
That translates to Hittl never until after the incident.
449
00:22:47,600 --> 00:22:49,200
You need mechanical triggers.
450
00:22:49,200 --> 00:22:51,600
The most obvious trigger is a numeric confidence score.
451
00:22:51,600 --> 00:22:54,600
For example, anything below 75 escalates.
452
00:22:54,600 --> 00:22:56,200
That doesn't mean the model is measuring truth.
453
00:22:56,200 --> 00:23:00,600
It means you're defining a threshold where the system must stop pretending it can decide.
454
00:23:00,600 --> 00:23:05,600
And you pick that number by testing, run historical cases, measure how often the agent gets it wrong,
455
00:23:05,600 --> 00:23:09,800
and set the gate where the cost of a mistake exceeds the cost of a review.
456
00:23:09,800 --> 00:23:13,200
But confidence scores a loan of fragile because models can be confidently wrong.
457
00:23:13,200 --> 00:23:14,800
So you add policy-based gates.
458
00:23:14,800 --> 00:23:17,200
In candidate screening, you gate on.
459
00:23:17,200 --> 00:23:21,600
In complete evidence, missing rubric fields, conflicting criteria or detected proxy risk.
460
00:23:21,600 --> 00:23:27,600
In ticket triage, you gate on. Requests that involve pay, leave disputes, medical accommodations,
461
00:23:27,600 --> 00:23:30,800
or anything that triggers an employee relations workflow.
462
00:23:30,800 --> 00:23:35,200
In onboarding, you gate on license assignment, group membership, access to sensitive systems,
463
00:23:35,200 --> 00:23:38,000
or any action that can create access you can't easily unwind.
464
00:23:38,000 --> 00:23:41,200
That's how you build a risk tiered Hittl model without pretending to be a lawyer.
465
00:23:41,200 --> 00:23:42,800
Next is the approval object.
466
00:23:42,800 --> 00:23:46,400
A high-till pause is useless if it's just ask a human in chat.
467
00:23:46,400 --> 00:23:49,600
You need a structured approval artifact that records who approved,
468
00:23:49,600 --> 00:23:52,400
what they saw, what decision they made, and why.
469
00:23:52,400 --> 00:23:55,200
That artifact lives in Dataverse, not in a team's message.
470
00:23:55,200 --> 00:23:57,000
Teams is the notification channel.
471
00:23:57,000 --> 00:23:58,000
Dataverse is the record.
472
00:23:58,000 --> 00:24:03,400
So define an approval object with fields that force accountability, case ID, decision type,
473
00:24:03,400 --> 00:24:09,600
proposed action, evidence links, confidence score, gate reason, approver identity, time stamp,
474
00:24:09,600 --> 00:24:13,600
outcome, override reason code, and a free text justification field,
475
00:24:13,600 --> 00:24:19,200
but only after the structured reason code, because otherwise humans will write looks good and call it oversight.
476
00:24:19,200 --> 00:24:20,600
What does the approver see?
477
00:24:20,600 --> 00:24:22,000
Not the entire chat transcript.
478
00:24:22,000 --> 00:24:24,200
That's how you leak data and drown the reviewer.
479
00:24:24,200 --> 00:24:29,000
The reviewer gets a context package, the structured rubric, the retrieved sources with links and versions,
480
00:24:29,000 --> 00:24:31,600
and the specific recommendation the agent proposes.
481
00:24:31,600 --> 00:24:34,400
If the agent can't produce that package, it hasn't earned autonomy.
482
00:24:34,400 --> 00:24:36,000
It escalates by default.
483
00:24:36,000 --> 00:24:38,400
Now define borderline case like you mean it.
484
00:24:38,400 --> 00:24:40,600
A borderline case is not "I'm not sure".
485
00:24:40,600 --> 00:24:42,800
It's a set of conditions you can test.
486
00:24:42,800 --> 00:24:44,000
Example for screening.
487
00:24:44,000 --> 00:24:49,000
Candidate meets 70 to 85% of required rubric points or fails one must have,
488
00:24:49,000 --> 00:24:53,600
but exceeds in compensating skills or the job requirement text maps to multiple interpretations.
489
00:24:53,600 --> 00:24:56,400
You don't need perfection, you need repeatability.
490
00:24:56,400 --> 00:25:01,000
And when a human overrides the agent, approve a borderline candidate, reject a recommended one,
491
00:25:01,000 --> 00:25:02,800
that override becomes a signal.
492
00:25:02,800 --> 00:25:05,600
It's evidence of drift, bias, or rubric failure.
493
00:25:05,600 --> 00:25:08,200
You capture it, aggregate it, and review it monthly.
494
00:25:08,200 --> 00:25:09,600
Overrides are not noise.
495
00:25:09,600 --> 00:25:12,600
Overrides are where the system tells you it's misaligned with intent.
496
00:25:12,600 --> 00:25:15,200
Now here's where most people accidentally kill the value.
497
00:25:15,200 --> 00:25:16,400
Rubber stamping.
498
00:25:16,400 --> 00:25:21,400
If the approval UI makes approve the easy button and reject the annoying one,
499
00:25:21,400 --> 00:25:25,200
humans will approve everything and your HITEL becomes compliance theatre.
500
00:25:25,200 --> 00:25:28,600
So you enforce friction, require reason codes on approval and override,
501
00:25:28,600 --> 00:25:31,600
require selection of which rubric criterion justified the decision,
502
00:25:31,600 --> 00:25:34,800
require a note when the decision contradicts the recommendation,
503
00:25:34,800 --> 00:25:36,600
and limit who can approve which actions,
504
00:25:36,600 --> 00:25:38,600
hiring managers approve shortlist additions,
505
00:25:38,600 --> 00:25:40,800
HR-Ops approves policy exceptions,
506
00:25:40,800 --> 00:25:42,800
platform admins approve tool expansion,
507
00:25:42,800 --> 00:25:44,200
nobody approves everything.
508
00:25:44,200 --> 00:25:46,600
Finally, treat HITEL as a stop, not a detour.
509
00:25:46,600 --> 00:25:49,600
When the system pauses, it should freeze the workflow state and wait.
510
00:25:49,600 --> 00:25:52,600
No parallel tool calls, no keep going and will review later.
511
00:25:52,600 --> 00:25:55,200
If you let actions proceed, you didn't implement HITEL.
512
00:25:55,200 --> 00:25:57,800
You implemented a notification, so the payoff is simple.
513
00:25:57,800 --> 00:26:01,400
Autonomy where mistakes are reversible, friction where mistakes are irreversible,
514
00:26:01,400 --> 00:26:05,400
and a decision trail that survives scrutiny because it's made of records, not stories.
515
00:26:05,400 --> 00:26:06,400
Now we prove it.
516
00:26:06,400 --> 00:26:11,200
Observability and audit evidence across prompts, actions, overrides and state.
517
00:26:11,200 --> 00:26:15,200
Observability and audit evidence prompts actions, overrides and state.
518
00:26:15,200 --> 00:26:18,000
If you remember nothing else from this episode, remember this.
519
00:26:18,000 --> 00:26:22,600
An HR agent without observability is just a liability generator with a chat window
520
00:26:22,600 --> 00:26:25,600
because the failure mode isn't the agent answered wrong.
521
00:26:25,600 --> 00:26:29,400
The failure mode is the agent acted and you can't reconstruct why.
522
00:26:29,400 --> 00:26:32,200
And in HR, reconstruction is not a technical hobby.
523
00:26:32,200 --> 00:26:33,600
It's your defensability story.
524
00:26:33,600 --> 00:26:37,400
So what does audit grade observability actually mean in an agente workflow?
525
00:26:37,400 --> 00:26:40,600
It means you can produce a complete chain of evidence for any case,
526
00:26:40,600 --> 00:26:44,600
from the original user intent to the sources retrieved, to the tools invoked,
527
00:26:44,600 --> 00:26:49,000
to the data that changed to the human approvals and overrides that allowed it to proceed.
528
00:26:49,000 --> 00:26:50,400
Not we have some logs.
529
00:26:50,400 --> 00:26:51,400
A chain.
530
00:26:51,400 --> 00:26:53,000
Start with prompts and context.
531
00:26:53,000 --> 00:26:56,400
You need to capture the user's input, plus the relevant system instructions
532
00:26:56,400 --> 00:26:58,600
that shape the agent's behavior at that moment.
533
00:26:58,600 --> 00:27:00,800
Not because you love storing chat transcripts,
534
00:27:00,800 --> 00:27:04,400
because when someone challenges an outcome, the first question is always,
535
00:27:04,400 --> 00:27:07,200
what did the user ask and what did the system assume?
536
00:27:07,200 --> 00:27:08,400
But don't get sloppy.
537
00:27:08,400 --> 00:27:11,600
Prompts can contain PII, so you don't default to raw dumps.
538
00:27:11,600 --> 00:27:16,000
You store what's necessary for reconstruction and you redact what creates unnecessary exposure.
539
00:27:16,000 --> 00:27:20,400
This is where secure inputs and outputs in logic apps stop being a checkbox
540
00:27:20,400 --> 00:27:22,400
and start being a survival feature.
541
00:27:22,400 --> 00:27:24,400
Next, retrieval evidence.
542
00:27:24,400 --> 00:27:28,000
If the agent grounded its response or decision on SharePoint resumes,
543
00:27:28,000 --> 00:27:30,800
policy documents or people directory data,
544
00:27:30,800 --> 00:27:34,200
you must store the references, not just the generated text.
545
00:27:34,200 --> 00:27:37,800
That means document IDs, version IDs, and metadata
546
00:27:37,800 --> 00:27:41,600
that lets you prove what the agent saw, otherwise you get into the worst kind of argument.
547
00:27:41,600 --> 00:27:44,600
The policy changed after the decision and you can't prove it didn't.
548
00:27:44,600 --> 00:27:46,400
Now tool calls.
549
00:27:46,400 --> 00:27:48,600
Your model doesn't do onboarding.
550
00:27:48,600 --> 00:27:51,400
It calls a tool that provisions accounts.
551
00:27:51,400 --> 00:27:53,000
Your model doesn't root tickets it.
552
00:27:53,000 --> 00:27:54,800
It calls a tool that assigns a case.
553
00:27:54,800 --> 00:27:57,200
Those tool calls are where reality happens.
554
00:27:57,200 --> 00:28:02,000
So log each tool call with tool name, input payload shape, output summary,
555
00:28:02,000 --> 00:28:05,000
and the execution identifiers from your downstream platform.
556
00:28:05,000 --> 00:28:08,800
In our stack, that means correlating co-pilot studio session identifiers
557
00:28:08,800 --> 00:28:13,600
to MCP tool invocations, to logic app run IDs, to dataverse transaction IDs,
558
00:28:13,600 --> 00:28:15,600
correlation IDs aren't nice to have.
559
00:28:15,600 --> 00:28:18,400
They are the only way you can stitch a conversation to an action.
560
00:28:18,400 --> 00:28:21,400
Now let's talk about overrides because overrides are where most teams
561
00:28:21,400 --> 00:28:23,200
accidentally delete their own credibility.
562
00:28:23,200 --> 00:28:26,600
A human override must be treated as a first class event, not a footnote.
563
00:28:26,600 --> 00:28:30,600
Store who overrode what they overrode, the reason code, and the justification.
564
00:28:30,600 --> 00:28:34,000
And crucially, store whether the override contradicted the agent recommendation
565
00:28:34,000 --> 00:28:35,200
or simply confirmed it.
566
00:28:35,200 --> 00:28:37,800
Why? Because override rates are how you detect drift.
567
00:28:37,800 --> 00:28:41,400
A rising override rate means the rubric is failing, the knowledge sources are stale,
568
00:28:41,400 --> 00:28:43,400
or the agent is rooting edge cases badly.
569
00:28:43,400 --> 00:28:46,400
A near zero override rate might mean the agent is perfect.
570
00:28:46,400 --> 00:28:48,200
Or it might mean humans are rubber stamping.
571
00:28:48,200 --> 00:28:51,400
You don't get to assume which one you instrumented, then state.
572
00:28:51,400 --> 00:28:54,600
The system needs a durable case record that captures progression over time,
573
00:28:54,600 --> 00:28:57,800
candidates screening from intake to shortlist, ticket triage,
574
00:28:57,800 --> 00:29:02,800
from submission to resolution, on boarding from offer accepted today 30 milestones.
575
00:29:02,800 --> 00:29:06,800
Chat transcripts don't model state, they model conversation.
576
00:29:06,800 --> 00:29:11,400
Dataverse model state case objects, milestones, approvals and artifacts
577
00:29:11,400 --> 00:29:14,200
that survive retries, restarts and long running weights.
578
00:29:14,200 --> 00:29:17,000
That's what makes your workflows deterministic at scale because your agent
579
00:29:17,000 --> 00:29:18,600
doesn't need to remember.
580
00:29:18,600 --> 00:29:21,000
It can read current state and act accordingly.
581
00:29:21,000 --> 00:29:23,200
Now retention logs aren't free.
582
00:29:23,200 --> 00:29:26,400
They aren't free operationally and they definitely aren't free legally.
583
00:29:26,400 --> 00:29:28,200
Decide retention deliberately.
584
00:29:28,200 --> 00:29:31,400
What evidence must be retained where it lives and for how long?
585
00:29:31,400 --> 00:29:33,600
Tie it to your HR and compliance policies.
586
00:29:33,600 --> 00:29:37,800
Don't accidentally retain sensitive prompts forever because nobody set a policy
587
00:29:37,800 --> 00:29:39,600
and the default was keep everything.
588
00:29:39,600 --> 00:29:41,400
Then the metrics that expose failure.
589
00:29:41,400 --> 00:29:44,800
Track escalation rate, title trigger rate, override frequency,
590
00:29:44,800 --> 00:29:47,400
unresolved sessions and reopen rates for tickets.
591
00:29:47,400 --> 00:29:51,400
In screening track borderline volume and decision variance across reviewers.
592
00:29:51,400 --> 00:29:54,800
In on boarding, track retry counts and idempotency violations
593
00:29:54,800 --> 00:29:58,400
because provision twice is not an error you want to discover in an audit.
594
00:29:58,400 --> 00:30:02,000
And yes, track latency because slow agents cause humans to bypass them
595
00:30:02,000 --> 00:30:03,400
and bypass is the real enemy.
596
00:30:03,400 --> 00:30:06,800
The point of all of this is audit readiness as an operational posture.
597
00:30:06,800 --> 00:30:08,800
When someone asks what happened, you don't go digging.
598
00:30:08,800 --> 00:30:10,400
You produce the record on demand.
599
00:30:10,400 --> 00:30:14,200
And once you can do that, you can scale the agent without losing control
600
00:30:14,200 --> 00:30:16,800
because the system tells you what it's doing, where it's drifting
601
00:30:16,800 --> 00:30:19,600
and where humans are compensating for design omissions.
602
00:30:19,600 --> 00:30:22,600
Now we can apply this control plane to the three use cases
603
00:30:22,600 --> 00:30:24,800
without changing the architecture every time.
604
00:30:24,800 --> 00:30:28,000
Use case map, three workflows, one control plane.
605
00:30:28,000 --> 00:30:32,000
Now the three use cases, not because they're the only HR automations worth doing
606
00:30:32,000 --> 00:30:35,000
but because they represent three different failure modes.
607
00:30:35,000 --> 00:30:39,400
High risk decisions, high volume operations and long running orchestration.
608
00:30:39,400 --> 00:30:42,000
And the point is that the control plane stays the same.
609
00:30:42,000 --> 00:30:43,600
Candidate screening is the volatile one.
610
00:30:43,600 --> 00:30:45,600
It's bi-sensitive, it's legally sensitive,
611
00:30:45,600 --> 00:30:48,600
and it's where people most want to let the model just decide
612
00:30:48,600 --> 00:30:50,000
because the volume is painful.
613
00:30:50,000 --> 00:30:51,600
That's exactly why it fails first.
614
00:30:51,600 --> 00:30:54,400
So the control plane here is strict, structured rubric,
615
00:30:54,400 --> 00:30:55,800
constrained tool calls,
616
00:30:55,800 --> 00:30:58,600
hitl gates on borderline scores and an evidence trail
617
00:30:58,600 --> 00:31:02,400
that links the score artifact to the exact resume source version.
618
00:31:02,400 --> 00:31:06,000
The agent can summarize and propose, it can't silently rank and move on.
619
00:31:06,000 --> 00:31:10,000
Screening is where you prove your agent isn't an automated discrimination machine
620
00:31:10,000 --> 00:31:11,000
with good UX.
621
00:31:11,000 --> 00:31:13,400
Ticket triage is the safer, higher ROI starter.
622
00:31:13,400 --> 00:31:15,800
It's operationally ugly but it's not usually irreversible.
623
00:31:15,800 --> 00:31:18,000
If the agent mis-roots a ticket, you can correct it.
624
00:31:18,000 --> 00:31:21,000
If it auto-resolve the tier one request with the wrong answer,
625
00:31:21,000 --> 00:31:25,000
you can reopen it, fix the response, and improve the knowledge boundary.
626
00:31:25,000 --> 00:31:27,400
That makes it ideal for scaling early
627
00:31:27,400 --> 00:31:31,800
because you can instrument deflection without gambling on high stakes decisions.
628
00:31:31,800 --> 00:31:35,400
And yes, this is where measurable reductions like the ticket volume drop happen,
629
00:31:35,400 --> 00:31:39,000
not from smarter language but from deterministic classification,
630
00:31:39,000 --> 00:31:44,400
routing and scripted resolution paths that don't require three humans to touch the same case.
631
00:31:44,400 --> 00:31:46,800
Onboarding orchestration is the one that exposes
632
00:31:46,800 --> 00:31:50,000
whether you actually built an agent work flow system
633
00:31:50,000 --> 00:31:53,400
or just a chat layer with some connectors because onboarding isn't one flow.
634
00:31:53,400 --> 00:31:57,000
It's a sequence of dependent actions across days, accounts, groups, licenses,
635
00:31:57,000 --> 00:32:00,000
equipment requests, training assignments, manager check-ins, and exceptions
636
00:32:00,000 --> 00:32:02,200
that arrive late and break assumptions.
637
00:32:02,200 --> 00:32:05,800
This is where state, id, and potency and retries stop being theoretical.
638
00:32:05,800 --> 00:32:09,800
If you can't survive a connector timeout without double provisioning a new hire,
639
00:32:09,800 --> 00:32:11,800
you haven't automated onboarding.
640
00:32:11,800 --> 00:32:15,600
You've created a denial of service against your own identity platform.
641
00:32:15,600 --> 00:32:16,600
So those are the three.
642
00:32:16,600 --> 00:32:18,400
Now here's what ties them together.
643
00:32:18,400 --> 00:32:22,400
The event, reasoning, orchestration, evidence pattern.
644
00:32:22,400 --> 00:32:24,200
Every workflow starts with an event.
645
00:32:24,200 --> 00:32:27,800
In screening, it's a resume intake or a recruiter request to shortlist.
646
00:32:27,800 --> 00:32:31,600
In triage, it's a ticket submission through Teams, email or a portal.
647
00:32:31,600 --> 00:32:35,000
In onboarding, it's an offer accepted signal from your ATS or ATRIS.
648
00:32:35,000 --> 00:32:37,000
The event creates a case record in Dataverse.
649
00:32:37,000 --> 00:32:38,600
That case id becomes your spine.
650
00:32:38,600 --> 00:32:39,600
No case, no control.
651
00:32:39,600 --> 00:32:40,600
Then reasoning.
652
00:32:40,600 --> 00:32:43,200
Copilot Studio takes the event, asks for missing parameters,
653
00:32:43,200 --> 00:32:47,000
applies the conversational policy and chooses the next allowed tool.
654
00:32:47,000 --> 00:32:49,600
Reasoning does not mean freeform thinking.
655
00:32:49,600 --> 00:32:51,600
Means selecting from constrained options.
656
00:32:51,600 --> 00:32:55,000
If the agent can't bind the request to a known rubric, a known routing map
657
00:32:55,000 --> 00:32:57,200
or a known onboarding stage, it escalates.
658
00:32:57,200 --> 00:32:59,200
Because a known state is where models improvise.
659
00:32:59,200 --> 00:33:01,400
Improvisation is not an HR strategy.
660
00:33:01,400 --> 00:33:04,200
Then orchestration logic apps executes the tools, paths,
661
00:33:04,200 --> 00:33:06,800
classifier, root, provision, notify, request approval.
662
00:33:06,800 --> 00:33:08,000
Each workflow is a tool.
663
00:33:08,000 --> 00:33:09,200
Each tool has a schema.
664
00:33:09,200 --> 00:33:11,600
Each tool runs under an identity boundary.
665
00:33:11,600 --> 00:33:14,800
This is where you enforce the difference between read and write operations
666
00:33:14,800 --> 00:33:20,200
and where you front load safety, validate inputs, enforce preconditions and stop on gates.
667
00:33:20,200 --> 00:33:22,200
The model doesn't get to try things.
668
00:33:22,200 --> 00:33:26,400
It gets to invoke permitted operations or it gets blocked and finally evidence.
669
00:33:26,400 --> 00:33:31,400
Every tool call, state transition, approval and override, writes an audit trail.
670
00:33:31,400 --> 00:33:38,400
Correlation IDs across copilot session, MCP invocation, logic app run and Dataverse records.
671
00:33:38,400 --> 00:33:42,400
That's how you survive audits and incident reviews without inventing a narrative.
672
00:33:42,400 --> 00:33:45,600
The common misunderstanding is thinking these are three separate builds.
673
00:33:45,600 --> 00:33:46,200
They're not.
674
00:33:46,200 --> 00:33:50,400
They are three applications of the same control plane with different risk tolerances
675
00:33:50,400 --> 00:33:55,200
and we start with candidate screening because it's the one that punishes weak governance immediately.
676
00:33:55,200 --> 00:34:00,400
Use case one candidate screening intake to short list candidate screening is where people
677
00:34:00,400 --> 00:34:04,200
quietly give the model too much authority because the volume feels unbearable
678
00:34:04,200 --> 00:34:07,400
and then they act surprised when they can't defend outcomes.
679
00:34:07,400 --> 00:34:09,800
So the first rule in this workflow is simple.
680
00:34:09,800 --> 00:34:11,400
Resumes are not documents.
681
00:34:11,400 --> 00:34:13,400
They are regulated data objects.
682
00:34:13,400 --> 00:34:14,600
Treat them like objects.
683
00:34:14,600 --> 00:34:20,000
Resume intake starts in SharePoint but not in a folder named Resumes Final Final 2.
684
00:34:20,000 --> 00:34:27,200
You need metadata discipline, requisition ID, role family, region, submission date, source channel
685
00:34:27,200 --> 00:34:29,400
and a stable candidate identifier.
686
00:34:29,400 --> 00:34:30,900
The reason is not organization.
687
00:34:30,900 --> 00:34:32,400
The reason is retrieval control.
688
00:34:32,400 --> 00:34:40,400
If copilot can retrieve from everything, it will and you've just turned your whole tenant into an unreviewed training set for your screening logic.
689
00:34:40,400 --> 00:34:43,600
So the tool boundary is the agent never pulls all resumes.
690
00:34:43,600 --> 00:34:49,600
It calls a logic app's tool that queries SharePoint by metadata filters and returns a bounded set of candidates and references.
691
00:34:49,600 --> 00:34:50,600
References not payloads.
692
00:34:50,600 --> 00:34:53,200
You don't pipe raw resumes through the chat layer.
693
00:34:53,200 --> 00:34:59,400
You return candidate IDs, file links and extracted structured fields if you've done passing inside the tool boundary.
694
00:34:59,400 --> 00:35:03,400
Next is criteria passing and this is where most teams sabotage themselves.
695
00:35:03,400 --> 00:35:07,000
They take a job description, throw it into the model and ask for a score out of 10.
696
00:35:07,000 --> 00:35:08,000
That's not screening.
697
00:35:08,000 --> 00:35:09,800
That's astrology with better typography.
698
00:35:09,800 --> 00:35:13,600
Instead, you translate job requirements into a structured rubric.
699
00:35:13,600 --> 00:35:16,200
Must haves, should haves and disqualifiers.
700
00:35:16,200 --> 00:35:19,600
Each rubric item has a label, a weight and a required evidence type.
701
00:35:19,600 --> 00:35:23,000
Evidence type matters because it forces specificity.
702
00:35:23,000 --> 00:35:32,000
Years of experience in X, certification Y, experience with tool Z, portfolio link, clearance level, whatever is job related.
703
00:35:32,000 --> 00:35:36,000
If it can't be expressed as a rubric item, it doesn't belong in automated scoring.
704
00:35:36,000 --> 00:35:39,800
That's how you keep the system from drifting into proxies. Now bias filtered scoring.
705
00:35:39,800 --> 00:35:42,000
This is not about pretending the system is moral.
706
00:35:42,000 --> 00:35:45,200
It's about preventing the system from using irrelevant correlates.
707
00:35:45,200 --> 00:35:47,400
Resumes contain proxy landmines.
708
00:35:47,400 --> 00:35:51,000
Names, graduation years, addresses, gaps, extracurriculars.
709
00:35:51,000 --> 00:35:54,000
Some of these are legitimate signals in specific contexts.
710
00:35:54,000 --> 00:35:57,200
Many are not, so you do a two-stage scoring pattern. Stage one.
711
00:35:57,200 --> 00:36:00,000
Redact or ignore obvious proxy fields before scoring.
712
00:36:00,000 --> 00:36:02,000
The agent doesn't need names to assess skills.
713
00:36:02,000 --> 00:36:04,400
It doesn't need addresses to assess qualifications.
714
00:36:04,400 --> 00:36:07,200
It doesn't need a graduation year to assess role fit.
715
00:36:07,200 --> 00:36:11,200
If you want location for relocation eligibility, that's a separate explicit field,
716
00:36:11,200 --> 00:36:13,000
not whatever the resume implies.
717
00:36:13,000 --> 00:36:17,600
Stage two require rationale per rubric item, not a narrative essay, a structured statement
718
00:36:17,600 --> 00:36:21,400
which excerpt or evidence supported the score linked back to the source reference.
719
00:36:21,400 --> 00:36:25,000
That gives you explainability without turning the workflow into a philosophy seminar.
720
00:36:25,000 --> 00:36:28,600
And because you are not trusting vibes, you store the scoring artifact in dataverse.
721
00:36:28,600 --> 00:36:32,600
Candidate ID, requisition ID, rubric version per item scores,
722
00:36:32,600 --> 00:36:35,000
rational pointers and a final recommendation status.
723
00:36:35,000 --> 00:36:36,400
That artifact is the output.
724
00:36:36,400 --> 00:36:38,000
The chat response is just the UI.
725
00:36:38,000 --> 00:36:40,400
Now here's where most people mess up, borderline cases.
726
00:36:40,400 --> 00:36:44,600
They either escalate everything which kills throughput or they escalate nothing,
727
00:36:44,600 --> 00:36:46,000
which kills defensibility.
728
00:36:46,000 --> 00:36:47,800
So define borderline explicitly.
729
00:36:47,800 --> 00:36:51,600
A practical definition is candidates that land in a scoring band
730
00:36:51,600 --> 00:36:55,400
where small rubric interpretation differences change the outcome.
731
00:36:55,400 --> 00:37:00,000
Or candidates missing one must have, but exceeding strongly in multiple should have.
732
00:37:00,000 --> 00:37:01,800
Your rubric can label these conditions.
733
00:37:01,800 --> 00:37:04,000
The workflow can detect them deterministically.
734
00:37:04,000 --> 00:37:05,600
Then you apply the confidence gate.
735
00:37:05,600 --> 00:37:09,800
If the workflow meets borderline conditions or the model confidence falls below your threshold,
736
00:37:09,800 --> 00:37:13,800
the agent stops and generates a review package, not the chat transcript, a package.
737
00:37:13,800 --> 00:37:16,800
Rubric results, evidence links and the proposed decision.
738
00:37:16,800 --> 00:37:18,200
Then hit the triggers.
739
00:37:18,200 --> 00:37:21,600
In the demo research, the hiring manager approval happens via email.
740
00:37:21,600 --> 00:37:24,800
That's fine as a channel as long as the approval becomes a record.
741
00:37:24,800 --> 00:37:27,800
The approval object in dataverse captures approval identity,
742
00:37:27,800 --> 00:37:30,000
decision, timestamp and reason code.
743
00:37:30,000 --> 00:37:33,200
If the manager overrides the recommendation, they must choose why.
744
00:37:33,200 --> 00:37:34,600
Not because you want to punish them,
745
00:37:34,600 --> 00:37:37,200
because you want to measure drift and prevent rubber stamping.
746
00:37:37,200 --> 00:37:40,200
Once approved, the agent can trigger the next actions.
747
00:37:40,200 --> 00:37:42,800
Schedule an interview, generate interview questions,
748
00:37:42,800 --> 00:37:45,000
or move the candidate to the next stage.
749
00:37:45,000 --> 00:37:47,000
Again, those actions should be tools.
750
00:37:47,000 --> 00:37:48,800
Invite candidate is a tool.
751
00:37:48,800 --> 00:37:50,000
Create questions is a tool.
752
00:37:50,000 --> 00:37:51,600
Update status is a tool.
753
00:37:51,600 --> 00:37:54,000
Each one has an input schema and a permission boundary.
754
00:37:54,000 --> 00:37:55,800
And the audit entry is automatic.
755
00:37:55,800 --> 00:37:59,400
Every time a candidate gets scored, escalated, approved or rejected,
756
00:37:59,400 --> 00:38:01,600
you write an evidence record.
757
00:38:01,600 --> 00:38:05,400
Correlation IDs from co-pilot session to MCP tool call to logic app
758
00:38:05,400 --> 00:38:06,800
run to dataverse transaction.
759
00:38:06,800 --> 00:38:09,800
This is the difference between explainability and storytelling.
760
00:38:09,800 --> 00:38:12,000
If someone asks, why did we reject this candidate?
761
00:38:12,000 --> 00:38:13,600
You don't answer with a paragraph.
762
00:38:13,600 --> 00:38:14,800
You answer with a record.
763
00:38:14,800 --> 00:38:17,600
So the outcome of this workflow is not a short list.
764
00:38:17,600 --> 00:38:20,000
The outcome is a short list you can defend.
765
00:38:20,000 --> 00:38:22,600
Buyers, fairness and explainability.
766
00:38:22,600 --> 00:38:24,400
What you can actually defend.
767
00:38:24,400 --> 00:38:26,800
Buyers doesn't show up as an evil line of code.
768
00:38:26,800 --> 00:38:29,200
It shows up as a system choosing shortcuts.
769
00:38:29,200 --> 00:38:32,000
And hiring systems love shortcuts because the data is messy
770
00:38:32,000 --> 00:38:35,000
and the pressure is high and everyone wants the queue to disappear.
771
00:38:35,000 --> 00:38:38,000
The thing most people miss is that bias is usually a proxy problem,
772
00:38:38,000 --> 00:38:39,400
not a prompt problem.
773
00:38:39,400 --> 00:38:43,200
The model rarely uses race and it uses variables that correlate with race.
774
00:38:43,200 --> 00:38:47,000
Zip codes, school names, employment gaps, graduation years,
775
00:38:47,000 --> 00:38:50,600
even certain role titles that are historically skewed by demographic patterns.
776
00:38:50,600 --> 00:38:53,000
If you let unstructured text drive scoring,
777
00:38:53,000 --> 00:38:57,200
the system will learn these correlations faster than you can write a policy memo about them.
778
00:38:57,200 --> 00:38:59,000
And yes, it also shows up as drift.
779
00:38:59,000 --> 00:39:00,000
Your rubric starts clean.
780
00:39:00,000 --> 00:39:02,000
Then a recruiter adds an exception.
781
00:39:02,000 --> 00:39:04,400
Then a hiring manager insists we need culture fit.
782
00:39:04,400 --> 00:39:06,800
Then someone adds a field for communication style.
783
00:39:06,800 --> 00:39:09,000
Then a new region uses different signals.
784
00:39:09,000 --> 00:39:12,200
Over time, the scoring criteria stops being job related
785
00:39:12,200 --> 00:39:15,600
and starts being an organizational mirror of whatever bias already exists.
786
00:39:15,600 --> 00:39:17,000
So what can you actually defend?
787
00:39:17,000 --> 00:39:19,400
You can defend repeatable criteria tied to the job.
788
00:39:19,400 --> 00:39:22,000
You can defend consistent treatment across candidates.
789
00:39:22,000 --> 00:39:24,600
You can defend oversight with recorded rationale.
790
00:39:24,600 --> 00:39:29,800
And you can defend monitoring that detects when the system starts behaving differently than intended.
791
00:39:29,800 --> 00:39:31,200
You cannot defend vibes.
792
00:39:31,200 --> 00:39:32,800
Start with structured scoring.
793
00:39:32,800 --> 00:39:35,200
The rubric is your defensibility artifact.
794
00:39:35,200 --> 00:39:36,400
It doesn't need to be perfect.
795
00:39:36,400 --> 00:39:37,800
It needs to be explicit.
796
00:39:37,800 --> 00:39:40,600
Must haves, should haves, disqualifiers and weights.
797
00:39:40,600 --> 00:39:42,400
And every score has to buy into evidence.
798
00:39:42,400 --> 00:39:43,600
Not the model thinks.
799
00:39:43,600 --> 00:39:44,400
Evidence.
800
00:39:44,400 --> 00:39:45,600
An excerpt reference.
801
00:39:45,600 --> 00:39:48,600
A portfolio link, a certification, a documented project.
802
00:39:48,600 --> 00:39:52,600
Something you can point at later without rerunning the model and getting a different answer.
803
00:39:52,600 --> 00:39:59,800
That alone knocks out a large class of bias because it forces the system to operate on job related signals instead of socially correlated noise.
804
00:39:59,800 --> 00:40:01,400
Now test outcomes, not intentions.
805
00:40:01,400 --> 00:40:04,200
A practical method here is disparate impact screening.
806
00:40:04,200 --> 00:40:08,200
The simplest lens people use is the 80% rule as a quick check.
807
00:40:08,200 --> 00:40:13,800
Compare selection rates across groups and flag when one groups rate drops below 80% of the highest groups rate.
808
00:40:13,800 --> 00:40:15,000
It's not a legal conclusion.
809
00:40:15,000 --> 00:40:16,000
It's a smoke alarm.
810
00:40:16,000 --> 00:40:19,800
If it triggers you investigate the rubric, the data, the workflow and the human overrides.
811
00:40:19,800 --> 00:40:21,400
You also run counterfactual checks.
812
00:40:21,400 --> 00:40:27,400
This clicked for a lot of teams when they realized you can change one attribute that shouldn't matter and see if the score shifts.
813
00:40:27,400 --> 00:40:31,800
Swap a name, remove a graduation year, replace and address with a neutral placeholder.
814
00:40:31,800 --> 00:40:35,000
If the scoring moves materially, you just found a proxy pathway.
815
00:40:35,000 --> 00:40:37,400
That doesn't mean the system is racist.
816
00:40:37,400 --> 00:40:41,000
It means your design allowed correlated inputs to influence outcomes.
817
00:40:41,000 --> 00:40:43,000
Then consistency verification.
818
00:40:43,000 --> 00:40:47,800
Take a sample of resumes and score them across rubric versions and across reviewers.
819
00:40:47,800 --> 00:40:53,400
If small wording changes in the job description cause large ranking shifts, your rubric is underspecified.
820
00:40:53,400 --> 00:40:59,800
If different reviewers override in opposite directions with no consistent reason codes, your human oversight is theater.
821
00:40:59,800 --> 00:41:03,400
And if override rate is near zero, assume deference, not perfection.
822
00:41:03,400 --> 00:41:05,000
Because the human factor is brutal.
823
00:41:05,000 --> 00:41:09,400
When you put an AI recommendation next to a human in a busy workflow, you create authority bias.
824
00:41:09,400 --> 00:41:11,400
People defer. They rub a stamp.
825
00:41:11,400 --> 00:41:16,200
That's why you require reason codes and justification on overrides and approvals.
826
00:41:16,200 --> 00:41:20,600
Because it's fun. Because it forces cognitive engagement and it gives you signals you can measure.
827
00:41:20,600 --> 00:41:23,800
Now explainability. Explainability is not the model's chain of thought.
828
00:41:23,800 --> 00:41:27,800
Now, don't store it. Don't rely on it and don't pretend it's the rationale.
829
00:41:27,800 --> 00:41:31,000
Explainability is what rubric items drove the outcome.
830
00:41:31,000 --> 00:41:32,600
What evidence supported each item.
831
00:41:32,600 --> 00:41:35,800
What sources were used and what human interventions changed the path.
832
00:41:35,800 --> 00:41:40,600
That's it. If the agent can produce that, you can defend the process as a control decision workflow.
833
00:41:40,600 --> 00:41:44,200
If it can't, it's just generating text and hoping the audience trusts it.
834
00:41:44,200 --> 00:41:47,400
And vendor reality responsibility stays with the deployer.
835
00:41:47,400 --> 00:41:52,200
Even if the resume passing comes from a third party, even if the scoring model is industry standard,
836
00:41:52,200 --> 00:41:54,600
you own the outcome because you operationalized it.
837
00:41:54,600 --> 00:41:59,000
Contracts don't absorb liability. They just spread blame and blame is not a control.
838
00:41:59,000 --> 00:42:01,200
So the defensible posture is a cycle.
839
00:42:01,200 --> 00:42:06,200
Structured rubric, proxy minimization, outcome monitoring and periodic sampling reviews,
840
00:42:06,200 --> 00:42:09,000
not annual panic, a rhythm.
841
00:42:09,000 --> 00:42:12,600
And once you have that rhythm, you can safely move to the lower risk,
842
00:42:12,600 --> 00:42:16,600
higher ROI deployment pattern, HR ticket triage.
843
00:42:16,600 --> 00:42:20,200
Use case two, HR ticket triage, deflection without chaos.
844
00:42:20,200 --> 00:42:23,400
Ticket triage is where most HR teams actually bleed time.
845
00:42:23,400 --> 00:42:26,200
Not because the questions are hard, because the intake is messy,
846
00:42:26,200 --> 00:42:29,400
the routing is emotional and the same request gets touched by three people
847
00:42:29,400 --> 00:42:31,400
before anyone decides who owns it.
848
00:42:31,400 --> 00:42:34,600
So the goal here isn't build a helpful HR chatbot.
849
00:42:34,600 --> 00:42:36,800
The goal is deterministic deflection.
850
00:42:36,800 --> 00:42:40,000
The system takes a request, classifies it into a lane you control,
851
00:42:40,000 --> 00:42:44,200
resolves what it can safely resolve and escalates the rest with an evidence package.
852
00:42:44,200 --> 00:42:48,400
Start with intake. You need one entry point, even if you accept many channels.
853
00:42:48,400 --> 00:42:53,600
Teams message form portal, email, find, but every pathway must normalize into a case object.
854
00:42:53,600 --> 00:42:57,600
A single schema ticket ID, request identity, category candidates,
855
00:42:57,600 --> 00:43:00,600
region, urgency and whatever minimal context you allow.
856
00:43:00,600 --> 00:43:03,600
If you don't normalize, you can't measure, if you can't measure, you can't scale.
857
00:43:03,600 --> 00:43:06,000
Copilot Studio then does what it's actually good at.
858
00:43:06,000 --> 00:43:09,600
It runs the conversation to fill missing parameters and prevent garbage in.
859
00:43:09,600 --> 00:43:13,600
If the user writes benefits, the agent asks one clarifying question, not six.
860
00:43:13,600 --> 00:43:15,800
If it's time off, it asks jurisdiction.
861
00:43:15,800 --> 00:43:18,600
If it's payroll, it asks pay period.
862
00:43:18,600 --> 00:43:21,200
This is where you reduce entropy before you automate.
863
00:43:21,200 --> 00:43:22,400
Then classification.
864
00:43:22,400 --> 00:43:26,000
Most teams create a general HR bucket because it feels flexible.
865
00:43:26,000 --> 00:43:28,200
It isn't. It's where triage goes to die.
866
00:43:28,200 --> 00:43:31,200
Instead you map topics to tool groups and lanes.
867
00:43:31,200 --> 00:43:35,000
Benefits questions go to benefits knowledge and benefits workflows.
868
00:43:35,000 --> 00:43:37,800
Time off requests go to leave policies and leave tools.
869
00:43:37,800 --> 00:43:41,200
Payroll issues go to payroll routing and a higher default hitl profile.
870
00:43:41,200 --> 00:43:46,800
Employee relations items get flagged as high sensitivity and routed to humans by design.
871
00:43:46,800 --> 00:43:49,200
The agent shouldn't be allowed to improvise categories.
872
00:43:49,200 --> 00:43:54,400
It chooses from the set you own and it must attach a confidence score and a rationale for the classification.
873
00:43:54,400 --> 00:43:56,400
Once you nail that, everything else clicks.
874
00:43:56,400 --> 00:43:58,200
Rooting becomes deterministic.
875
00:43:58,200 --> 00:44:01,200
Logic apps takes the classified case and applies rooting rules.
876
00:44:01,200 --> 00:44:03,200
Queue assignment by category and region.
877
00:44:03,200 --> 00:44:05,800
Priority rules for specific keywords or metadata.
878
00:44:05,800 --> 00:44:09,000
SLA timers, escalation targets and here's the key.
879
00:44:09,000 --> 00:44:11,000
The routing is not AI magic.
880
00:44:11,000 --> 00:44:12,600
It's a mapping table you can review.
881
00:44:12,600 --> 00:44:16,000
If the agent predicts category A, the workflow roots to QA.
882
00:44:16,000 --> 00:44:18,800
If category is uncertain or the case hits a risk trigger,
883
00:44:18,800 --> 00:44:22,200
the workflow roots to a human triage Q with a clear reason code.
884
00:44:22,200 --> 00:44:27,400
Tier one auto resolution is where you get the ROI and it's also where people accidentally create chaos.
885
00:44:27,400 --> 00:44:31,600
Auto resolution doesn't mean the model writes a confident paragraph and closes the ticket.
886
00:44:31,600 --> 00:44:35,800
It means use grounded knowledge plus deterministic actions where allowed.
887
00:44:35,800 --> 00:44:40,200
Reset a password, that's an IT flow, not HR, but you get the idea, update and address,
888
00:44:40,200 --> 00:44:43,000
that might be a controlled HRS write with approvals.
889
00:44:43,000 --> 00:44:44,400
Where do I find the policy?
890
00:44:44,400 --> 00:44:47,000
That's a link plus the policy version reference.
891
00:44:47,000 --> 00:44:48,600
How many leave days do I have?
892
00:44:48,600 --> 00:44:52,200
That's a read operation that returns a number, not a hallucinated answer.
893
00:44:52,200 --> 00:44:53,200
So the play is simple.
894
00:44:53,200 --> 00:44:56,000
For tier one categories, your agent can do two things.
895
00:44:56,000 --> 00:44:59,000
Provide an answer grounded in approved knowledge sources.
896
00:44:59,000 --> 00:45:03,200
An executor bounded action through a tool when the action is low risk and reversible.
897
00:45:03,200 --> 00:45:04,400
Anything else poses.
898
00:45:04,400 --> 00:45:07,800
This is where the title pattern becomes operational, not philosophical.
899
00:45:07,800 --> 00:45:11,400
Complex cases don't get escalated by dumping chat history into an email.
900
00:45:11,400 --> 00:45:15,800
They get escalated with a context package, the normalized case object classification confidence,
901
00:45:15,800 --> 00:45:18,800
the knowledge sources consulted, the proposed next action,
902
00:45:18,800 --> 00:45:20,800
and the exact reason it couldn't auto resolve.
903
00:45:20,800 --> 00:45:24,000
That package gets written to dataverse and routed to HR ops,
904
00:45:24,000 --> 00:45:26,200
so the human starts with structure, not archaeology.
905
00:45:26,200 --> 00:45:27,800
Now, the uncomfortable truth.
906
00:45:27,800 --> 00:45:30,600
Triage will expose your knowledge boundaries fast.
907
00:45:30,600 --> 00:45:33,000
If your knowledge base contains outdated policies,
908
00:45:33,000 --> 00:45:35,000
the agent will confidently cite them.
909
00:45:35,000 --> 00:45:37,000
If your SharePoint permissions are sloppy,
910
00:45:37,000 --> 00:45:40,200
the agent will surface sensitive content to the wrong people.
911
00:45:40,200 --> 00:45:43,400
If your categories are vague, the agent will root unpredictably.
912
00:45:43,400 --> 00:45:45,800
And if your workflows can't log safely,
913
00:45:45,800 --> 00:45:49,400
you'll end up with PII sitting in run history like a self-inflicted breach,
914
00:45:49,400 --> 00:45:52,200
so you treat ticket triage as a controlled system.
915
00:45:52,200 --> 00:45:56,200
Bounded inputs, bounded tool calls, bounded outputs, and full evidence.
916
00:45:56,200 --> 00:45:58,600
You measure it with metrics that matter.
917
00:45:58,600 --> 00:46:01,400
Deflection rate versus resolution rate, escalation rate,
918
00:46:01,400 --> 00:46:03,800
re-open rate, and human touches per case.
919
00:46:03,800 --> 00:46:06,400
And your review overrides when humans reclassify,
920
00:46:06,400 --> 00:46:08,400
when they reopen, when they change the lane.
921
00:46:08,400 --> 00:46:11,400
Those are drift signals, fix the mapping, tighten the knowledge scope,
922
00:46:11,400 --> 00:46:12,600
or add a gate.
923
00:46:12,600 --> 00:46:15,000
That's how you get deflection without chaos,
924
00:46:15,000 --> 00:46:16,400
not by trusting the model.
925
00:46:16,400 --> 00:46:19,600
By enforcing the workflow, measuring ticket reduction,
926
00:46:19,600 --> 00:46:22,000
what actually drives the 44%.
927
00:46:22,000 --> 00:46:29,200
When someone claims we reduced HR tickets by 44%, the first job is to ask what they mean by reduced.
928
00:46:29,200 --> 00:46:31,800
Because there are two numbers that get confused on purpose.
929
00:46:31,800 --> 00:46:33,400
Deflection and resolution.
930
00:46:33,400 --> 00:46:36,200
Deflection means the employee never created a ticket,
931
00:46:36,200 --> 00:46:38,400
or the ticket never reached a human queue.
932
00:46:38,400 --> 00:46:41,800
Resolution means a ticket existed and eventually got closed.
933
00:46:41,800 --> 00:46:44,400
You can inflate one while the other quietly gets worse.
934
00:46:44,400 --> 00:46:47,400
If the agent answers in chat and employees still open tickets
935
00:46:47,400 --> 00:46:50,000
because they don't trust it, you didn't reduce anything.
936
00:46:50,000 --> 00:46:52,400
You just added a new front door to the same backlog.
937
00:46:52,400 --> 00:46:55,400
So the measurement model needs to start with a strict funnel.
938
00:46:55,400 --> 00:46:56,600
Intake volume.
939
00:46:56,600 --> 00:47:01,000
Normalized cases created, cases routed to a human lane, cases auto resolved,
940
00:47:01,000 --> 00:47:03,200
cases reopened, cases escalated,
941
00:47:03,200 --> 00:47:07,200
and then the one metric that exposes operational truth, human touches per case.
942
00:47:07,200 --> 00:47:12,400
If your AI triage still requires two humans to read, interpret, and forward the same request,
943
00:47:12,400 --> 00:47:14,800
you just replaced manual work with slower manual work.
944
00:47:14,800 --> 00:47:18,600
Now, what actually drives a ticket reduction claim in a system like this?
945
00:47:18,600 --> 00:47:20,800
It's not the language model being smarter.
946
00:47:20,800 --> 00:47:22,000
It's three levers.
947
00:47:22,000 --> 00:47:26,000
Classification quality, knowledge boundaries, and action capability.
948
00:47:26,000 --> 00:47:30,000
Classification quality comes first because it determines everything downstream.
949
00:47:30,000 --> 00:47:34,000
If the agent can reliably map requests into a small set of well-owned lanes
950
00:47:34,000 --> 00:47:35,800
you stop burning time on triage ping pong.
951
00:47:35,800 --> 00:47:39,400
You also reduce rework because the right team gets the case the first time.
952
00:47:39,400 --> 00:47:41,400
This is where you measure precision, not vibes.
953
00:47:41,400 --> 00:47:45,400
How often did the initial category match the final category after human review?
954
00:47:45,400 --> 00:47:47,000
How often did it hit the right queue?
955
00:47:47,000 --> 00:47:53,000
How often did the agent ask for one clarifying parameter instead of dumping the request into other?
956
00:47:53,000 --> 00:47:54,800
Knowledge boundaries come next.
957
00:47:54,800 --> 00:47:57,400
People love to talk about rag, like it's a strategy.
958
00:47:57,400 --> 00:47:59,400
It isn't. Rag is a retrieval mechanism.
959
00:47:59,400 --> 00:48:03,600
The strategy is what sources are allowed, how they're filtered, and how they're versed.
960
00:48:03,600 --> 00:48:07,800
If the agent can only ground answers in curated HR policy sources,
961
00:48:07,800 --> 00:48:11,000
by region, by employee type, by current effective date,
962
00:48:11,000 --> 00:48:16,000
you reduce the number of tickets created for, where is the policy and what's the process questions.
963
00:48:16,000 --> 00:48:20,000
But if it retrieves from random sharepoint sites, you increase tickets
964
00:48:20,000 --> 00:48:23,400
because employees will immediately escalate when they see conflicting answers,
965
00:48:23,400 --> 00:48:24,600
that's not an AI problem.
966
00:48:24,600 --> 00:48:27,200
That's your information architecture showing up in public.
967
00:48:27,200 --> 00:48:31,600
Action capability is the third lever, and it's the one that creates real deflection.
968
00:48:31,600 --> 00:48:33,600
Not answers. Actions.
969
00:48:33,600 --> 00:48:38,000
When the agent can complete a bounded low-risk task, submit a request, update a field,
970
00:48:38,000 --> 00:48:41,000
create a case with the right metadata, route it correctly,
971
00:48:41,000 --> 00:48:43,400
employees stop opening tickets as a workaround.
972
00:48:43,400 --> 00:48:48,400
This is also why logic apps and MCP tools matter. They turn chat into completion.
973
00:48:48,400 --> 00:48:50,400
Now here's where most people mess up the measurement.
974
00:48:50,400 --> 00:48:55,400
They report automation rate using the agent's activity logs, which mostly measure sessions.
975
00:48:55,400 --> 00:48:59,400
Sessions don't equal outcomes. You want outcome metrics that punish bad automation.
976
00:48:59,400 --> 00:49:04,400
So track deflection rate as percentage of intents that end without a human ticket touch.
977
00:49:04,400 --> 00:49:08,800
Track auto-resolve rate as percentage of cases closed with no human intervention.
978
00:49:08,800 --> 00:49:13,600
Track escalation rate as percentage that hit hit-y-all gates or high-risk categories.
979
00:49:13,600 --> 00:49:17,800
Track reopen rate as percentage reopened within a defined window.
980
00:49:17,800 --> 00:49:23,000
Track satisfaction separately because low satisfaction with high deflection is just silent failure.
981
00:49:23,000 --> 00:49:25,600
And then track the failure signals that predict collapse.
982
00:49:25,600 --> 00:49:31,000
If escalation's rise week over week classification is drifting or knowledge is stale.
983
00:49:31,000 --> 00:49:36,200
If satisfaction drops while deflection rises, employees are getting wrong answers but giving up.
984
00:49:36,200 --> 00:49:41,200
If retries and repeat questions increase, the agent is not completing tasks. It's looping.
985
00:49:41,200 --> 00:49:45,200
And if override frequency is near zero, you probably trained humans to defer,
986
00:49:45,200 --> 00:49:48,200
which means you're accumulating risk not reducing workload.
987
00:49:48,200 --> 00:49:52,200
When you connect those metrics back to design choices, the story becomes boring.
988
00:49:52,200 --> 00:49:54,200
Good, boring means deterministic.
989
00:49:54,200 --> 00:49:58,200
Confidence gates reduce wasted human time by rooting only edge cases.
990
00:49:58,200 --> 00:50:03,200
Deterministic rooting rules reduce triage ping pong, knowledge scoping reduces conflicting guidance.
991
00:50:03,200 --> 00:50:08,200
Scripted tier one actions eliminate, please forward this to workflows.
992
00:50:08,200 --> 00:50:12,800
The 44% is not magic. It's what happens when you stop treating HR support as a conversation
993
00:50:12,800 --> 00:50:14,800
and start treating it as a controlled system.
994
00:50:14,800 --> 00:50:17,800
And if you can't measure it with those levers, you don't have a reduction.
995
00:50:17,800 --> 00:50:21,200
You have marketing. Use case three. Intelligence onboarding.
996
00:50:21,200 --> 00:50:23,200
Offer accepted to day 30.
997
00:50:23,200 --> 00:50:27,200
Onboarding is where most agents go to die because onboarding is not a question and answer problem.
998
00:50:27,200 --> 00:50:31,800
It's a long running orchestration problem with dependencies, delays and irreversible side effects.
999
00:50:31,800 --> 00:50:36,800
And the trigger matters if your onboarding starts because someone posted new hire starting Monday in teams.
1000
00:50:36,800 --> 00:50:38,800
You don't have an onboarding workflow.
1001
00:50:38,800 --> 00:50:41,800
You have a ritual. The trigger needs to be an actual event.
1002
00:50:41,800 --> 00:50:46,800
Offer accepted in your ATS, a row created in your HR is a status change in a hiring system.
1003
00:50:46,800 --> 00:50:49,800
Something durable, something you can replay, something you can audit.
1004
00:50:49,800 --> 00:50:55,800
That event creates a case record in data verse with a daytime stamp, owner, region, role family and a stage checklist.
1005
00:50:55,800 --> 00:50:58,800
From there, Copilot Studio does the conversational work.
1006
00:50:58,800 --> 00:51:02,800
It fills in the missing parameters without turning onboarding into a form.
1007
00:51:02,800 --> 00:51:07,800
Start date, manager, location, equipment needs, any conditional parts like contractor versus employee.
1008
00:51:07,800 --> 00:51:11,800
And any regional policy flags that change what standard onboarding even means.
1009
00:51:11,800 --> 00:51:14,800
Then it hands execution to logic apps through MCP tools.
1010
00:51:14,800 --> 00:51:17,800
And this is where you stop being cute and start being reliable.
1011
00:51:17,800 --> 00:51:25,800
First, provisioning via Microsoft Graph, account creation, group membership, license assignment, mailbox provisioning, teams enablement, whatever your org does.
1012
00:51:25,800 --> 00:51:30,800
The architectural point is not that Graph exists. The point is that these are right operations with blast radius.
1013
00:51:30,800 --> 00:51:34,800
So they are gated, idempotent and least privileged.
1014
00:51:34,800 --> 00:51:36,800
Provision identity is not one tool.
1015
00:51:36,800 --> 00:51:45,800
It's a tool group with separation, one tool to create the account, one tool to assign baseline groups, one tool to request elevated access, one tool to validate the final state.
1016
00:51:45,800 --> 00:51:49,800
Each tool runs under a managed identity with scoped permissions.
1017
00:51:49,800 --> 00:51:57,800
No shared connections, no HR automation service account that quietly becomes the most privileged identity in the tenant.
1018
00:51:57,800 --> 00:52:03,800
And yes, you put approvals where the damage is expensive, assigning a standard M365 license might be auto.
1019
00:52:03,800 --> 00:52:06,800
Assigning access to finance systems or HR systems should not be.
1020
00:52:06,800 --> 00:52:11,800
That's a hitlegate by risk category, not by feelings, second training assignments.
1021
00:52:11,800 --> 00:52:14,800
Most organizations handle this with an email and hope.
1022
00:52:14,800 --> 00:52:17,800
That's not onboarding, that's outsourcing accountability to outlook.
1023
00:52:17,800 --> 00:52:24,800
So the agent creates structured training tasks, role-based learning paths, compliance modules and deadlines.
1024
00:52:24,800 --> 00:52:29,800
It stores each assignment as a stateful milestone in dataverse assigned due date, completion status, evidence link.
1025
00:52:29,800 --> 00:52:37,800
The completion signal can come from your LMS connector, a file submission or a simple acknowledgement flow, but it must update state, not just send reminders.
1026
00:52:37,800 --> 00:52:44,800
Because if you can't query who is behind on onboarding milestones, you don't have an onboarding system, you have noise.
1027
00:52:44,800 --> 00:52:49,800
Third, scheduling check-ins. This is where agents look impressive in demos and disappointing in reality.
1028
00:52:49,800 --> 00:52:52,800
Scheduling isn't hard, scheduling with constraints is hard.
1029
00:52:52,800 --> 00:52:55,800
So you treat check-ins as tasks, not calendar magic.
1030
00:52:55,800 --> 00:53:02,800
The agent creates a manager task, schedule day 7 check-in, schedule day 30 review, confirm equipment received, confirm access works.
1031
00:53:02,800 --> 00:53:06,800
If you can automate calendar invites through graph, fine.
1032
00:53:06,800 --> 00:53:11,800
But the system still tracks completion as state, because calendar invites do not equal completed onboarding.
1033
00:53:11,800 --> 00:53:17,800
And now the part people get wrong, adaptive nudges. This is not surveillance, it is not monitor their team's messages.
1034
00:53:17,800 --> 00:53:24,800
You don't need psychometrics, you need basic workflow signals, training in complete, no check-in scheduled, provisioning failed, equipment ticket unresolved.
1035
00:53:24,800 --> 00:53:27,800
Those are objective indicators that onboarding is drifting.
1036
00:53:27,800 --> 00:53:31,800
So the agent watches milestone state. If a milestone is late, it nudges the owner.
1037
00:53:31,800 --> 00:53:35,800
The manager. HR ops IT onboarding, whoever owns that step.
1038
00:53:35,800 --> 00:53:39,800
And it nudges with context, what's missing, what tool call failed, what the next action is.
1039
00:53:39,800 --> 00:53:44,800
Not hey, just checking in. That's how you create notification fatigue and then everyone turns it off.
1040
00:53:44,800 --> 00:53:48,800
Now the reason this use case matters is that it exposes orchestration reality.
1041
00:53:48,800 --> 00:53:51,800
Provisioning actions can't run twice without causing damage.
1042
00:53:51,800 --> 00:53:55,800
Training assignments can't get duplicated because the employee changes managers.
1043
00:53:55,800 --> 00:53:58,800
Check-ins can't be best effort if you want consistent time to productivity.
1044
00:53:58,800 --> 00:54:02,800
So the workflow must be durable across days. It must survive connector timeouts.
1045
00:54:02,800 --> 00:54:06,800
It must resume after human approvals. It must log every state transition.
1046
00:54:06,800 --> 00:54:11,800
And it must be able to answer at any time. What stage is this person in? What has been done?
1047
00:54:11,800 --> 00:54:15,800
What is blocked and who owns the block? That's what offer accepted today 30 actually means.
1048
00:54:15,800 --> 00:54:23,800
Not a chatbot that says welcome to the company. A govern system that turns onboarding into a deterministic sequence of actions with evidence.
1049
00:54:23,800 --> 00:54:27,800
And if you can do that, you've proven the platform can handle the worst kind of HR automation.
1050
00:54:27,800 --> 00:54:30,800
Long running, cross system and full of edge cases.
1051
00:54:30,800 --> 00:54:33,800
Now we deal with the uncomfortable mechanics that make it survive.
1052
00:54:33,800 --> 00:54:36,800
State, Retrieves, Identity and Drift.
1053
00:54:36,800 --> 00:54:40,800
Orchestration. Reality. Reliability patterns for agentic HR.
1054
00:54:40,800 --> 00:54:42,800
Now it gets unglamorous.
1055
00:54:42,800 --> 00:54:47,800
Onboarding, ticket routing and screening workflows don't fail because the model hallucinates.
1056
00:54:47,800 --> 00:54:50,800
They fail because distributed systems behave like distributed systems.
1057
00:54:50,800 --> 00:54:54,800
Partial failure, duplicate events, timeouts, retries and human delays.
1058
00:54:54,800 --> 00:54:58,800
And the moment you let an LLM sit on top of that without hard reliability patterns,
1059
00:54:58,800 --> 00:55:02,800
you've built a probabilistic control plane for deterministic work.
1060
00:55:02,800 --> 00:55:06,800
So start with the dampency because HR workflows love doing the same damage twice.
1061
00:55:06,800 --> 00:55:08,800
Provisioning is the obvious example.
1062
00:55:08,800 --> 00:55:12,800
If create user runs twice, you don't get two users. You get a mess.
1063
00:55:12,800 --> 00:55:17,800
Conflicting objects, partial licenses, inconsistent group membership and cleanup work that becomes a shadow project.
1064
00:55:17,800 --> 00:55:23,800
The fix is not be careful. The fix is to design every right tool to be idempotent by default.
1065
00:55:23,800 --> 00:55:27,800
Accept a stable key. Check current state first and only apply deltas.
1066
00:55:27,800 --> 00:55:32,800
That stable key should be something like a candidate id, employee id or a composite key you control.
1067
00:55:32,800 --> 00:55:37,800
Not whatever name came from the resume. Names are not identifiers, their noise.
1068
00:55:37,800 --> 00:55:42,800
So each tool call needs a precondition check. Does the record already exist? Is it in the expected state?
1069
00:55:42,800 --> 00:55:45,800
And is this transition allowed? If the answer is no, you stop and escalate.
1070
00:55:45,800 --> 00:55:49,800
The agent doesn't improvise close enough state transitions. It can't see the consequences.
1071
00:55:49,800 --> 00:55:52,800
Next is retries and dead lettering. Retries are not reliability.
1072
00:55:52,800 --> 00:55:56,800
Retries are how you turn transient failures into duplicate side effects.
1073
00:55:56,800 --> 00:56:02,800
That's why you pair retries with id impotency and why you distinguish transient errors from business rule failures.
1074
00:56:02,800 --> 00:56:08,800
A connector timeout, retry with back off, a 429 throttle, retry with back off and jitter.
1075
00:56:08,800 --> 00:56:12,800
A validation failure like manager id missing or region policy mismatch.
1076
00:56:12,800 --> 00:56:17,800
Don't retry. Create a deterministic failure state, root to hitle and wait for human input.
1077
00:56:17,800 --> 00:56:20,800
And when retries finally give up, you need a dead letter path.
1078
00:56:20,800 --> 00:56:32,800
Not send an email to the builder. An actual queue or case state that says, "This workflow instance is blocked. Here's the error. Here's the context package. And here's who owns the fix. HR doesn't need heroics. It needs predictable recovery."
1079
00:56:32,800 --> 00:56:36,800
Now here's where most people mess up the architecture. They couple protocol to processing.
1080
00:56:36,800 --> 00:56:44,800
They build HTTP request response flows that do heavy work in line because the demo looks clean. But HR work isn't always fast and it's rarely reliable end to end.
1081
00:56:44,800 --> 00:56:52,800
So split it. Use request response only to acknowledge receipt and create the durable case record. Then push the heavy work into asynchronous processing.
1082
00:56:52,800 --> 00:57:02,800
Cues, long running orchestrations and stepwise state transitions. That's how you avoid timeouts, how you absorb spikes and how you stop the client from becoming the reliability boundary. And yes, HR has spikes.
1083
00:57:02,800 --> 00:57:11,800
Monday mornings. Open enrollment. New hire waves, policy changes. The system doesn't care that your workload is seasonal. It will fail anyway. Design for it.
1084
00:57:11,800 --> 00:57:21,800
Tool throttles and connector limits are the other silent killer. Every connector has quotas, latency variation and failure modes that show up only when you scale. Your demo will work.
1085
00:57:21,800 --> 00:57:30,800
Your production run will hit rate limits and stall mid-process, leaving partially completed onboarding or half-routed tickets. So you need explicit throttling strategy.
1086
00:57:30,800 --> 00:57:40,800
Limit concurrency, batchware appropriate and prefer event-driven triggers over constant polling when the source supports it. Where it doesn't, you still design for duplicate events because polling will deliver them.
1087
00:57:40,800 --> 00:57:48,800
Now state ownership dot co pilot studio is not a system of record. The chat history is not a system of record. Logic app run history is not a system of record.
1088
00:57:48,800 --> 00:58:03,800
Dataverse is your durable workflow state. The case, the milestones, the approval objects, the artifacts, the correlation IDs. That means every long running workflow reads state before acting, writes state after acting and treats state transitions as the source of truth.
1089
00:58:03,800 --> 00:58:14,800
The agent restarts, it doesn't remember, it rehydrates from dataverse and continues. That's what makes it survivable and finally drift. The workflow you build today is not the workflow you'll run in six months.
1090
00:58:14,800 --> 00:58:22,800
Someone will change a rubric, someone will add a ticket category, someone will update on boarding steps, entropy accumulates. So make drift explicit.
1091
00:58:22,800 --> 00:58:33,800
Version your rubrics, version your tool schemers and store the version used on every case record. If you can't say this decision used rubric v3, you can't explain behavior changes without rewriting history.
1092
00:58:33,800 --> 00:58:44,800
Reliability in agentech r is not uptime. It's controlled repetition, safe retreat, safe duplicates, safe delays and recoverable failure. Without that, your agent doesn't scale, it just fails faster.
1093
00:58:44,800 --> 00:59:00,800
Reproducible blueprint, build order and guardrails. Now the build plan, not a journey, a sequence that produces working outcomes with guardrails that stop you from creating a clever prototype that collapses the first time HR hits a busy Monday. Start with build order and accept the risk gradient.
1094
00:59:00,800 --> 00:59:14,800
Ticket triage goes first, it's high volume, mostly reversible and it forces you to build the control plane, case normalization, deterministic routing, knowledge boundaries and hitle for sensitive categories. If your stack can't survive triage, it won't survive hiring.
1095
00:59:14,800 --> 00:59:24,800
Onboarding goes second, it teaches durability, long running state, retreats without double provisioning, approval weights and the unpleasant reality that connectors fail more often in production than in demos.
1096
00:59:24,800 --> 00:59:38,800
If you can't do idempotency and state ownership here, you're not ready to let the agent touch identity. Candidate screening goes last, not because it's hard technically but because it's hard operationally. It's where bias and explainability get externalized into audits, complaints and litigation risk.
1097
00:59:38,800 --> 00:59:49,800
You build it after you already know your logging, gating and identity boundaries work. Now environment model. If you build in one environment and publish to prod by exporting a zip file, you don't have an architecture. You have a hobby.
1098
00:59:49,800 --> 00:59:59,800
You need dev, test and prod with policy parity, same connectors, same network constraints, same key vault patterns, same logging settings, different data, same controls.
1099
00:59:59,800 --> 01:00:08,800
That's the only way you stop, it worked in my tenant from becoming your post-incident slogan. And yes, that includes copilot studio environments and power platform governance, not just Azure.
1100
01:00:08,800 --> 01:00:17,800
Agents drift when environments drift, you want drift to be intentional, versioned and reviewed. Identity model next because this is where HR agents quietly become security debt.
1101
01:00:17,800 --> 01:00:28,800
Every workflow tool gets its own managed identity, not one identity for HR automation, not one shared connection that accumulates privileges over time, per tool, per workflow, scope permissions.
1102
01:00:28,800 --> 01:00:36,800
And where you need user context, you use on behalf of patterns deliberately, not accidentally. Then you separate roles across the human side.
1103
01:00:36,800 --> 01:00:48,800
Recruiters, HR ops and platform admins, recruiters can initiate and review. HR ops can approve policy exceptions, platform admins can expand tool access. Nobody gets the combined power to change the tool set and approve the outcomes.
1104
01:00:48,800 --> 01:00:56,800
That's not governance, that's a future incident with a single name on it. Now data model, this is where teams sabotage themselves by stuffing PII into prompts because it's convenient.
1105
01:00:56,800 --> 01:01:07,800
Don't, minimal PII in prompts, use IDs and retrieval with permissions. The agent should reason over structured fields and reference documents, not raw resumes and medical notes pasted into a chat window.
1106
01:01:07,800 --> 01:01:18,800
In practice, that means data verse holds case state and structured artifacts. Rubric scores, ticket categories, onboarding milestones, approval objects and evidence links, sharepoint holds documents with metadata.
1107
01:01:18,800 --> 01:01:30,800
Logic apps tools retrieve what's needed when it's needed under an identity boundary. The chat layer stays thin, guardrails come next. The non-negotiables that prevent capability sprawl, one define the action space upfront.
1108
01:01:30,800 --> 01:01:38,800
List the allowed tools, the allowed operations and the prohibited actions. Can read policies is not the same as can update HRIS records.
1109
01:01:38,800 --> 01:01:50,800
Can schedule interviews is not the same as can generate and send offer letters without review. If you don't define action space, the agent will expand it through requests, exceptions and make a creativity. Entropy always wins.
1110
01:01:50,800 --> 01:02:01,800
Two implement confidence gates and risk gates as defaults. Low confidence classification routes to humans, high risk categories, route to humans, any right to identity or access routes through approvals.
1111
01:02:01,800 --> 01:02:07,800
You don't negotiate this in production, you implement it as the system's behavior. Three observability is mandatory before scale.
1112
01:02:07,800 --> 01:02:18,800
Correlation IDs across copilot, mcp, logic apps and dataverse. Secure inputs and outputs where PI exists, retention as a policy decision. If you can't reconstruct the case end to end, you don't scale, you stop.
1113
01:02:18,800 --> 01:02:22,800
And now governance rhythm because governance is not a document, it's a calendar.
1114
01:02:22,800 --> 01:02:28,800
Monthly. Review KPI is that expose failure, not vanity adoption metrics.
1115
01:02:28,800 --> 01:02:38,800
Escalation rates, override frequency, reopen rates, unresolved sessions and latency. Quarterly sample decisions for fairness and consistency and screening and run outcome checks where you have the data to do it.
1116
01:02:38,800 --> 01:02:47,800
And always, incident playbooks that include a kill switch, disable tools fast, restore after review. That's how you prevent a bad change from becoming a month long cleanup.
1117
01:02:47,800 --> 01:02:56,800
If you implement this blueprint, you get something rare. An HR agent system that behaves like an enterprise system, deterministic where it must be, probablyistic only where humans can correct it.
1118
01:02:56,800 --> 01:03:09,800
Govnt by design, not by hope. ROI and KPI is the only numbers that matter. Now the numbers, not because spreadsheets are fun, but because agentic HR without measurement becomes a budget line item with no defensibility.
1119
01:03:09,800 --> 01:03:16,800
And the numbers that matter aren't the ones vendors put on slides, they're the ones that tie directly to throughput, risk and human time.
1120
01:03:16,800 --> 01:03:30,800
Start with ROI components. In this architecture, value comes from three places. Fewer human touches per case, faster cycle times and fewer exception loops. Dicquetriage drives the first one, on boarding drives the second, candidates screening drives the third.
1121
01:03:30,800 --> 01:03:41,800
But only if you treat it as structured scoring plus review, not an automated decision engine. Then cost components. Licensing is the obvious line, but it's never the dominant cost in production. Build effort matters, but it's one time.
1122
01:03:41,800 --> 01:03:53,800
Governance and monitoring are the ongoing costs people forget to price in. Log analytics retention, audit evidence storage, quarterly sampling, bias checks and incident response practice.
1123
01:03:53,800 --> 01:04:09,800
If you don't fund those, you aren't saving money. You're deferring it into a compliance event. So the model is simple. Value is reclaimed, hours times loaded cost plus any measurable reduction in external spend like agency costs or contractor triage support plus any cycle time improvement that has a business impact.
1124
01:04:09,800 --> 01:04:25,800
Costs are licenses, build, run and governance overhead. Nothing else counts. Here's the math framing you can actually use. Pick one workflow, measure baseline, then measure after. For ticket triage, average handle time, particular total ticket volume, percent auto resolved and reopen rate.
1125
01:04:25,800 --> 01:04:36,800
If you reduce human touches, you reclaim hours. If you reduce reopen rate, you reduce second pass work. If you reduce time to first response, you reduce escalations and executive noise. That's ROI without fantasy.
1126
01:04:36,800 --> 01:04:46,800
For onboarding, measure time from offer accepted to day one ready plus the number of provisioning failures per hire plus the number of manual follow ups required to complete milestones.
1127
01:04:46,800 --> 01:04:57,800
If you cut missed access and reduce rework, new hires become productive sooner. That's the only onboarding ROI that matters. Time to productivity, not number of welcome emails sent.
1128
01:04:57,800 --> 01:05:10,800
For candidate screening, measure screening time per candidate, percentage of borderline escalations and review variance. If your structured rubric cuts screening time while increasing consistency, you get measurable savings and a defensibility upgrade.
1129
01:05:10,800 --> 01:05:25,800
If it just moves work into review the AI score, you didn't automate anything. You added a new step. Now KPIs, keep them ruthless. For ticket triage, deflection rate, auto resolved percent, time to first response, reopen rate, escalation rate and human touches per case.
1130
01:05:25,800 --> 01:05:41,800
Track override frequency 2 because it tells you when classification is drifting or when the knowledge boundary is wrong. For onboarding, percent of hires day one ready, provisioning retry counts, idempotency violations, milestone completion times, and number of late milestones per hire.
1131
01:05:41,800 --> 01:05:47,800
Late milestones are the real indicator of systemic failure because they correlate with manager frustration and new hired churn.
1132
01:05:47,800 --> 01:06:05,800
For screening, rubric completion rate, borderline rate, approval turnaround time, override rate by reviewer and consistency across rubric versions and you need at least one fairness signal. Not a legal conclusion. A monitoring indicator that tells you selection rates changed materially when we changed rubric v2 to v3.
1133
01:06:05,800 --> 01:06:14,800
If you can't detect that, you can't claim your managing bias. Finally the executive narrative, scale doesn't come from smarter AI. Scale comes from fewer exceptions.
1134
01:06:14,800 --> 01:06:26,800
Every exception is an entropy generator, a one-off rule, a manual override, a hidden permission, a side channel. Your agent gets better when your system gets tighter. That's the uncomfortable truth behind every successful deployment.
1135
01:06:26,800 --> 01:06:35,800
So when someone asks, what's the ROI? The answer is not, the model is amazing. The answer is we reduced human touches, shortened cycle times and made high-risk decisions auditable by design.
1136
01:06:35,800 --> 01:06:37,800
So that's the transformation.
1137
01:06:37,800 --> 01:06:49,800
Now, the governed, agente workflows turn HR from a reactive queue into a controlled system that can actually scale. Here's the challenge. Next week, pick one workflow you already have pain around. Ticket triage is usually the best start.
1138
01:06:49,800 --> 01:06:58,800
Add one confidence gate that forces a human decision when risk crosses your line. And add one evidence record in data verse that lets you reconstruct what happened without digging through chats.
1139
01:06:58,800 --> 01:07:08,800
So, blue prints like this, copilot studio as the brain, logic apps, standard as the muscle and governance as the control plane. Subscribe to the M365 FM podcast.
1140
01:07:08,800 --> 01:07:13,800
And if you've got a real HR agent failure mode, you want dissected, connect with myocopeters on LinkedIn and send it.
















