-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Dotnet - Add support for Foundry Adaptive evals #6267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
alliscode
wants to merge
12
commits into
microsoft:main
Choose a base branch
from
alliscode:dotnet-adaptive-evals
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,001
−38
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
ea1c00d
.NET: feat(evals): RubricScore type + EvalScoreResult.Dimensions
alliscode e29eae8
.NET: feat(evals): GeneratedEvaluatorRef + assertion helpers
alliscode c01b392
.NET: feat(foundry-evals): accept GeneratedEvaluatorRef in evaluators=
alliscode 55e829a
.NET: feat(foundry-evals): parse rubric dimension_scores into RubricS…
alliscode 33f064d
.NET: feat(samples): Evaluation_FoundryRubric end-to-end sample
alliscode 3a63442
fix(foundry-evals): harden FoundryEvals public surface for review
alliscode d23c2d9
fix(sample): set ExitCode=1 when rubric dimension gate trips
alliscode 501af85
fix(foundry-evals): search typed Sample directly for rubric scores
alliscode db25a71
test(evals): cover assert_score_at_least and assert_no_failed_items
alliscode 984967f
docs(samples): remove dead rubric-evaluator doc link from FoundryRubr…
alliscode 25b40a3
Potential fix for pull request finding
alliscode 3c4e20d
Address PR 6267 review nits
alliscode File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
15 changes: 15 additions & 0 deletions
15
...samples/05-end-to-end/Evaluation/Evaluation_FoundryRubric/Evaluation_FoundryRubric.csproj
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| <Project Sdk="Microsoft.NET.Sdk"> | ||
|
|
||
| <PropertyGroup> | ||
| <OutputType>Exe</OutputType> | ||
| <TargetFrameworks>net10.0</TargetFrameworks> | ||
|
|
||
| <Nullable>enable</Nullable> | ||
| <ImplicitUsings>enable</ImplicitUsings> | ||
| </PropertyGroup> | ||
|
|
||
| <ItemGroup> | ||
| <ProjectReference Include="..\..\..\..\src\Microsoft.Agents.AI.Foundry\Microsoft.Agents.AI.Foundry.csproj" /> | ||
| </ItemGroup> | ||
|
|
||
| </Project> |
141 changes: 141 additions & 0 deletions
141
dotnet/samples/05-end-to-end/Evaluation/Evaluation_FoundryRubric/Program.cs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,141 @@ | ||
| // Copyright (c) Microsoft. All rights reserved. | ||
|
|
||
| // This sample evaluates a pre-existing Azure AI Foundry agent against a rubric evaluator | ||
| // that was authored in the Foundry portal. | ||
| // | ||
| // Rubric evaluators are LLM-as-judge evaluators with custom scoring dimensions you define | ||
| // for your domain. agent-framework consumes pre-existing rubric evaluators — they are | ||
| // authored in the Foundry portal (or via the dedicated SDK / REST surface) and referenced | ||
| // here by name and version. | ||
| // | ||
| // Prerequisites: | ||
| // - An Azure AI Foundry project with a deployed model. | ||
| // - A registered Foundry agent in that project (the rubric was created against this agent). | ||
| // - A rubric evaluator already created in the Foundry portal. | ||
| // - .env (or environment) populated with the FOUNDRY_* variables below. | ||
| // | ||
| // IMPORTANT: FOUNDRY_PROJECT_ENDPOINT must be the project-scoped URL | ||
| // https://<resource>.services.ai.azure.com/api/projects/<project> | ||
| // A bare Azure OpenAI endpoint silently fails eval submission with HTTP 500. | ||
|
|
||
| using Azure.AI.Projects; | ||
| using Azure.AI.Projects.Agents; | ||
| using Azure.Identity; | ||
| using Microsoft.Agents.AI; | ||
| using Microsoft.Agents.AI.Foundry; | ||
| using FoundryEvals = Microsoft.Agents.AI.Foundry.FoundryEvals; | ||
|
|
||
| string projectEndpoint = Environment.GetEnvironmentVariable("FOUNDRY_PROJECT_ENDPOINT") | ||
| ?? throw new InvalidOperationException("FOUNDRY_PROJECT_ENDPOINT is not set."); | ||
| string model = Environment.GetEnvironmentVariable("FOUNDRY_MODEL") | ||
| ?? throw new InvalidOperationException("FOUNDRY_MODEL is not set."); | ||
| string agentName = Environment.GetEnvironmentVariable("FOUNDRY_AGENT_NAME") | ||
| ?? throw new InvalidOperationException("FOUNDRY_AGENT_NAME is not set."); | ||
| string? agentVersion = Environment.GetEnvironmentVariable("FOUNDRY_AGENT_VERSION"); | ||
| string rubricName = Environment.GetEnvironmentVariable("FOUNDRY_RUBRIC_NAME") | ||
| ?? throw new InvalidOperationException("FOUNDRY_RUBRIC_NAME is not set."); | ||
| string? rubricVersion = Environment.GetEnvironmentVariable("FOUNDRY_RUBRIC_VERSION"); | ||
|
|
||
| // WARNING: DefaultAzureCredential is convenient for development but requires careful | ||
| // consideration in production. Prefer ManagedIdentityCredential (or a specific credential) | ||
| // to avoid latency, unintended credential probing, and fallback security risks. | ||
| AIProjectClient projectClient = new(new Uri(projectEndpoint), new DefaultAzureCredential()); | ||
|
|
||
| // 1. Connect to the pre-existing Foundry agent the rubric was created against. | ||
| FoundryAgent agent; | ||
| if (agentVersion is null) | ||
| { | ||
| ProjectsAgentRecord agentRecord = await projectClient.AgentAdministrationClient.GetAgentAsync(agentName); | ||
| agent = projectClient.AsAIAgent(agentRecord); | ||
| } | ||
| else | ||
| { | ||
| ProjectsAgentVersion versionRecord = await projectClient.AgentAdministrationClient.GetAgentVersionAsync(agentName, agentVersion); | ||
| agent = projectClient.AsAIAgent(versionRecord); | ||
| } | ||
|
|
||
| // 2. Reference the pre-existing rubric evaluator by name + version. | ||
| // Always pin a version for reproducible CI runs; a versionless ref resolves to the | ||
| // current version at run time and emits a Trace.TraceWarning on each criterion build. | ||
| GeneratedEvaluatorRef rubric = rubricVersion is null | ||
| ? GeneratedEvaluatorRef.Latest(rubricName) | ||
| : new GeneratedEvaluatorRef(rubricName, rubricVersion); | ||
|
|
||
| // 3. Mix the rubric with built-in evaluators in a single FoundryEvals config. | ||
| // The implicit conversion lets you pass strings and refs interchangeably. | ||
| FoundryEvals evals = new( | ||
| projectClient, | ||
| model, | ||
| rubric, | ||
| FoundryEvals.Relevance, | ||
| FoundryEvals.Coherence); | ||
|
|
||
| // 4. Run two example queries against the agent and evaluate the outputs in one call. | ||
| string[] queries = | ||
| [ | ||
| "What's the weather like in Seattle?", | ||
| "Should I bring an umbrella to London tomorrow?", | ||
| ]; | ||
|
|
||
| Console.WriteLine(new string('=', 60)); | ||
| Console.WriteLine($"Evaluating '{agent.Name}' with rubric '{rubricName}' (version {rubricVersion ?? "latest"})"); | ||
| Console.WriteLine(new string('=', 60)); | ||
|
|
||
| AgentEvaluationResults results = await agent.EvaluateAsync(queries, evals); | ||
|
|
||
| Console.WriteLine($"Status: {results.Status}"); | ||
| Console.WriteLine($"Results: {results.Passed}/{results.Total} passed"); | ||
| if (results.ReportUrl is not null) | ||
| { | ||
| Console.WriteLine($"Portal: {results.ReportUrl}"); | ||
| } | ||
|
|
||
| Console.WriteLine(results.Passed == results.Total ? "[PASS] All passed" : $"[FAIL] {results.Failed} failed"); | ||
|
|
||
| // 5. Print per-dimension breakdown for each evaluated item — this is the unique value | ||
| // of a rubric evaluator over the built-in numeric ones. | ||
| Console.WriteLine(); | ||
| Console.WriteLine(new string('=', 60)); | ||
| Console.WriteLine("Per-dimension scores"); | ||
| Console.WriteLine(new string('=', 60)); | ||
|
|
||
| if (results.DetailedItems is { Count: > 0 }) | ||
| { | ||
| for (int i = 0; i < results.DetailedItems.Count; i++) | ||
| { | ||
| EvalItemResult item = results.DetailedItems[i]; | ||
| Console.WriteLine($"Item {i + 1}{(i < queries.Length ? $" — \"{queries[i]}\"" : string.Empty)}"); | ||
|
|
||
| foreach (EvalScoreResult score in item.Scores) | ||
| { | ||
| Console.WriteLine($" {score.Name}: {score.Score:F1}{(score.Passed is bool p ? (p ? " (pass)" : " (fail)") : string.Empty)}"); | ||
| if (score.Dimensions is { Count: > 0 } dims) | ||
| { | ||
| foreach (RubricScore d in dims) | ||
| { | ||
| string scoreStr = d.Score is int s ? s.ToString() : "n/a"; | ||
| Console.WriteLine($" - {d.Id}: {scoreStr} (weight={d.Weight}, applicable={d.Applicable})"); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| Console.WriteLine(); | ||
| } | ||
| } | ||
|
|
||
| // 6. CI quality gate — fail the build if a critical dimension drops below threshold. | ||
| // Replace "general_quality" with whatever dimension id your rubric actually defines. | ||
| Console.WriteLine(new string('=', 60)); | ||
| Console.WriteLine("Per-dimension quality gate"); | ||
| Console.WriteLine(new string('=', 60)); | ||
|
|
||
| try | ||
| { | ||
| results.AssertDimensionScoreAtLeast("general_quality", minScore: 3.0, evaluator: rubricName, requireApplicable: true); | ||
| Console.WriteLine($"[PASS] {results.ProviderName}: general_quality >= 3 on every item"); | ||
| } | ||
| catch (InvalidOperationException ex) | ||
| { | ||
| Console.WriteLine($"[FAIL] {results.ProviderName}: dimension gate tripped: {ex.Message}"); | ||
| System.Environment.ExitCode = 1; | ||
| } | ||
|
alliscode marked this conversation as resolved.
|
||
55 changes: 55 additions & 0 deletions
55
dotnet/samples/05-end-to-end/Evaluation/Evaluation_FoundryRubric/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| # Evaluation — Foundry Rubric | ||
|
|
||
| This sample evaluates a pre-existing Azure AI Foundry agent against a **rubric evaluator** | ||
| authored in the Foundry portal. Rubric evaluators are LLM-as-judge evaluators with custom | ||
| scoring dimensions you define for your domain; agent-framework references them by name and | ||
| version, mixes them with built-in evaluators, and exposes per-dimension scores you can gate | ||
| CI on. | ||
|
|
||
| ## What this sample demonstrates | ||
|
|
||
| - Connecting to a pre-existing Foundry agent (`AgentAdministrationClient.GetAgentAsync`). | ||
| - Referencing a pre-existing rubric evaluator via `GeneratedEvaluatorRef(name, version)`. | ||
| - Mixing the rubric with built-in evaluators (`Relevance`, `Coherence`) in one | ||
| `FoundryEvals` run. | ||
| - Reading per-dimension breakdowns from `EvalScoreResult.Dimensions`. | ||
| - Gating CI on a per-dimension threshold via | ||
| `AgentEvaluationResults.AssertDimensionScoreAtLeast(...)`. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - .NET 10 SDK or later. | ||
| - Azure CLI installed and authenticated (`az login`). | ||
| - An Azure AI Foundry project with a deployed model. | ||
| - A registered Foundry agent in that project (the agent the rubric was created against). | ||
| - A rubric evaluator created in the Foundry portal. Creating rubrics through the portal | ||
| currently requires picking a Foundry agent as the generation context, so this | ||
| prerequisite is implied by having a rubric at all. | ||
|
|
||
| > [!IMPORTANT] | ||
| > `FOUNDRY_PROJECT_ENDPOINT` **must** be the project-scoped URL | ||
| > `https://<resource>.services.ai.azure.com/api/projects/<project>`. A bare Azure OpenAI | ||
| > endpoint silently fails eval submission with HTTP 500. | ||
| > [!NOTE] | ||
| > An **Eval Definition** (a saved bundle of testing_criteria with `"object": "eval"`) is | ||
| > not the same as a **Rubric Evaluator** (a standalone evaluator with dimensions, weights, | ||
| > and a version). `GeneratedEvaluatorRef` points at the latter. | ||
| ## Environment variables | ||
|
|
||
| ```powershell | ||
| $env:FOUNDRY_PROJECT_ENDPOINT="https://your-resource.services.ai.azure.com/api/projects/your-project" | ||
| $env:FOUNDRY_MODEL="gpt-4o-mini" | ||
| $env:FOUNDRY_AGENT_NAME="your-agent-name" | ||
| $env:FOUNDRY_AGENT_VERSION="1" # optional; omit for latest | ||
| $env:FOUNDRY_RUBRIC_NAME="your-rubric-name" | ||
| $env:FOUNDRY_RUBRIC_VERSION="1" # optional; omit for latest (CI: pin this) | ||
| ``` | ||
|
|
||
| ## Run the sample | ||
|
|
||
| ```powershell | ||
| cd dotnet/samples/05-end-to-end/Evaluation | ||
| dotnet run --project .\Evaluation_FoundryRubric | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.