.Net: Support for configuring dimensions in Google AI embeddings generation#10489
.Net: Support for configuring dimensions in Google AI embeddings generation#10489rogerbarreto merged 11 commits intomicrosoft:mainfrom
Conversation
|
Hi team, |
|
@microsoft-github-policy-service agree |
|
Hi team, |
dotnet/src/SemanticKernel.Abstractions/Services/AIServiceExtensions.cs
Outdated
Show resolved
Hide resolved
...nnectors/Connectors.Google.UnitTests/Services/GoogleAITextEmbeddingGenerationServiceTests.cs
Show resolved
Hide resolved
|
Hi @ArieSLV thanks for you contributions, most of it looking good so far.
|
…ration (PR comments)
|
Hi @rogerbarreto, Thanks for the thorough review. I've addressed all your feedback in the latest changes:
Regarding the spell check errors, I believe those might have been resolved during the merge as they weren't directly related to my code changes. |
dotnet/src/Connectors/Connectors.Google/Core/GoogleAI/GoogleAIEmbeddingRequest.cs
Show resolved
Hide resolved
…mbedding requests
…ration (microsoft#10489) ### Motivation and Context This change addresses a limitation in the current implementation of the Google AI embeddings generation service in Semantic Kernel. Currently, users cannot configure the output dimensionality of the embeddings, even though the underlying Google AI API supports specifying the number of dimensions via the `output_dimensionality` parameter. **Why is this change required?** Allowing configuration of the dimensions provides greater flexibility for users to tailor the embeddings to their specific use cases—whether for optimizing memory usage, improving performance, or ensuring compatibility with downstream systems that expect a particular embedding size. **What problem does it solve?** It solves the issue of inflexibility by exposing the `dimensions` parameter in the service constructors, builder methods, and API request payloads. This ensures that developers can leverage the full capabilities of the Google API without being limited to the default embedding size. **What scenario does it contribute to?** This feature is particularly useful in scenarios where: - Users need to optimize storage or computational resources. - Downstream tasks or integrations require embeddings of a specific dimensionality. - Fine-tuning the model output is essential for performance or compatibility reasons. Relevant issue link: microsoft#10488 ### Description This PR introduces support for specifying the output dimensionality in the Google AI embeddings generation workflow. The main changes include: - **Service Constructor Update:** The `GoogleAITextEmbeddingGenerationService` constructor now accepts an optional `dimensions` parameter, which is then forwarded to the lower-level client implementations. - **Builder and Extension Methods:** Extension methods such as `AddGoogleAIEmbeddingGeneration` have been updated to accept a `dimensions` parameter. This allows developers to configure the embedding dimensions using the builder pattern. - **Request Payload Enhancement:** The `GoogleAIEmbeddingRequest` class now includes a new optional property `Dimensions` (serialized as `output_dimensionality`). When provided, this value is included in the JSON payload sent to the Google AI API. - **Metadata and Attributes Update:** The service’s metadata now reflects the provided dimensions, ensuring consistency in configuration tracking. - **Unit Testing:** New unit tests have been added to confirm that: - When a `dimensions` value is provided, it is correctly included in the JSON request. - When not provided, the default behavior remains unchanged. This enhancement maintains backward compatibility since the new parameter is optional. Existing implementations that do not specify a dimension will continue to work as before. ### Contribution Checklist - [x] The code builds clean without any errors or warnings. - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations. - [x] All unit tests pass, and I have added new tests where possible. - [x] I didn't break anyone 😄 --------- Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>
Motivation and Context
This change addresses a limitation in the current implementation of the Google AI embeddings generation service in Semantic Kernel. Currently, users cannot configure the output dimensionality of the embeddings, even though the underlying Google AI API supports specifying the number of dimensions via the
output_dimensionalityparameter.Why is this change required?
Allowing configuration of the dimensions provides greater flexibility for users to tailor the embeddings to their specific use cases—whether for optimizing memory usage, improving performance, or ensuring compatibility with downstream systems that expect a particular embedding size.
What problem does it solve?
It solves the issue of inflexibility by exposing the
dimensionsparameter in the service constructors, builder methods, and API request payloads. This ensures that developers can leverage the full capabilities of the Google API without being limited to the default embedding size.What scenario does it contribute to?
This feature is particularly useful in scenarios where:
Users need to optimize storage or computational resources.
Downstream tasks or integrations require embeddings of a specific dimensionality.
Fine-tuning the model output is essential for performance or compatibility reasons.
Resolves .Net: New Feature: Support for configuring dimensions in Google AI embeddings generation #10488
Description
This PR introduces support for specifying the output dimensionality in the Google AI embeddings generation workflow. The main changes include:
Service Constructor Update:
The
GoogleAITextEmbeddingGenerationServiceconstructor now accepts an optionaldimensionsparameter, which is then forwarded to the lower-level client implementations.Builder and Extension Methods:
Extension methods such as
AddGoogleAIEmbeddingGenerationhave been updated to accept adimensionsparameter. This allows developers to configure the embedding dimensions using the builder pattern.Request Payload Enhancement:
The
GoogleAIEmbeddingRequestclass now includes a new optional propertyDimensions(serialized asoutput_dimensionality). When provided, this value is included in the JSON payload sent to the Google AI API.Metadata and Attributes Update:
The service’s metadata now reflects the provided dimensions, ensuring consistency in configuration tracking.
Unit Testing:
New unit tests have been added to confirm that:
dimensionsvalue is provided, it is correctly included in the JSON request.This enhancement maintains backward compatibility since the new parameter is optional. Existing implementations that do not specify a dimension will continue to work as before.
Contribution Checklist