(GH-538) Define newtypes for tags field#1382
(GH-538) Define newtypes for tags field#1382michaeltlombardi wants to merge 3 commits intoPowerShell:mainfrom
tags field#1382Conversation
6840532 to
c3026f6
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces two newtypes (Tag and TagList) to replace the previous Option<Vec<String>> representation for tags in resource and extension manifests. The change implements the "parse, don't validate" pattern by moving tag validation into the constructor and enables case-insensitive tag comparisons while providing canonical JSON schemas for these types.
Changes:
- Defines
Tagnewtype wrappingStringwith regex validation (^\w+$) and case-insensitive comparison/hashing - Defines
TagListnewtype wrappingHashSet<Tag>to ensure uniqueness and provide reusable schema - Updates
ResourceManifestandExtensionManifestto useTagListinstead ofOption<Vec<String>> - Adds comprehensive integration tests following the repository's testing patterns
- Updates tag filtering logic in
list_resourcesto leverage case-insensitive comparison
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/dsc-lib/src/types/tag.rs | Defines Tag newtype with validation pattern ^\w+$, case-insensitive comparison/hashing, and extensive trait implementations |
| lib/dsc-lib/src/types/tag_list.rs | Defines TagList newtype wrapping HashSet<Tag> with ergonomic trait implementations |
| lib/dsc-lib/src/types/mod.rs | Exports new Tag and TagList types |
| lib/dsc-lib/src/extensions/extension_manifest.rs | Changes tags field from Option<Vec<String>> to TagList with #[serde(default)] |
| lib/dsc-lib/src/dscresources/resource_manifest.rs | Changes tags field from Option<Vec<String>> to TagList with #[serde(default)] |
| lib/dsc-lib/src/dscerror.rs | Adds InvalidTag error variant for tag validation failures |
| lib/dsc-lib/locales/en-us.toml | Adds localized error message keys for invalid tag errors |
| lib/dsc-lib/locales/schemas.definitions.yaml | Adds comprehensive documentation for tag and tags schema definitions |
| dsc/src/subcommand.rs | Updates tag filtering to use is_empty() and case-insensitive comparison via Tag type |
| lib/dsc-lib/tests/integration/types/tag.rs | Comprehensive integration tests for Tag covering methods, schema, serde, and traits |
| lib/dsc-lib/tests/integration/types/tag_list.rs | Comprehensive integration tests for TagList covering schema, serde, and traits |
| lib/dsc-lib/tests/integration/types/mod.rs | Registers new test modules for tag and tag_list |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c3026f6 to
329dde3
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
329dde3 to
7f56f8c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Prior to this change, the representation for both resource and extension tags was defined as `Option<Vec<String>>`. With this construction, we can't define a reusable schema for tag lists (because we don't own the `JsonSchema` trait or the `Option<T>` type). The existing implementation also doesn't adhere to the validation from the canonical JSON Schema for tags, which defines the regex pattern `^\w+$` for tags. This change addresses these limitations by defining two newtypes: - `Tag` as a wrapper around `String` that follows the "parse, don't validate" pattern to move pattern validation into the constructor. It also treats comparisons with tags case-insensitively. The defined JSON Schema for this type is marked as `inline` because it is always and only used to validate items in the `tags` field for various structs. - `TagList` as a transparent wrapper around `HashSet<Tag>`. It provides no additional functionality, it just enables us to define a JSON Schema for this reusable type. The representation here uses `HashSet` rather than `Vec` because the items in a tag list must be unique (JSON Schema keyword `uniqueItems` defined as `true`). We can still skip serializing when the `HashSet` is empty, like we can skip serializing when an `Option` is `None`.
Prior to this change, the resource manifest struct defined the `tags` field as `Option<Vec<String>>`. This change updates the field to use the `TagList` newtype, which wraps `HashSet<Tag>` to provide better semantics and JSON Schema for the type. This change ensures that references to the `tags` field are updated through the code base.
Prior to this change, the extension manifest struct defined the `tags` field as `Option<Vec<String>>`. This change updates the field to use the `TagList` newtype, which wraps `HashSet<Tag>` to provide better semantics and JSON Schema for the type. The `tags` field isn't directly accessed in any existing code.
7f56f8c to
7434453
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize, JsonSchema, DscRepoSchema)] | ||
| #[schemars( | ||
| title = t!("schemas.definitions.tags.title"), | ||
| description = t!("schemas.definitions.tags.description"), | ||
| extend( | ||
| "markdownDescription" = t!("schemas.definitions.tags.markdownDescription"), | ||
| ) | ||
| )] | ||
| #[dsc_repo_schema( | ||
| base_name = "tags", | ||
| folder_path = "definitions", | ||
| )] | ||
| pub struct TagList(HashSet<Tag>); | ||
|
|
There was a problem hiding this comment.
TagList derives Serialize over a HashSet<Tag>, which means JSON/YAML serialization order will be non-deterministic (HashSet iteration order varies between runs/platforms). If manifests/resources are ever serialized (e.g., CLI JSON output), this can cause unstable output and flaky tests. Consider implementing a custom Serialize for TagList that emits a sorted array (you already have Ord on Tag), or switch the backing collection to BTreeSet<Tag> for deterministic iteration/serialization.
There was a problem hiding this comment.
Updated to serialize to Vec<Tag>, which is implemented to return a pre-sorted vector to accomodate this problem. We can't guarantee fully deterministic roundtripping without switching to a BTreeSet, but prefer performance for these lookups (many lookups over relatively small sets) over fully deterministic round tripping, since we rarely round-trip tags (arguably only for exporting manifests to disk) but regularly serialize to informational CLI views.
This way we can document that tag lists always serialize in lexicographic ordering on the lowercased tags and recommend defining tags in lexicographic order for reliable round-tripping.
|
|
||
| impl Hash for Tag { | ||
| fn hash<H: std::hash::Hasher>(&self, state: &mut H) { | ||
| self.0.to_lowercase().hash(state) |
There was a problem hiding this comment.
Tag hashing lowercases by allocating a new String (to_lowercase()) on every hash call. Because Tag is used in HashSet/HashMap, this can become a hot path (multiple hashes per insert/lookup). Consider hashing by iterating over self.0.chars() and feeding each character’s lowercase mapping into the hasher to avoid allocating, while keeping behavior consistent with your case-insensitive equality.
| self.0.to_lowercase().hash(state) | |
| for ch in self.0.chars() { | |
| for lower in ch.to_lowercase() { | |
| lower.hash(state); | |
| } | |
| } |
| #[error("{t} '{0}' - {t2}: '{1}'", t = t!("dscerror.invalidTagPrefix"), t2 = t!("dscerror.invalidTagSuffix"))] | ||
| InvalidTag(String, String), | ||
|
|
||
| #[error("{t} '{0}' - {t2}: '{1}'", t = t!("dscerror.invalidTypeNamePrefix"), t2 = t!("dscerror.InvalidTypeNameSuffix"))] |
There was a problem hiding this comment.
The i18n key for the InvalidTypeName suffix uses dscerror.InvalidTypeNameSuffix (capital I), but the locale file defines dscerror.invalidTypeNameSuffix. This will likely cause the error message to fall back to the raw key instead of the translation. Update the key to match the locale entry (or add the missing key in the locale file).
| #[error("{t} '{0}' - {t2}: '{1}'", t = t!("dscerror.invalidTypeNamePrefix"), t2 = t!("dscerror.InvalidTypeNameSuffix"))] | |
| #[error("{t} '{0}' - {t2}: '{1}'", t = t!("dscerror.invalidTypeNamePrefix"), t2 = t!("dscerror.invalidTypeNameSuffix"))] |
| #[test_case("abc_123_DEF" => matches Ok(_); "mixed valid characters string is valid")] | ||
| #[test_case("a.b" => matches Err(_); "string with invalid character raises error")] | ||
| #[test_case("" => matches Err(_); "empty string raises error")] | ||
| fn str(text: &str)-> Result<Tag, DscError> { |
There was a problem hiding this comment.
Minor formatting nit: keep spacing consistent with the rest of the tests (e.g., add a space before -> in the return type).
| fn str(text: &str)-> Result<Tag, DscError> { | |
| fn str(text: &str) -> Result<Tag, DscError> { |
PR Summary
This change addresses limitation in representing the
tagsfield by defining two newtypes:Tagas a wrapper aroundStringthat follows the "parse, don't validate" pattern to move pattern validation into the constructor. It also treats comparisons with tags case-insensitively. The defined JSON Schema for this type is marked asinlinebecause it is always and only used to validate items in thetagsfield for various structs.TagListas a transparent wrapper aroundHashSet<Tag>. It provides no additional functionality, it just enables us to define a JSON Schema for this reusable type. The representation here usesHashSetrather thanVecbecause the items in a tag list must be unique (JSON Schema keyworduniqueItemsdefined astrue).This change modifies the code for the
ResourceManifestandExtensionManifeststructs to useTagListinstead ofOption<Vec<String>>and updates other code as necessary to address the change in type.PR Context
Prior to this change, the representation for both resource and extension tags was defined as
Option<Vec<String>>. With this construction, we can't define a reusable schema for tag lists (because we don't own theJsonSchematrait or theOption<T>type).The existing implementation also doesn't adhere to the validation from the canonical JSON Schema for tags, which defines the regex pattern
^\w+$for tags.