Skip to content

Conversation

@inteon
Copy link
Member

@inteon inteon commented Jun 16, 2025

Replaces the github.com/json-iterator/go dependency with encoding/json/v2.

# based on 2a9ee08f99c36a53079be4e9be570f3bac18418b
$ go test -benchmem -run=^$ -bench ^BenchmarkFieldSet/serialize.*$ sigs.k8s.io/structured-merge-diff/v6/fieldpath -count=6 > pr.txt
$ go test -benchmem -run=^$ -bench ^BenchmarkFieldSet/serialize.*$ sigs.k8s.io/structured-merge-diff/v6/fieldpath -count=6 > master.txt
$ benchstat master.txt pr.txt

goos: linux
goarch: amd64
pkg: sigs.k8s.io/structured-merge-diff/v6/fieldpath
cpu: Intel(R) Core(TM) Ultra 7 165H
                            │  master.txt  │                pr.txt                │
                            │    sec/op    │    sec/op      vs base               │
FieldSet/serialize-20-8       7.974µ ± 13%   11.061µ ± 12%  +38.72% (p=0.002 n=6)
FieldSet/deserialize-20-8     21.13µ ±  5%    24.60µ ± 22%  +16.43% (p=0.002 n=6)
FieldSet/serialize-50-8       22.37µ ± 17%    33.22µ ±  4%  +48.48% (p=0.002 n=6)
FieldSet/deserialize-50-8     48.55µ ± 15%    63.67µ ±  7%  +31.14% (p=0.002 n=6)
FieldSet/serialize-100-8      64.62µ ±  7%   113.10µ ± 11%  +75.02% (p=0.002 n=6)
FieldSet/deserialize-100-8    117.6µ ±  5%    202.1µ ±  4%  +71.77% (p=0.002 n=6)
FieldSet/serialize-500-8      335.7µ ±  6%    591.3µ ± 28%  +76.16% (p=0.002 n=6)
FieldSet/deserialize-500-8    586.9µ ±  5%   1013.9µ ±  4%  +72.77% (p=0.002 n=6)
FieldSet/serialize-1000-8     740.6µ ±  7%   1288.0µ ± 26%  +73.92% (p=0.002 n=6)
FieldSet/deserialize-1000-8   1.293m ±  9%    2.567m ± 41%  +98.64% (p=0.002 n=6)
geomean                       110.1µ          174.4µ        +58.40%

                            │  master.txt   │               pr.txt                │
                            │     B/op      │     B/op      vs base               │
FieldSet/serialize-20-8         2318.0 ± 0%     847.0 ± 0%  -63.46% (p=0.002 n=6)
FieldSet/deserialize-20-8     10.494Ki ± 0%   7.130Ki ± 0%  -32.05% (p=0.002 n=6)
FieldSet/serialize-50-8        6.685Ki ± 0%   2.379Ki ± 0%  -64.40% (p=0.002 n=6)
FieldSet/deserialize-50-8      22.12Ki ± 0%   19.92Ki ± 0%   -9.91% (p=0.002 n=6)
FieldSet/serialize-100-8      20.272Ki ± 0%   7.625Ki ± 0%  -62.39% (p=0.002 n=6)
FieldSet/deserialize-100-8     66.85Ki ± 0%   68.85Ki ± 0%   +2.98% (p=0.002 n=6)
FieldSet/serialize-500-8      112.15Ki ± 0%   42.70Ki ± 0%  -61.92% (p=0.002 n=6)
FieldSet/deserialize-500-8     323.8Ki ± 0%   351.7Ki ± 0%   +8.62% (p=0.002 n=6)
FieldSet/serialize-1000-8     225.21Ki ± 0%   86.01Ki ± 0%  -61.81% (p=0.002 n=6)
FieldSet/deserialize-1000-8    706.0Ki ± 0%   776.3Ki ± 0%   +9.96% (p=0.002 n=6)
geomean                        44.05Ki        26.11Ki       -40.72%

                            │ master.txt  │               pr.txt               │
                            │  allocs/op  │  allocs/op   vs base               │
FieldSet/serialize-20-8        9.000 ± 0%    3.000 ± 0%  -66.67% (p=0.002 n=6)
FieldSet/deserialize-20-8      241.0 ± 0%    247.0 ± 0%   +2.49% (p=0.002 n=6)
FieldSet/serialize-50-8       14.000 ± 0%    8.000 ± 0%  -42.86% (p=0.002 n=6)
FieldSet/deserialize-50-8      707.0 ± 0%    720.5 ± 0%   +1.91% (p=0.002 n=6)
FieldSet/serialize-100-8       31.00 ± 0%    25.00 ± 0%  -19.35% (p=0.002 n=6)
FieldSet/deserialize-100-8    2.369k ± 0%   2.417k ± 0%   +2.00% (p=0.002 n=6)
FieldSet/serialize-500-8       138.0 ± 0%    126.0 ± 1%   -8.70% (p=0.002 n=6)
FieldSet/deserialize-500-8    12.16k ± 0%   12.42k ± 0%   +2.10% (p=0.002 n=6)
FieldSet/serialize-1000-8      295.0 ± 0%    277.0 ± 0%   -6.10% (p=0.002 n=6)
FieldSet/deserialize-1000-8   26.86k ± 0%   27.49k ± 0%   +2.32% (p=0.002 n=6)
geomean                        340.5         281.0       -17.47%

closes #202

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 16, 2025
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 16, 2025
@dims
Copy link
Member

dims commented Jun 17, 2025

xref: kubernetes/kubernetes#132312

@dims
Copy link
Member

dims commented Jun 18, 2025

For the pull-structured-merge-diff-test failure, please add this to fix the error @inteon

diff --git a/internal/cli/main_test.go b/internal/cli/main_test.go
index 3e409ede0673..8beed5c6cf55 100644
--- a/internal/cli/main_test.go
+++ b/internal/cli/main_test.go
@@ -21,6 +21,7 @@ import (
        "encoding/json"
        "io/ioutil"
        "path/filepath"
+       "strings"
        "testing"
 )

@@ -135,7 +136,7 @@ func (tt *testCase) checkOutput(t *testing.T, got []byte) {
                t.Fatalf("couldn't read expected output %q: %v", tt.expectedOutputPath, err)
        }

-       if a, e := string(got), string(want); a != e {
+       if a, e := strings.TrimSpace(string(got)), strings.TrimSpace(string(want)); a != e {
                t.Errorf("output didn't match expected output: got:\n%v\nwanted:\n%v\n", a, e)
        }
 }

@inteon inteon force-pushed the use_json_v2 branch 2 times, most recently from a4b6871 to bdce391 Compare June 18, 2025 10:14
@inteon
Copy link
Member Author

inteon commented Jun 18, 2025

For the pull-structured-merge-diff-test failure, please add this to fix the error @inteon
...

I fixed the test failure.

@dims
Copy link
Member

dims commented Jun 18, 2025

/assign @BenTheElder @liggitt

@liggitt
Copy link
Contributor

liggitt commented Jun 19, 2025

/assign @jpbetz
who is the primary apimachinery approver on this bit and was deeply involved in the initial performance-driven use of json-iterator in these bits

@liggitt
Copy link
Contributor

liggitt commented Jun 19, 2025

For the pull-structured-merge-diff-test failure, please add this to fix the error @inteon

I suspect using a json marshal function (like MarshalWrite) that doesn't append a newline would be a more efficient way to accomplish that

return nil, fmt.Errorf("parsing JSON: %v", err)
}

k := rawKey.String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is rawKey.String() the same as decoding to a string, in terms of interpreting escape sequences, etc?

Comment on lines 330 to 339
{
JSON: `1.0`,
IntoType: reflect.TypeOf(json.Number("")),
Want: json.Number("1.0"),
},
{
JSON: `1`,
IntoType: reflect.TypeOf(json.Number("")),
Want: json.Number("1"),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if it's ok to drop these... were they added to try to catch a specific issue?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: inteon
Once this PR has been reviewed and has the lgtm label, please ask for approval from jpbetz. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liggitt
Copy link
Contributor

liggitt commented Jun 23, 2025

The .../deserialize... benchmarks actually don't look terrible now... I'd be willing to accept that performance drop in pursuit of correctness / safety.

The serialize benchmarks still look pretty rough. Need to see what we can improve there.

@liggitt
Copy link
Contributor

liggitt commented Jun 24, 2025

did you run the full set of benchmarks to see how we looked across all of them?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 30, 2025
@liggitt
Copy link
Contributor

liggitt commented Jul 1, 2025

Thanks for the updates, how are the overall benchmarks looking (not just the subset in the description)?

As you were adjusting the implementation, were there any unit tests it would make sense to add to catch edges the previous implementations handled we want to ensure the new one does as well? I'm thinking specifically of things like:

  • handling of extra data in the input bytes/buffer when decoding/deserializing (e.g. "somevalue"extrastuff or {"key":"value"}extrastuff)
  • handling of special characters in strings that need escaping where the raw bytes would not be the same as the decoded or encoded/escaped bytes
  • handling of ignorable whitespace when decoding

@jpbetz
Copy link
Contributor

jpbetz commented Jul 1, 2025

First off- it's amazing to see this happening and the benchmarks are VERY promising. Thanks @inteon!

To get this to the finish line, and merge, what should our criteria be?

I chatted offline with @liggitt briefly and some of the criteria we discussed was:

  • Golang releases a stable json/v2 (The alternative would be to add an internal copy of json-experiment to this repo like kube-openapi has, but I don't know if it's worth it given how close json/v2 is to stable)
  • github.com/kubernetes/kubernetes CI test stability is not negatively impacted
  • This passes a scale test (SIG instrumentation)
  • We are confident on the correctness (triple check the implementation, shore up with additional functional tests)

Intuitively, it seems like the deserialization is already sufficiently fast. I suspect we need to optimize serialization a bit further since we serialize managed fields on all updates (not just patches). That said, I'm willing to be data driven here. If we can show downstream scale and performance is acceptable, I'm willing to accept a higher serialization perf regression in order to migrate to json/v2.

Thoughts, concerns?

@inteon
Copy link
Member Author

inteon commented Aug 30, 2025

Update check the new numbers in my PR description, upgrading encoding/json/v2 did improve performance!

@inteon
Copy link
Member Author

inteon commented Aug 30, 2025

Did some further tuning and got the # allocations lower than on master.

@BenTheElder
Copy link
Member

Golang releases a stable json/v2 (The alternative would be to add an internal copy of json-experiment to this repo like kube-openapi has, but I don't know if it's worth it given how close json/v2 is to stable

even with stable json/v2, we might need to temporarily use a fork, otherwise we kubernetes branches that aren't on that go version yet can't update SMD.

but we should encapsulate it and plan to eliminate it when we're ready to require that minimum go version

even kubernetes master isn't on 1.25 yet

@liggitt
Copy link
Contributor

liggitt commented Sep 2, 2025

Did some further tuning and got the # allocations lower than on master.

Am I reading correctly that B/op and allocs/op are ~equivalent or better than master on pretty much all benchmarks? If so, that's amazing progress!

Paired with a close review and functional/correctness test coverage to make sure the new approach behaves identically to the old version (especially in terms of what it accepts/rejects/produces in edge cases like leading/trailing/non-normalized/invalid inputs), this looks really promising.

@lalitc375
Copy link
Contributor

Amazing work @inteon in reducing the number of allocs per operation to 1. I did a similar analysis over your change, and saw similar performance. The change should have zero to negligible impact on Kube API server performance. We just have to make sure this new library behaves the same as the existing implementation functionally, which I think existing tests should be able to do(?).

@liggitt
Copy link
Contributor

liggitt commented Sep 4, 2025

We just have to make sure this new library behaves the same as the existing implementation functionally, which I think existing tests should be able to do(?).

I'm not sure how detailed the existing tests are at all the edge cases of valid and invalid variants on input (handling of escaped values in keys, whitespace before/after/between tokens, valid and invalid syntax, etc), and byte-for-byte assertions about output. Since this needed to effectively rewrite some of the encoding/decoding paths, we need to make sure we have test coverage for those things.

@lalitc375
Copy link
Contributor

We just have to make sure this new library behaves the same as the existing implementation functionally, which I think existing tests should be able to do(?).

I'm not sure how detailed the existing tests are at all the edge cases of valid and invalid variants on input (handling of escaped values in keys, whitespace before/after/between tokens, valid and invalid syntax, etc), and byte-for-byte assertions about output. Since this needed to effectively rewrite some of the encoding/decoding paths, we need to make sure we have test coverage for those things.

There are not enough tests for unicode and escape characters . I have added those tests in #300. Including these new tests, We should detect regression in Serialization and Deserialization code in future.

@liggitt
Copy link
Contributor

liggitt commented Sep 9, 2025

Excellent, #300 looks like a great step forward for test coverage of normalized encoding. We'll probably want similar additions for:

  1. decoding of valid-but-non-normalized values working properly (insignificant leading / trailing / interspersed whitespace, or non-canonical escaped values, etc) and capturing what in-memory values are produced by the decoding
  2. decoding of invalid values erroring properly

@inteon
Copy link
Member Author

inteon commented Sep 11, 2025

After upgrading github.com/go-json-experiment/json and rerunning the benchmarks, all benchmarks now outperform the benchmarks on master.

@liggitt
Copy link
Contributor

liggitt commented Sep 11, 2025

Huh… did something change on s-m-d master? The latest benchmark update looks like some of the relative improvement came from master getting worse...

@liggitt
Copy link
Contributor

liggitt commented Sep 11, 2025

oh, maybe the test changes in #300 impacted the master benchmark numbers

@dims
Copy link
Member

dims commented Sep 17, 2025

k/k master is at golang v1.25.1 fyi

@liggitt liggitt moved this from Dependencies to replace/remove to Dependency review complete in [sig-architecture] Dependency management Oct 1, 2025
@dims
Copy link
Member

dims commented Nov 29, 2025

@inteon do we want to pick this up when k8s 1.36 cycle reopens

@inteon
Copy link
Member Author

inteon commented Dec 2, 2025

@inteon do we want to pick this up when k8s 1.36 cycle reopens

Yes. I updated the code, fixed some bugs and updated the benchmark results.

@liggitt
Copy link
Contributor

liggitt commented Dec 2, 2025

Looks like we can drop the "Performance is not yet great" comment from the description now... this looks ~neutral or an improvement for memory use at this point 🎉

@liggitt
Copy link
Contributor

liggitt commented Dec 2, 2025

@inteon do we want to pick this up when k8s 1.36 cycle reopens

We want to pick it up as soon as possible, but we need to wait until json/v2 lands in stdlib in a non-experimental way. I'm not sure if that's happening in Go 1.26 or not (the status of golang/go#76406 and json/v2 working group (view) makes me think not).

api-machinery folks have a detailed code review of this PR scheduled for later this week. Once we're happy with the shape of it, and do a final round of benchmarks, we'll submit this as positive community feedback on the current shape of json/v2, and then wait patiently for json/v2 to land in stdlib 🤞

}

// DeserializePathElement parses a serialized path element
func DeserializePathElement(s string) (PathElement, error) {
Copy link
Contributor

@liggitt liggitt Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz @lalitc375: let's augment the unit tests that exercise successful DeserializePathElement calls (TestPathElementRoundTrip?) to also assert the in-memory state returned, not just the round-trip

fields.Sort()
return PathElement{Key: &fields}, iter.Error
var fields value.FieldList
if err := json.UnmarshalRead(strings.NewReader(s[2:]), &fields); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz @lalitc375: can we add test-cases to exercise

  • zero-length
  • EOF
  • leading / trailing whitespace in value
  • multi-token should fail identically

v, err := readJSONIter(iter)
if err != nil {
var v any
if err := json.UnmarshalRead(strings.NewReader(s[2:]), &v); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz @lalitc375: can we add test-cases to ensure behave identically between readJSONIter / json.UnmarshalRead

  • zero-length - json-iter masks, json/v2 returns EOF error? https://go.dev/play/p/D80jALZ7ZRq
  • non-zero length with EOF (not sure this is possible)
  • leading / trailing whitespace in value
  • multi-token should fail identically

return fmt.Errorf("parsing JSON: %v", err)
}

fields = append(fields, Field{Name: k, Value: NewValueInterface(v)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks it would accept duplicate key entries. Did the original ReadObjectCB prevent duplicates or accept them? Is there a test case to make sure these behave identically to json-iterator?

cc @jpbetz @lalitc375

} else if err != nil {
return fmt.Errorf("parsing JSON: %v", err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to encounter a token other than EndObject or String here? will ReadToken() error in that case?

@jpbetz @lalitc375: can we make sure there's a test that some other token type instead of a string key errors correctly here?

break
}

k := rawKey.String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz @lalitc375: can we ensure we have test cases that hit this point to compare behavior between json-iterator and json/v2 and we end up with the same in-memory values:

  • escaped/encoded keys ("\u... \n ...")
  • multi-byte keys ("Iñtërnâtiônàlizætiøn,💝🐹🌇⛔")
  • keys containing bytes that are invalid unicode characters

return children, isMember, nil
}

// FromJSON clears s and reads a JSON formatted set structure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this back up to where FromJSON was previously to minimize the diff

func readIterV1(iter *jsoniter.Iterator) (children *Set, isMember bool) {
iter.ReadMapCB(func(iter *jsoniter.Iterator, key string) bool {
if key == "." {
func (s *setContentsV1) readIterV1(parser *jsontext.Decoder) (children *Set, isMember bool, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the receiver isn't used, change this back to a function:

Suggested change
func (s *setContentsV1) readIterV1(parser *jsontext.Decoder) (children *Set, isMember bool, err error) {
func readIterV1(parser *jsontext.Decoder) (children *Set, isMember bool, err error) {


// Append the member to the members list, we will sort it later
m := &children.Members.members
// Since we expect that most of the time these will have been
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's leave the insert-in-sorted-order code as it was here and below if there's not a reason it has to be connected to this PR

@jpbetz, let's separately consider whether to switch this to a post-accumulation sort

if children == nil {
children = &Set{}
}
// Since we expect that most of the time these will have been
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this bit as well

})
}

// Sort the members and children
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert the post-accumulation sort

@liggitt
Copy link
Contributor

liggitt commented Dec 4, 2025

@inteon, thanks so much for this PR, we spent several hours today reviewing the details deeply now that the performance has converged to something we could actually merge.

There were four main categories of feedback:

  • splitting unrelated changes out of this PR to keep this limited just to the json/v2 switch
  • test coverage to ensure equivalent behavior in lots of edges (which we'll want to make pass against this PR and against master, and merge to master ahead of this PR)
  • small tweaks (error checking, relocating to minimize diff, helper function around writing an empty object, etc)
  • comments with some possibilities for simplification and correctness improvements – some were questions we need to actually run through benchmarks to know whether we can simplify without impacting performance

Don't feel like you have to be the one to resolve all the comments, @jpbetz and @lalitc375 will be jumping in to help as well.

We can gradually work on this and continue getting this ready for when json/v2 lands in stdlib (which looks like it will now be Go 1.27 at the earliest)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

remove use of json-iterator / reflect2

7 participants