Skip to content

Conversation

@Vladilen
Copy link
Collaborator

@Vladilen Vladilen commented Dec 5, 2025

Changelog entry

...

Changelog category

  • Not for changelog (changelog entry is not required)

Description for reviewers

...

Copilot AI review requested due to automatic review settings December 5, 2025 17:51
@Vladilen Vladilen requested review from a team as code owners December 5, 2025 17:51
@Vladilen Vladilen requested review from XJIE6 and dorooleg December 5, 2025 17:52
@ydbot
Copy link
Collaborator

ydbot commented Dec 5, 2025

Run Extra Tests

Run additional tests for this PR. You can customize:

  • Test Size: small, medium, large (default: all)
  • Test Targets: any directory path (default: ydb/)
  • Sanitizers: ASAN, MSAN, TSAN
  • Coredumps: enable for debugging (default: off)
  • Additional args: custom ya make arguments

▶  Run tests

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

2025-12-05 17:55:36 UTC Pre-commit check linux-x86_64-relwithdebinfo for cc62a54 has started.
2025-12-05 17:55:53 UTC Artifacts will be uploaded here
2025-12-05 17:58:05 UTC ya make is running...
🟡 2025-12-05 20:09:18 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
39469 36519 0 3 2919 28

2025-12-05 20:09:30 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-12-05 20:20:50 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
72 (only retried tests) 60 0 0 0 12

🟢 2025-12-05 20:20:57 UTC Build successful.
🟡 2025-12-05 20:21:16 UTC ydbd size 2.3 GiB changed* by +234.4 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: fe2b581 merge: cc62a54 diff diff %
ydbd size 2 466 152 008 Bytes 2 466 392 024 Bytes +234.4 KiB +0.010%
ydbd stripped size 524 843 136 Bytes 524 865 152 Bytes +21.5 KiB +0.004%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

2025-12-05 17:55:43 UTC Pre-commit check linux-x86_64-release-asan for cc62a54 has started.
2025-12-05 17:56:00 UTC Artifacts will be uploaded here
2025-12-05 17:58:06 UTC ya make is running...
🟡 2025-12-05 19:42:09 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
12063 11998 0 48 7 10

🟢 2025-12-05 19:42:19 UTC Build successful.
🟡 2025-12-05 19:42:43 UTC ydbd size 3.8 GiB changed* by +331.0 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: fe2b581 merge: cc62a54 diff diff %
ydbd size 4 127 809 296 Bytes 4 128 148 208 Bytes +331.0 KiB +0.008%
ydbd stripped size 1 532 692 760 Bytes 1 532 767 704 Bytes +73.2 KiB +0.005%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

🟢 2025-12-05 17:55:48 UTC The validation of the Pull Request description is successful.

Copilot finished reviewing on behalf of Vladilen December 5, 2025 17:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes JSON path handling by implementing a proper JSONPath parser and accessor system, replacing the previous workaround that manually stripped $. prefixes from paths. The changes introduce a new TJsonPathAccessor and TJsonPathAccessorTrie infrastructure for correctly parsing and navigating JSON paths according to RFC 9535 standards.

Key Changes:

  • Introduced new JSON path processing infrastructure using YQL's jsonpath parser instead of manual string manipulation
  • Removed the workaround in query optimization that stripped quotes from simple JSON paths
  • Updated JSON subcolumn extraction to properly handle quoted keys and special characters in member names

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
json_value_path.h/cpp New core JSON path parsing and accessor infrastructure
kqp_opt_phy_olap_filter.cpp Removed manual JSON path quote-stripping workaround
kernel_logic.h/cpp Updated to use new JSON path extraction with proper error handling
json_ut.cpp Added comprehensive tests for special characters and path parsing
ut_sub_columns.cpp Added unit tests for JSON path trie and accessor functionality
stats.h Updated key lookup to use JSON path trie instead of string comparison
partial.h/cpp Changed to return TJsonPathAccessor with remaining path support
others_storage.h/cpp Updated accessor methods to use new JSON path infrastructure
columns_storage.h Modified GetPathAccessor to use trie-based lookup
accessor.h/cpp Refactored TJsonRestorer to use proper JSON path parsing
direct_builder.cpp Updated key string building to use QuoteJsonItem for proper escaping
data_extractor.cpp Restricted to only support top-level JSON objects
json_extractors.cpp Added handling for empty objects in JSON extraction
written.cpp Added debug logging (appears to be temporary)
composite_serial/accessor.cpp Fixed incorrect TConclusion boolean check
binary_json_value_view.h/cpp Exposed JsonNumberToString as public static method
ya.make Added jsonpath dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +268 to +269
// TODO: IDK why it is needed, re-check
// return std::make_shared<TSparsedArray>(nullptr, arrow::binary(), recordsCount);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO comment with uncertainty and lacking tracking information. The comment "IDK why it is needed, re-check" suggests the author is unsure about the commented-out code's purpose. Either remove the commented code if it's not needed, or properly document why it might be needed in the future.

Suggested change
// TODO: IDK why it is needed, re-check
// return std::make_shared<TSparsedArray>(nullptr, arrow::binary(), recordsCount);

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Перепроверить

Comment on lines +22 to +28
TConclusion<std::shared_ptr<TJsonPathAccessor>> GetPathAccessor(const std::string_view path) const {
auto jsonPathAccessorTrie = std::make_shared<NKikimr::NArrow::NAccessor::NSubColumns::TJsonPathAccessorTrie>();
for (ui32 i = 0; i < Stats.GetColumnsCount(); ++i) {
auto insertResult = jsonPathAccessorTrie->Insert(ToSubcolumnName(Stats.GetColumnName(i)), Records->GetColumnVerified(i));
AFL_VERIFY(insertResult.IsSuccess())("error", insertResult.GetErrorMessage());
}
return jsonPathAccessorTrie->GetAccessor(path);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar performance issue: A new TJsonPathAccessorTrie is created and populated on every call to GetPathAccessor. This is inefficient for repeated lookups. Consider caching the trie as a member variable or using a more efficient lookup structure.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dorooleg как думаешь что тут лучше? Мне кажется тут сокрее будут одноразовые акции по вызову GetPathAccessor, поэтому, чтобы лишний раз не строить на каждое создание объекта это дерево, сделал это здесь.

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

2025-12-05 23:29:55 UTC Pre-commit check linux-x86_64-release-asan for fb907a4 has started.
2025-12-05 23:30:04 UTC Artifacts will be uploaded here
2025-12-05 23:31:46 UTC ya make is running...
🟡 2025-12-06 00:49:18 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
11924 11869 0 39 7 9

🟢 2025-12-06 00:49:27 UTC Build successful.
🟡 2025-12-06 00:49:50 UTC ydbd size 3.8 GiB changed* by +327.5 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: 435a374 merge: fb907a4 diff diff %
ydbd size 4 127 839 576 Bytes 4 128 174 928 Bytes +327.5 KiB +0.008%
ydbd stripped size 1 532 698 520 Bytes 1 532 774 872 Bytes +74.6 KiB +0.005%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

2025-12-05 23:34:40 UTC Pre-commit check linux-x86_64-relwithdebinfo for fb907a4 has started.
2025-12-05 23:34:53 UTC Artifacts will be uploaded here
2025-12-05 23:36:23 UTC ya make is running...
🟡 2025-12-06 01:17:42 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
39330 36378 0 3 2928 21

2025-12-06 01:17:53 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-12-06 01:30:29 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
75 (only retried tests) 59 0 0 0 16

🟢 2025-12-06 01:30:35 UTC Build successful.
🟡 2025-12-06 01:30:57 UTC ydbd size 2.3 GiB changed* by +229.4 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: 435a374 merge: fb907a4 diff diff %
ydbd size 2 466 176 648 Bytes 2 466 411 552 Bytes +229.4 KiB +0.010%
ydbd stripped size 524 848 192 Bytes 524 869 376 Bytes +20.7 KiB +0.004%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@dorooleg dorooleg requested a review from Copilot December 8, 2025 13:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

2025-12-08 15:05:47 UTC Pre-commit check linux-x86_64-relwithdebinfo for 940e5ef has started.
2025-12-08 15:06:05 UTC Artifacts will be uploaded here
2025-12-08 15:08:14 UTC ya make is running...

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

2025-12-08 15:07:04 UTC Pre-commit check linux-x86_64-release-asan for 940e5ef has started.
2025-12-08 15:07:20 UTC Artifacts will be uploaded here
2025-12-08 15:09:28 UTC ya make is running...

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +21 to +40
TConclusion<std::shared_ptr<NSubColumns::TJsonPathAccessor>> TSubColumnsPartialArray::GetPathAccessor(const std::string_view svPath, const ui32 recordsCount) const {
auto jsonPathAccessorTrie = std::make_shared<NKikimr::NArrow::NAccessor::NSubColumns::TJsonPathAccessorTrie>();
auto headerStats = Header.GetColumnStats();
for (ui32 i = 0; i < headerStats.GetDataNamesCount(); ++i) {
if (PartialColumnsData.HasColumn(i)) {
auto insertResult = jsonPathAccessorTrie->Insert(NSubColumns::ToJsonPath(headerStats.GetColumnName(i)), PartialColumnsData.GetAccessorVerified(i));
AFL_VERIFY(insertResult.IsSuccess())("error", insertResult.GetErrorMessage());
}
}

auto accessorResult = jsonPathAccessorTrie->GetAccessor(svPath);
if (accessorResult.IsSuccess() && accessorResult.GetResult()->IsValid()) {
return accessorResult;
}

if (OthersData) {
return OthersData->GetPathAccessor(svPath, recordsCount);
} else {
AFL_VERIFY(!Header.GetOtherStats().GetKeyIndexOptional(svPath));
return std::make_shared<TTrivialArray>(TThreadSimpleArraysCache::GetNull(arrow::binary(), recordsCount));
}

return std::shared_ptr<NSubColumns::TJsonPathAccessor>{};
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new TJsonPathAccessorTrie is created on every call to GetPathAccessor. Consider caching the trie to avoid rebuilding it for each call, which could improve performance significantly when this method is called frequently.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants