LCORE-579 Anonymize user ID in transcripts by max-svistunov · Pull Request #673 · lightspeed-core/lightspeed-stack

max-svistunov · 2025-10-14T12:41:19Z

Description

Type of change

Related Tickets & Documents

Related Issue # LCORE-579
Closes # LCORE-579

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Refactor
- Enhanced privacy by replacing raw user identifiers with hashed identifiers in transcript storage paths and stored metadata, reducing exposure of personal data without changing external behavior.
Tests
- Updated unit tests to validate transcript path construction and stored metadata now use hashed identifiers, ensuring consistency with the privacy improvement.

coderabbitai · 2025-10-14T12:41:31Z

Walkthrough

Hash user_id with SHA-256 for transcript path construction and stored metadata; tests updated to expect hashed user_id. No public API or function signatures changed.

Changes

Cohort / File(s)	Summary
Transcript utils logic `src/utils/transcripts.py`	Import `hashlib`; add private `_hash_user_id(user_id)` helper; compute SHA-256 hex digest and use `hashed_user_id` for path construction and to replace `user_id` in stored metadata; internal-only change.
Unit tests for transcripts `tests/unit/utils/test_transcripts.py`	Update tests to import `hashlib` and assert transcript path and stored metadata use SHA-256 hashed `user_id` instead of raw `user_id`; other assertions unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Utils as transcripts.py
  participant Hash as hashlib
  participant Storage as Filesystem/Store

  Caller->>Utils: store_transcript(user_id, content, metadata)
  Utils->>Hash: _hash_user_id(user_id) — sha256 → hex
  Hash-->>Utils: hashed_user_id
  Utils->>Utils: construct path with hashed_user_id<br/>replace metadata.user_id with hashed_user_id
  Utils->>Storage: write transcript(content, metadata, path)
  Storage-->>Utils: ack/success
  Utils-->>Caller: result/status

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibble bytes and hop with glee,
A hash now guards identity.
Paths obscured, neat as a burrow,
Tests aligned — no need to furrow.
SHA-256 keeps secrets snug and small,
I hop, I guard, I watch them all. 🐇🔐

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly and accurately describes the primary change of anonymizing user IDs in transcripts and aligns directly with the implemented functionality. It is a single, clear sentence that highlights the main feature without extraneous details.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8258cff and 436755b.

📒 Files selected for processing (1)

src/utils/transcripts.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/utils/transcripts.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)
GitHub Check: e2e_tests (azure)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a71a42 and 8258cff.

📒 Files selected for processing (2)

src/utils/transcripts.py (3 hunks)
tests/unit/utils/test_transcripts.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

tests/unit/utils/test_transcripts.py
src/utils/transcripts.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/utils/test_transcripts.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/utils/test_transcripts.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/utils/transcripts.py

🧬 Code graph analysis (1)

tests/unit/utils/test_transcripts.py (1)

src/utils/transcripts.py (1)

construct_transcripts_path (22-32)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)
GitHub Check: e2e_tests (azure)

🔇 Additional comments (5)

src/utils/transcripts.py (2)

12-12: LGTM!

The hashlib import is appropriate for implementing SHA-256 hashing to anonymize user IDs.

26-27: LGTM!

The SHA-256 hashing correctly anonymizes the user ID in the transcripts path. UTF-8 encoding and hexdigest conversion are appropriate choices.

tests/unit/utils/test_transcripts.py (3)

3-3: LGTM!

The hashlib import is correctly added to compute expected hashed values in the test assertions.

43-50: LGTM!

The test correctly computes the expected hashed user ID and validates that the path construction uses the hashed value instead of the raw user ID.

103-111: LGTM!

The test correctly validates that the stored transcript metadata contains the hashed user ID instead of the raw value, ensuring proper anonymization.

src/utils/transcripts.py

max-svistunov · 2025-10-14T12:54:04Z

@tisnik Could you PTAL?

tisnik

LGTM, minimal + crystal clear

Anonymize user ID in transcripts

8258cff

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

src/utils/transcripts.py Outdated Show resolved Hide resolved

Use helper function for hashing user id

436755b

tisnik approved these changes Oct 14, 2025

View reviewed changes

tisnik merged commit 0896a60 into lightspeed-core:main Oct 14, 2025
18 of 20 checks passed

coderabbitai bot mentioned this pull request Feb 9, 2026

LCORE-1070: Docstrings for transcripts.py #1124

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-579 Anonymize user ID in transcripts#673

LCORE-579 Anonymize user ID in transcripts#673
tisnik merged 2 commits intolightspeed-core:mainfrom
max-svistunov:lcore-579-anonymize-user-id

max-svistunov commented Oct 14, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

max-svistunov commented Oct 14, 2025

Uh oh!

tisnik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

max-svistunov commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

max-svistunov commented Oct 14, 2025

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

max-svistunov commented Oct 14, 2025 •

edited

Loading

coderabbitai bot commented Oct 14, 2025 •

edited

Loading