LCORE-579 Anonymize user ID in transcripts#673
Conversation
WalkthroughHash user_id with SHA-256 for transcript path construction and stored metadata; tests updated to expect hashed user_id. No public API or function signatures changed. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant Utils as transcripts.py
participant Hash as hashlib
participant Storage as Filesystem/Store
Caller->>Utils: store_transcript(user_id, content, metadata)
Utils->>Hash: _hash_user_id(user_id) — sha256 → hex
Hash-->>Utils: hashed_user_id
Utils->>Utils: construct path with hashed_user_id<br/>replace metadata.user_id with hashed_user_id
Utils->>Storage: write transcript(content, metadata, path)
Storage-->>Utils: ack/success
Utils-->>Caller: result/status
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/utils/transcripts.py(3 hunks)tests/unit/utils/test_transcripts.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/unit/utils/test_transcripts.pysrc/utils/transcripts.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/utils/test_transcripts.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/utils/test_transcripts.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/utils/transcripts.py
🧬 Code graph analysis (1)
tests/unit/utils/test_transcripts.py (1)
src/utils/transcripts.py (1)
construct_transcripts_path(22-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (5)
src/utils/transcripts.py (2)
12-12: LGTM!The
hashlibimport is appropriate for implementing SHA-256 hashing to anonymize user IDs.
26-27: LGTM!The SHA-256 hashing correctly anonymizes the user ID in the transcripts path. UTF-8 encoding and hexdigest conversion are appropriate choices.
tests/unit/utils/test_transcripts.py (3)
3-3: LGTM!The
hashlibimport is correctly added to compute expected hashed values in the test assertions.
43-50: LGTM!The test correctly computes the expected hashed user ID and validates that the path construction uses the hashed value instead of the raw user ID.
103-111: LGTM!The test correctly validates that the stored transcript metadata contains the hashed user ID instead of the raw value, ensuring proper anonymization.
|
@tisnik Could you PTAL? |
tisnik
left a comment
There was a problem hiding this comment.
LGTM, minimal + crystal clear
Description
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Refactor
Tests