Skip to content

feat: add Chromium extract methods and source mapping#521

Merged
moonD4rk merged 8 commits intofeat/v2-hbd-architecturefrom
feat/v2-extract-sources
Mar 29, 2026
Merged

feat: add Chromium extract methods and source mapping#521
moonD4rk merged 8 commits intofeat/v2-hbd-architecturefrom
feat/v2-extract-sources

Conversation

@moonD4rk
Copy link
Copy Markdown
Owner

@moonD4rk moonD4rk commented Mar 28, 2026

Summary

Closes #520 (partial — Chromium complete, Firefox extract methods in follow-up PR)

Implements per-category data extraction for Chromium browsers as standalone typed functions, with Firefox source mapping included.

New files: browser/chromium/

File Type Description
source.go Data mapping dataSource struct, chromiumSources, yandexSources, yandexQueryOverrides
decrypt.go Crypto decryptValue() — Chromium AES → DPAPI fallback
extract_password.go SQLite + decrypt []LoginEntry
extract_cookie.go SQLite + decrypt []CookieEntry
extract_creditcard.go SQLite + decrypt []CreditCardEntry
extract_history.go SQLite []HistoryEntry
extract_download.go SQLite []DownloadEntry
extract_bookmark.go JSON recursive []BookmarkEntry
extract_extension.go JSON []ExtensionEntry
extract_storage.go LevelDB []StorageEntry (localStorage + sessionStorage)

New file: browser/firefox/

File Description
source.go firefoxSources mapping (7 categories)

Testing

  • testutil_test.go — Real Chrome table schemas + insertLogin(), insertCookie(), etc. helpers
  • 16 test cases covering field mapping, sort order, Yandex query override, folder tracking, invalid paths
  • All tests create real SQLite/JSON/LevelDB fixtures with t.TempDir()

Design notes

  • Functions are standalone (not methods on Chromium struct) — Phase 8 will wire them in
  • Each extract function takes explicit masterKey, path, query parameters
  • Yandex support: query parameter defaults to standard SQL, callers pass override for Yandex
  • v20 App-Bound Encryption deferred

Test plan

  • go test ./browser/chromium/ — 16 tests pass
  • go test ./... — all tests pass
  • GOOS=darwin/windows/linux go build ./... — cross-compiles
  • golangci-lint run — 0 issues

Implement per-category data extraction for Chromium browsers as typed
standalone functions, preparing for Phase 8 wiring into the new
Chromium struct.

New files:
- source.go: dataSource struct, chromiumSources/yandexSources maps,
  yandexQueryOverrides for Yandex action_url variant
- decrypt.go: decryptValue() wrapping platform-specific decryption
- extract_password.go: SQLite + decrypt → []LoginEntry
- extract_cookie.go: SQLite + decrypt → []CookieEntry
- extract_creditcard.go: SQLite + decrypt → []CreditCardEntry
- extract_history.go: SQLite → []HistoryEntry
- extract_download.go: SQLite → []DownloadEntry
- extract_bookmark.go: JSON recursive → []BookmarkEntry
- extract_extension.go: JSON → []ExtensionEntry
- extract_storage.go: LevelDB → []StorageEntry (local + session)
- firefox/source.go: firefoxSources map

Tests use real Chrome table schemas for SQLite fixtures, with INSERT
helpers to keep test data readable and self-documenting.

Ref #520
@moonD4rk moonD4rk marked this pull request as ready for review March 28, 2026 14:51
Copilot AI review requested due to automatic review settings March 28, 2026 14:51
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 28, 2026

Codecov Report

❌ Patch coverage is 81.01852% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.55%. Comparing base (b444a09) to head (7d7ea9b).

Files with missing lines Patch % Lines
browser/chromium/decrypt.go 23.07% 10 Missing ⚠️
browser/chromium/source.go 0.00% 7 Missing ⚠️
browser/chromium/extract_cookie.go 87.09% 2 Missing and 2 partials ⚠️
browser/chromium/extract_download.go 80.00% 2 Missing and 2 partials ⚠️
browser/chromium/extract_password.go 82.60% 2 Missing and 2 partials ⚠️
browser/chromium/extract_storage.go 86.20% 2 Missing and 2 partials ⚠️
browser/chromium/extract_bookmark.go 92.85% 1 Missing and 1 partial ⚠️
browser/chromium/extract_creditcard.go 85.71% 1 Missing and 1 partial ⚠️
browser/chromium/extract_extension.go 93.54% 1 Missing and 1 partial ⚠️
browser/chromium/extract_history.go 90.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@                     Coverage Diff                      @@
##           feat/v2-hbd-architecture     #521      +/-   ##
============================================================
+ Coverage                     41.60%   43.55%   +1.94%     
============================================================
  Files                            23       35      +12     
  Lines                           959     1318     +359     
============================================================
+ Hits                            399      574     +175     
- Misses                          504      676     +172     
- Partials                         56       68      +12     
Flag Coverage Δ
unittests 43.55% <81.01%> (+1.94%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new standalone, typed extraction helpers under browser/chromium/ (plus source mapping for browser/firefox/) as part of the ongoing refactor away from the extractor registry, with accompanying fixtures/tests and lint configuration updates.

Changes:

  • Introduce Chromium category→source mappings (including Yandex overrides) and per-category extract functions for SQLite/JSON/LevelDB.
  • Add Firefox category→source mappings for the refactor path.
  • Add Chromium-focused test utilities plus unit tests for storage/password/history/download/credit-card/cookie/bookmark/extension extraction.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
browser/firefox/source.go Adds Firefox category-to-file source mapping.
browser/chromium/source.go Adds Chromium source mapping + Yandex source/query overrides.
browser/chromium/decrypt.go Adds shared decrypt helper used by Chromium extractors.
browser/chromium/extract_password.go Implements login extraction + sorting + optional query override.
browser/chromium/extract_password_test.go Tests password mapping/sort order + Yandex override behavior.
browser/chromium/extract_cookie.go Implements cookie extraction + sorting + decryption.
browser/chromium/extract_cookie_test.go Tests cookie mapping/sort/flags.
browser/chromium/extract_creditcard.go Implements credit card extraction + decryption.
browser/chromium/extract_creditcard_test.go Tests credit card field mapping.
browser/chromium/extract_history.go Implements history extraction + sorting.
browser/chromium/extract_history_test.go Tests history mapping/sort + custom query + missing file.
browser/chromium/extract_download.go Implements download extraction + sorting.
browser/chromium/extract_download_test.go Tests download mapping/sort.
browser/chromium/extract_bookmark.go Implements bookmark JSON traversal + sorting + folder tracking.
browser/chromium/extract_bookmark_test.go Tests bookmark mapping/sort + folder handling.
browser/chromium/extract_extension.go Implements extension parsing from Secure Preferences JSON.
browser/chromium/extract_extension_test.go Tests extension filtering/mapping + invalid JSON behavior.
browser/chromium/extract_storage.go Implements LevelDB iteration for local/session storage.
browser/chromium/extract_storage_test.go Tests storage parsing and invalid path behavior.
browser/chromium/testutil_test.go Adds fixture builders and real schema strings for tests.
.golangci.yml Temporarily excludes unused linter for new extract files pending wiring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +47 to +51
func parseStorageKey(key, separator []byte) (url, name string) {
parts := bytes.SplitN(key, separator, 2)
if len(parts) != 2 {
return "", ""
}
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractSessionStorage/extractLocalStorage key parsing is currently too simplistic: parseStorageKey only SplitN’s into 2 parts and doesn’t handle the key formats the existing extractors support (e.g. leading '_' + trailing '\x01' trimming, and 3-part sessionStorage keys like "map-..." / "namespace-..."). This will produce incorrect URL/key values on real Chromium LevelDB data; consider porting the key parsing logic from the current browserdata/localstorage and browserdata/sessionstorage implementations.

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +45
Value: string(iter.Value()),
})
}
return entries, iter.Error()
}

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LevelDB values are currently converted with string(iter.Value()) and there’s no max-length guard. The existing localStorage/sessionStorage extractors cap values at 2KB and normalize bytes (e.g., UTF-8 split handling) to avoid huge/binary payloads in output. Consider applying similar normalization/size limits here to prevent large memory usage and unreadable output when encountering real storage blobs.

Suggested change
Value: string(iter.Value()),
})
}
return entries, iter.Error()
}
Value: normalizeLevelDBValue(iter.Value(), 2048),
})
}
return entries, iter.Error()
}
// normalizeLevelDBValue truncates the given byte slice to at most maxLen bytes,
// avoiding splitting a UTF-8 multi-byte sequence when truncating.
func normalizeLevelDBValue(v []byte, maxLen int) string {
if len(v) == 0 {
return ""
}
if maxLen <= 0 || len(v) <= maxLen {
return string(v)
}
end := maxLen
if end > len(v) {
end = len(v)
}
// Move back from the tentative end to the start of the UTF-8 code point,
// so we don't cut in the middle of a multi-byte sequence. UTF-8 continuation
// bytes have the form 10xxxxxx (0x80 to 0xBF).
for end > 0 && (v[end] & 0xC0) == 0x80 {
end--
}
if end == 0 {
// Fallback: if we couldn't find a safe boundary, just cut at maxLen.
end = maxLen
if end > len(v) {
end = len(v)
}
}
return string(v[:end])
}

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +46
value, _ := decryptValue(masterKey, encryptedValue)
return types.CookieEntry{
Name: name,
Host: host,
Path: cookiePath,
Value: string(value),
IsSecure: isSecure != 0,
IsHTTPOnly: isHTTPOnly != 0,
ExpireAt: typeutil.TimeEpoch(expireAt),
CreatedAt: typeutil.TimeEpoch(createdAt),
}, nil
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cookie decryption is missing the Chromium-specific post-processing that the existing extractor applies: after Chromium AES decryption, some cookie values include a 32-byte prefix that needs to be stripped (see browserdata/cookie/cookie.go handling of value[32:]). Without this, extracted cookie values may be incorrect even when decryption succeeds.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +33
Username: username,
Password: string(password),
CreatedAt: typeutil.TimeEpoch(created),
}, nil
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Password CreatedAt is always derived via typeutil.TimeEpoch(created). The existing Chromium password extractor uses a heuristic (epoch vs unix timestamp) based on whether the value is greater than time.Now().Unix(). If date_created can be stored as a unix timestamp in some cases, this change would regress timestamps; consider preserving that heuristic or otherwise validating the time base before converting.

Copilot uses AI. Check for mistakes.
if err := rows.Scan(&url, &username, &pwd, &created); err != nil {
return types.LoginEntry{}, err
}
password, _ := decryptValue(masterKey, pwd)
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decryptValue errors are discarded (e.g., password, _ := decryptValue(...)), so decryption failures become silent empty strings. The previous extractors logged decryption failures at debug level; consider at least logging the error (or returning it to QueryRows so it can be logged/row-skipped) to preserve observability when masterKey is present but decryption fails.

Suggested change
password, _ := decryptValue(masterKey, pwd)
password, err := decryptValue(masterKey, pwd)
if err != nil {
return types.LoginEntry{}, err
}

Copilot uses AI. Check for mistakes.
leveldb.OpenFile creates the directory on Windows instead of returning
an error, causing TestExtractLocalStorage_InvalidPath to fail in CI.
This test was verifying LevelDB behavior, not our extraction logic.
Only extractPasswords needs the query override (Yandex action_url).
The other 7 SQLite extract functions always use their default query,
so remove the unnecessary query parameter from their signatures.
Replace try-then-fallback pattern with explicit version detection using
crypto.DetectVersion. Routes v10 to DecryptWithChromium, DPAPI to
DecryptWithDPAPI, and adds a TODO placeholder for v20 App-Bound
Encryption.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread browser/chromium/decrypt.go Outdated
Comment on lines +17 to +27
switch crypto.DetectVersion(ciphertext) {
case crypto.CipherV10:
return crypto.DecryptWithChromium(masterKey, ciphertext)
case crypto.CipherV20:
// TODO: implement App-Bound Encryption (Chrome 127+)
return nil, fmt.Errorf("v20 App-Bound Encryption not yet supported")
case crypto.CipherDPAPI:
return crypto.DecryptWithDPAPI(ciphertext)
default:
return nil, fmt.Errorf("unsupported cipher version: %s", crypto.DetectVersion(ciphertext))
}
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: crypto.DetectVersion(ciphertext) is called twice (switch + default error). Store the detected version in a local variable and reuse it to avoid duplicate work and ensure the error message matches the switch decision.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +21
var name, month, year string
var encNumber []byte
if err := rows.Scan(&name, &month, &year, &encNumber); err != nil {
return types.CreditCardEntry{}, err
}
number, _ := decryptValue(masterKey, encNumber)
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In extractCreditCards, expiration_month / expiration_year are declared as INTEGER in the real Chromium schema (and in this repo's test schema), but the code scans them into string variables. With database/sql, scanning an int64 column into *string typically fails at runtime. Scan into an integer type (e.g. int64/int) and then format to ExpMonth/ExpYear strings (or change the struct fields to ints if desired).

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +17
func extractLocalStorage(path string) ([]types.StorageEntry, error) {
return extractLevelDB(path, []byte("\x00"))
}

func extractSessionStorage(path string) ([]types.StorageEntry, error) {
return extractLevelDB(path, []byte("-"))
}
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractLocalStorage/extractSessionStorage currently assume LevelDB keys are simple url<sep>key pairs and values are plain UTF-8 strings. The existing implementation in this repo parses Chromium LocalStorage/SessionStorage keys with additional prefixes (e.g. leading '_' / META records, '\x01' suffixes, and session namespace/map formats) and also normalizes value bytes (see browserdata/localstorage and browserdata/sessionstorage). As-is, this will produce incorrect URL/key/value output against real browser LevelDB data. Port the key/value parsing logic (including META handling and UTF-8 splitting) into these new extractors, or delegate to shared helpers.

Copilot uses AI. Check for mistakes.
Comment on lines +31 to +42

value, _ := decryptValue(masterKey, encryptedValue)
return types.CookieEntry{
Name: name,
Host: host,
Path: cookiePath,
Value: string(value),
IsSecure: isSecure != 0,
IsHTTPOnly: isHTTPOnly != 0,
ExpireAt: typeutil.TimeEpoch(expireAt),
CreatedAt: typeutil.TimeEpoch(createdAt),
}, nil
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cookie decryption in the existing Chromium extractor trims a 32-byte prefix from some decrypted cookie values (after AES decryption) to handle Chrome's cookie payload format on certain versions. This new extractor returns the raw decrypted bytes as-is, which can lead to cookie values containing extra leading metadata bytes. Consider replicating the prefix-trimming behavior (only when applicable) so Value matches what users expect.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +30 to +42
}

value, _ := decryptValue(masterKey, encryptedValue)
return types.CookieEntry{
Name: name,
Host: host,
Path: cookiePath,
Value: string(value),
IsSecure: isSecure != 0,
IsHTTPOnly: isHTTPOnly != 0,
ExpireAt: typeutil.TimeEpoch(expireAt),
CreatedAt: typeutil.TimeEpoch(createdAt),
}, nil
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decryptValue() errors are ignored (value, _ := decryptValue...). That means unsupported/encrypted cookies can be returned with empty Value and no error/log signal. Consider propagating or logging the decryption error so callers can tell decryption failed (QuerySQLite will log and continue if the scan function returns an error).

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +27
if err := rows.Scan(&name, &month, &year, &encNumber); err != nil {
return types.CreditCardEntry{}, err
}
number, _ := decryptValue(masterKey, encNumber)
return types.CreditCardEntry{
Name: name,
Number: string(number),
ExpMonth: month,
ExpYear: year,
}, nil
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decryptValue() errors are ignored (number, _ := decryptValue...). This can silently produce empty card numbers (including the explicit v20 unsupported error) without any logging/return error. Consider handling the error (propagate to the scan function or log it explicitly).

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +26
func extractLevelDB(path string, separator []byte) ([]types.StorageEntry, error) {
db, err := leveldb.OpenFile(path, nil)
if err != nil {
return nil, err
}
defer db.Close()
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leveldb.OpenFile will create a new empty DB directory if the path is missing, which can mask “not found” conditions and causes unintended filesystem side effects during extraction. Consider checking the directory exists first and/or opening with options that error if missing (so callers get a proper error instead of an empty result).

Copilot uses AI. Check for mistakes.
assert.Empty(t, got)
}

func TestExtractExtensions_InvalidJSON(t *testing.T) {
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test name is misleading: the JSON in this test is valid; the failure condition is that the expected extensions/settings path is missing. Renaming the test would make intent clearer (and leave “invalid JSON” for truly malformed content).

Suggested change
func TestExtractExtensions_InvalidJSON(t *testing.T) {
func TestExtractExtensions_MissingExtensionsSettingsPath(t *testing.T) {

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +15
const defaultCreditCardQuery = `SELECT name_on_card, expiration_month, expiration_year,
card_number_encrypted FROM credit_cards`

func extractCreditCards(masterKey []byte, path string) ([]types.CreditCardEntry, error) {
return sqliteutil.QueryRows(path, false, defaultCreditCardQuery,
func(rows *sql.Rows) (types.CreditCardEntry, error) {
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractCreditCards returns results in whatever order SQLite happens to return rows (no ORDER BY and no post-sort), unlike the other extractors which enforce a deterministic sort. Consider adding an ORDER BY (e.g., date_modified desc) or sorting the slice before returning to keep output stable and avoid flaky assumptions in callers/tests.

Copilot uses AI. Check for mistakes.
- Store DetectVersion result in local variable to avoid duplicate call
- Scan credit card expiration_month/year as int then convert to string
  (matches INTEGER column type in real Chrome schema)
- Add os.Stat check before leveldb.OpenFile to prevent creating empty
  directories for non-existent paths
- Rename TestExtractExtensions_InvalidJSON to
  TestExtractExtensions_MissingSettingsPath (JSON is valid, path is missing)
modernc.org/sqlite handles INTEGER→string conversion automatically.
Scanning into string is safer for nullable columns — NULL becomes ""
instead of "0" which would be an invalid month/year.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +51 to +57
// parseStorageKey splits a LevelDB key into (url, name) by the given separator.
func parseStorageKey(key, separator []byte) (url, name string) {
parts := bytes.SplitN(key, separator, 2)
if len(parts) != 2 {
return "", ""
}
return string(parts[0]), string(parts[1])
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LevelDB key parsing is too simplistic for real Chromium storage formats. Existing extractors handle prefixes/sentinels like "META:", leading "_" on the URL part, and trim trailing "\x01" from the key (browserdata/localstorage/localstorage.go:79-97). As written, extractLocalStorage will include these raw delimiters in URL/Key (or skip entries), leading to incorrect output. Consider porting the established parsing logic (and associated trimming) into parseStorageKey / extractLevelDB.

Suggested change
// parseStorageKey splits a LevelDB key into (url, name) by the given separator.
func parseStorageKey(key, separator []byte) (url, name string) {
parts := bytes.SplitN(key, separator, 2)
if len(parts) != 2 {
return "", ""
}
return string(parts[0]), string(parts[1])
// parseStorageKey splits a LevelDB key into (url, name) by the given separator,
// applying Chromium-specific cleanup (skipping META: entries, stripping leading
// "_" from the URL part, and trimming trailing "\x01" from the name).
func parseStorageKey(key, separator []byte) (url, name string) {
// Skip metadata entries like "META:" which do not represent actual storage.
if bytes.HasPrefix(key, []byte("META:")) {
return "", ""
}
parts := bytes.SplitN(key, separator, 2)
if len(parts) != 2 {
return "", ""
}
urlBytes := parts[0]
nameBytes := parts[1]
// Chromium local/session storage keys may have a leading "_" on the URL part.
if len(urlBytes) > 0 && urlBytes[0] == '_' {
urlBytes = urlBytes[1:]
}
// Trim any trailing 0x01 sentinel bytes from the name/key part.
nameBytes = bytes.TrimRight(nameBytes, "\x01")
return string(urlBytes), string(nameBytes)

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +19
func extractSessionStorage(path string) ([]types.StorageEntry, error) {
return extractLevelDB(path, []byte("-"))
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SessionStorage LevelDB keys are not simply "-"; the existing implementation splits on '-' and supports multiple key shapes (e.g., "META:", "_-", and 3-part keys like "map--" / "namespace--" in browserdata/sessionstorage/sessionstorage.go:79-95). Using SplitN(..., 2) will mis-parse many real keys (and can break on URLs containing '-'). Please align extractSessionStorage parsing with the existing sessionstorage key format handling.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +16
dir := createTestLevelDB(t, map[string]string{
"https://example.com\x00token": "abc123",
"https://example.com\x00theme": "dark",
"https://other.com\x00session_id": "xyz789",
"noseparator": "should-be-skipped",
})
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LevelDB fixtures here use simplified keys (e.g., "https://example.com\x00token" and "https://example.com-token") that don't cover real Chromium key formats handled by the existing extractors (META: entries, leading '_' URL prefixes, trailing '\x01' key sentinels, and 3-part SessionStorage keys like "namespace-..." / "map-..."; see browserdata/localstorage/localstorage.go:79-97 and browserdata/sessionstorage/sessionstorage.go:79-95). As a result, these tests can pass even when parsing is incorrect for real browser data. Please add test cases with realistic keys to validate trimming/splitting behavior.

Copilot uses AI. Check for mistakes.
return types.CookieEntry{}, err
}

value, _ := decryptValue(masterKey, encryptedValue)
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decryptValue can return an error for unsupported/unknown cipher versions (e.g., v20 App-Bound Encryption), but extractCookies discards that error. This can silently produce empty/incorrect cookie values without alerting the caller. Please handle the decrypt error (return it, or skip the row while preserving a non-nil error signal) so unsupported encryption is observable.

Suggested change
value, _ := decryptValue(masterKey, encryptedValue)
value, err := decryptValue(masterKey, encryptedValue)
if err != nil {
return types.CookieEntry{}, err
}

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +33
password, _ := decryptValue(masterKey, pwd)
return types.LoginEntry{
URL: url,
Username: username,
Password: string(password),
CreatedAt: typeutil.TimeEpoch(created),
}, nil
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decryptValue errors are ignored here. In particular, v20 App-Bound Encryption currently returns an error (decrypt.go:21-23); dropping it means passwords can silently become empty strings. Please propagate or otherwise surface decrypt errors so callers can distinguish "no password" from "failed to decrypt".

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +36
dir := createTestLevelDB(t, map[string]string{
"https://example.com-token": "abc123",
"https://example.com-user": "alice",
})
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These SessionStorage test keys ("https://example.com-token") assume a single '-' separator, but real Chromium SessionStorage keys can contain multiple '-' segments and multiple structural formats (browserdata/sessionstorage/sessionstorage.go:79-95). Add fixtures for those formats (including URLs containing '-') to ensure extractSessionStorage doesn't split incorrectly.

Copilot uses AI. Check for mistakes.
@moonD4rk moonD4rk merged commit 1f8185b into feat/v2-hbd-architecture Mar 29, 2026
11 checks passed
moonD4rk added a commit that referenced this pull request Apr 3, 2026
* feat: add Chromium extract methods, source mapping, and tests

Implement per-category data extraction for Chromium browsers as typed
standalone functions, preparing for Phase 8 wiring into the new
Chromium struct.

New files:
- source.go: dataSource struct, chromiumSources/yandexSources maps,
  yandexQueryOverrides for Yandex action_url variant
- decrypt.go: decryptValue() wrapping platform-specific decryption
- extract_password.go: SQLite + decrypt → []LoginEntry
- extract_cookie.go: SQLite + decrypt → []CookieEntry
- extract_creditcard.go: SQLite + decrypt → []CreditCardEntry
- extract_history.go: SQLite → []HistoryEntry
- extract_download.go: SQLite → []DownloadEntry
- extract_bookmark.go: JSON recursive → []BookmarkEntry
- extract_extension.go: JSON → []ExtensionEntry
- extract_storage.go: LevelDB → []StorageEntry (local + session)
- firefox/source.go: firefoxSources map

Tests use real Chrome table schemas for SQLite fixtures, with INSERT
helpers to keep test data readable and self-documenting.

Ref #520

* fix: remove LevelDB invalid path test (Windows compatibility)

leveldb.OpenFile creates the directory on Windows instead of returning
an error, causing TestExtractLocalStorage_InvalidPath to fail in CI.
This test was verifying LevelDB behavior, not our extraction logic.

* refactor: remove unused query parameter from extract functions

Only extractPasswords needs the query override (Yandex action_url).
The other 7 SQLite extract functions always use their default query,
so remove the unnecessary query parameter from their signatures.

* refactor: use DetectVersion in decryptValue instead of blind fallback

Replace try-then-fallback pattern with explicit version detection using
crypto.DetectVersion. Routes v10 to DecryptWithChromium, DPAPI to
DecryptWithDPAPI, and adds a TODO placeholder for v20 App-Bound
Encryption.

* chore: relax gocognit and gocritic linters for test files

* revert: restore strict gocognit and gocritic linters for test files

* fix: address review feedback on extract methods

- Store DetectVersion result in local variable to avoid duplicate call
- Scan credit card expiration_month/year as int then convert to string
  (matches INTEGER column type in real Chrome schema)
- Add os.Stat check before leveldb.OpenFile to prevent creating empty
  directories for non-existent paths
- Rename TestExtractExtensions_InvalidJSON to
  TestExtractExtensions_MissingSettingsPath (JSON is valid, path is missing)

* fix: revert creditcard scan to string type for NULL safety

modernc.org/sqlite handles INTEGER→string conversion automatically.
Scanning into string is safer for nullable columns — NULL becomes ""
instead of "0" which would be an invalid month/year.
@moonD4rk moonD4rk deleted the feat/v2-extract-sources branch April 7, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants