Context
In PR #1895 (Add a dedicated OpenAI-compatible LLM adapter), the _record_usage method in unstract/sdk1/src/unstract/sdk1/llm.py was updated to fall back to prompt_tokens=0 when:
- The provider does not return
usage.prompt_tokens in its response, and
- LiteLLM's
token_counter() raises (e.g., for unmapped custom/OpenAI-compatible models).
A warning is logged in this case, but the recorded usage payload contains prompt_llm_token_count=0 with no flag to distinguish "estimation failed / data unavailable" from "the prompt genuinely consumed zero tokens".
This was acknowledged as a known limitation and deferred from PR #1895 because adding a provenance field requires a wider end-to-end contract change across the usage pipeline beyond sdk1.
Note (post-#1929): The usage path has changed — _record_usage no longer calls Audit().push_usage_data. Usage now flows through self._pending_usage → worker bulk_create_usage. The core problem remains unchanged.
Relevant discussion: #1895 (comment)
Problem
Downstream consumers of the usage data (cost attribution, analytics, billing) cannot distinguish:
- Missing data — token estimation failed for an unmapped model
- Genuinely zero — the prompt actually consumed zero tokens
This silently understates prompt-token consumption in cost attribution and analytics for long-running workloads against OpenAI-compatible endpoints that do not return usage.prompt_tokens.
Proposed Solution
Add a provenance / sentinel field to the usage payload, for example:
prompt_tokens_source: an enum/string such as "provider", "estimated", or "unknown"
- or
estimation_failed: a boolean flag
This would require changes to (re-anchored against the new shape post-#1929):
unstract/sdk1/src/unstract/sdk1/llm.py — populate the provenance field in _record_usage before appending to self._pending_usage
- The OSS
Usage model — extend the schema to include the provenance field
- The worker
bulk_create_usage path — consume and persist the new field
- Downstream platform/usage-record schema — surface and store the new field
Optionally, increment an ops metric/counter when the fallback path is triggered so operations teams can detect silent drift without parsing logs.
References
Context
In PR #1895 (Add a dedicated OpenAI-compatible LLM adapter), the
_record_usagemethod inunstract/sdk1/src/unstract/sdk1/llm.pywas updated to fall back toprompt_tokens=0when:usage.prompt_tokensin its response, andtoken_counter()raises (e.g., for unmapped custom/OpenAI-compatible models).A warning is logged in this case, but the recorded usage payload contains
prompt_llm_token_count=0with no flag to distinguish "estimation failed / data unavailable" from "the prompt genuinely consumed zero tokens".This was acknowledged as a known limitation and deferred from PR #1895 because adding a provenance field requires a wider end-to-end contract change across the usage pipeline beyond sdk1.
Relevant discussion: #1895 (comment)
Problem
Downstream consumers of the usage data (cost attribution, analytics, billing) cannot distinguish:
This silently understates prompt-token consumption in cost attribution and analytics for long-running workloads against OpenAI-compatible endpoints that do not return
usage.prompt_tokens.Proposed Solution
Add a provenance / sentinel field to the usage payload, for example:
prompt_tokens_source: an enum/string such as"provider","estimated", or"unknown"estimation_failed: a boolean flagThis would require changes to (re-anchored against the new shape post-#1929):
unstract/sdk1/src/unstract/sdk1/llm.py— populate the provenance field in_record_usagebefore appending toself._pending_usageUsagemodel — extend the schema to include the provenance fieldbulk_create_usagepath — consume and persist the new fieldOptionally, increment an ops metric/counter when the fallback path is triggered so operations teams can detect silent drift without parsing logs.
References
Audit().push_usage_datatoself._pending_usage→bulk_create_usage