This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
llm_eval_ruby is a Ruby gem that provides LLM evaluation functionality through two main features:
- Prompt Management: Fetch and compile prompts using Liquid templating
- Tracing: Track LLM calls with traces, spans, and generations
The gem supports two backend adapters:
- Langfuse: Cloud-based prompt and trace management via API
- Local: File-based storage for prompts and traces
bundle exec rspec # Run all tests
bundle exec rspec spec/path_spec.rb # Run specific test filebundle exec rubocop # Run RuboCop linter
bundle exec rubocop -a # Auto-correct offensesbundle exec rake build # Build the gem
bundle exec rake install # Install locally
bundle exec rake release # Build, tag, and push to RubyGemsbundle exec rake # Runs both spec and rubocopConfiguration (lib/llm_eval_ruby/configuration.rb)
- Global configuration via
LlmEvalRuby.configure - Attributes:
adapter(:langfuseor:local),langfuse_options,local_options
Adapter Pattern The gem uses an adapter pattern to support multiple backends:
- Prompt Adapters:
PromptAdapters::Base→PromptAdapters::Langfuse/PromptAdapters::Local - Trace Adapters:
TraceAdapters::Base→TraceAdapters::Langfuse/TraceAdapters::Local
Prompt Repositories (lib/llm_eval_ruby/prompt_repositories/)
Text: Single text promptsChat: Multi-message chat prompts (system, user, assistant roles)- Methods:
fetch(name:, version:)andfetch_and_compile(name:, variables:, version:)
Prompt Types (lib/llm_eval_ruby/prompt_types/)
Base: Abstract base class withroleandcontentSystem,User,Assistant: Role-specific prompt typesCompiled: Rendered prompt with Liquid variables substituted
Liquid Templating All prompts support Liquid template syntax for variable interpolation. Variables are deep stringified before rendering.
Tracer (lib/llm_eval_ruby/tracer.rb)
- Class methods:
trace(...),span(...),generation(...),update_generation(...) - Each method instantiates a Tracer with the configured adapter and delegates to it
- Supports block syntax for automatic timing and result capture
Trace Hierarchy
- Trace: Top-level container (e.g., a user request)
- Span: A step within a trace (e.g., data preprocessing)
- Generation: An LLM API call within a trace or span
Observable Module (lib/llm_eval_ruby/observable.rb)
Include this module in classes to automatically trace methods via the observe decorator:
observe :method_name→ wraps as traceobserve :method_name, type: :span→ wraps as spanobserve :method_name, type: :generation→ wraps as generation- Requires instance variable
@trace_idto link traces - Automatically deep copies and sanitizes inputs (truncates base64 images)
API Client (lib/llm_eval_ruby/api_clients/langfuse.rb)
- HTTParty-based client for Langfuse API
- Endpoints:
fetch_prompt,get_prompts,create_trace,create_span,create_generation, etc. - All trace operations use the
/ingestionendpoint with batched events - Traces support upsert by ID (create or update based on ID presence)
Serializers (lib/serializers/)
PromptSerializer: Converts prompt objects for APITraceSerializer: Converts trace objects for APIGenerationSerializer: Converts generation objects with usage metadata
File Structure Prompts are stored in directories named after the prompt:
app/prompts/
├── my_chat_prompt/
│ ├── system.txt
│ ├── user.txt
│ └── assistant.txt (optional)
└── my_text_prompt/
└── user.txt
- Adapter Selection: Determined at runtime based on
LlmEvalRuby.config.adapter - Custom Client Support: Langfuse adapters support custom client injection via
client:parameterLlmEvalRuby::Tracer.new(adapter: :langfuse, client: custom_client)LlmEvalRuby::PromptRepositories::Text.new(adapter: :langfuse, client: custom_client)- If no client is provided, uses default from
langfuse_optionsconfig - Local adapter does not use clients
- Prompt Versioning: Only supported by Langfuse adapter; local adapter ignores version parameter
- Trace IDs: Must be manually managed when using Observable pattern via
@trace_id - Deep Copy: Observable module deep copies inputs to prevent mutation; handles Marshal-incompatible objects gracefully
- Base64 Sanitization: Automatically truncates base64-encoded images in traced inputs to 30 characters
- Ruby Version: Requires Ruby >= 3.3.0
httparty(~> 0.22.0): HTTP client for Langfuse APIliquid(~> 5.5.0): Template rendering engine