Skip to content

YNU-768: Add Clearnode Metrics#562

Merged
dimast-x merged 1 commit intomainfrom
feat-v1/metrics
Feb 17, 2026
Merged

YNU-768: Add Clearnode Metrics#562
dimast-x merged 1 commit intomainfrom
feat-v1/metrics

Conversation

@dimast-x
Copy link
Contributor

@dimast-x dimast-x commented Feb 17, 2026

Summary by CodeRabbit

  • New Features

    • Enhanced system observability with comprehensive metrics tracking across RPC operations, app sessions, blockchain actions, and connection activity.
    • Added connection origin tracking for improved network monitoring.
  • Bug Fixes & Improvements

    • Improved quorum verification with application context integration.
  • Deprecations

    • Removed the CloseAppSession RPC method; applications should use alternative session management approaches.

@dimast-x dimast-x requested a review from a team as a code owner February 17, 2026 13:09
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

Warning

Rate limit exceeded

@dimast-x has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 43 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Comprehensive metrics instrumentation system introduced across API handlers, RPC routing, and blockchain components. New metrics package provides runtime and store metric exporters with Prometheus integration. Handler constructors updated to accept metrics exporters; quorum verification enhanced with application context; RPC routing refactored with observability middleware and connection origin tracking; database store extended with metric aggregation methods.

Changes

Cohort / File(s) Summary
Metrics Infrastructure
clearnode/metrics/interface.go, clearnode/metrics/exporter.go
New metrics package introduces RuntimeMetricExporter and StoreMetricExporter interfaces with Prometheus implementations. Provides no-op implementation for testing. Extensive metric collection across RPC, app sessions, channels, and blockchain operations.
App Session API Metrics
clearnode/api/app_session_v1/handler.go, clearnode/api/app_session_v1/create_app_session.go, clearnode/api/app_session_v1/rebalance_app_sessions.go, clearnode/api/app_session_v1/submit_app_state.go, clearnode/api/app_session_v1/submit_deposit_state.go
Updated verifyQuorum signature to include application parameter across all call sites. Handler constructor now accepts RuntimeMetricExporter and instruments signature validation with metrics callbacks.
App Session API Tests
clearnode/api/app_session_v1/*_test.go
Updated all test fixtures to inject metrics.NewNoopRuntimeMetricExporter() into NewHandler calls, wiring metrics support across all test scenarios.
Channel API Metrics
clearnode/api/channel_v1/handler.go, clearnode/api/channel_v1/request_creation.go, clearnode/api/channel_v1/submit_state.go
Handler constructor extended to accept RuntimeMetricExporter. Signature validation instrumented with IncChannelStateSigValidation metrics on both success and failure paths.
Channel API Tests
clearnode/api/channel_v1/*_test.go
Updated test fixtures to initialize Handler with metrics.NewNoopRuntimeMetricExporter().
RPC Router & Observability
clearnode/api/rpc_router.go, clearnode/api/metric_store.go, clearnode/api/utils.go
Renamed LoggerMiddleware to ObservabilityMiddleware with integrated metric recording. NewRPCRouter accepts RuntimeMetricExporter. Introduced metricStore wrapper for transactional metric buffering and getMethodPath helper for RPC path extraction.
Database Store Extensions
clearnode/store/database/interface.go, clearnode/store/database/app_session.go, clearnode/store/database/channel.go
Added CountAppSessionsByStatus and CountChannelsByStatus methods with corresponding data types for aggregated metric reporting.
Blockchain & Event Handling
clearnode/blockchain_worker.go, pkg/blockchain/evm/reactor.go
Added MetricsExporter interface and metrics field to BlockchainWorker with IncBlockchainAction instrumentation. Introduced SetOnEventProcessed callback on Reactor for blockchain event metric tracking.
Type String Representations
pkg/app/session_key_v1.go, pkg/core/channel_signer.go, pkg/core/types.go, pkg/rpc/message.go
Added String() methods to AppSessionSignerTypeV1, ChannelSignerType, ChannelStatus, and MsgType for human-readable metric labels.
RPC Connection & Origin Tracking
pkg/rpc/connection.go, pkg/rpc/connection_hub.go, pkg/rpc/node.go
Added Origin field and method to Connection interface and WebsocketConnection. Introduced ObserveConnectionsFn callback type replacing four discrete lifecycle callbacks. Updated ConnectionHub to track connections by origin and invoke metrics callback.
RPC Methods & Constants
pkg/rpc/methods.go
Removed AppSessionsV1CloseAppSessionMethod constant from exported methods.
Runtime & Integration
clearnode/runtime.go, clearnode/main.go
Extended Backbone with RuntimeMetrics and StoreMetrics fields. Added metrics initialization and wiring in InitBackbone. Main now creates metrics exporters, registers callbacks, and launches background store metrics exporter goroutine.
Deprecated Metrics
clearnode/metrics/prometheus/metrics.go
Removed legacy Metrics struct and associated UpdateChannelMetrics/UpdateAppSessionMetrics methods in favor of new metrics architecture.

Sequence Diagram

sequenceDiagram
    participant Client
    participant RPCRouter
    participant ObservabilityMiddleware
    participant Handler
    participant MetricsExporter
    participant Database

    Client->>RPCRouter: RPC Request
    RPCRouter->>ObservabilityMiddleware: Process Request
    Note over ObservabilityMiddleware: Extract method & path<br/>Record RPC message
    
    ObservabilityMiddleware->>Handler: Invoke Handler
    alt Handler Success
        Handler->>Database: Store Operation (metricStore)
        Database-->>Handler: Result
        Handler->>MetricsExporter: IncRPCRequest(method, path, true)
        Handler->>MetricsExporter: ObserveRPCDuration(method, path, true, duration)
    else Handler Error
        Handler->>MetricsExporter: IncRPCRequest(method, path, false)
        Handler->>MetricsExporter: ObserveRPCDuration(method, path, false, duration)
    end
    
    Handler-->>RPCRouter: Response
    RPCRouter-->>Client: RPC Response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

Suggested reviewers

  • nksazonov
  • philanton

🐰 Whiskers twitching with metric glee,
Observability metrics flow so free,
Quorum checks now app-aware and bright,
Prometheus counters shining in the night,
From RPC hops to blockchain blocks divine, ✨📊

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'YNU-768: Add Clearnode Metrics' clearly summarizes the main change - adding a comprehensive metrics system to Clearnode, which is the central theme across all modified and new files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat-v1/metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @dimast-x, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of the Clearnode system by integrating a new Prometheus-based metrics framework. It introduces granular tracking for various operational aspects, including RPC interactions, blockchain activities, and application state changes, alongside aggregate metrics derived from the database. This refactoring provides a more robust and detailed monitoring capability, allowing for better insights into system performance and health.

Highlights

  • Comprehensive Metrics Integration: Introduced a new, comprehensive metrics system using Prometheus, replacing the previous implementation. This includes RuntimeMetricExporter for real-time operational metrics and StoreMetricExporter for database-derived aggregate metrics.
  • RPC Observability: Implemented an ObservabilityMiddleware in the RPC router to capture detailed metrics on RPC messages, requests, and their durations, categorized by method, path, and success status. Connection lifecycle events are also now observed.
  • Transactional Metric Buffering: Developed a metricStore wrapper that buffers metric callbacks during database transactions, ensuring that metrics related to data persistence are only recorded upon successful transaction commit.
  • Blockchain and Application-Specific Metrics: Integrated metrics into blockchain workers to track action success/failure, into EVM event reactors to monitor event processing, and into application session handlers to record state updates and signature validations.
  • Enhanced Data for Metrics: Added applicationID to verifyQuorum calls and Origin tracking for RPC connections, providing richer context for metric labeling. New String() methods were added to various enums (AppSessionSignerTypeV1, ChannelSignerType, ChannelStatus, MsgType) for more descriptive metric labels.
  • Database-Driven Aggregate Metrics: Added new database queries (CountAppSessionsByStatus, CountChannelsByStatus) to enable the StoreMetricExporter to periodically report aggregate counts of application sessions and channels by their status.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • clearnode/api/app_session_v1/create_app_session.go
    • Modified verifyQuorum function signature to include applicationID for metric tracking.
  • clearnode/api/app_session_v1/create_app_session_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/app_session_v1/get_app_definition_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/app_session_v1/get_app_sessions_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/app_session_v1/handler.go
    • Imported the new metrics package.
    • Added a metrics.RuntimeMetricExporter field to the Handler struct.
    • Updated the NewHandler constructor to accept and initialize the new metrics exporter.
    • Modified the verifyQuorum function signature to include applicationID.
    • Integrated calls to h.metrics.IncAppSessionUpdateSigValidation for signature validation metrics.
  • clearnode/api/app_session_v1/rebalance_app_sessions.go
    • Modified verifyQuorum function signature to include applicationID for metric tracking.
  • clearnode/api/app_session_v1/rebalance_app_sessions_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/app_session_v1/submit_app_state.go
    • Modified verifyQuorum function signature to include applicationID for metric tracking.
  • clearnode/api/app_session_v1/submit_app_state_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/app_session_v1/submit_deposit_state.go
    • Added calls to h.metrics.IncChannelStateSigValidation for signature validation metrics.
    • Modified verifyQuorum function signature to include applicationID for metric tracking.
  • clearnode/api/app_session_v1/submit_deposit_state_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/channel_v1/get_escrow_channel_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/channel_v1/get_home_channel_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/channel_v1/get_latest_state_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/channel_v1/handler.go
    • Imported the new metrics package.
    • Added a metrics.RuntimeMetricExporter field to the Handler struct.
    • Updated the NewHandler constructor to accept and initialize the new metrics exporter.
  • clearnode/api/channel_v1/request_creation.go
    • Added calls to h.metrics.IncChannelStateSigValidation for signature validation metrics.
  • clearnode/api/channel_v1/request_creation_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/channel_v1/submit_state.go
    • Added calls to h.metrics.IncChannelStateSigValidation for signature validation metrics.
  • clearnode/api/channel_v1/submit_state_test.go
    • Imported metrics package and injected metrics.NewNoopRuntimeMetricExporter() into handler initializations for testing.
  • clearnode/api/metric_store.go
    • Added a new file metric_store.go which defines a metricStore wrapper for DatabaseStore.
    • Implemented methods (RecordTransaction, StoreUserState, UpdateAppSession, StoreChannelSessionKeyState, StoreAppSessionKeyState) to buffer metric callbacks during database transactions.
    • Added a flush() method to execute buffered callbacks upon successful transaction commit.
  • clearnode/api/rpc_router.go
    • Imported the new metrics package.
    • Added a runtimeMetrics field to the RPCRouter struct.
    • Updated NewRPCRouter to accept and initialize the runtimeMetrics exporter.
    • Replaced LoggerMiddleware with ObservabilityMiddleware to capture RPC metrics.
    • Modified transaction wrapper functions (useChannelV1StoreInTx, useAppSessionV1StoreInTx) to utilize the new metricStore for transactional metric buffering.
    • Updated handler initializations to pass the runtimeMetrics exporter.
  • clearnode/api/utils.go
    • Added a new file utils.go containing the getMethodPath helper function.
    • Implemented getMethodPath to extract specific method paths from RPC requests for metric labeling.
  • clearnode/blockchain_worker.go
    • Defined a MetricsExporter interface for blockchain actions.
    • Added a metrics field to the BlockchainWorker struct.
    • Updated NewBlockchainWorker to accept and initialize the MetricsExporter.
    • Integrated calls to w.metrics.IncBlockchainAction to record blockchain action success/failure.
  • clearnode/main.go
    • Imported prometheus and metrics packages.
    • Initialized runtimeMetrics and storeMetrics exporters at startup.
    • Passed runtimeMetrics to api.NewRPCRouter and NewBlockchainWorker.
    • Configured evm.NewReactor to use runtimeMetrics.IncBlockchainEvent as a callback for event processing.
    • Started a new goroutine runStoreMetricsExporter to periodically export store-related metrics.
  • clearnode/metrics/exporter.go
    • Added a new file exporter.go which defines runtimeMetricExporter and storeMetricExporter.
    • Implemented NewRuntimeMetricExporter to initialize various Prometheus counters, gauges, and histograms for runtime operations (user states, transactions, RPC, app sessions, blockchain actions/events, session keys).
    • Implemented NewStoreMetricExporter to initialize Prometheus gauges for aggregate store metrics (app sessions and channels by status).
    • Provided concrete methods for incrementing and observing these metrics.
  • clearnode/metrics/interface.go
    • Added a new file interface.go defining RuntimeMetricExporter and StoreMetricExporter interfaces.
    • Included a noopRuntimeMetricExporter for testing purposes.
  • clearnode/metrics/prometheus/metrics.go
    • Removed the old Prometheus metrics implementation file.
  • clearnode/store/database/app_session.go
    • Added AppSessionCount struct to represent app session counts.
    • Implemented CountAppSessionsByStatus method to query and return app session counts grouped by application and status.
  • clearnode/store/database/channel.go
    • Added ChannelCount struct to represent channel counts.
    • Implemented CountChannelsByStatus method to query and return channel counts grouped by asset and status.
  • clearnode/store/database/interface.go
    • Added CountAppSessionsByStatus and CountChannelsByStatus methods to the DatabaseStore interface.
  • pkg/app/session_key_v1.go
    • Added a String() method for the AppSessionSignerTypeV1 enum to provide human-readable labels for metrics.
  • pkg/blockchain/evm/reactor.go
    • Added an onEventProcessed field and SetOnEventProcessed method to allow external callbacks for event processing.
    • Integrated a call to onEventProcessed after each event is handled.
  • pkg/core/channel_signer.go
    • Added a String() method for the ChannelSignerType enum to provide human-readable labels for metrics.
  • pkg/core/types.go
    • Added a String() method for the ChannelStatus enum to provide human-readable labels for metrics.
  • pkg/rpc/connection.go
    • Added an Origin() method to the Connection interface and WebsocketConnection struct to retrieve connection source information.
    • Added an Origin field to WebsocketConnectionConfig for tracking connection sources.
  • pkg/rpc/connection_hub.go
    • Defined ObserveConnectionsFn type for connection observation callbacks.
    • Added observeConnections field to ConnectionHub and updated NewConnectionHub to accept it.
    • Modified Add and Remove methods to call observeConnections and track connection sources.
  • pkg/rpc/message.go
    • Added a String() method for the MsgType enum to provide human-readable labels for metrics.
  • pkg/rpc/methods.go
    • Removed the AppSessionsV1CloseAppSessionMethod RPC method.
  • pkg/rpc/node.go
    • Removed several old connection lifecycle handlers (OnConnectHandler, OnDisconnectHandler, OnMessageSentHandler, OnAuthenticatedHandler) from WebsocketNodeConfig.
    • Added ObserveConnections to WebsocketNodeConfig and passed it to NewConnectionHub for centralized connection monitoring.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/v1-push.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive metrics system using Prometheus, which is a great addition for observability. The changes are well-structured, with a new metrics package, injection of the metrics exporter into relevant components, and a clever metricStore wrapper to ensure metrics are only recorded for successful database transactions. The new middleware for RPC metrics and the background worker for store-based metrics are also well-implemented.

I have a couple of suggestions for improvement:

  • Using descriptive string labels for action_type metrics in the blockchain worker.
  • Ensuring consistency in the metric types (Counter vs. Gauge) for session key metrics.

Overall, this is a solid implementation of a crucial feature.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
pkg/rpc/node.go (1)

188-201: ⚠️ Potential issue | 🟡 Minor

Stale doc comment references removed callbacks.

Line 194 still mentions "Invokes lifecycle callbacks (OnConnect, OnDisconnect, etc.)" but these callbacks no longer exist. Update the godoc to reflect the new ObserveConnections-based mechanism.

📝 Suggested doc update
-//  4. Invokes lifecycle callbacks (OnConnect, OnDisconnect, etc.)
+//  4. Tracks connection lifecycle via ObserveConnections
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rpc/node.go` around lines 188 - 201, Update the ServeHTTP godoc to remove
references to the removed lifecycle callbacks (OnConnect, OnDisconnect) and
instead describe the new ObserveConnections-based mechanism: explain that
ServeHTTP upgrades HTTP to WebSocket, creates/manages connection IDs and state,
spawns per-connection goroutines for read/process/write/health, and notifies
connection lifecycle via ObserveConnections; reference the ServeHTTP method and
ObserveConnections so readers can locate the new flow.
clearnode/api/channel_v1/submit_state.go (1)

100-102: ⚠️ Potential issue | 🟡 Minor

Pre-existing: err is stale in the "missing user signature" error message.

Not introduced by this PR, but worth noting: on line 101, err is the result of the prior PackState call (line 94), which succeeded if execution reached here. The %v will format <nil>, producing a misleading message like "missing incoming state user signature: <nil>".

Suggested fix
-		return rpc.Errorf("missing incoming state user signature: %v", err)
+		return rpc.Errorf("missing incoming state user signature")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/api/channel_v1/submit_state.go` around lines 100 - 102, The error
message uses a stale variable `err` (from the earlier PackState call) when
checking `incomingState.UserSig == nil`, causing misleading "<nil>" output;
update the `rpc.Errorf` in the check for `incomingState.UserSig == nil` to not
reference `err` (either remove the "%v" and err entirely or include a relevant
value like an identifier from `incomingState`) so the message clearly states
"missing incoming state user signature" (locate the check around the
`incomingState.UserSig == nil` branch in submit_state.go and update the
rpc.Errorf call accordingly).
clearnode/api/app_session_v1/handler.go (1)

61-73: ⚠️ Potential issue | 🔴 Critical

Add length guard before accessing signature byte at index 0.

Line 67 accesses sigBytes[0] without checking the slice length. When hexutil.Decode("0x") returns an empty slice (without error), this causes an index-out-of-bounds panic. Although Recover() includes a length check, it's called after line 67, so the panic occurs before it's reached.

🐛 Proposed fix
 		sigBytes, err := hexutil.Decode(sigHex)
 		if err != nil {
 			return rpc.Errorf("failed to decode signature: %v", err)
 		}
+		if len(sigBytes) == 0 {
+			return rpc.Errorf("empty signature")
+		}
 
 		sigType := app.AppSessionSignerTypeV1(sigBytes[0])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/api/app_session_v1/handler.go` around lines 61 - 73, The loop over
signatures accesses sigBytes[0] before ensuring sigBytes is non-empty, which can
panic when hexutil.Decode returns an empty slice; add a length guard immediately
after decoding in the handler (the loop that decodes sigHex) to check
len(sigBytes) > 0 and return a descriptive rpc.Errorf (or mark validation
failed) if it's empty, before constructing app.AppSessionSignerTypeV1 or calling
appSessionSignerValidator.Recover; keep the existing metrics calls to
h.metrics.IncAppSessionUpdateSigValidation around Recover as-is but ensure the
early-return on empty sigBytes happens first.
🧹 Nitpick comments (12)
pkg/blockchain/evm/reactor.go (1)

58-123: Callback not invoked on the early-return path for truly unknown events (line 63–65).

When eventMapping doesn't contain the event ID, HandleEvent returns at line 65 without reaching the callback at line 121. This means events that aren't part of the contract ABI won't be reflected in metrics at all — no success or failure is recorded.

If this is intentional (only track ABI-known events), a brief comment at the early return would clarify intent. If you do want full observability, move the callback (or add a second invocation) before that early return.

Option: track unknown-ABI events as failures
 	eventName, ok := eventMapping[eventID]
 	if !ok {
 		logger.Warn("unknown event ID", "eventID", eventID.Hex(), "blockNumber", l.BlockNumber, "txHash", l.TxHash.String(), "logIndex", l.Index)
+		if r.onEventProcessed != nil {
+			r.onEventProcessed(r.blockchainID, false)
+		}
 		return
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/blockchain/evm/reactor.go` around lines 58 - 123, HandleEvent currently
returns early when eventMapping lacks the eventID, skipping the metrics
callback; update HandleEvent to invoke the onEventProcessed callback before the
early return so unknown-ABI events are recorded as failures (call
r.onEventProcessed(r.blockchainID, false) immediately before returning when
!ok), or if the omission was intentional, add a clarifying comment at that
early-return explaining why unknown events must not be tracked; reference
symbols: HandleEvent, eventMapping, eventID, onEventProcessed, and
r.blockchainID.
clearnode/api/metric_store.go (1)

64-68: Consider clearing callbacks after flush to prevent accidental double-invocation.

If flush() is ever called more than once (e.g., due to a code path change), all metric callbacks would fire again. Clearing the slice after iteration is a low-cost defensive measure.

♻️ Proposed fix
 func (s *metricStore) flush() {
 	for _, cb := range s.callbacks {
 		cb()
 	}
+	s.callbacks = nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/api/metric_store.go` around lines 64 - 68, The flush() method on
metricStore currently iterates and invokes each callback (s.callbacks) but
leaves the slice intact; change flush (the metricStore.flush method) to clear
the callbacks slice after calling them (e.g., set s.callbacks = nil or assign an
empty slice) so subsequent calls won't re-invoke the same callbacks; keep the
invocation loop as-is but add the slice-clearing step immediately after
iteration to defensively prevent double-invocation.
clearnode/api/utils.go (1)

5-28: Payload is deserialized solely to extract one field for metric labeling.

This function re-deserializes the entire request payload (marshal → unmarshal round-trip via Translate) just to extract a single sub-field. On the hot path, this adds unnecessary allocation. Consider whether the handler could pass the extracted path downstream instead, or cache the deserialized payload on the context.

That said, the current approach is pragmatic and functionally correct. The break on error correctly falls through to return "default".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/api/utils.go` around lines 5 - 28, getMethodPath currently
re-deserializes the entire request payload just to read a single field, causing
extra allocations on the hot path; update call sites or context handling so the
needed metric label is provided directly instead of round-tripping Translate:
either have the handlers that already decode the payload pass the extracted path
string into the rpc.Context (or into a new field like MethodPath) before metrics
are recorded, or cache the parsed request struct on the context so getMethodPath
can read the specific fields (AppStateUpdate.Intent, State.Transition.Type)
without calling Translate; change getMethodPath to read the pre-extracted/cached
value and fall back to "default" only if missing.
pkg/rpc/connection_hub.go (1)

72-74: observeConnections callback invoked while holding hub.mu lock.

If the callback (or anything it triggers) ever calls back into the hub (e.g., Get, Add, Remove), it will deadlock. Consider invoking the callback after releasing the lock, or document this constraint clearly.

♻️ Proposed fix for Add (similar pattern for Remove)
 func (hub *ConnectionHub) Add(conn Connection) error {
 	if conn == nil {
 		return fmt.Errorf("connection cannot be nil")
 	}
 
 	connID := conn.ConnectionID()
+	var totalConns uint32
+	var origin string
 
 	hub.mu.Lock()
-	defer hub.mu.Unlock()
 
 	if _, exists := hub.connections[connID]; exists {
+		hub.mu.Unlock()
 		return fmt.Errorf("connection with ID %s already exists", connID)
 	}
 
 	hub.connections[connID] = conn
 
 	sourceID := getSourceID(conn.Origin())
 	hub.sourceMap[sourceID]++
-	hub.observeConnections(defaultConnectionRegion, conn.Origin(), uint32(len(hub.connections)))
+	totalConns = uint32(len(hub.connections))
+	origin = conn.Origin()
+	hub.mu.Unlock()
+
+	hub.observeConnections(defaultConnectionRegion, origin, totalConns)
 
 	return nil
 }

Also applies to: 114-118

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rpc/connection_hub.go` around lines 72 - 74, observeConnections is being
called while holding hub.mu which can deadlock if the callback calls back into
the hub (Get/Add/Remove); to fix, avoid calling hub.observeConnections under the
lock: in methods that currently call observeConnections (e.g., Add and Remove
where you compute sourceID, update hub.sourceMap and then call
hub.observeConnections), capture any necessary values (sourceID, connection
origin, connection count/region) while holding hub.mu, release the lock, then
invoke hub.observeConnections with those captured values; ensure no hub state is
mutated after unlocking before the callback runs and apply the same pattern
wherever observeConnections is invoked (including the other occurrence similar
to 114-118).
clearnode/blockchain_worker.go (1)

187-187: Add String() method to BlockchainActionType for readable metric labels.

Metric labels currently use raw numeric values (fmt.Sprintf("%d", action.Type)) like "1", "10", etc. Add a String() method to BlockchainActionType to emit human-readable labels like "checkpoint", "initiateEscrowDeposit", similar to how ActionResult implements it in clearnode/metrics/exporter.go. This improves dashboard interpretability.

Applies to: Lines 187, 195

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/blockchain_worker.go` at line 187, Add a String() method on the
BlockchainActionType type that returns human-readable labels (e.g.,
"checkpoint", "initiateEscrowDeposit") similar to ActionResult's implementation
in clearnode/metrics/exporter.go, then replace uses of fmt.Sprintf("%d",
action.Type) in blockchain_worker.go (calls to w.metrics.IncBlockchainAction)
with action.Type.String() so metrics use readable labels; ensure the String()
covers all enum values and has a sensible default for unknown values.
clearnode/main.go (2)

160-195: Typo in parameter name and clean implementation.

metricExported on line 167 appears to be a typo for metricExporter.

Otherwise, the store metrics exporter implementation is clean — proper ticker lifecycle, error handling, and graceful shutdown path.

Fix typo
-	metricExported metrics.StoreMetricExporter, logger log.Logger) {
+	metricExporter metrics.StoreMetricExporter, logger log.Logger) {

And update the usage on lines 179 and 187 accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/main.go` around lines 160 - 195, Rename the parameter
metricExported to metricExporter in runStoreMetricsExporter and update all its
usages (calls to SetAppSessions and SetChannels) to use metricExporter instead
of metricExported; ensure the function signature and every reference inside the
function body (the assignments inside the ticker case) are consistently renamed
to avoid the typo.

106-106: Store metrics exporter goroutine won't receive cancellation on shutdown.

The ctx passed here is context.Background() (line 27), which is never cancelled. The goroutine's <-ctx.Done() path (line 190) will never trigger, so it leaks until process exit. This is consistent with the existing pattern for listeners/workers (line 156 TODO), but worth noting for a future cleanup where all background goroutines are wired to a cancellable context.

Suggested fix: use a cancellable context for all background goroutines
-	ctx := context.Background()
+	ctx, cancelCtx := context.WithCancel(context.Background())
+	defer cancelCtx()

Then the existing shutdown signal handling would call cancelCtx() before shutting down the servers, ensuring all background goroutines (listeners, workers, store metrics exporter) receive the cancellation signal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/main.go` at line 106, The store metrics exporter is started with a
non-cancellable root context (ctx from context.Background()), so its
<-ctx.Done() branch in runStoreMetricsExporter never triggers; change
initialization to derive a cancellable context (e.g., ctx, cancel :=
context.WithCancel(context.Background())) and pass that cancellable ctx into go
runStoreMetricsExporter(...), then ensure the existing shutdown signal handler
calls cancel() (or the shared cancelCtx()) before shutting down servers so
runStoreMetricsExporter and other background goroutines receive cancellation and
exit cleanly.
clearnode/metrics/interface.go (1)

24-24: Minor: durationSecs parameter name is misleading for a time.Duration type.

time.Duration is natively in nanoseconds. The parameter name durationSecs implies the caller should pass a value in seconds, which could lead to confusion. Consider renaming to just duration.

Rename suggestion
-	ObserveRPCDuration(method, path string, success bool, durationSecs time.Duration) // +
+	ObserveRPCDuration(method, path string, success bool, duration time.Duration) // +
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/interface.go` at line 24, The ObserveRPCDuration interface
method uses a misleading parameter name durationSecs for a time.Duration; rename
the parameter to duration in the ObserveRPCDuration(method, path string, success
bool, duration time.Duration) signature and update all implementations and
callers (methods/functions that implement ObserveRPCDuration and any calls to
ObserveRPCDuration) to use the new parameter name to avoid implying the value
should be seconds while preserving the time.Duration type.
clearnode/metrics/exporter.go (4)

278-310: Remove or relocate planning comments.

These trailing comments (lines 278–310) appear to be task-tracking or design notes rather than code documentation. They'll become stale as the implementation evolves and add noise. Consider moving them to the PR description, a design doc, or removing them entirely now that the implementation is in place.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/exporter.go` around lines 278 - 310, Remove the trailing
planning/comment block (the lines listing metrics like
`channel_state_validations_total{sig_type, result}`,
`rpc_messages_total{msg_type, method}`, `app_state_updates{application}`,
`blockchain_actions_total{asset, blockchain_id, action_type, result}`, and the
"metric worker" list) from clearnode/metrics/exporter.go and either delete it or
relocate it to the PR description or a separate design doc; ensure the file
contains only implementation-level comments (or convert to a short TODO with a
link to the external design doc if you must keep a pointer).

267-276: Consider adding a helper constructor for ActionResult from bool.

The pattern res := ActionResultFailed; if b { res = ActionResultSuccess } is repeated six times. A small helper would reduce boilerplate:

Suggested helper
+func toActionResult(ok bool) ActionResult {
+	if ok {
+		return ActionResultSuccess
+	}
+	return ActionResultFailed
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/exporter.go` around lines 267 - 276, The code repeats the
pattern `res := ActionResultFailed; if b { res = ActionResultSuccess }` across
the codebase; add a small helper function ActionResultFromBool(b bool)
ActionResult (or NewActionResultFromBool) that returns ActionResultSuccess when
b is true and ActionResultFailed otherwise, keep the existing ActionResult type
and String() method unchanged, and replace the six call sites with calls to this
new helper (referencing ActionResult and ActionResultFromBool) to remove
boilerplate and centralize the logic.

27-34: Watch for label cardinality on application, origin, and similar dynamic labels.

Labels like applicationID (line 30, 138, 142) and origin (line 132) are user/client-controlled and could grow unboundedly. High-cardinality labels are a well-known Prometheus anti-pattern that can cause memory pressure and slow queries. Ensure these dimensions are bounded in practice, or consider using an allowlist/bucketing strategy.

Also applies to: 129-132, 135-142

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/exporter.go` around lines 27 - 34, The prometheus metrics
appSessionsTotal and channelsTotal (and other metrics using labels like
"application", "origin", "asset") use user-controlled labels which can create
unbounded cardinality; update the metric design to bound those dimensions by
replacing raw IDs with a safe representation (e.g., use an allowlist of known
application names, bucketized/hashed categories, or map IDs to a limited set of
enums) before calling prometheus.NewGaugeVec or when observing values, and add
validation or mapping helpers (e.g., sanitizeApplicationLabel,
bucketOriginLabel) referenced from the metric update sites so only bounded label
values are emitted.

37-44: MustRegister panics on failure, but the constructor returns error — consider using Register instead.

Both constructors return (T, error), yet they use MustRegister which panics on duplicate or conflicting metric registration. This is inconsistent: callers expect errors to be surfaced through the return value, not via panics.

Suggested approach: use non-panicking Register
-	if reg != nil {
-		reg.MustRegister(
-			m.userStatesTotal,
-			// ...
-		)
-	} else {
-		return nil, fmt.Errorf("prometheus registerer not provided")
-	}
+	if reg == nil {
+		return nil, fmt.Errorf("prometheus registerer not provided")
+	}
+	collectors := []prometheus.Collector{
+		m.userStatesTotal,
+		// ...all collectors...
+	}
+	for _, c := range collectors {
+		if err := reg.Register(c); err != nil {
+			return nil, fmt.Errorf("registering metric: %w", err)
+		}
+	}

Also applies to: 157-176

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/exporter.go` around lines 37 - 44, The constructors
currently call reg.MustRegister which panics on registration failures; change to
use the non-panicking reg.Register and handle the returned error(s). Replace
calls to MustRegister for metrics like m.appSessionsTotal and m.channelsTotal
(and the other registration block later in the file) with Register calls, check
the error returned, and return a wrapped/formatted error from the constructor if
registration fails so failures surface via the constructor's error return
instead of panicking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/v1-push.yml:
- Line 5: The push workflow's branches trigger currently contains the feature
branch entry "feat-v1/metrics" alongside "release/v1.0.0"; remove
"feat-v1/metrics" from the branches array in the v1-push workflow so only
release/v1.0.0 remains (i.e., change the branches list from [ release/v1.0.0,
feat-v1/metrics ] to just [ release/v1.0.0 ]) and commit that change before
merging this PR to avoid accidental RC tagging and overwriting latest-rc images.

In `@clearnode/blockchain_worker.go`:
- Around line 191-195: The code currently returns early when
w.store.Complete(action.ID, txHash) fails, so the successful on-chain action
isn't recorded by w.metrics.IncBlockchainAction; update the logic so
IncBlockchainAction(state.Asset, w.blockchainID, fmt.Sprintf("%d", action.Type),
true) is called regardless of Complete() failing (i.e., call the metric before
the early return or call it in both the success and the error branch), and still
log the Complete() error via logger.Error (use w.store.Complete, logger.Error,
w.metrics.IncBlockchainAction, action.ID, txHash, action.Type, state.Asset,
w.blockchainID to locate the spots to change).

In `@clearnode/metrics/exporter.go`:
- Around line 82-83: The two metric fields channelSessionKeysTotal and
appSessionKeysTotal have a type/semantic mismatch: change both field types from
prometheus.Counter to prometheus.Gauge and ensure their initializers use
prometheus.NewGauge with prometheus.GaugeOpts (and keep the "_active" names) so
the types match the semantics; alternatively, if you decide they must be
monotonic counters, rename the metrics to use "_total" and use
prometheus.NewCounter with prometheus.CounterOpts for both—update the field
declarations and their constructors for channelSessionKeysTotal and
appSessionKeysTotal to be consistent.

In `@pkg/rpc/connection_hub.go`:
- Around line 72-74: observeConnections is being called with the global total
(uint32(len(hub.connections))) which causes SetRPCConnections' per-origin gauge
(labels region/origin) to show the global count; instead pass the per-origin
count from hub.sourceMap. Change calls to observeConnections in the Add (where
sourceID is computed and hub.sourceMap[sourceID]++) and Remove methods to use
uint32(hub.sourceMap[sourceID]) (after increment/decrement) rather than
uint32(len(hub.connections)) so the metric reflects per-origin connection
counts.

---

Outside diff comments:
In `@clearnode/api/app_session_v1/handler.go`:
- Around line 61-73: The loop over signatures accesses sigBytes[0] before
ensuring sigBytes is non-empty, which can panic when hexutil.Decode returns an
empty slice; add a length guard immediately after decoding in the handler (the
loop that decodes sigHex) to check len(sigBytes) > 0 and return a descriptive
rpc.Errorf (or mark validation failed) if it's empty, before constructing
app.AppSessionSignerTypeV1 or calling appSessionSignerValidator.Recover; keep
the existing metrics calls to h.metrics.IncAppSessionUpdateSigValidation around
Recover as-is but ensure the early-return on empty sigBytes happens first.

In `@clearnode/api/channel_v1/submit_state.go`:
- Around line 100-102: The error message uses a stale variable `err` (from the
earlier PackState call) when checking `incomingState.UserSig == nil`, causing
misleading "<nil>" output; update the `rpc.Errorf` in the check for
`incomingState.UserSig == nil` to not reference `err` (either remove the "%v"
and err entirely or include a relevant value like an identifier from
`incomingState`) so the message clearly states "missing incoming state user
signature" (locate the check around the `incomingState.UserSig == nil` branch in
submit_state.go and update the rpc.Errorf call accordingly).

In `@pkg/rpc/node.go`:
- Around line 188-201: Update the ServeHTTP godoc to remove references to the
removed lifecycle callbacks (OnConnect, OnDisconnect) and instead describe the
new ObserveConnections-based mechanism: explain that ServeHTTP upgrades HTTP to
WebSocket, creates/manages connection IDs and state, spawns per-connection
goroutines for read/process/write/health, and notifies connection lifecycle via
ObserveConnections; reference the ServeHTTP method and ObserveConnections so
readers can locate the new flow.

---

Nitpick comments:
In `@clearnode/api/metric_store.go`:
- Around line 64-68: The flush() method on metricStore currently iterates and
invokes each callback (s.callbacks) but leaves the slice intact; change flush
(the metricStore.flush method) to clear the callbacks slice after calling them
(e.g., set s.callbacks = nil or assign an empty slice) so subsequent calls won't
re-invoke the same callbacks; keep the invocation loop as-is but add the
slice-clearing step immediately after iteration to defensively prevent
double-invocation.

In `@clearnode/api/utils.go`:
- Around line 5-28: getMethodPath currently re-deserializes the entire request
payload just to read a single field, causing extra allocations on the hot path;
update call sites or context handling so the needed metric label is provided
directly instead of round-tripping Translate: either have the handlers that
already decode the payload pass the extracted path string into the rpc.Context
(or into a new field like MethodPath) before metrics are recorded, or cache the
parsed request struct on the context so getMethodPath can read the specific
fields (AppStateUpdate.Intent, State.Transition.Type) without calling Translate;
change getMethodPath to read the pre-extracted/cached value and fall back to
"default" only if missing.

In `@clearnode/blockchain_worker.go`:
- Line 187: Add a String() method on the BlockchainActionType type that returns
human-readable labels (e.g., "checkpoint", "initiateEscrowDeposit") similar to
ActionResult's implementation in clearnode/metrics/exporter.go, then replace
uses of fmt.Sprintf("%d", action.Type) in blockchain_worker.go (calls to
w.metrics.IncBlockchainAction) with action.Type.String() so metrics use readable
labels; ensure the String() covers all enum values and has a sensible default
for unknown values.

In `@clearnode/main.go`:
- Around line 160-195: Rename the parameter metricExported to metricExporter in
runStoreMetricsExporter and update all its usages (calls to SetAppSessions and
SetChannels) to use metricExporter instead of metricExported; ensure the
function signature and every reference inside the function body (the assignments
inside the ticker case) are consistently renamed to avoid the typo.
- Line 106: The store metrics exporter is started with a non-cancellable root
context (ctx from context.Background()), so its <-ctx.Done() branch in
runStoreMetricsExporter never triggers; change initialization to derive a
cancellable context (e.g., ctx, cancel :=
context.WithCancel(context.Background())) and pass that cancellable ctx into go
runStoreMetricsExporter(...), then ensure the existing shutdown signal handler
calls cancel() (or the shared cancelCtx()) before shutting down servers so
runStoreMetricsExporter and other background goroutines receive cancellation and
exit cleanly.

In `@clearnode/metrics/exporter.go`:
- Around line 278-310: Remove the trailing planning/comment block (the lines
listing metrics like `channel_state_validations_total{sig_type, result}`,
`rpc_messages_total{msg_type, method}`, `app_state_updates{application}`,
`blockchain_actions_total{asset, blockchain_id, action_type, result}`, and the
"metric worker" list) from clearnode/metrics/exporter.go and either delete it or
relocate it to the PR description or a separate design doc; ensure the file
contains only implementation-level comments (or convert to a short TODO with a
link to the external design doc if you must keep a pointer).
- Around line 267-276: The code repeats the pattern `res := ActionResultFailed;
if b { res = ActionResultSuccess }` across the codebase; add a small helper
function ActionResultFromBool(b bool) ActionResult (or NewActionResultFromBool)
that returns ActionResultSuccess when b is true and ActionResultFailed
otherwise, keep the existing ActionResult type and String() method unchanged,
and replace the six call sites with calls to this new helper (referencing
ActionResult and ActionResultFromBool) to remove boilerplate and centralize the
logic.
- Around line 27-34: The prometheus metrics appSessionsTotal and channelsTotal
(and other metrics using labels like "application", "origin", "asset") use
user-controlled labels which can create unbounded cardinality; update the metric
design to bound those dimensions by replacing raw IDs with a safe representation
(e.g., use an allowlist of known application names, bucketized/hashed
categories, or map IDs to a limited set of enums) before calling
prometheus.NewGaugeVec or when observing values, and add validation or mapping
helpers (e.g., sanitizeApplicationLabel, bucketOriginLabel) referenced from the
metric update sites so only bounded label values are emitted.
- Around line 37-44: The constructors currently call reg.MustRegister which
panics on registration failures; change to use the non-panicking reg.Register
and handle the returned error(s). Replace calls to MustRegister for metrics like
m.appSessionsTotal and m.channelsTotal (and the other registration block later
in the file) with Register calls, check the error returned, and return a
wrapped/formatted error from the constructor if registration fails so failures
surface via the constructor's error return instead of panicking.

In `@clearnode/metrics/interface.go`:
- Line 24: The ObserveRPCDuration interface method uses a misleading parameter
name durationSecs for a time.Duration; rename the parameter to duration in the
ObserveRPCDuration(method, path string, success bool, duration time.Duration)
signature and update all implementations and callers (methods/functions that
implement ObserveRPCDuration and any calls to ObserveRPCDuration) to use the new
parameter name to avoid implying the value should be seconds while preserving
the time.Duration type.

In `@pkg/blockchain/evm/reactor.go`:
- Around line 58-123: HandleEvent currently returns early when eventMapping
lacks the eventID, skipping the metrics callback; update HandleEvent to invoke
the onEventProcessed callback before the early return so unknown-ABI events are
recorded as failures (call r.onEventProcessed(r.blockchainID, false) immediately
before returning when !ok), or if the omission was intentional, add a clarifying
comment at that early-return explaining why unknown events must not be tracked;
reference symbols: HandleEvent, eventMapping, eventID, onEventProcessed, and
r.blockchainID.

In `@pkg/rpc/connection_hub.go`:
- Around line 72-74: observeConnections is being called while holding hub.mu
which can deadlock if the callback calls back into the hub (Get/Add/Remove); to
fix, avoid calling hub.observeConnections under the lock: in methods that
currently call observeConnections (e.g., Add and Remove where you compute
sourceID, update hub.sourceMap and then call hub.observeConnections), capture
any necessary values (sourceID, connection origin, connection count/region)
while holding hub.mu, release the lock, then invoke hub.observeConnections with
those captured values; ensure no hub state is mutated after unlocking before the
callback runs and apply the same pattern wherever observeConnections is invoked
(including the other occurrence similar to 114-118).

@codecov
Copy link

codecov bot commented Feb 17, 2026

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
clearnode/api/app_session_v1/handler.go (1)

62-67: ⚠️ Potential issue | 🟡 Minor

Potential panic if sigBytes is empty.

hexutil.Decode("0x") returns an empty byte slice without error. Accessing sigBytes[0] on line 67 would panic with an index-out-of-range error. Add a length check after decode.

🛡️ Proposed fix
 		sigBytes, err := hexutil.Decode(sigHex)
 		if err != nil {
 			return rpc.Errorf("failed to decode signature: %v", err)
 		}
+		if len(sigBytes) == 0 {
+			return rpc.Errorf("empty signature bytes")
+		}
 
 		sigType := app.AppSessionSignerTypeV1(sigBytes[0])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/api/app_session_v1/handler.go` around lines 62 - 67, After decoding
sigHex into sigBytes in the handler (where sigBytes is created and later used to
form app.AppSessionSignerTypeV1), add a guard that verifies len(sigBytes) > 0
and return a descriptive rpc.Errorf if it's empty to avoid an index-out-of-range
panic; locate the decode block that sets sigBytes and the subsequent use sigType
:= app.AppSessionSignerTypeV1(sigBytes[0]) and perform the length check between
those lines, returning an appropriate error when sigBytes is empty.
pkg/rpc/node.go (1)

188-201: ⚠️ Potential issue | 🟡 Minor

Stale doc comment references removed callbacks.

Line 194 still mentions "Invokes lifecycle callbacks (OnConnect, OnDisconnect, etc.)" but these callbacks have been replaced by ObserveConnections. Update the doc comment to reflect the current design.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rpc/node.go` around lines 188 - 201, Update the ServeHTTP doc comment to
remove references to the old lifecycle callbacks and describe the current
connection observation model: mention that ServeHTTP upgrades requests to
WebSocket, creates a unique connection ID and manages state, spawns goroutines
for concurrent message processing, and uses ObserveConnections to monitor
connection lifecycle and events (instead of OnConnect/OnDisconnect). Ensure the
comment references ServeHTTP and ObserveConnections so future readers can locate
the implementation and understand the design change.
🧹 Nitpick comments (10)
pkg/rpc/connection_hub.go (1)

152-155: getSourceID is a trivial pass-through.

Currently just returns its input. If there's no planned transformation, consider inlining the call to reduce indirection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rpc/connection_hub.go` around lines 152 - 155, getSourceID is a no-op
that simply returns its input; remove this indirection by deleting the
getSourceID function and replacing all calls to getSourceID(origin) with the
origin string directly (or, if you intend to preserve a hook for future
transformations, keep the function but add a clear TODO comment and unit test).
Update any callers referencing getSourceID in connection_hub.go (and elsewhere)
to use the raw origin value to eliminate the trivial wrapper.
clearnode/api/utils.go (1)

5-28: Double deserialization of the request payload for metrics labeling.

Translate re-marshals/unmarshals the payload just to extract a sub-label. This runs on every request matching these three methods, even though the actual handler will deserialize the same payload again downstream. For a metrics utility this is acceptable overhead, but if request volume grows, consider caching the result or extracting the label post-handler.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/api/utils.go` around lines 5 - 28, getMethodPath currently
re-deserializes the request payload for the three RPC methods
(rpc.AppSessionsV1SubmitAppStateMethod, rpc.ChannelsV1RequestCreationMethod,
rpc.ChannelsV1SubmitStateMethod) which duplicates work the handler will do;
instead, persist the extracted label once when the handler first deserializes
the payload and read that cached value in getMethodPath (e.g., add a
parsed-label field on rpc.Context or set a context key like "methodPathLabel"
during handler deserialization), or use a concurrent request-scoped cache keyed
by request ID to store results of translating into
rpc.AppSessionsV1SubmitAppStateRequest / rpc.ChannelsV1RequestCreationRequest /
rpc.ChannelsV1SubmitStateRequest so getMethodPath returns the cached label
rather than calling Translate again.
clearnode/metrics/exporter.go (4)

294-326: Remove leftover planning/spec comments.

These trailing comments appear to be design notes or a checklist. They should be removed or moved to a design doc / issue tracker before merging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/exporter.go` around lines 294 - 326, Remove the leftover
planning/spec comments block in exporter.go (the multi-line commented checklist
referencing metrics like `user_states_total`, `transactions_total`,
`channel_state_validations_total`, `rpc_messages_total`, `app_state_updates`,
`blockchain_actions_total`, etc.). Delete that entire commented section from the
file or move it into a design doc/issue tracker, and ensure no stray TODO/spec
comments remain in the `metrics/exporter.go` file before merging.

25-49: Consider checking reg != nil before allocating metric objects.

Both constructors create all metric objects before checking if the registerer is nil. Moving the nil check upfront avoids unnecessary allocations on the error path.

Early nil check
 func NewStoreMetricExporter(reg prometheus.Registerer) (StoreMetricExporter, error) {
+	if reg == nil {
+		return nil, fmt.Errorf("prometheus registerer not provided")
+	}
+
 	m := &storeMetricExporter{
 		// ...
 	}
 
-	if reg != nil {
-		reg.MustRegister(
-			m.appSessionsTotal,
-			m.channelsTotal,
-		)
-	} else {
-		return nil, fmt.Errorf("prometheus registerer not provided")
-	}
+	reg.MustRegister(
+		m.appSessionsTotal,
+		m.channelsTotal,
+	)
 
 	return m, nil
 }

Same pattern applies to NewRuntimeMetricExporter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/exporter.go` around lines 25 - 49, Move the prometheus
registerer nil-check to the top of NewStoreMetricExporter (and similarly
NewRuntimeMetricExporter) so you return an error immediately if reg == nil,
before constructing any prometheus.GaugeVecs; specifically, in
NewStoreMetricExporter check reg != nil first, return fmt.Errorf("prometheus
registerer not provided") if nil, and only then create m.appSessionsTotal and
m.channelsTotal and call reg.MustRegister on them.

102-116: Metric naming: _active suffix on monotonic counters is misleading.

channel_session_keys_active and app_session_keys_active use the _active suffix, which by Prometheus naming conventions implies a gauge (current snapshot). Since the interface only supports Inc(), these are monotonically increasing counters and should use the _total suffix per Prometheus naming best practices.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/exporter.go` around lines 102 - 116, The metric names use
an `_active` suffix which implies a gauge but the code treats them as monotonic
counters; update channelSessionKeysTotal (prometheus.NewCounter) to use the
metric name "channel_session_keys_total" and adjust its Help text to "Total
number of channel session keys issued"; change appSessionKeysTotal (currently
prometheus.NewGauge) to a prometheus.NewCounter, rename its metric to
"app_session_keys_total" and update its Help text to "Total number of app
session keys issued" so both follow Prometheus `_total` naming for counters.

203-206: InexactFloat64() for transaction amounts — acceptable precision trade-off?

amount.InexactFloat64() on Line 205 will lose precision for very large decimal values. This is inherent to Prometheus counters (which use float64), so it's likely acceptable for metrics purposes, but worth noting if exact financial tracking via metrics is expected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/exporter.go` around lines 203 - 206, In
runtimeMetricExporter.RecordTransaction the call to amount.InexactFloat64() can
lose precision for very large decimal amounts; update the method to explicitly
handle overflow/precision by (1) converting with a guarded conversion: compute
the maximum representable decimal for float64 and if amount > max then set value
to math.MaxFloat64 (or math.Inf(+1)) or clamp to a sensible cap, (2) optionally
round or scale the decimal (e.g., amount.Round(scale)) before conversion if you
want a defined precision, and (3) add a short comment on RecordTransaction
explaining that Prometheus counters use float64 so exact financial precision is
not preserved; reference the function name
runtimeMetricExporter.RecordTransaction and the fields transactionsAmountTotal
to locate where to apply this change.
clearnode/metrics/interface.go (2)

12-36: Well-structured metrics interface with clean subsystem grouping.

The interface is well-organized by subsystem, and the noop implementation makes it test-friendly.

One minor naming nit: the durationSecs parameter on Line 24 is typed time.Duration (not raw seconds). The concrete implementation in exporter.go calls .Seconds() internally, so the caller passes a time.Duration — consider renaming to just duration to avoid confusion.

Proposed rename
-	ObserveRPCDuration(method, path string, success bool, durationSecs time.Duration) // +
+	ObserveRPCDuration(method, path string, success bool, duration time.Duration) // +
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/interface.go` around lines 12 - 36, Rename the
ObserveRPCDuration parameter to avoid implying it's raw seconds: change the
parameter name durationSecs in the RuntimeMetricExporter interface to duration
(or simply dur) so it correctly reflects its type time.Duration; update the
interface method ObserveRPCDuration(method, path string, success bool, duration
time.Duration) and adjust any implementations (e.g., ObserveRPCDuration in
exporter.go) and all callers to use the new parameter name (ObserveRPCDuration)
to keep the semantics (callers pass a time.Duration and implementations may call
.Seconds()).

57-61: No noop provided for StoreMetricExporter.

If tests or non-production code paths ever need a StoreMetricExporter, there's no noop equivalent. If it's only used in the metric worker (not in test-wired handlers), this is fine for now — but consider adding one for symmetry if needed later.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/metrics/interface.go` around lines 57 - 61, Add a no-op
implementation of the StoreMetricExporter interface so tests and non-production
paths can use it: create an exported type (e.g., NoopStoreMetricExporter) that
implements SetAppSessions(applicationID string, status app.AppSessionStatus,
count uint64) and SetChannels(asset string, status core.ChannelStatus, count
uint64) with empty method bodies; use this noop where a metric exporter is
optional to avoid nil checks and to satisfy the interface in test wiring.
clearnode/api/metric_store.go (1)

64-68: flush() does not clear callbacks after execution.

If flush() is ever called more than once on the same metricStore instance, callbacks would fire again. This is likely safe if the store is created per-transaction, but a defensive clear would prevent accidental double-emission.

Defensive clear after flush
 func (s *metricStore) flush() {
 	for _, cb := range s.callbacks {
 		cb()
 	}
+	s.callbacks = nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/api/metric_store.go` around lines 64 - 68, The flush method on
metricStore currently invokes each callback but leaves s.callbacks intact,
allowing callbacks to fire again; modify metricStore.flush to clear the
callbacks after iterating (e.g., set s.callbacks = nil or s.callbacks =
s.callbacks[:0]) so subsequent calls do nothing, referencing the
metricStore.flush method and the s.callbacks field when making the change.
clearnode/api/rpc_router.go (1)

113-139: Metrics are recorded for ping requests but logging is skipped — verify this is intentional.

Lines 125-128 emit metrics for all requests (including ping), then lines 130-133 short-circuit logging for ping. If ping requests are high-frequency, the metrics cardinality is fine (same method label each time), but confirm this is the desired behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/api/rpc_router.go` around lines 113 - 139, ObservabilityMiddleware
currently records metrics for all requests then returns early to skip logging
for ping; decide whether ping should be excluded from metrics or just from
logging and implement accordingly: either move the ping check (c.Request.Method
== rpc.NodeV1PingMethod.String()) to the top of ObservabilityMiddleware before
calling runtimeMetrics.* to skip both metrics and logging, or keep metrics but
explicitly guard each metric call (runtimeMetrics.IncRPCMessage,
runtimeMetrics.IncRPCRequest, runtimeMetrics.ObserveRPCDuration) with a
condition to skip them for ping while leaving the existing early return for
logging; update only the ObservabilityMiddleware function to reflect the chosen
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clearnode/main.go`:
- Line 26: The current ctx := context.Background() is not cancellable so the
goroutine started by runStoreMetricsExporter never observes shutdown; change
main to create a cancellable context via ctx, ctxCancel :=
context.WithCancel(context.Background()), defer ctxCancel(), and pass that ctx
into runStoreMetricsExporter; optionally call ctxCancel() explicitly just before
initiating the shutdown sequence to ensure the store metrics goroutine sees
ctx.Done() and exits cleanly (verify runStoreMetricsExporter is selecting on
ctx.Done()).
- Line 156: Rename the misspelled parameter metricExported to metricExporter in
the function signature and update all references to that parameter (e.g., where
metricExported is used to register/export metrics) so they now use
metricExporter; ensure you update the symbol in the same function body and any
calls that pass this parameter name (search for metricExported in
clearnode/main.go and replace with metricExporter) and run a build to verify no
remaining references.

In `@clearnode/runtime.go`:
- Around line 139-142: The call to rpc.NewWebsocketNode in runtime.go assigns
(rpcNode, err) but never checks err; add an error check immediately after that
call (checking err and handling it appropriately — e.g., log via logger or
processLogger and return the error or exit) to avoid proceeding with a
nil/invalid rpcNode; locate the rpc.NewWebsocketNode invocation and ensure you
validate err before using rpcNode anywhere in the surrounding function.

In `@pkg/rpc/connection_hub.go`:
- Around line 104-118: ConnectionHub.Remove currently reports the global
connection total and leaves zero-count entries in hub.sourceMap; update Remove
to (1) decrement the per-origin count via
hub.sourceMap[getSourceID(conn.Origin())], and if the resulting count == 0
delete that key from hub.sourceMap to avoid leaks, and (2) call
hub.observeConnections with the per-origin region (use the sourceID as the
region identifier, not defaultConnectionRegion) and pass the updated per-origin
count (uint32 of the remaining count) rather than uint32(len(hub.connections));
reference ConnectionHub.Remove, hub.sourceMap, getSourceID, and
observeConnections when making the change.

---

Outside diff comments:
In `@clearnode/api/app_session_v1/handler.go`:
- Around line 62-67: After decoding sigHex into sigBytes in the handler (where
sigBytes is created and later used to form app.AppSessionSignerTypeV1), add a
guard that verifies len(sigBytes) > 0 and return a descriptive rpc.Errorf if
it's empty to avoid an index-out-of-range panic; locate the decode block that
sets sigBytes and the subsequent use sigType :=
app.AppSessionSignerTypeV1(sigBytes[0]) and perform the length check between
those lines, returning an appropriate error when sigBytes is empty.

In `@pkg/rpc/node.go`:
- Around line 188-201: Update the ServeHTTP doc comment to remove references to
the old lifecycle callbacks and describe the current connection observation
model: mention that ServeHTTP upgrades requests to WebSocket, creates a unique
connection ID and manages state, spawns goroutines for concurrent message
processing, and uses ObserveConnections to monitor connection lifecycle and
events (instead of OnConnect/OnDisconnect). Ensure the comment references
ServeHTTP and ObserveConnections so future readers can locate the implementation
and understand the design change.

---

Duplicate comments:
In `@clearnode/metrics/exporter.go`:
- Around line 83-86: The two metrics have inconsistent types/names:
appSessionKeysTotal is declared as prometheus.Counter but initialized with
prometheus.NewGauge, and channelSessionKeysTotal uses a name with "_active" but
is declared a Counter; make both consistent counters with _total naming. Change
the initialization of appSessionKeysTotal to use prometheus.NewCounter with
prometheus.CounterOpts and the name "app_session_keys_total", and update
channelSessionKeysTotal initialization to prometheus.NewCounter with the name
"channel_session_keys_total"; ensure the field declarations
(channelSessionKeysTotal, appSessionKeysTotal) remain prometheus.Counter so
types match the NewCounter returns and keep using Inc() where used.

In `@pkg/rpc/connection_hub.go`:
- Around line 72-74: The metric call to observeConnections is passing the global
connection count (uint32(len(hub.connections))) while labeling by origin,
causing each origin gauge to show the global total; update the two sites (the
creation path that uses getSourceID(conn.Origin()) and the Remove path) to pass
the per-origin count hub.sourceMap[sourceID] instead so
observeConnections(origin, ...) receives the per-origin
uint32(hub.sourceMap[sourceID]) value.

---

Nitpick comments:
In `@clearnode/api/metric_store.go`:
- Around line 64-68: The flush method on metricStore currently invokes each
callback but leaves s.callbacks intact, allowing callbacks to fire again; modify
metricStore.flush to clear the callbacks after iterating (e.g., set s.callbacks
= nil or s.callbacks = s.callbacks[:0]) so subsequent calls do nothing,
referencing the metricStore.flush method and the s.callbacks field when making
the change.

In `@clearnode/api/rpc_router.go`:
- Around line 113-139: ObservabilityMiddleware currently records metrics for all
requests then returns early to skip logging for ping; decide whether ping should
be excluded from metrics or just from logging and implement accordingly: either
move the ping check (c.Request.Method == rpc.NodeV1PingMethod.String()) to the
top of ObservabilityMiddleware before calling runtimeMetrics.* to skip both
metrics and logging, or keep metrics but explicitly guard each metric call
(runtimeMetrics.IncRPCMessage, runtimeMetrics.IncRPCRequest,
runtimeMetrics.ObserveRPCDuration) with a condition to skip them for ping while
leaving the existing early return for logging; update only the
ObservabilityMiddleware function to reflect the chosen behavior.

In `@clearnode/api/utils.go`:
- Around line 5-28: getMethodPath currently re-deserializes the request payload
for the three RPC methods (rpc.AppSessionsV1SubmitAppStateMethod,
rpc.ChannelsV1RequestCreationMethod, rpc.ChannelsV1SubmitStateMethod) which
duplicates work the handler will do; instead, persist the extracted label once
when the handler first deserializes the payload and read that cached value in
getMethodPath (e.g., add a parsed-label field on rpc.Context or set a context
key like "methodPathLabel" during handler deserialization), or use a concurrent
request-scoped cache keyed by request ID to store results of translating into
rpc.AppSessionsV1SubmitAppStateRequest / rpc.ChannelsV1RequestCreationRequest /
rpc.ChannelsV1SubmitStateRequest so getMethodPath returns the cached label
rather than calling Translate again.

In `@clearnode/metrics/exporter.go`:
- Around line 294-326: Remove the leftover planning/spec comments block in
exporter.go (the multi-line commented checklist referencing metrics like
`user_states_total`, `transactions_total`, `channel_state_validations_total`,
`rpc_messages_total`, `app_state_updates`, `blockchain_actions_total`, etc.).
Delete that entire commented section from the file or move it into a design
doc/issue tracker, and ensure no stray TODO/spec comments remain in the
`metrics/exporter.go` file before merging.
- Around line 25-49: Move the prometheus registerer nil-check to the top of
NewStoreMetricExporter (and similarly NewRuntimeMetricExporter) so you return an
error immediately if reg == nil, before constructing any prometheus.GaugeVecs;
specifically, in NewStoreMetricExporter check reg != nil first, return
fmt.Errorf("prometheus registerer not provided") if nil, and only then create
m.appSessionsTotal and m.channelsTotal and call reg.MustRegister on them.
- Around line 102-116: The metric names use an `_active` suffix which implies a
gauge but the code treats them as monotonic counters; update
channelSessionKeysTotal (prometheus.NewCounter) to use the metric name
"channel_session_keys_total" and adjust its Help text to "Total number of
channel session keys issued"; change appSessionKeysTotal (currently
prometheus.NewGauge) to a prometheus.NewCounter, rename its metric to
"app_session_keys_total" and update its Help text to "Total number of app
session keys issued" so both follow Prometheus `_total` naming for counters.
- Around line 203-206: In runtimeMetricExporter.RecordTransaction the call to
amount.InexactFloat64() can lose precision for very large decimal amounts;
update the method to explicitly handle overflow/precision by (1) converting with
a guarded conversion: compute the maximum representable decimal for float64 and
if amount > max then set value to math.MaxFloat64 (or math.Inf(+1)) or clamp to
a sensible cap, (2) optionally round or scale the decimal (e.g.,
amount.Round(scale)) before conversion if you want a defined precision, and (3)
add a short comment on RecordTransaction explaining that Prometheus counters use
float64 so exact financial precision is not preserved; reference the function
name runtimeMetricExporter.RecordTransaction and the fields
transactionsAmountTotal to locate where to apply this change.

In `@clearnode/metrics/interface.go`:
- Around line 12-36: Rename the ObserveRPCDuration parameter to avoid implying
it's raw seconds: change the parameter name durationSecs in the
RuntimeMetricExporter interface to duration (or simply dur) so it correctly
reflects its type time.Duration; update the interface method
ObserveRPCDuration(method, path string, success bool, duration time.Duration)
and adjust any implementations (e.g., ObserveRPCDuration in exporter.go) and all
callers to use the new parameter name (ObserveRPCDuration) to keep the semantics
(callers pass a time.Duration and implementations may call .Seconds()).
- Around line 57-61: Add a no-op implementation of the StoreMetricExporter
interface so tests and non-production paths can use it: create an exported type
(e.g., NoopStoreMetricExporter) that implements SetAppSessions(applicationID
string, status app.AppSessionStatus, count uint64) and SetChannels(asset string,
status core.ChannelStatus, count uint64) with empty method bodies; use this noop
where a metric exporter is optional to avoid nil checks and to satisfy the
interface in test wiring.

In `@pkg/rpc/connection_hub.go`:
- Around line 152-155: getSourceID is a no-op that simply returns its input;
remove this indirection by deleting the getSourceID function and replacing all
calls to getSourceID(origin) with the origin string directly (or, if you intend
to preserve a hook for future transformations, keep the function but add a clear
TODO comment and unit test). Update any callers referencing getSourceID in
connection_hub.go (and elsewhere) to use the raw origin value to eliminate the
trivial wrapper.

@dimast-x dimast-x force-pushed the feat-v1/metrics branch 2 times, most recently from 9e07df7 to 3cb3a07 Compare February 17, 2026 14:55
Co-authored-by: Anton Filonenko <philanton@users.noreply.github.com>
@dimast-x dimast-x merged commit cc8c276 into main Feb 17, 2026
9 of 10 checks passed
@dimast-x dimast-x deleted the feat-v1/metrics branch February 17, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants