Skip to content

Conversation

@0xBison
Copy link
Contributor

@0xBison 0xBison commented May 7, 2025

Resolves the following issues:

#523
#696

Synopsis:

Consolidates most of the config related and error related code into a single config folder. We now have a singleton that is created and validated at startup. Some initial type validations are performed at startup and there are another set of validations that run after which themself need the config values. We can add validations as we wish to the validations file. On startup failed validations will show like this:

issues

Potential improvements:

  • Currently logs aren't very prominent due to them being console.log statements. we dont have access to a ponder logger as they dont expose it. i've asked them about this and will follow up. for now it may be ok or we can add some color to validations with the chalk package to make it clear.

@0xBison 0xBison requested a review from a team as a code owner May 7, 2025 12:45
@changeset-bot
Copy link

changeset-bot bot commented May 7, 2025

🦋 Changeset detected

Latest commit: f4c48bb

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
ensindexer Minor
ensadmin Minor
ensrainbow Minor
@ensnode/ens-deployments Minor
@ensnode/utils Minor
@ensnode/ensrainbow-sdk Minor
@ensnode/ponder-metadata Minor
@ensnode/ensnode-schema Minor
@ensnode/ponder-subgraph Minor
@ensnode/shared-configs Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Contributor

vercel bot commented May 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
admin.ensnode.io ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 22, 2025 5:20pm
ensnode.io ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 22, 2025 5:20pm
ensrainbow.io ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 22, 2025 5:20pm

Copy link
Collaborator

@shrugs shrugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use zod transforms to make the schema a bit tighter? in particular where preprocess is used or where a custom error function that includes validation is used (which seems like an antipattern)? https://v4.zod.dev/api#transforms

this feels heavily llm-generated, which is fine in general but i'm worried i had to spend a lot of time reviewing these changes, in particular zod-related behavior where it's not using zod4 changes/features

Copy link
Member

@lightwalker-eth lightwalker-eth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0xBison Hey excited to see this PR, thank you! I've only partially been able to complete a review of this PR so far, but sharing initial feedback 🫡

// Add selectedEnsDeployment to the config after as it's a derived value
.transform((config) => ({
...config,
selectedEnsDeployment: ENSDeployments[config.ensDeploymentChain] as ENSDeploymentGlobalType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this spot as a nice place to include active plugin projection onto ENSIndexerConfig object.
In this very spot, we already have access to the parsed config value. This enables us to read config.plugins and availableDatasourceNames (which value is returned from Object.keys(config.selectedEnsDeployment) as DatasourceName[]; call):

const activePlugins = getActivePlugins(AVAILABLE_PLUGINS, config.plugins, availableDatasourceNames);

This approach would let us to move the declaration of const AVAILABLE_PLUGINS from ponder.config.ts into config.schema.ts.

Alternatively, if using this transform callback on the ENSIndexerConfigSchema turns out not to be the best place (however, I think it's a place for the purpose), we can move the transformation logic into buildConfigFromEnvironment function call of app-config.ts file.

Copy link
Collaborator

@shrugs shrugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks GREAT to me, well done. config.schema.ts feels lovely as well and reads so cleanly. excited to have this in the codebase

* indexedChains object which is ensured by the validateChainConfigs
* function in `validations.ts` and not part of the schema validation here.
*/
indexedChains: Record<number, ChainConfig>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: chainConfigs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this, @lightwalker-eth what are your thoughts, as this was previously chains and you suggested we change it to indexedChains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question -- my understanding is that each ChainConfig in this field is actively being indexed based on the plugins / ensDeploymentChain / ensDeploymentChain.

In other words, let's imagine environment variables were defined with RPC endpoints for mainnet and also for base, however based on the configuration of plugins / ensDeploymentChain / ensDeploymentChain, only the subgraph plugin is activated and therefore the only indexedChain would be mainnet and that base would be excluded from this field.

This distinction is why I was thinking it would be best to name this indexedChains as each of these ChainConfigs should be guaranteed to be actively indexed by the ENSIndexer. This is important for cases like the indexing status page we're building in ENSAdmin, etc.

Appreciate your advice and also if we should refine the docs / validations for this field at all based on this idea 👍

Copy link
Collaborator

@shrugs shrugs May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently this property in the config defines "the set of chain configurations (rpcEndpoint & maxRequests) provided by the environment" and isn't filtered by those that are explicitly indexed.

to derive "the set of all actively indexed chains" in ensadmin, for example, the networks property of the ponder config should be used, as that's the actual set of chains ponder is indexing at runtime. i.e. the plugins determine which networks they require by specifying as such in the networks property of their ponder config.

i don't think it's important for this property in the ENSIndexerConfig (current: indexedChains, proposed: chainConfigs) to only represent explicitly indexed chains, and i think the fact that it represents how the user is configuring ENSIndexer is ideal, actually. the decision making of which chains are actively indexed is done by each of the active plugins, not the ENSIndexerConfig, which is just providing the configuration options for each of the possible networks ensindexer can index from.

so i'm still in favor of chainConfigs or rpcConfigs (and possibly even changing the object to { url: string, maxRequestsPerSecond: number } for minimalism) and don't think any additional re-architecture of this parameter is necessary. and the documentation should be updated to read more like "Configuration options for each indexable chain's rpc"...

Copy link
Member

@lightwalker-eth lightwalker-eth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0xBison Really super work here! 🚀 This is a partial review but about to hop on some long calls and didn't want to further delay sharing some feedback 🫡

* indexedChains object which is ensured by the validateChainConfigs
* function in `validations.ts` and not part of the schema validation here.
*/
indexedChains: Record<number, ChainConfig>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question -- my understanding is that each ChainConfig in this field is actively being indexed based on the plugins / ensDeploymentChain / ensDeploymentChain.

In other words, let's imagine environment variables were defined with RPC endpoints for mainnet and also for base, however based on the configuration of plugins / ensDeploymentChain / ensDeploymentChain, only the subgraph plugin is activated and therefore the only indexedChain would be mainnet and that base would be excluded from this field.

This distinction is why I was thinking it would be best to name this indexedChains as each of these ChainConfigs should be guaranteed to be actively indexed by the ENSIndexer. This is important for cases like the indexing status page we're building in ENSAdmin, etc.

Appreciate your advice and also if we should refine the docs / validations for this field at all based on this idea 👍

rpcMaxRequestsPerSecond: number;
}

export interface GlobalBlockrange {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about an invariant if both startBlock and endBlock are defined? Should endBlock be enforced to be >= startBlock, or more strictly > startBlock?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe that's handled on apps/ensindexer/src/config/config.schema.ts line 56

.refine(
      (val) =>
        val.startBlock === undefined ||
        val.endBlock === undefined ||
        val.startBlock <= val.endBlock,
      { error: "END_BLOCK must be greater than or equal to START_BLOCK." },
    );

BUT i just tested this with ponder and if start and end blocks are the same, nothing is indexed, even though it feels like just that exact block should be indexed. will log bug with ponder but in the meantime we should switch this comparison to be just < and update the error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool. Will be good to also refine the docs we put on this type to fully define all the expected invariants.

rpcMaxRequestsPerSecond: number;
}

export interface GlobalBlockrange {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just call this Blockrange since it looks to be a generic idea modeled by this type. What do you think?

To avoid a possible misunderstanding, I am not suggesting to rename the globalBlockrange field on ENSIndexerConfig, only asking about the GlobalBlockrange interface.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me, actually we should use re-use the Blockrange type from apps/ensindexer/src/lib/types.ts here

/**
* Configuration for a single blockchain network (chain) used by ENSIndexer.
*
* Invariants:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest remove the documentation for this key / chainId out of the ChainConfig interface. I see it is already documented on the indexedChains field 👍


/**
* The maximum number of RPC requests per second allowed for this chain.
* This is used to avoid rate limiting by the RPC provider. This value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest moving the docs for the following idea out of the ChainConfig interface and instead to wherever we are building a ChainConfig instead.

This value is optional and defaults to DEFAULT_RPC_RATE_LIMIT if not specified.

It can cause confusion to document this idea here because the field on this interface is not optional.

return clonedObj;
}

// Default, non-exported mock configuration template
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest building this from the environment interface you created for a config.

In other words, all instances of an ENSIndexerConfig, including this mock instance, should be guaranteed to always go through the validation logic we're building.

This will be helpful as over time we'll continue evolving the invariants and schema. It's good that our unit tests fail here if we forgot something.

@@ -93,7 +74,15 @@ export function getActivePlugins<PLUGIN extends ENSIndexerPlugin>(

if (!hasRequiredDatasources) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, this check should happen as part of building and validating an ENSIndexerConfig and not here.

It should be impossible to have an ENSIndexerConfig that fails this test anywhere in our code.

Please see several other related comments I've shared on this PR.

availablePlugins: readonly PLUGIN[],
requestedPluginNames: string[],
requestedPluginNames: Set<PluginName>,
availableDatasourceNames: DatasourceName[],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include these DatasourceName values as a field in the config object too? I feel like the config we build should include a data model that's aligned with the plugin -> data source relationship.

Appreciate if this is too complex for this PR. If it is, but you think this is a nice idea, can we please create a follow up GitHub Issue to remember the idea for the future? Thanks

// all instead of throwing an error here which will only flag a single missing RPC URL.
// The code which does that validation is the `validateChainConfigs` function in
// `src/config/validations.ts`.
transport: http(config.indexedChains[chainId]?.rpcEndpointUrl),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should first validate that config.indexedChains[chainId] is defined. If it isn't, it should throw an exception.

After validating that, there should be no need for the ? here.

// throw an error so we don't need to handle undefined here with a default.
// That is already handled in our schema validation so when the RPC url is added
// this will not be undefined.
maxRequestsPerSecond: config.indexedChains[chainId]?.rpcMaxRequestsPerSecond,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar feedback here as above.

Copy link
Member

@lightwalker-eth lightwalker-eth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0xBison Once again nice work here 👍 Finished my review with additional feedback. Appreciate your advice! Thanks 🫡

import { parse as parseConnectionString } from "pg-connection-string";
import { z } from "zod/v4";

export const DEFAULT_RPC_RATE_LIMIT = 50;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we update this to something like 500. As I understand, 50 will cause quite a slow indexing result.

return `\n` + issues.map((issue) => issue.message).join("\n");
}

// loads the relevant environment variables in the shape of the zod schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// loads the relevant environment variables in the shape of the zod schema
// loads the relevant environment variables in the shape of a ENSIndexerEnvironment

Is that fair?

*
* This function returns a raw chain config which is not yet validated against the zod schema.
*/
export function getChainsFromEnv(): Record<number, RawChainConfig> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe no need to export?

* This function then validates the config against the zod schema ensuring that the config
* meets all type checks and invariants.
*
* **Note:** This function does **not** perform deep validation of the configuration values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate your advice for what we would need to do so that this function would perform this deep validation?

/**
* Builds the ENSIndexer configuration object from an ENSIndexerEnvironment object
*
* This function then validates the config against the zod schema ensuring that the config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand zod is an implementation detail of this function and therefore it would be best for us not to mention it in the comments here.

In other words, in theory we should be able to swap out zod for a different parsing / validation framework without it impacting "external" code. As I understand it, zod is an internal implementation detail for how an ENSIndexerInvironment is converted into an ENSIndexerConfig.

Copy link
Member

@lightwalker-eth lightwalker-eth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0xBison @shrugs Looks great! Appreciate all of the special efforts here! 🫡 🚀 🚀 Let's ship it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants