Thanks for considering contributing to Soda Core's library of supported data sources!
To make a data source available to our user community, we require that you provide the following:
- a working data source
- locally - provide a command to launch a docker container with the data source, either a Docker file or docker-compose file.
- in the cloud - provide a service account to connect to.
- a Python library for the data source connection. Usually, you need to install an existing official connector library.
- a data source package that handles the following:
- get connection properties
- connect to the data source
- access any data source-specific code to ensure full support
Datasource file and folder structure
- The package goes to
soda/xy, following the same structure as other datasource packages. - The main file is
soda/xy/soda/data_sources/xy_data_source.pywith aXyDataSource(DataSource)class.
Basic code in the data source class
- Implement the
__init__method to retrieve and save connection properties. - Implement the
connectmethod that returns a PEP 249-compatible connection object.
Required overrides
- Type mappings; refer to the base DataSource class comments for more detail.
SCHEMA_CHECK_TYPES_MAPPINGSQL_TYPE_FOR_CREATE_TABLE_MAPSQL_TYPE_FOR_SCHEMA_CHECK_MAPNUMERIC_TYPES_FOR_PROFILINGTEXT_TYPES_FOR_PROFILING
safe_connection_data()method
Optional overrides, frequent
sql_get_table_names_with_count()- SQL query to retrieve all tables and their respective counts. This is usually data source-specific.default_casify_*()- indicates any default case manipulation that a data source does when retrieving respective identifiers.- Table/column metadata methods
column_metadata_columns()column_metadata_catalog_column()sql_get_table_names_with_count()
- Regex support
escape_regex()orescape_string()to ensure correct regex formatting.regex_replace_flags()- for data sources that support regex replace flags; for example,gforglobal.
- Identifier quoting -
quote_*()methods handle identifier quoting;qualified_table_name()creates a fully-qualified table name.
Optional overrides, infrequent
- Any of the
sql_*methods when a particular data source needs a specific query to get a desired result.
Further considerations
- How are schemas (or the equivalent) handled? Can they be set globally for the connection, or do they need to be prefixed in all the queries?
Required tests
- Create a
soda/xy/tests/text_xy.pyfile withtest_xy()method. Use this file for any data source-specific tests. - Implement
XyDataSourceFixturefor everything related to tests:_build_configuration_dict()- connection configuration the tests use_create_schema_if_not_exists_sql()/_drop_schema_if_exists_sql- DDL to create or drop a new schema or database
To test the data source
- Create an
.envfile based on.env.exampleand add the appropriate variables for the data source. - Change the
test_data_sourcevariable to the data source you are testing. - Run the tests using
pytest.