diff --git a/README.md b/README.md index feb093b..c33c814 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,13 @@ Automated data quality review for Pull Requests and Merge Requests. -When a developer opens or updates a PR/MR touching dbt models, this action: -1. Detects changed models via `git diff` -2. Queries the Elementary MCP server for test results, active incidents, and downstream lineage +When a developer opens or updates a PR/MR, this action: +1. Sends the repository name and branch to the Elementary API +2. Elementary fetches the diff, analyses test results, active incidents, and downstream lineage 3. Posts a summary comment to the PR/MR (updates it on reruns - no spam) +**Prerequisites:** Connect your code repository in the Elementary Cloud UI. + --- ## GitHub Actions @@ -19,42 +21,27 @@ name: Elementary Data Quality Review on: pull_request: - paths: - - "models/**/*.sql" - - "models/**/*.yml" - - "dbt_project.yml" jobs: elementary-review: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 - with: - fetch-depth: 0 # required for git diff across branches - - uses: elementary-data/elementary-ci@v1 with: - anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }} + elementary-api-key: ${{ secrets.ELEMENTARY_API_KEY }} ``` ### Inputs | Input | Default | Description | |---|---|---| -| `anthropic-api-key` | required | Anthropic API key for Claude | -| `models-path` | `models/` | Path to dbt models directory | -| `diff-filter` | `ACM` | git diff filter (A=added, C=copied, M=modified) | -| `edr-version` | latest | Pin to a specific `elementary-data` version | -| `claude-model` | `claude-haiku-4-5-20251001` | Claude model ID | -| `mcp-config-path` | `.mcp.json` | Path to MCP config file | -| `base-ref` | PR base branch | Branch to diff against | +| `elementary-api-key` | required | Elementary Cloud API key | ### Required secrets | Secret | Description | |---|---| -| `ANTHROPIC_API_KEY` | Anthropic API key | -| Warehouse credentials | Whatever `edr` needs to connect (e.g. `SNOWFLAKE_PASSWORD`) | +| `ELEMENTARY_API_KEY` | Elementary Cloud API key | `GITHUB_TOKEN` is provided automatically by GitHub Actions. @@ -70,68 +57,12 @@ include: - component: gitlab.com/elementary-data/ci-components/mr-review@v1 ``` -That's it. Override inputs only if needed: - -```yaml -include: - - component: gitlab.com/elementary-data/ci-components/mr-review@v1 - inputs: - models_path: "dbt/models/" - edr_version: "0.15.0" - claude_model: "claude-sonnet-4-6" - stage: "data-quality" -``` - -### Inputs - -| Input | Default | Description | -|---|---|---| -| `stage` | `test` | Pipeline stage | -| `models_path` | `models/` | Path to dbt models directory | -| `diff_filter` | `ACM` | git diff filter | -| `edr_version` | latest | Pin to a specific `elementary-data` version | -| `claude_model` | `claude-haiku-4-5-20251001` | Claude model ID | -| `mcp_config_path` | `.mcp.json` | Path to MCP config file | -| `allow_failure` | `true` | Whether to block the MR on job failure | - ### Required CI/CD variables Set these in **Settings > CI/CD > Variables** (mark sensitive ones as masked): | Variable | Description | |---|---| -| `ANTHROPIC_API_KEY` | Anthropic API key | -| `GITLAB_API_TOKEN` | Project/group token with `api` scope | -| Warehouse credentials | Whatever `edr` needs (e.g. `SNOWFLAKE_PASSWORD`) | - ---- - -## MCP config - -Both integrations require a `.mcp.json` file checked into your repo: - -```json -{ - "mcpServers": { - "elementary": { - "command": "edr", - "args": ["run-mcp"] - } - } -} -``` +| `ELEMENTARY_API_KEY` | Elementary Cloud API key | -Adjust the `command` and `args` to match your `edr` version. Run `edr --help` to confirm the MCP subcommand name. - ---- - -## Model selection - -The `claude-model` / `claude_model` input accepts any Anthropic model ID. -See the [Anthropic models documentation](https://docs.anthropic.com/en/docs/about-claude/models) for available options. - -| Model | When to use | -|---|---| -| `claude-haiku-4-5-20251001` | Default - fast and cost-efficient for routine reviews | -| `claude-sonnet-4-6` | Richer analysis, better reasoning about complex lineage | -| `claude-opus-4-6` | Deep investigation on critical models | +`CI_JOB_TOKEN` is provided automatically. Optionally set `GITLAB_API_TOKEN` (project/group token with `api` scope) if `CI_JOB_TOKEN` lacks comment permissions. diff --git a/action.yml b/action.yml index 2e44fdc..c1b3930 100644 --- a/action.yml +++ b/action.yml @@ -1,6 +1,6 @@ name: Elementary Data Quality Review description: > - Queries the Elementary MCP server for data quality context on changed dbt models + Queries the Elementary API for data quality context on changed dbt models and posts a summary comment to the GitHub Pull Request. author: Elementary Data @@ -9,84 +9,39 @@ branding: color: blue inputs: - anthropic-api-key: - description: Anthropic API key for Claude - required: true elementary-api-key: description: Elementary Cloud API key required: true - models-path: - description: Path to the dbt models directory relative to repo root - default: "models/" - diff-filter: - description: > - git diff --diff-filter value. - ACMR = Added, Copied, Modified, Renamed (excludes deleted models). - default: "ACMR" - claude-model: - description: > - Claude model ID to use. - See https://docs.anthropic.com/en/docs/about-claude/models for available models. - default: "claude-haiku-4-5" - base-ref: - description: Base branch to diff against. Defaults to the PR base branch. - default: ${{ github.base_ref }} - mcp-config-path: - description: > - Optional path to a custom MCP config file. When set, overrides the built-in - Elementary MCP config. Use this only for non-standard or self-hosted setups. - default: "" + elementary-api-url: + description: Elementary API base URL (defaults to production) + required: false + default: "https://prod.api.elementary-data.com" outputs: review-status: description: > Whether the review completed successfully. - "success" if the review ran, "skipped" if no model changes were detected. + "success" if the review ran. value: ${{ steps.review.outputs.status }} runs: using: composite steps: - - name: Install Claude CLI - shell: bash - run: npm install -g @anthropic-ai/claude-code - - - name: Generate MCP config - shell: bash - run: | - if [ -n "${{ inputs.mcp-config-path }}" ]; then - echo "MCP_CONFIG_PATH=${{ inputs.mcp-config-path }}" >> $GITHUB_ENV - else - printf '{"mcpServers":{"elementary":{"command":"npx","args":["-y","mcp-remote@latest","https://prod.api.elementary-data.com/mcp/","--header","Authorization:${AUTH_HEADER}"],"env":{"AUTH_HEADER":"Bearer %s"}}}}\n' \ - "${{ inputs.elementary-api-key }}" > /tmp/elementary-mcp.json - echo "MCP_CONFIG_PATH=/tmp/elementary-mcp.json" >> $GITHUB_ENV - fi - - name: Run Elementary review id: review shell: bash env: - ANTHROPIC_API_KEY: ${{ inputs.anthropic-api-key }} + ELEMENTARY_API_KEY: ${{ inputs.elementary-api-key }} + ELEMENTARY_API_URL: ${{ inputs.elementary-api-url }} + REPOSITORY: ${{ github.repository }} + BRANCH: ${{ github.head_ref }} run: | - DIFF=$(git diff \ - --diff-filter=${{ inputs.diff-filter }} \ - origin/${{ inputs.base-ref }}...HEAD \ - -- ${{ inputs.models-path }}) - - if [ -z "${DIFF}" ]; then - echo "status=skipped" >> $GITHUB_OUTPUT - echo "No dbt model changes detected, skipping Elementary review." - exit 0 - fi - - export DIFF - export COMMENT_MARKER="" + export COMMENT_MARKER="" export POST_COMMENT_URL="https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments" export LIST_COMMENTS_URL="https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments" export UPDATE_COMMENT_URL_TPL="https://api.github.com/repos/${{ github.repository }}/issues/comments/{id}" export AUTH_HEADER_NAME="Authorization" export AUTH_HEADER_VALUE="Bearer ${{ github.token }}" - export CLAUDE_MODEL="${{ inputs.claude-model }}" bash "${{ github.action_path }}/scripts/review.sh" echo "status=success" >> $GITHUB_OUTPUT diff --git a/scripts/review.sh b/scripts/review.sh index 70fdb6a..ca23de7 100644 --- a/scripts/review.sh +++ b/scripts/review.sh @@ -3,75 +3,47 @@ # Called by both the GitHub Action and GitLab CI component. # # Required environment variables (set by the wrapper): -# DIFF git diff output of changed dbt models -# COMMENT_MARKER HTML marker for idempotency, e.g. +# REPOSITORY Repository identifier (e.g. "owner/repo") +# BRANCH Branch name to review +# ELEMENTARY_API_KEY Elementary account API key +# COMMENT_MARKER HTML marker for idempotency, e.g. # POST_COMMENT_URL API URL to POST a new comment # LIST_COMMENTS_URL API URL to GET existing comments (for idempotency check) # UPDATE_COMMENT_URL_TPL URL template for updating a comment, with {id} placeholder # AUTH_HEADER_NAME Header name: "Authorization" (GitHub) | "PRIVATE-TOKEN" or "JOB-TOKEN" (GitLab) # AUTH_HEADER_VALUE Header value: "Bearer " (GitHub) | "" (GitLab) -# MCP_CONFIG_PATH Path to .mcp.json -# CLAUDE_MODEL Claude model ID to use (default: claude-haiku-4-5) set -euo pipefail -if [ -z "${DIFF:-}" ]; then - echo "No dbt model changes detected, skipping Elementary review." - exit 0 +if [ -z "${ELEMENTARY_API_KEY:-}" ]; then + echo "ERROR: ELEMENTARY_API_KEY is not set." >&2 + exit 1 fi -if [ -z "${MCP_CONFIG_PATH:-}" ]; then - echo "ERROR: MCP_CONFIG_PATH is not set." >&2 +ELEMENTARY_API_URL="${ELEMENTARY_API_URL:-https://prod.api.elementary-data.com}" + +if [ -z "${REPOSITORY:-}" ] || [ -z "${BRANCH:-}" ]; then + echo "ERROR: REPOSITORY and BRANCH must be set." >&2 exit 1 fi -CLAUDE_MODEL="${CLAUDE_MODEL:-claude-haiku-4-5}" export COMMENT_FILE="/tmp/elementary-comment.md" -# Step 1: Claude generates the comment — output captured directly from stdout -claude -p " -You are a data quality reviewer for a pull/merge request. - -Git diff of changed dbt models: -\`\`\`diff -${DIFF} -\`\`\` - -Using the Elementary MCP tools available to you, follow these steps: - -0. Discover the working environment using get_environments, then scope all - subsequent queries to the relevant environment. -1. For each changed model, use get_table_asset to retrieve its metadata and - associated tests. Then use get_tests and get_test_execution_history to - fetch test results and recent execution patterns. -2. Use get_asset_incidents_history to check for active or recent data quality - incidents affecting these models. -3. Use get_downstream_assets (depth 2) to assess the blast radius of changes. - For any renamed or removed columns, also use get_column_downstream_columns - to identify column-level impact on downstream models and BI tools. -4. Summarize overall model health, test coverage, and change risk. - -Output ONLY the Markdown comment — no tool calls, no explanation, just the comment. +# Step 1: Elementary API generates the comment +RESPONSE=$(curl -sf --max-time 120 \ + -X POST \ + -H "Authorization: Bearer ${ELEMENTARY_API_KEY}" \ + -H "Content-Type: application/json" \ + --data-raw "{\"repository\": \"${REPOSITORY}\", \"branch\": \"${BRANCH}\"}" \ + "${ELEMENTARY_API_URL}/ci/review") || { + echo "ERROR: Elementary API request failed." >&2 + exit 1 +} -Format requirements: -- First line must be exactly: ${COMMENT_MARKER} -- Use proper Markdown with newlines between each section and list item -- Use ## headings for each section -- Use bullet points with a blank line between groups -- Include a test pass/fail table per model where data is available -- Call out downstream impact clearly -- If a model has no Elementary history yet, say so explicitly -- If the MCP server is unreachable, say so rather than omitting the section -- End with: _Posted by [Elementary CI](https://www.elementary-data.com)_ -" \ - --mcp-config "${MCP_CONFIG_PATH}" \ - --model "${CLAUDE_MODEL}" \ - --allowedTools "mcp__elementary__*" \ - --output-format text \ - > "${COMMENT_FILE}" +printf '%s' "${RESPONSE}" | jq -r '.comment' > "${COMMENT_FILE}" if [ ! -s "${COMMENT_FILE}" ]; then - echo "ERROR: Claude produced empty output." >&2 + echo "ERROR: Elementary API returned empty comment." >&2 exit 1 fi @@ -98,7 +70,7 @@ function api(method, reqUrl, data) { hostname: parsed.hostname, port: parsed.port, path: parsed.pathname + parsed.search, - headers: { [authName]: authValue, "Content-Type": "application/json" }, + headers: { [authName]: authValue, "Content-Type": "application/json", "User-Agent": "elementary-ci" }, }; const req = mod.request(opts, (res) => { let chunks = []; @@ -109,7 +81,12 @@ function api(method, reqUrl, data) { console.error(`HTTP ${res.statusCode} ${method} ${reqUrl}: ${text}`); return reject(new Error(`HTTP ${res.statusCode}`)); } - resolve(text ? JSON.parse(text) : {}); + try { + resolve(text ? JSON.parse(text) : {}); + } catch (e) { + console.error(`Failed to parse JSON from ${method} ${reqUrl}: ${text.slice(0, 500)}`); + reject(new Error(`Invalid JSON response from ${reqUrl}`)); + } }); }); req.on("error", reject); diff --git a/templates/mr-review.yml b/templates/mr-review.yml index fc9eb61..90abfaf 100644 --- a/templates/mr-review.yml +++ b/templates/mr-review.yml @@ -3,41 +3,20 @@ elementary-mr-review: rules: - if: $CI_PIPELINE_SOURCE == "merge_request_event" - changes: - - "models/**/*.sql" - - "models/**/*.yml" - - "dbt_project.yml" when: always - when: never allow_failure: true before_script: - - apt-get update -qq && apt-get install -y -qq git curl - - npm install -g @anthropic-ai/claude-code - - | - if [ -n "${ELEMENTARY_CI_MCP_CONFIG_PATH:-}" ]; then - export MCP_CONFIG_PATH="${ELEMENTARY_CI_MCP_CONFIG_PATH}" - else - printf '{"mcpServers":{"elementary":{"command":"npx","args":["-y","mcp-remote@latest","https://prod.api.elementary-data.com/mcp/","--header","Authorization:${AUTH_HEADER}"],"env":{"AUTH_HEADER":"Bearer %s"}}}}\n' \ - "${ELEMENTARY_API_KEY}" > /tmp/elementary-mcp.json - export MCP_CONFIG_PATH="/tmp/elementary-mcp.json" - fi + - apt-get update -qq && apt-get install -y -qq curl jq script: - | - MODELS_PATH="${ELEMENTARY_CI_MODELS_PATH:-models/}" - DIFF_FILTER="${ELEMENTARY_CI_DIFF_FILTER:-ACMR}" - - git fetch origin "${CI_MERGE_REQUEST_TARGET_BRANCH_NAME}" - - DIFF=$(git diff \ - --diff-filter=${DIFF_FILTER} \ - origin/${CI_MERGE_REQUEST_TARGET_BRANCH_NAME}...HEAD \ - -- ${MODELS_PATH}) - - export DIFF - export COMMENT_MARKER="" + export REPOSITORY="${CI_PROJECT_PATH}" + export BRANCH="${CI_MERGE_REQUEST_SOURCE_BRANCH_NAME}" + export ELEMENTARY_API_URL="${ELEMENTARY_API_URL:-https://prod.api.elementary-data.com}" + export COMMENT_MARKER="" export POST_COMMENT_URL="${CI_SERVER_URL}/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${CI_MERGE_REQUEST_IID}/notes" export LIST_COMMENTS_URL="${CI_SERVER_URL}/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${CI_MERGE_REQUEST_IID}/notes" export UPDATE_COMMENT_URL_TPL="${CI_SERVER_URL}/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${CI_MERGE_REQUEST_IID}/notes/{id}" @@ -48,7 +27,6 @@ elementary-mr-review: export AUTH_HEADER_NAME="JOB-TOKEN" export AUTH_HEADER_VALUE="${CI_JOB_TOKEN}" fi - export CLAUDE_MODEL="${ELEMENTARY_CI_CLAUDE_MODEL:-claude-haiku-4-5}" SCRIPT_TAG="${ELEMENTARY_CI_SCRIPT_TAG:-v1}" curl -fsSL \