LCORE-493: auth e2e tests hardening by radofuchs · Pull Request #494 · lightspeed-core/lightspeed-stack

radofuchs · 2025-09-03T08:59:39Z

Description

auth e2e tests hardening by adding health check after docker restart

Type of change

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Tests
- Improved end-to-end test stability by waiting for container health instead of fixed delays.
- Added container-aware utilities with optional cleanup to streamline configuration switching during test runs.
- Introduced retry logic and timeouts to reduce flakiness when containers start slowly.
- Enhanced diagnostics with clearer messages on health check failures to aid investigation.

coderabbitai · 2025-09-03T08:59:47Z

Walkthrough

Adds wait_for_container_health(container_name, max_attempts=3) to E2E test utilities and updates switch_config_and_restart to accept container_name and cleanup; after restart it now polls Docker health until "healthy" (with timeouts/retries and error handling) instead of a fixed sleep.

Changes

Cohort / File(s)	Summary
E2E Test Utilities `tests/e2e/utils/utils.py`	Added `wait_for_container_health(container_name: str, max_attempts: int = 3) -> None` which polls `docker inspect` with a 10s subprocess timeout, retries with 5s sleeps, and handles `CalledProcessError`/`TimeoutExpired` while logging attempts. Updated `switch_config_and_restart(original_file: str, replacement_file: str, container_name: str, cleanup: bool = False) -> str` to accept `container_name` and `cleanup` and to call `wait_for_container_health` after restarting the container instead of sleeping.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Tester
  participant Switch as switch_config_and_restart
  participant Docker as Docker Engine
  participant Health as wait_for_container_health

  Tester->>Switch: switch_config_and_restart(orig, repl, container_name, cleanup?)
  Switch->>Docker: Copy replacement config & restart container
  Switch->>Health: wait_for_container_health(container_name, max_attempts=3)
  loop up to 3 attempts
    Health->>Docker: docker inspect --format {{.State.Health.Status}} (10s timeout)
    alt status == "healthy"
      Health-->>Switch: healthy → return
    else status != "healthy" or error/timeout
      Health-->>Health: log attempt, sleep 5s, retry
    end
  end
  Note over Health,Switch: After final failed attempt, logs timeout/error and returns

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

LCORE-166: add healthcheck for llama stack and lightspeed stack to docker compose #416 — Adds Docker healthcheck definitions for services, which pairs with these tests that poll container health.

Suggested reviewers

tisnik

Poem

hop hop — I wait for the green,
I poke the shell, I listen keen.
A tiny timeout, a patient sniff,
health checks pass — the tests can whiff.
Config swapped, the pipeline's seen. 🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/e2e/utils/utils.py (2)

68-74: Fix backup semantics; align docstring and logic. Always create backup; let cleanup only delete it.

Skipping backup creation when cleanup=True risks losing the ability to restore and contradicts the docstring.

 def switch_config_and_restart(
     original_file: str,
     replacement_file: str,
     container_name: str,
     cleanup: bool = False,
 ) -> str:
@@
-        cleanup: If True, remove the backup file after restoration (default: False)
+        cleanup: If True, remove the backup file at the end of this call (default: False)
@@
-    if not cleanup and not os.path.exists(backup_file):
+    if not os.path.exists(backup_file):
         try:
             shutil.copy(original_file, backup_file)

Also applies to: 80-81, 87-93

100-111: Add a timeout to docker restart and separate exception handling.

Without a timeout, tests can hang indefinitely; combine with the new health wait for robust readiness.

     # Restart container
     try:
         subprocess.run(
             ["docker", "restart", container_name],
             capture_output=True,
             text=True,
-            check=True,
+            check=True,
+            timeout=60,
         )
-    except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
-        print(f"Failed to restart container {container_name}: {e.stderr}")
+    except subprocess.CalledProcessError as e:
+        print(f"Failed to restart container {container_name}: {e.stderr}")
+        raise
+    except subprocess.TimeoutExpired as e:
+        print(f"Timed out restarting container {container_name} after {e.timeout}s")
         raise
@@
-    # Wait for container to be healthy
+    # Wait for container to be healthy (raises on timeout)
     wait_for_container_health(container_name)

Also applies to: 112-114

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ddbe64d and e277521.

📒 Files selected for processing (1)

tests/e2e/utils/utils.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests

tests/e2e/utils/utils.py

tisnik

LGTM, nice one

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/e2e/utils/utils.py (1)

70-75: Add missing container_name (and optional cleanup) to all switch_config_and_restart calls

The invocations in tests/e2e/features/environment.py at lines 90 and 100 are still using the old two-arg signature and must be updated to include the required container_name parameter (and cleanup if needed).

♻️ Duplicate comments (1)

tests/e2e/utils/utils.py (1)

35-68: Fail fast on unhealthy/timeout; fix double-sleep; support containers without HEALTHCHECK.

Current loop swallows failures, never raises on timeout, and sleeps twice per attempt (once inside try, once after), making timing/logs inaccurate. Tests may proceed while the container isn’t ready.

Apply this hardened rewrite:

-def wait_for_container_health(container_name: str, max_attempts: int = 3) -> None:
-    """Wait for container to be healthy."""
-    for attempt in range(max_attempts):
-        try:
-            result = subprocess.run(
-                [
-                    "docker",
-                    "inspect",
-                    "--format={{.State.Health.Status}}",
-                    container_name,
-                ],
-                capture_output=True,
-                text=True,
-                check=True,
-                timeout=10,
-            )
-            if result.stdout.strip() == "healthy":
-                break
-            else:
-                if attempt < max_attempts - 1:
-                    time.sleep(5)
-                else:
-                    print(
-                        f"{container_name} not healthy after {max_attempts * 5} seconds"
-                    )
-        except (subprocess.CalledProcessError, subprocess.TimeoutExpired):
-            pass
-
-        if attempt < max_attempts - 1:
-            print(f"⏱ Attempt {attempt + 1}/{max_attempts} - waiting...")
-            time.sleep(5)
-        else:
-            print(f"Could not check health status for {container_name}")
+def wait_for_container_health(
+    container_name: str,
+    max_attempts: int = 6,
+    interval_sec: int = 5,
+    inspect_timeout_sec: int = 10,
+) -> None:
+    """Wait for container to be healthy (or running if no HEALTHCHECK); raise on timeout."""
+    start = time.monotonic()
+    last_status = "unknown"
+    for attempt in range(1, max_attempts + 1):
+        try:
+            result = subprocess.run(
+                [
+                    "docker",
+                    "inspect",
+                    "--format={{if .State.Health}}{{.State.Health.Status}}{{else}}no-healthcheck{{end}}",
+                    container_name,
+                ],
+                capture_output=True,
+                text=True,
+                check=True,
+                timeout=inspect_timeout_sec,
+            )
+            status = result.stdout.strip()
+            last_status = status
+            if status == "healthy":
+                return
+            if status == "no-healthcheck":
+                # Fallback: consider "running" as ready when no HealthCheck is defined.
+                state = subprocess.run(
+                    ["docker", "inspect", "--format={{.State.Status}}", container_name],
+                    capture_output=True,
+                    text=True,
+                    check=True,
+                    timeout=inspect_timeout_sec,
+                )
+                if state.stdout.strip() == "running":
+                    return
+        except (subprocess.CalledProcessError, subprocess.TimeoutExpired):
+            # ignore and retry
+            pass
+
+        if attempt < max_attempts:
+            print(f"⏱ Attempt {attempt}/{max_attempts} - waiting...")
+            time.sleep(interval_sec)
+        else:
+            elapsed = int(time.monotonic() - start)
+            raise TimeoutError(f"{container_name} not ready after {elapsed}s (last status: {last_status})")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e277521 and 0a946cd.

📒 Files selected for processing (1)

tests/e2e/utils/utils.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests

coderabbitai · 2025-09-03T09:19:16Z

tests/e2e/utils/utils.py

+    # Wait for container to be healthy
+    wait_for_container_health(container_name)



🛠️ Refactor suggestion

Surface restart failures with a timeout and clearer error reporting.

TimeoutExpired won’t fire without a timeout; also bubble up a concise message.

# Restart container try: subprocess.run( ["docker", "restart", container_name], capture_output=True, text=True, check=True, + timeout=60, ) - except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e: - print(f"Failed to restart container {container_name}: {e.stderr}") + except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e: + err = getattr(e, "stderr", None) or str(e) + print(f"Failed to restart container {container_name}: {err}") raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Wait for container to be healthy

wait_for_container_health(container_name)

# Restart container

try:

subprocess.run(

["docker", "restart", container_name],

capture_output=True,

text=True,

check=True,

timeout=60,

)

except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:

err = getattr(e, "stderr", None) or str(e)

print(f"Failed to restart container {container_name}: {err}")

raise

🤖 Prompt for AI Agents

In tests/e2e/utils/utils.py around lines 114 to 116, the call to wait_for_container_health(container_name) currently can hang indefinitely and any TimeoutExpired won't be raised because no timeout is provided; update the call to pass a reasonable timeout (e.g., timeout_seconds or a constant) and wrap the call in a try/except that catches TimeoutExpired (and optionally subprocess.TimeoutExpired) and re-raises a concise, informative exception (or raise RuntimeError) that includes the container name and that the container failed to become healthy within the timeout; ensure the timeout value is configurable or clearly documented.

coderabbitai bot reviewed Sep 3, 2025

View reviewed changes

tests/e2e/utils/utils.py Show resolved Hide resolved

add health check after restarting container

0a946cd

radofuchs force-pushed the Auth-w2w-test-hardening branch from dfcbed0 to 0a946cd Compare September 3, 2025 09:12

tisnik approved these changes Sep 3, 2025

View reviewed changes

coderabbitai bot reviewed Sep 3, 2025

View reviewed changes

tisnik merged commit fda810e into lightspeed-core:main Sep 3, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-493: auth e2e tests hardening#494

LCORE-493: auth e2e tests hardening#494
tisnik merged 1 commit intolightspeed-core:mainfrom
radofuchs:Auth-w2w-test-hardening

radofuchs commented Sep 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 3, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

tisnik left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Wait for container to be healthy
		wait_for_container_health(container_name)

Conversation

radofuchs commented Sep 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

radofuchs commented Sep 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 3, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)