feat: add shell tool by akihikokuroda · Pull Request #1107 · generative-computing/mellea

akihikokuroda · 2026-05-20T16:18:18Z

Pull Request

Issue

Fix #1024

Description

This PR doesn't include issues mentioned in #1087

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code was added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used: claudecode

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

Component
Requirement
Sampling Strategy
Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

github-actions · 2026-05-20T16:19:28Z

This comment is managed by a bot. Editing it is fine — checking off boxes, adding notes — but please leave the HTML comment marker on the first line alone, otherwise checklist updates will break.

Tool PR Checklist

Use this checklist when adding or modifying tools in mellea/stdlib/tools/.

Protocol Compliance

Ensure compatibility with existing backends and providers
- For most tools being added as functions, this means that calling convert_function_to_tool works

Integration

Tool exported in mellea/stdlib/tools/__init__.py or, if you are adding a library of tools, from your sub-module

planetf1

Overall the design is solid — subprocess.run(shell=False) as the foundation, shlex.split() for tokenisation, and working-dir restriction are the right primitives. Two confirmed bugs block merge; a third issue means the _bash_patterns.py test suite gives false assurance about what's actually blocked at runtime.

planetf1 · 2026-05-26T11:49:44Z

+            if cmd not in SAFE_WRAPPER_COMMANDS or arg_cmd not in (
+                "bash",
+                "sh",
+                "zsh",


env bash -c <payload> bypasses the denylist

The code-execution check (lines 176–182) only fires when argv[0] is the interpreter. When argv[0] is env, that path is never reached — and lines 225–231 explicitly allow shells as nested arguments inside safe wrappers, so -c is never re-checked.

_is_dangerous_command(["env", "bash", "-c", "id"]) # (False, '') _is_dangerous_command(["timeout", "60", "bash", "-c", "id"]) # (False, '')

Fix: re-check for -c/-e on the nested interpreter, or restrict the shell allowlist here to script-file invocations only (no -c/-e in the remaining argv).

additional check is added. The tests are added.

Verified fixed — env/timeout/nice bash -c bypass is now blocked correctly. LGTM.

planetf1 · 2026-05-26T11:50:49Z

+        with self._storage_lock:
+            results = self._violations[:]
+
+        for violation in results:


Classic iterate-and-remove bug — filter results will be incomplete

results.remove() inside for violation in results causes Python's iterator to skip the element immediately after each removal. Filters will silently return the wrong set.

if session_id: results = [v for v in results if v.session_id == session_id] if pattern: results = [v for v in results if v.pattern == pattern] if category: results = [v for v in results if v.category == category] if severity: results = [v for v in results if v.severity == severity]

planetf1 · 2026-05-26T11:52:43Z

+    Returns:
+        A tuple of (has_dangerous_paths, reason_message).
+    """
+    write_commands = {"rm", "touch", "cp", "mv", "mkdir", "mkfifo", "mknod", "tee"}


write_commands is missing several filesystem-write commands

chmod, chown, ln, and dd aren't in this set, so the dangerous-path checks never fire for them. A few things that currently slip through:

chmod 777 /etc/passwd

ln -sf /etc/passwd /allowed/path/link (symlink escape out of the allowed fence)

dd if=/dev/urandom of=/boot/vmlinuz

Same set is duplicated on line 415 — both need updating.

Additional write commands are added. Add "ln" command handling.

Verified fixed — WRITE_COMMANDS now covers chmod, chown, chgrp, ln, dd, install, truncate. All four cases blocked. LGTM.

planetf1 · 2026-05-26T11:54:03Z

+]
+
+
+def check_all_patterns(


This framework isn't connected to the execution path

shell.py only imports record_bash_violation from _bash_audit — check_all_patterns() is never called from bash_executor(). The tests in test_bash_guardrails.py all pass, but none of it affects whether a command is actually blocked at runtime.

If this is intended as the extensible pattern layer, wire check_all_patterns(argv) into _is_dangerous_command(). If it's documentation of the threat model, the disconnect from the execution path should be made explicit.

wire check_all_patterns is wired into the execution path by updating _is_dangerous_command to use the pattern framework.

Verified fixed — check_all_patterns is now imported and called at the top of _is_dangerous_command. LGTM.

planetf1 · 2026-05-26T11:56:13Z

+
+
+def example_3_with_working_dir() -> None:
+    """Example 3: Restrict write validation and execution cwd to a directory."""


Two functions are both labelled "Example 3" — example_3_llm_with_forced_tool_use (line 85) already has that number. This one should be Example 4, with the subsequent examples renumbered.

akihikokuroda · 2026-06-04T17:01:45Z

After investigate, it may not be good to share the CapabilityPolicy , proposal a separate policy for shell execution.

@DataClass
class BashExecutionPolicy:
"""Execution constraints for bash commands.

  Args:
      timeout (int): Max seconds before subprocess.run timeout (enforced).
      stdout_max_bytes (int | None): Truncate stdout to this size (enforced).
      stderr_max_bytes (int | None): Truncate stderr to this size (enforced).
      allowed_paths (list[Path] | None): Write allowlist for filesystem safety check.
      working_dir (Path | None): Host directory for command execution.
  """

  timeout: int = 60
  stdout_max_bytes: int | None = 10 * 1024  # 10KB (matches current default)
  stderr_max_bytes: int | None = 10 * 1024
  allowed_paths: list[Path] | None = None
  working_dir: Path | None = None

  ENFORCED_timeout: ClassVar[bool] = True
  ENFORCED_stdout_max_bytes: ClassVar[bool] = True
  ENFORCED_stderr_max_bytes: ClassVar[bool] = True
  ENFORCED_allowed_paths: ClassVar[bool] = False  # Denylist override
  ENFORCED_working_dir: ClassVar[bool] = False    # Declarative

Benefits:

Minimal, focused on enforced constraints
Maintains the ENFORCED_* pattern for transparency
Parallel to Python's CapabilityPolicy without forcing a bad fit
Users understand it's bash-specific, not a general execution policy

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

planetf1

Second pass. Clarification replies added to the A, C, and D threads — none of those claimed fixes are reflected in the current code. Inline comments below cover the two unaddressed findings without author replies (B and E) and one new issue surfaced by this review.

planetf1 · 2026-06-04T18:28:15Z

+        results = [
+            v
+            for v in results
+            if (session_id is None or v.session_id == session_id)
+            and (pattern is None or v.pattern == pattern)
+            and (category is None or v.category == category)
+            and (severity is None or v.severity == severity)
+        ]
+


Verified fixed — filter is now a list comprehension, returns correct counts. LGTM.

planetf1 · 2026-06-04T18:28:15Z

+        print(f"  Error: {exec_result.stderr}")
+    print()
+
+


Verified fixed — renamed to example_4_with_working_dir with correct numbering; __main__ calls the LLM example. LGTM.

Would you check this comment? There may be some confusion.

planetf1 · 2026-06-04T18:28:15Z

+        is_dangerous, reason = pattern.check(argv)
+        if is_dangerous:
+            pattern_name = type(pattern).__name__
+            category = getattr(pattern, "category", "unknown")


New finding: even if check_all_patterns were wired into the execution path, every violation would still be logged with category='unknown' and severity='MEDIUM'. The function reads these via getattr(pattern, 'category', 'unknown') and getattr(pattern, 'severity', 'MEDIUM'), but no concrete pattern class defines category or severity attributes — those live in _bash_guardrails.COMMAND_RULES, not on the pattern objects themselves.

To fix this, either add category and severity class attributes to each concrete pattern (populated from COMMAND_RULES), or return them as part of a richer result object from check() instead of a plain tuple[bool, str].

Verified fixed in the latest commit — each pattern class now declares category and severity as class attributes, so the getattr defaults never fire. Violations are logged with correct metadata. LGTM.

Correction to my previous reply — on closer inspection this is only partially fixed. The class-level attributes prevent the getattr defaults, but they apply a blanket value per pattern class rather than per command. For example DangerousCommandPattern hardcodes severity='HIGH', so a sudo attempt (which COMMAND_RULES defines as CRITICAL/PRIVILEGE_ESCALATION) gets logged as HIGH/dangerous_command. The granular data is already in COMMAND_RULES — the fix is to look up the matched command there rather than using a class-level constant.

Thanks. I made changes.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda requested a review from a team as a code owner May 20, 2026 16:18

akihikokuroda requested review from markstur and nrfulton May 20, 2026 16:18

github-actions Bot added the enhancement New feature or request label May 20, 2026

planetf1 requested changes May 26, 2026

View reviewed changes

planetf1 reviewed May 26, 2026

View reviewed changes

akihikokuroda requested a review from planetf1 May 26, 2026 15:38

akihikokuroda added 17 commits June 4, 2026 13:07

shell executor

2e49ae2

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

shell executor

de4775c

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

update an example

11ed4f4

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

d6e53c2

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

6994227

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

06390ef

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

fix CI issue

9ec3dba

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

47d56ec

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

9573819

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comment

86c704a

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comment

1b401f8

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comment

e5f1208

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

ae35cfb

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

Local by Default + Opt-In Sandboxing

e3300f0

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

bash guardrails

3f2d61c

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

add bash audit

fd95ed5

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

remove unsafe_local_bash_executor() refernces

414701c

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda added 3 commits June 4, 2026 13:15

remove sandbox=true

7ed72b2

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

3f9da6e

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

rebase main

c9494f7

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda force-pushed the shell branch from b984cdd to c9494f7 Compare June 4, 2026 17:27

planetf1 requested changes Jun 4, 2026

View reviewed changes

akihikokuroda added 2 commits June 4, 2026 17:04

review comments

66cac4c

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comment

38099e8

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda requested a review from planetf1 June 4, 2026 22:07



		def example_3_with_working_dir() -> None:
		"""Example 3: Restrict write validation and execution cwd to a directory."""

Conversation

akihikokuroda commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Issue

Description

Testing

Attribution

Adding a new component, requirement, sampling strategy, or tool?

Uh oh!

github-actions Bot commented May 20, 2026 • edited by akihikokuroda Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tool PR Checklist

Protocol Compliance

Integration

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akihikokuroda commented Jun 4, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akihikokuroda commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited by akihikokuroda

Loading

planetf1 Jun 4, 2026 •

edited

Loading

planetf1 Jun 4, 2026 •

edited

Loading

planetf1 Jun 4, 2026 •

edited

Loading

planetf1 Jun 4, 2026 •

edited

Loading

planetf1 Jun 4, 2026 •

edited

Loading