Handing Your Filesystem to an Agent Using a Local Shell

February 19, 2026 · By Jay Wengrow

In the previous post, we explored the idea of having our agent write and execute its own code inside a shell. There, we specifically used a shell hosted by OpenAI. This helps sandbox our code execution and make sure that the agent doesn't do something bad to our computer.

However, there are times where you'll want to give your agent access to your local filesystem. Say, for example, that we're building an agent that helps a user navigate their own computer. The user could ask for the agent to help locate a specific file, or perhaps to find large files that are taking up too much space on the hard drive.

Of course, this wades into DANGEROUS territory. What if the agent decides on its own to clear up space on your computer and delete important files? I guess I'm about to risk everything just to create this blog post. FOLLOW ALONG AT YOUR OWN RISK.

The Local Shell "Tool"

Like the hosted shell, OpenAI treats the local shell like an agentic tool. We can include the local shell in our tool schema like so:

TOOLS = [    {"type": "shell", "environment": {"type": "local"}}]

Here, the local shell is the only tool I have.

If the agent decides it wants to see all the files in the current directory, the LLM will output something like this:

ResponseFunctionShellToolCall(id='sh_0fd5660d0fcad219c96620bb6d0037088',     action=Action(        commands=['pwd', 'ls -la'],         max_output_length=None,         timeout_ms=10000),     call_id='call_nqYUo07aoxyt893rvOxBK8',     environment=None,     status='completed',     type='shell_call',     created_by=None)

The real action inside this ResponseFunctionShellToolCall is the list of shell commands: ['pwd', 'ls -la']. This indicates that the agent wants to run both of these commands back to back.

Also note that the official response type is shell_call. This is important for the agent code we'll write below.

Now, with the hosted shell, OpenAI automatically ran these commands for us. However, with the local shell, we need to execute these commands ourselves. To do this, I adapted some code from the OpenAI local shell docs which can execute a string of shell commands:

class CmdResult:    def __init__(self, stdout, stderr, returncode, timed_out):        self.stdout = stdout        self.stderr = stderr        self.returncode = returncode        self.timed_out = timed_outclass ShellExecutor:    def __init__(self, default_timeout: float = 60):        self.default_timeout = default_timeout    def run(self, cmd: str, timeout: float | None = None) -> CmdResult:        t = timeout or self.default_timeout        p = subprocess.Popen(            cmd,            shell=True,            stdout=subprocess.PIPE,            stderr=subprocess.PIPE,            text=True,        )        try:            out, err = p.communicate(timeout=t)            return CmdResult(out, err, p.returncode, False)        except subprocess.TimeoutExpired:            p.kill()            out, err = p.communicate()            return CmdResult(out, err, p.returncode, True)

Here's how we can use this thing to process the command list ['pwd', 'ls -la']:

commands = ['pwd', 'ls -la']# convert list of commands into string with each command separated by a newline:shell_script = "\n".join(commands)# Execute the commands in the shell:executor = ShellExecutor()result = executor.run(shell_script)

This result will contain an instance of CmdResult, which in turn holds important data about the results of running our shell script, including stdout, stderr, returncode, and timed_out.

The stdout is particularly important, as it contains the output of the shell commands. When I tested the ShellExecutor in my current directory, the stdout looked like this:

"stdout": "/app\\ntotal 168\\ndrwxr-xr-x 12 root root    384 Feb 19 19:44 .\\ndrwxr-xr-x  1 root root   4096 Feb 19 19:44 ..\\n-rw-r--r--  1 root root   1045 Feb 12 19:03 .env\\n-rw-rw-r--  1 root root    250 Jan  7 22:08 Dockerfile\\ndrwxr-xr-x  8 root root    256 Feb 12 20:05 architecting_agentic_workflows\\ndrwxr-xr-x  3 root root     96 Feb 17 19:39 blog\\n-rw-rw-r--  1 root root    221 Jan  6 17:29 chatbot_00.py\\n-rw-rw-r--  1 root root     93 Jan  7 22:11 docker-compose.yml\\ndrwxr-xr-x  3 root root     96 Feb 19 19:44 old_stuff\\n-rw-r--r--  1 root root    381 Jan 19 19:03 pyproject.toml\\ndrwxr-xr-x 10 root root    320 Feb 12 19:22 running_the_agent_loop\\n-rw-r--r--  1 root root 145730 Jan 19 19:03 uv.lock\\n

First, we can see the result of pwd, namely that the current directory is called app and contains 168 files. This data is followed by the list of the directory's files, which was produced by the ls -la command.

A Local-Shell Agent

Let's now put this all together to create an agent that helps a user navigate their own filesystem:

import osimport jsonfrom dotenv import load_dotenvfrom openai import OpenAIimport subprocessload_dotenv()llm = OpenAI()TOOLS = [    {"type": "shell", "environment": {"type": "local"}}]class CmdResult:    def __init__(self, stdout, stderr, returncode, timed_out):        self.stdout = stdout        self.stderr = stderr        self.returncode = returncode        self.timed_out = timed_outclass ShellExecutor:    def __init__(self, default_timeout: float = 60):        self.default_timeout = default_timeout    def run(self, cmd: str, timeout: float | None = None) -> CmdResult:        t = timeout or self.default_timeout        p = subprocess.Popen(            cmd,            shell=True,            stdout=subprocess.PIPE,            stderr=subprocess.PIPE,            text=True,        )        try:            out, err = p.communicate(timeout=t)            return CmdResult(out, err, p.returncode, False)        except subprocess.TimeoutExpired:            p.kill()            out, err = p.communicate()            return CmdResult(out, err, p.returncode, True)def llm_response(history):    response = llm.responses.create(        model="gpt-5.2",        input=history,        tools=TOOLS    )    return responsedef agent_loop(history):    while True:        response = llm_response(history)        history += response.output        tool_calls = [obj for obj in response.output if getattr(obj, "type", None) == "function_call"]        shell_calls = [obj for obj in response.output if getattr(obj, "type", None) == "shell_call"]        if not (tool_calls or shell_calls):            break        # for tool_call in tool_calls:            # placeholder        for shell_call in shell_calls:            shell_script = "\n".join(shell_call.action.commands)            executor = ShellExecutor()            result = executor.run(shell_script)            history += [{"type": "local_shell_call_output",                        "call_id": shell_call.call_id,                        "output": json.dumps(result.__dict__)}]    return responsedef system_prompt():    return """You are a friendly AI assistant. You focus on helping users navigate their    own filesystem."""assistant_message = "How can I help?"user_input = input(f"\nAssistant: {assistant_message}\n\nUser: ")history = [    {"role": "developer", "content": system_prompt()},    {"role": "assistant", "content": assistant_message},    {"role": "user", "content": user_input}]while user_input != "exit":    response = agent_loop(history)                print(f"\nAssistant: {response.output_text}")    user_input = input("\nUser: ")    history += [{"role": "user", "content": user_input}]print("****HISTORY****")print(history)

You can see that we treat shell tool calls quite similarly to standard tool calls. When we see that the agent outputs a response type of shell_call, we use our ShellExecutor to execute the shell script within the response.

(Since we don't have any tools other than the shell tool, I've left some placeholder code to accommodate custom tools we might add in the future.)

We then format the shell script output as an object whose type is local_shell_call_output, and include all the data from the CmdResult that resulted from the script execution. We append this output to the conversation history so the agent can see the result of the shell script execution.

Here are some sample conversations I had with this agent:

Assistant: How can I help?User: what's the largest file in the current directory?Assistant: The largest file in the current directory (`/app`) is **`uv.lock`**, size **143K**.

Under the hood, here was the shell tool call and output:

ResponseFunctionShellToolCall = {    "id": "sh_044aeea028a2920955f196dd3c1ecd7",    "type": "shell_call",    "status": "completed",    "call_id": "call_F6cHxTCCkfn6RwBeOKL",    "created_by": None,    "environment": None,    "action": {        "commands": [            "pwd",            "ls -lahS | head -n 20",        ],        "max_output_length": None,        "timeout_ms": None,    },}ShellCallOutput = {    "type": "local_shell_call_output",    "call_id": "call_F6cHxTCCkfn6RwBeOKL",    "stdout": (        "/app\n"        "total 168K\n"        "-rw-r--r--  1 root root 143K Jan 19 19:03 uv.lock\n"        "drwxr-xr-x  1 root root 4.0K Feb 19 19:44 ..\n"        "-rw-r--r--  1 root root 1.1K Feb 12 19:03 .env\n"        "drwxr-xr-x 12 root root  384 Feb 19 19:44 .\n"        "-rw-r--r--  1 root root  381 Jan 19 19:03 pyproject.toml\n"        "drwxr-xr-x 10 root root  320 Feb 12 19:22 running_the_agent_loop\n"        "drwxr-xr-x  8 root root  256 Feb 12 20:05 architecting_agentic_workflows\n"        "-rw-rw-r--  1 root root  250 Jan  7 22:08 Dockerfile\n"        "-rw-rw-r--  1 root root  221 Jan  6 17:29 chatbot_00.py\n"        "drwxr-xr-x  3 root root   96 Feb 17 19:39 blog\n"        "drwxr-xr-x  3 root root   96 Feb 19 19:44 old_stuff\n"        "-rw-rw-r--  1 root root   93 Jan  7 22:11 docker-compose.yml\n"    ),    "stderr": "",    "returncode": 0,    "timed_out": False,}

I then tried a more interesting conversation:

User: I'm pretty sure I have a headshot picture somewhere on my computer, but I can't find it. Can you help me?Assistant: Yes. If you tell me roughly where it might be (e.g., your home folder, Downloads, Pictures, Desktop), I can search for likely image files and narrow it down.In the meantime, I can start by scanning this project directory (`/app`) for common image types. Do you want me to search:1) **Just here (`/app`)**, or  2) **Some other folder** (tell me the path), or  3) **Broadly** (e.g., your whole home directory—slower and may return lots of results)?Also, any clues help: filename (even partial), approximate date, file type (jpg/png/heic), or whether it’s a LinkedIn-style headshot.User: I have no idea - just help meAssistant: Found one likely match: **`/app/old_stuff/jay_headshot_2.jpg`** (about **1.2 MB**).

Here's the shell tool call and output:

ResponseFunctionShellToolCall = {    "id": "sh_044aeea028a292f40069976893d20881a0a6bc0141514c8def",    "type": "shell_call",    "status": "completed",    "call_id": "call_lK7Z6BFZHO6s97LGwAyJ92SK",    "created_by": None,    "environment": None,    "action": {        "commands": [            "ls -la",            "find /app -type f \\( -iname '*.jpg' -o -iname '*.jpeg' -o -iname '*.png' -o -iname '*.heic' -o -iname '*.webp' \\) "            "-printf '%s\\t%p\\n' 2>/dev/null | sort -nr | head -n 50",        ],        "max_output_length": None,        "timeout_ms": None,    },}ShellCallOutput = {    "type": "local_shell_call_output",    "call_id": "call_lK7Z6BFZHO6s97LGwAyJ92SK",    "stdout": (        "total 168\n"        "drwxr-xr-x 12 root root    384 Feb 19 19:44 .\n"        "drwxr-xr-x  1 root root   4096 Feb 19 19:44 ..\n"        "-rw-r--r--  1 root root   1045 Feb 12 19:03 .env\n"        "-rw-rw-r--  1 root root    250 Jan  7 22:08 Dockerfile\n"        "drwxr-xr-x  8 root root    256 Feb 12 20:05 architecting_agentic_workflows\n"        "drwxr-xr-x  3 root root     96 Feb 17 19:39 blog\n"        "-rw-rw-r--  1 root root    221 Jan  6 17:29 chatbot_00.py\n"        "-rw-rw-r--  1 root root     93 Jan  7 22:11 docker-compose.yml\n"        "drwxr-xr-x  3 root root     96 Feb 19 19:44 old_stuff\n"        "-rw-r--r--  1 root root    381 Jan 19 19:03 pyproject.toml\n"        "drwxr-xr-x 10 root root    320 Feb 12 19:22 running_the_agent_loop\n"        "-rw-r--r--  1 root root 145730 Jan 19 19:03 uv.lock\n"        "1205725\t/app/old_stuff/jay_headshot_2.jpg\n"    ),    "stderr": "",    "returncode": 0,    "timed_out": False,}

Although the above examples only have the agent read the filesystem, there's nothing stopping us from having the agent write to the filesystem as well. Again, here we enter treacherous terrain.

At the same time, with the ability to read and write to the filesystem, we could do more with an agent like this. In fact, we pretty much have all the pieces in place to make a coding agent - but, I'll save that for a future post.

Keep learning

Course

Let's Build an LLM App

Step-by-step live cohort on Maven. Build an LLM-powered app with other developers.

View course Book

A Common-Sense Guide to AI Engineering

Build production-ready LLM applications from the ground up.

See the book