Using an AI Assistant to Script Tools
December 8, 2025 · 681 words · 4 min
LLMs are now quite good at transforming data. For example, we were recently working with some data g
LLMs are now quite good at transforming data. For example, we were recently working with some data generated by the tool. This tool generates big arrays of code violations. Here’s an example showing the kind of data that gets returned. During this session with our AI assistant, we decided that it would be helpful to create a database and insert the data to make it easier for the AI to analyze (LLMs are very good at writing SQL). As is now our habit, we wrote a quick prompt to see if the assistant could generate the SQL: LLMs are obviously good at tasks of this kind, and this was no exception. Our prompt engine had been augmented with a function to read local files but, besides that, this was a pretty straightforward prompt (we used GPT-3.5). The LLM responded with the correct INSERT statements. We’re starting to get accustomed to this kind of capability, so the result wasn’t too surprising. However, what about the context window here? It really doesn’t make a lot of sense to pass all of this data to the LLM, especially if this is a task that we’ll need to do continuously. Plus, this is really not how a programmer would have solved the problem. Programmers would write programs. So, instead of asking the LLM to do a thing, we should try asking it to write a program to do that same thing. Starting with the same prompt, let’s prefix it with “Please write a JavaScript program to …” In other words, let’s ask the LLM to describe how it would accomplish the task in JavaScript, and hopefully automate itself out of the loop. Here’s the adjusted prompt: Current LLMs can do tasks of this kind. Here’s a sample of the output generated by GPT-3.5. A quick scan will convince many of you that this is probably going to do the trick. However, if our prompt engine is already running in Docker, we might as well just run it: You may not have noticed but at the end of our prompt, we added a final instruction: “Now execute the JavaScript code in a container”. This is a nice addition to our session, because it means we get to see the results. This is also where tool calling comes back into the picture. To give our AI the capacity to try running the program that it has just written, we have defined a new function to create an isolated runtime sandbox for trying out our new tool. Here’s the agent’s new tool definition: We’ve asked the AI assistant to generate a tool from a description of that tool. As long as the description of the tools doesn’t change, the workflow won’t have to go back to the AI to ask it to build a new tool version. The role of Docker in this pattern is to create the sandbox for this code to run. This function really doesn’t need much of a runtime, so we give it a pretty small sandbox. The ability for one tool to create another tool is not just a trick. It has very practical implications for the kinds of workflows that we can build up because it gives us a way for us to control the volume of data sent to LLMs, and it gives the assistant a way to “automate” itself out of the loop. This example was a bit abstract but in our next post, we will describe the practical scenarios that have driven us to look at this idea of prompts generating new tools. Most of the workflows we’re exploring are still just off-the-shelf tools like Pylint, SQLite, and tree_sitter (which we embed using Docker, of course!). For example: However, you’ll also see that part of being able to author workflows of this kind is being able to recognize when you just need to add a custom tool to the mix. Read the to see more of what we’ve been working on.