Exploring AI for Non-Trivial Software Development
By Brian Tol
There’s a lot of noise around AI and software development these days. The internet is awash with demos of AI assistants writing todo apps and implementing basic algorithms. But what happens when we push beyond these toy examples? Can AI tools actually help with substantial software projects? These questions have been rolling around in my head for months, and I decided it was time to put them to the test.
Most developers I talk to are skeptical of AI coding tools, and with good reason. They’ve tried them, watched them produce nonsensical code, and concluded they’re just not ready for real work. The complaints are consistent: the code is poorly structured, full of basic errors, and often just plain wrong. But I’ve started wondering if we’re looking at this the wrong way. Maybe the issue isn’t with the AI tools themselves, but with how we’re using them.
Background
I needed a project that would push these tools beyond trivial examples while staying within reasonable bounds. My choice? Implementing the Gremlin API on top of SQLite, effectively turning it into a graph database. It’s complex enough to be interesting but not so novel that an AI would be completely lost. The core technologies—SQLite, Python, and Gremlin—are mature and well-documented, making them likely candidates for solid representation in the training data.
For those unfamiliar with the space, graph databases are specialized systems designed to store and process interconnected data. Unlike traditional databases that excel at storing rows and columns, graph databases are built for handling complex relationships—think social networks, recommendation engines, or fraud detection systems. Gremlin is a graph traversal language that lets developers write queries to navigate these relationships. It’s like SQL for graphs, but instead of thinking in tables, you think in vertices (nodes) and edges (connections). The language is powerful but complex, with features for everything from simple “find all friends of friends” queries to sophisticated path-finding algorithms.
This wasn’t just an academic exercise for me. I’ve been fascinated by graph databases for years, and the idea of building a lightweight, SQLite-based implementation has been on my project wishlist for ages. It seemed like the perfect opportunity to scratch that itch while exploring the capabilities of modern AI tools.
My goal was ambitious: to push these AI coding tools to their limits by letting them generate as much of the implementation as possible. Rather than falling back to manual coding when things got tough, I wanted to see how far I could get by treating the AI as a true development partner. This meant being patient when outputs weren’t perfect, learning to refine my prompts, and developing techniques to guide the AI toward better solutions. After all, the only way to really understand these tools’ capabilities is to rely on them for substantive work.
With no looming deadlines or pressure to ship, I could focus on the process itself. This freedom to experiment, to try different approaches and even fail, turned out to be crucial. It allowed me to develop and refine my methodology without the constraints of a production environment.
Process: A Modified Test-Driven Development (TDD) Approach
Like any good engineering endeavor, success with AI-assisted development demands a structured approach. Through trial and error, I developed a process that builds on traditional TDD practices while adapting to the unique challenges and opportunities of working with AI. Here’s how it evolved:
- Fairly quickly, I realized automation was going to be critical. Copy-and-pasting code became error prone and time consuming. I created a simple bash script that handled the preparation of inputs, maintaining a project description as a prefix and adding specific task details as a suffix. This standardization helped the AI maintain context and produce more reliable results across sessions. The script evolved over time as I discovered what information yielded the best results.
- Generating code became an exercise in clear communication. Rather than dumping the entire project scope on the AI at once, I broke it down into focused, manageable chunks. Each prompt included specific requirements, constraints, and context from related components. This approach yielded much better results than attempting to generate large sections of code at once.
- Code review turned out to be the linchpin of the entire process. I treated the AI as I would a pair programming partner, carefully reviewing each piece of generated code. This wasn’t just about catching bugs—it was about understanding the AI’s approach and providing feedback that would improve future generations. The review process often sparked insights into better ways to structure my prompts.
- I used the tools to create unit tests, too. I found that requesting 5-10 tests per feature provided enough coverage without overwhelming the review process. Experience showed that about 30% of generated tests contained errors or made invalid assumptions, so careful review was essential. This error rate remained surprisingly consistent across different features and complexity levels.
- Integration followed an incremental pattern. Each new batch of tests was added to the suite one at a time, with failures fed back to the AI for resolution. This methodical approach helped identify interaction issues early and kept the feedback loop tight. It also prevented the accumulation of broken tests that could obscure real issues.
- The refinement phase became a dialogue between developer and AI. Code was refined through multiple iterations until it met my standards. This wasn’t about blindly accepting or rejecting generated code—it was about working with the AI to shape the output into something that fit the project’s needs. The process often yielded insights that improved both the code and my prompting strategy.
Advice and Observations
While the process provides a framework, the daily reality of working with AI tools surfaced numerous insights that weren’t immediately obvious at the start. Some of these challenged my preconceptions about software development, while others reinforced fundamental engineering principles. Here’s what I learned along the way:
- Code review is non-negotiable with AI-generated code. Every line needs to be scrutinized as carefully as code from a junior developer. The AI can produce elegant solutions, but it can also make subtle mistakes that only become apparent under close inspection. This review process isn’t just about finding bugs—it’s about understanding the AI’s approach and improving your ability to guide it effectively.
- Over-engineering is a constant problem that needs active resistance. The AI often proposes complex solutions to simple problems, perhaps drawing on patterns it’s seen in its training data. When this happens, don’t hesitate to push back and ask for justification. I’ve found that explicitly requesting simpler solutions often yields better results.
- Version control becomes even more crucial when working with AI. Create branches liberally and don’t be afraid to experiment. Many generated solutions will end up being discarded, and that’s fine. The ability to easily throw away code without fear of losing work encourages more experimentation and ultimately leads to better solutions.
- Contrary to traditional wisdom about DRY principles, flatter code often works better when collaborating with AI. Accept some duplication in the initial implementation, focusing on getting the core logic right. Refactoring for reuse can come later, once you have a better understanding of the patterns that naturally emerge from the implementation.
- Different AI tools excel at different tasks. In-editor tools like GitHub Copilot are fantastic for small, context-aware completions, while larger language models like Claude excel at generating broader architectural patterns. Learning when to use each tool is key to maintaining productivity.
- Shorter context windows often yield better results than trying to maintain long conversations. Don’t hesitate to reset the context when starting work on a new component. The clarity gained is worth the minor overhead of reestablishing context.
- Early pattern establishment pays dividends throughout the project. Take the time to set clear coding patterns at the start, and be consistent in enforcing them. The AI will pick up on these patterns and begin to mirror them in its generated code.
Conclusion
After several weeks of after-hours and weekend development, I’ve implemented roughly 30% of the Gremlin API. The results have frankly exceeded my expectations. The code is surprisingly Pythonic, well-structured, and maintainable. It’s not perfect—no code ever is—but it’s a solid first draft. I’m going to keep noodling on it, and one day might even release it.
Here are the metrics: approximately 1,500 lines of implementation code backed by 4,400 lines of test code. The test suite includes 310 unit tests with 97% coverage. These numbers reflect both the complexity of the project and the thoroughness of the development process.
While this experiment has convinced me that AI won’t be replacing developers anytime soon, it’s also shown that these tools have earned a place in modern development workflows. The key lies in understanding their strengths and limitations, and developing processes that leverage the former while mitigating the latter. In the end, AI is just another tool in our toolkit—but it’s one that, when used properly, can significantly accelerate development while maintaining high quality standards.