AI Code Generation System Development
An AI code generation system autonomously creates production-ready code from textual descriptions or specifications. It includes requirement understanding, generation based on existing codebase, test execution, and iterative improvement. Architecturally more complex than a single LLM call — requires code context management, result verification, and CI/CD integration.
System Architecture
Context Manager — collects relevant context: database schema, API interfaces, existing models, code style guide.
Generation Engine — LLM agent with tools for reading files, executing tests, searching the codebase.
Verification Layer — syntax checking, test execution, linting.
Feedback Loop — iterations based on test errors.
Code Generation Agent
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from typing import TypedDict, Annotated, Optional
import subprocess
import ast
import operator
llm = ChatOpenAI(model="claude-opus-4-5", temperature=0.1)
class CodeGenState(TypedDict):
task_description: str
existing_code_context: str
generated_code: Optional[str]
test_results: Annotated[list, operator.add]
iteration: int
max_iterations: int
errors: Annotated[list, operator.add]
final_code: Optional[str]
@tool
def read_file(file_path: str) -> str:
"""Read a file from codebase for context."""
try:
with open(file_path) as f:
return f.read()
except FileNotFoundError:
return f"File {file_path} not found"
@tool
def search_codebase(query: str, directory: str = "./src") -> str:
"""Search the codebase using grep to find similar code."""
result = subprocess.run(
["grep", "-r", "--include=*.py", "-n", query, directory],
capture_output=True, text=True
)
return result.stdout[:3000] if result.stdout else "Nothing found"
@tool
def run_python_syntax_check(code: str) -> str:
"""Check Python code syntax."""
try:
ast.parse(code)
return "Syntax is correct"
except SyntaxError as e:
return f"Syntax error: {e}"
@tool
def run_tests(test_file_path: str) -> str:
"""Run pytest tests and return results."""
result = subprocess.run(
["python", "-m", "pytest", test_file_path, "-v", "--tb=short"],
capture_output=True, text=True, timeout=60
)
output = result.stdout + result.stderr
return output[-3000:] # Last 3000 chars
@tool
def write_file(file_path: str, content: str) -> str:
"""Write code to a file."""
with open(file_path, "w", encoding="utf-8") as f:
f.write(content)
return f"File {file_path} written ({len(content)} characters)"
CODE_GEN_SYSTEM = """You are a Senior Software Engineer. Generate production-quality code.
Principles:
- Follow existing codebase patterns
- Write typed code (type hints)
- Each function — one level of abstraction
- Handle errors explicitly
- Minimal external dependencies if standard alternatives exist
Process:
1. Read existing code for context
2. Generate code matching that style
3. Check syntax
4. Run tests
5. Fix errors iteratively"""
from langgraph.prebuilt import create_react_agent
code_gen_agent = create_react_agent(
llm.bind_tools([read_file, search_codebase, run_python_syntax_check, run_tests, write_file]),
tools=[read_file, search_codebase, run_python_syntax_check, run_tests, write_file],
state_modifier=CODE_GEN_SYSTEM,
)
Generation with Codebase Context
class ContextAwareCodeGenerator:
def __init__(self, project_root: str):
self.project_root = project_root
self.context_cache = {}
async def gather_context(self, task: str) -> str:
"""Gathers relevant context for a task"""
# Find similar files through LLM
relevant_files = await self.identify_relevant_files(task)
context_parts = []
# Read database schema
if await self.file_exists("models.py"):
models = await read_file_async(f"{self.project_root}/models.py")
context_parts.append(f"## Data Models\n{models[:2000]}")
# Read base classes and interfaces
for file_path in relevant_files[:3]:
content = await read_file_async(file_path)
context_parts.append(f"## {file_path}\n{content[:1500]}")
# Add code style guide
if await self.file_exists(".codestyle.md"):
style = await read_file_async(f"{self.project_root}/.codestyle.md")
context_parts.append(f"## Code Style\n{style[:1000]}")
return "\n\n".join(context_parts)
async def generate(self, task: str, output_file: str) -> dict:
context = await self.gather_context(task)
result = await code_gen_agent.ainvoke({
"messages": [{
"role": "user",
"content": f"""Task: {task}
Codebase context:
{context}
Output file: {output_file}
Generate code, verify it, and write to file."""
}]
})
return {
"task": task,
"output_file": output_file,
"iterations": result.get("iteration", 1),
"tests_passed": self.extract_test_status(result),
}
Template-based Generation with LLM Filling
class CRUDGenerator:
"""Generates CRUD modules from entity schema"""
CRUD_TEMPLATE = """
# Module for working with entity {entity_name}
from sqlalchemy import Column, Integer, String, DateTime, func
from sqlalchemy.orm import Session
from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
# PLACEHOLDERS FOR LLM REPLACEMENT:
# COLUMNS - list of SQLAlchemy columns
# PYDANTIC_FIELDS - Pydantic schema fields
# BUSINESS_LOGIC - business-specific logic
"""
async def generate_crud_module(self, entity_spec: dict) -> str:
"""entity_spec: {name, fields, business_rules, relationships}"""
# LLM fills in specific parts
columns = await self.generate_sqlalchemy_columns(entity_spec["fields"])
schemas = await self.generate_pydantic_schemas(entity_spec["fields"])
business_logic = await self.generate_business_logic(entity_spec.get("business_rules", []))
# Assemble final module
result = await llm.ainvoke(f"""Create a complete CRUD module for entity {entity_spec['name']}.
Specification: {json.dumps(entity_spec, ensure_ascii=False)}
Stack: FastAPI + SQLAlchemy 2.0 + Pydantic v2
Include: model, pydantic schemas, CRUD functions, FastAPI router with dependency injection
Code standards: async/await, type hints, docstrings""")
return result.content
Practical Case: Automation in a FinTech Startup
Task: a team of 4 developers created 3–5 new API endpoints per week. Each CRUD endpoint with tests took 4–6 hours.
AI Code Generation:
- CRUD module generation from OpenAPI specifications
- Automatic pytest test generation for endpoints
- Alembic migration generation on model changes
- Code review suggestions through AI Code Review agent
Results:
- Time to create standard CRUD endpoint: 5h → 50 min (15 min generation + 35 min review)
- Test coverage of new endpoints: 45% → 82%
- Code uniformity: significantly improved (all follow one pattern)
- System call rate: 14% of PRs required substantial business logic rework
Timeline
- Basic generator with context: 2–3 weeks
- Agent cycle with tests and iterations: 2–3 weeks
- CI/CD and IDE integration: 2–3 weeks
- Total: 6–9 weeks







