Integration of Copilot-like IDE Assistant
An AI IDE assistant isn't just code completion on steroids. It's a system that keeps the entire project in context: open files, change history, database schema, tests. A properly built assistant understands that you're writing a user registration function in a Django project with PostgreSQL, and suggests code compatible with your models and conventions — not an abstract Stack Overflow example.
IDE Assistant Architecture
A full-featured Copilot-like assistant consists of several layers:
Context Collector — gathers relevant context: current file, imports, related files, cursor position, selected code, clipboard.
LSP Bridge — interacts with Language Server Protocol for AST, types, definitions. Tree-sitter parses code into AST without running a compiler.
Retrieval Engine — semantic search across codebase. Embeddings for code (CodeBERT, text-embedding-3-small) + vector storage.
LLM Gateway — request routing: fast model for inline completion, powerful for chat/refactoring.
Response Renderer — output formatting: diff for refactoring, ghost text for completion, markdown for chat.
Continue.dev — Open-Source Foundation
Continue.dev is the most mature open-source alternative to GitHub Copilot. Supports VS Code and JetBrains, configurable via ~/.continue/config.json.
{
"models": [
{
"title": "Claude 3.5 Sonnet",
"provider": "anthropic",
"model": "claude-sonnet-4-5",
"apiKey": "$ANTHROPIC_API_KEY"
},
{
"title": "Ollama Qwen2.5-Coder",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "http://localhost:11434"
}
],
"tabAutocompleteModel": {
"title": "Autocomplete",
"provider": "ollama",
"model": "qwen2.5-coder:1.5b"
},
"contextProviders": [
{"name": "code", "params": {}},
{"name": "docs", "params": {}},
{"name": "diff", "params": {}},
{"name": "terminal", "params": {}},
{"name": "problems", "params": {}},
{"name": "folder", "params": {}},
{"name": "codebase", "params": {}}
],
"slashCommands": [
{"name": "edit", "description": "Edit highlighted code"},
{"name": "comment", "description": "Write comments for the code"},
{"name": "tests", "description": "Write unit tests"},
{"name": "share", "description": "Export the chat session"}
]
}
Key feature: tabAutocompleteModel uses a fast local model (1.5B parameters), while chat uses a powerful cloud one. Inline completion latency: 80–150 ms on Qwen2.5-Coder 1.5B via Ollama.
Custom Context Provider
Continue.dev allows writing custom context providers for specific data sources:
// ~/.continue/config.ts
import { ContinueConfig, IContextProvider, ContextProviderDescription } from "@continuedev/core";
class DatabaseSchemaProvider implements IContextProvider {
get description(): ContextProviderDescription {
return {
title: "db",
displayTitle: "Database Schema",
description: "Current database schema",
type: "normal",
};
}
async getContextItems(query: string, extras: any) {
const schema = await fetchDatabaseSchema(); // your API
return [{
name: "Database Schema",
description: "Current DB schema",
content: schema,
}];
}
}
class JiraContextProvider implements IContextProvider {
get description(): ContextProviderDescription {
return {
title: "jira",
displayTitle: "Jira Ticket",
description: "Fetch Jira ticket by ID",
type: "query",
};
}
async getContextItems(query: string, extras: any) {
// query = "PROJ-123"
const ticket = await fetchJiraTicket(query);
return [{
name: `Jira ${query}`,
description: ticket.summary,
content: `**${ticket.summary}**\n\n${ticket.description}\n\nAcceptance Criteria:\n${ticket.acceptance_criteria}`,
}];
}
}
export function modifyConfig(config: ContinueConfig): ContinueConfig {
config.contextProviders = [
...(config.contextProviders || []),
new DatabaseSchemaProvider(),
new JiraContextProvider(),
];
return config;
}
Inline Completion via Language Server Protocol
For embedding into any LSP-compatible editor (Neovim, Emacs, Helix):
from pygls.server import LanguageServer
from lsprotocol.types import (
TEXT_DOCUMENT_COMPLETION,
CompletionParams,
CompletionList,
CompletionItem,
CompletionItemKind,
)
from anthropic import Anthropic
import asyncio
server = LanguageServer("ai-completion-server", "v0.1")
anthropic_client = Anthropic()
@server.feature(TEXT_DOCUMENT_COMPLETION)
async def completions(params: CompletionParams):
document = server.workspace.get_document(params.text_document.uri)
# Get context: 50 lines before cursor
lines = document.lines
cursor_line = params.position.line
prefix = "\n".join(lines[max(0, cursor_line - 50):cursor_line + 1])
suffix = "\n".join(lines[cursor_line + 1:cursor_line + 10])
# Fill-in-the-middle prompt
prompt = f"<fim_prefix>{prefix}<fim_suffix>{suffix}<fim_middle>"
# Use fast model for completion
response = anthropic_client.messages.create(
model="claude-haiku-4-5",
max_tokens=150,
messages=[{
"role": "user",
"content": f"Complete this code (return only the completion, no explanation):\n{prompt}"
}]
)
completion_text = response.content[0].text.strip()
return CompletionList(
is_incomplete=False,
items=[CompletionItem(
label=completion_text[:50] + "..." if len(completion_text) > 50 else completion_text,
kind=CompletionItemKind.Snippet,
insert_text=completion_text,
detail="AI Suggestion",
)]
)
if __name__ == "__main__":
server.start_io()
Codebase Indexing with Tree-sitter
Semantic search across codebase is the foundation of context-dependent suggestions:
from tree_sitter import Language, Parser
from tree_sitter_languages import get_language, get_parser
from openai import OpenAI
import chromadb
from pathlib import Path
client = OpenAI()
chroma_client = chromadb.PersistentClient(path="./.codebase_index")
collection = chroma_client.get_or_create_collection("code_chunks")
def extract_functions(file_path: str, language: str) -> list[dict]:
"""Extracts functions/methods via Tree-sitter AST"""
parser = get_parser(language)
with open(file_path) as f:
source = f.read()
tree = parser.parse(source.encode())
# Tree-sitter query for Python functions
lang = get_language(language)
query = lang.query("""
(function_definition
name: (identifier) @func_name
body: (block) @func_body) @func_def
(class_definition
name: (identifier) @class_name
body: (block) @class_body) @class_def
""")
captures = query.captures(tree.root_node)
functions = []
for node, capture_name in captures:
if capture_name == "func_def":
func_text = source[node.start_byte:node.end_byte]
functions.append({
"file": file_path,
"type": "function",
"code": func_text,
"start_line": node.start_point[0],
})
return functions
def index_codebase(project_root: str):
"""Indexes entire codebase"""
project = Path(project_root)
all_chunks = []
for py_file in project.rglob("*.py"):
if "migrations" in str(py_file) or "__pycache__" in str(py_file):
continue
chunks = extract_functions(str(py_file), "python")
all_chunks.extend(chunks)
# Batch embedding
batch_size = 100
for i in range(0, len(all_chunks), batch_size):
batch = all_chunks[i:i + batch_size]
response = client.embeddings.create(
model="text-embedding-3-small",
input=[chunk["code"][:2000] for chunk in batch]
)
collection.add(
ids=[f"{c['file']}:{c['start_line']}" for c in batch],
embeddings=[e.embedding for e in response.data],
documents=[c["code"] for c in batch],
metadatas=[{"file": c["file"], "line": c["start_line"]} for c in batch],
)
print(f"Indexed {len(all_chunks)} code chunks")
def find_relevant_code(query: str, k: int = 5) -> list[str]:
"""Finds similar code for context"""
response = client.embeddings.create(
model="text-embedding-3-small",
input=query
)
results = collection.query(
query_embeddings=[response.data[0].embedding],
n_results=k,
)
return results["documents"][0]
Chat Mode with Project Context
class IDEChatAssistant:
"""Full-featured chat assistant with project context"""
def __init__(self, project_root: str):
self.project_root = project_root
self.client = Anthropic()
index_codebase(project_root)
async def chat(
self,
user_message: str,
current_file: str,
selected_code: str = None,
conversation_history: list = None,
) -> str:
# Gather context
context_parts = []
# Current file
if current_file:
with open(current_file) as f:
file_content = f.read()
context_parts.append(f"## Current file: {current_file}\n```python\n{file_content[:3000]}\n```")
# Selected code
if selected_code:
context_parts.append(f"## Selected code\n```python\n{selected_code}\n```")
# Similar code from codebase
relevant = find_relevant_code(user_message, k=3)
if relevant:
context_parts.append("## Similar code from project\n" + "\n\n".join(relevant))
system_prompt = f"""You are an experienced engineer working on the project.
{chr(10).join(context_parts)}
Rules:
- Write code in the style of the existing codebase
- Use the same dependencies already in the project
- Explain architectural decisions briefly
- If modifying existing code — show diff"""
messages = conversation_history or []
messages.append({"role": "user", "content": user_message})
response = self.client.messages.create(
model="claude-sonnet-4-5",
max_tokens=4096,
system=system_prompt,
messages=messages,
)
return response.content[0].text
Practical Case: Deployment in a 12-person Team
Starting state: team used GitHub Copilot ($19/month per developer), complained about irrelevant suggestions — Copilot didn't know internal patterns of Django project with 800+ models.
Solution: Continue.dev + local Ollama for autocomplete + Claude via API for chat/refactoring + custom context provider with codebase index.
Infrastructure: server with RTX 4090 (Qwen2.5-Coder 7B for autocomplete), Claude API for complex requests.
Results after 2 months:
- Inline suggestion acceptance: 23% (Copilot) → 41% (custom)
- Average time to write typical CRUD endpoint: 52 min → 31 min
- Tasks like "write test for this function": 100% manual → 70% automatic
- Cost: $228/month (Copilot) → ~$85/month (Ollama server amortized + Claude API)
Key improvement factor: context provider with codebase index gave the model real project examples, not abstract code.
Local Models for Completion
For teams with code confidentiality requirements — fully local stack:
| Model | Size | Latency (RTX 3080) | Quality |
|---|---|---|---|
| Qwen2.5-Coder 1.5B | 1.5B | 50–80 ms | Basic |
| Qwen2.5-Coder 7B | 7B | 150–250 ms | Good |
| DeepSeek-Coder 6.7B | 6.7B | 140–230 ms | Good |
| CodeLlama 13B | 13B | 350–500 ms | High |
For inline completion, latency < 200 ms is critical — users notice delays. Therefore, for FIM (fill-in-the-middle), models up to 7B are used.
Timeline
- Continue.dev + model configuration + basic context providers: 2–3 days
- Custom context providers (DB, Jira, documentation): 1 week
- Codebase indexing + semantic search: 1–2 weeks
- LSP server for non-standard editor: 2–3 weeks
- Total with team onboarding: 3–5 weeks







