Copilot-Like Assistant Integration for IDE

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Copilot-Like Assistant Integration for IDE
Complex
~2-4 weeks
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1240
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1167
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    867
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1084
  • image_logo-advance_0.png
    B2B Advance company logo design
    563
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    829

Integration of Copilot-like IDE Assistant

An AI IDE assistant isn't just code completion on steroids. It's a system that keeps the entire project in context: open files, change history, database schema, tests. A properly built assistant understands that you're writing a user registration function in a Django project with PostgreSQL, and suggests code compatible with your models and conventions — not an abstract Stack Overflow example.

IDE Assistant Architecture

A full-featured Copilot-like assistant consists of several layers:

Context Collector — gathers relevant context: current file, imports, related files, cursor position, selected code, clipboard.

LSP Bridge — interacts with Language Server Protocol for AST, types, definitions. Tree-sitter parses code into AST without running a compiler.

Retrieval Engine — semantic search across codebase. Embeddings for code (CodeBERT, text-embedding-3-small) + vector storage.

LLM Gateway — request routing: fast model for inline completion, powerful for chat/refactoring.

Response Renderer — output formatting: diff for refactoring, ghost text for completion, markdown for chat.

Continue.dev — Open-Source Foundation

Continue.dev is the most mature open-source alternative to GitHub Copilot. Supports VS Code and JetBrains, configurable via ~/.continue/config.json.

{
  "models": [
    {
      "title": "Claude 3.5 Sonnet",
      "provider": "anthropic",
      "model": "claude-sonnet-4-5",
      "apiKey": "$ANTHROPIC_API_KEY"
    },
    {
      "title": "Ollama Qwen2.5-Coder",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Autocomplete",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b"
  },
  "contextProviders": [
    {"name": "code", "params": {}},
    {"name": "docs", "params": {}},
    {"name": "diff", "params": {}},
    {"name": "terminal", "params": {}},
    {"name": "problems", "params": {}},
    {"name": "folder", "params": {}},
    {"name": "codebase", "params": {}}
  ],
  "slashCommands": [
    {"name": "edit", "description": "Edit highlighted code"},
    {"name": "comment", "description": "Write comments for the code"},
    {"name": "tests", "description": "Write unit tests"},
    {"name": "share", "description": "Export the chat session"}
  ]
}

Key feature: tabAutocompleteModel uses a fast local model (1.5B parameters), while chat uses a powerful cloud one. Inline completion latency: 80–150 ms on Qwen2.5-Coder 1.5B via Ollama.

Custom Context Provider

Continue.dev allows writing custom context providers for specific data sources:

// ~/.continue/config.ts
import { ContinueConfig, IContextProvider, ContextProviderDescription } from "@continuedev/core";

class DatabaseSchemaProvider implements IContextProvider {
  get description(): ContextProviderDescription {
    return {
      title: "db",
      displayTitle: "Database Schema",
      description: "Current database schema",
      type: "normal",
    };
  }

  async getContextItems(query: string, extras: any) {
    const schema = await fetchDatabaseSchema(); // your API

    return [{
      name: "Database Schema",
      description: "Current DB schema",
      content: schema,
    }];
  }
}

class JiraContextProvider implements IContextProvider {
  get description(): ContextProviderDescription {
    return {
      title: "jira",
      displayTitle: "Jira Ticket",
      description: "Fetch Jira ticket by ID",
      type: "query",
    };
  }

  async getContextItems(query: string, extras: any) {
    // query = "PROJ-123"
    const ticket = await fetchJiraTicket(query);

    return [{
      name: `Jira ${query}`,
      description: ticket.summary,
      content: `**${ticket.summary}**\n\n${ticket.description}\n\nAcceptance Criteria:\n${ticket.acceptance_criteria}`,
    }];
  }
}

export function modifyConfig(config: ContinueConfig): ContinueConfig {
  config.contextProviders = [
    ...(config.contextProviders || []),
    new DatabaseSchemaProvider(),
    new JiraContextProvider(),
  ];
  return config;
}

Inline Completion via Language Server Protocol

For embedding into any LSP-compatible editor (Neovim, Emacs, Helix):

from pygls.server import LanguageServer
from lsprotocol.types import (
    TEXT_DOCUMENT_COMPLETION,
    CompletionParams,
    CompletionList,
    CompletionItem,
    CompletionItemKind,
)
from anthropic import Anthropic
import asyncio

server = LanguageServer("ai-completion-server", "v0.1")
anthropic_client = Anthropic()

@server.feature(TEXT_DOCUMENT_COMPLETION)
async def completions(params: CompletionParams):
    document = server.workspace.get_document(params.text_document.uri)

    # Get context: 50 lines before cursor
    lines = document.lines
    cursor_line = params.position.line
    prefix = "\n".join(lines[max(0, cursor_line - 50):cursor_line + 1])
    suffix = "\n".join(lines[cursor_line + 1:cursor_line + 10])

    # Fill-in-the-middle prompt
    prompt = f"<fim_prefix>{prefix}<fim_suffix>{suffix}<fim_middle>"

    # Use fast model for completion
    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=150,
        messages=[{
            "role": "user",
            "content": f"Complete this code (return only the completion, no explanation):\n{prompt}"
        }]
    )

    completion_text = response.content[0].text.strip()

    return CompletionList(
        is_incomplete=False,
        items=[CompletionItem(
            label=completion_text[:50] + "..." if len(completion_text) > 50 else completion_text,
            kind=CompletionItemKind.Snippet,
            insert_text=completion_text,
            detail="AI Suggestion",
        )]
    )

if __name__ == "__main__":
    server.start_io()

Codebase Indexing with Tree-sitter

Semantic search across codebase is the foundation of context-dependent suggestions:

from tree_sitter import Language, Parser
from tree_sitter_languages import get_language, get_parser
from openai import OpenAI
import chromadb
from pathlib import Path

client = OpenAI()
chroma_client = chromadb.PersistentClient(path="./.codebase_index")
collection = chroma_client.get_or_create_collection("code_chunks")

def extract_functions(file_path: str, language: str) -> list[dict]:
    """Extracts functions/methods via Tree-sitter AST"""
    parser = get_parser(language)

    with open(file_path) as f:
        source = f.read()

    tree = parser.parse(source.encode())

    # Tree-sitter query for Python functions
    lang = get_language(language)
    query = lang.query("""
        (function_definition
            name: (identifier) @func_name
            body: (block) @func_body) @func_def

        (class_definition
            name: (identifier) @class_name
            body: (block) @class_body) @class_def
    """)

    captures = query.captures(tree.root_node)
    functions = []

    for node, capture_name in captures:
        if capture_name == "func_def":
            func_text = source[node.start_byte:node.end_byte]
            functions.append({
                "file": file_path,
                "type": "function",
                "code": func_text,
                "start_line": node.start_point[0],
            })

    return functions

def index_codebase(project_root: str):
    """Indexes entire codebase"""
    project = Path(project_root)

    all_chunks = []
    for py_file in project.rglob("*.py"):
        if "migrations" in str(py_file) or "__pycache__" in str(py_file):
            continue

        chunks = extract_functions(str(py_file), "python")
        all_chunks.extend(chunks)

    # Batch embedding
    batch_size = 100
    for i in range(0, len(all_chunks), batch_size):
        batch = all_chunks[i:i + batch_size]

        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=[chunk["code"][:2000] for chunk in batch]
        )

        collection.add(
            ids=[f"{c['file']}:{c['start_line']}" for c in batch],
            embeddings=[e.embedding for e in response.data],
            documents=[c["code"] for c in batch],
            metadatas=[{"file": c["file"], "line": c["start_line"]} for c in batch],
        )

    print(f"Indexed {len(all_chunks)} code chunks")

def find_relevant_code(query: str, k: int = 5) -> list[str]:
    """Finds similar code for context"""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=query
    )

    results = collection.query(
        query_embeddings=[response.data[0].embedding],
        n_results=k,
    )

    return results["documents"][0]

Chat Mode with Project Context

class IDEChatAssistant:
    """Full-featured chat assistant with project context"""

    def __init__(self, project_root: str):
        self.project_root = project_root
        self.client = Anthropic()
        index_codebase(project_root)

    async def chat(
        self,
        user_message: str,
        current_file: str,
        selected_code: str = None,
        conversation_history: list = None,
    ) -> str:

        # Gather context
        context_parts = []

        # Current file
        if current_file:
            with open(current_file) as f:
                file_content = f.read()
            context_parts.append(f"## Current file: {current_file}\n```python\n{file_content[:3000]}\n```")

        # Selected code
        if selected_code:
            context_parts.append(f"## Selected code\n```python\n{selected_code}\n```")

        # Similar code from codebase
        relevant = find_relevant_code(user_message, k=3)
        if relevant:
            context_parts.append("## Similar code from project\n" + "\n\n".join(relevant))

        system_prompt = f"""You are an experienced engineer working on the project.

{chr(10).join(context_parts)}

Rules:
- Write code in the style of the existing codebase
- Use the same dependencies already in the project
- Explain architectural decisions briefly
- If modifying existing code — show diff"""

        messages = conversation_history or []
        messages.append({"role": "user", "content": user_message})

        response = self.client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=4096,
            system=system_prompt,
            messages=messages,
        )

        return response.content[0].text

Practical Case: Deployment in a 12-person Team

Starting state: team used GitHub Copilot ($19/month per developer), complained about irrelevant suggestions — Copilot didn't know internal patterns of Django project with 800+ models.

Solution: Continue.dev + local Ollama for autocomplete + Claude via API for chat/refactoring + custom context provider with codebase index.

Infrastructure: server with RTX 4090 (Qwen2.5-Coder 7B for autocomplete), Claude API for complex requests.

Results after 2 months:

  • Inline suggestion acceptance: 23% (Copilot) → 41% (custom)
  • Average time to write typical CRUD endpoint: 52 min → 31 min
  • Tasks like "write test for this function": 100% manual → 70% automatic
  • Cost: $228/month (Copilot) → ~$85/month (Ollama server amortized + Claude API)

Key improvement factor: context provider with codebase index gave the model real project examples, not abstract code.

Local Models for Completion

For teams with code confidentiality requirements — fully local stack:

Model Size Latency (RTX 3080) Quality
Qwen2.5-Coder 1.5B 1.5B 50–80 ms Basic
Qwen2.5-Coder 7B 7B 150–250 ms Good
DeepSeek-Coder 6.7B 6.7B 140–230 ms Good
CodeLlama 13B 13B 350–500 ms High

For inline completion, latency < 200 ms is critical — users notice delays. Therefore, for FIM (fill-in-the-middle), models up to 7B are used.

Timeline

  • Continue.dev + model configuration + basic context providers: 2–3 days
  • Custom context providers (DB, Jira, documentation): 1 week
  • Codebase indexing + semantic search: 1–2 weeks
  • LSP server for non-standard editor: 2–3 weeks
  • Total with team onboarding: 3–5 weeks