Integrating Neural Networks with Bitrix24
ChatGPT is text. But neural networks can work with images, voice, classification, and predictions. When a marketer manually draws banners, when a supervisor re-listens to 40-minute calls instead of reading conversation summaries, when leads are distributed by departments "by eye" — these are tasks for specialized ML models connected to Bitrix24 via REST API.
General Architecture
The scheme is the same as when integrating ChatGPT, but with an expanded set of models:
Bitrix24 REST API → middleware server → API of neural networks (OpenAI, Replicate, custom models)
The middleware receives an event or call from Bitrix24, determines the task type, selects the model, forms a request, gets the result, and returns it to Bitrix24. One middleware can work with a dozen different models — routing by task type.
Call Transcription: Whisper
The most in-demand scenario after text generation. A manager spoke with a client for 20 minutes — the recording is stored in Bitrix24 as an audio file. A supervisor won't re-listen to every call. Whisper converts audio to text, and GPT makes a structured summary from the text.
Technical chain:
- Call completed → event
ONVOXIMPLANTCALLENDis sent to middleware. - Middleware gets the call recording via
voximplant.statistic.getand downloads the audio file. - Audio is sent to Whisper API (
POST /v1/audio/transcriptions). Parameters:model=whisper-1,language=ru,response_format=verbose_json(with timecodes). - Transcription is passed to GPT with prompt: "Extract key agreements, client questions, next steps. Format: JSON".
- Result is written to the call comment or to a custom deal field via
crm.timeline.comment.add.
Cost: Whisper — $0.006 per minute of audio. 20-minute call — $0.12. With 50 calls per day — about $6 per day. For most companies, this is many times cheaper than a supervisor's time to listen.
Image Generation: DALL-E and Stable Diffusion
A marketer needs a banner for a mailing, an illustration for a social media post, or a visual for a product card. Instead of a brief to a designer and waiting 2 days — a neural network request from the Bitrix24 interface.
Implementation:
-
Via chatbot. Marketer writes to bot: "Generate a banner for a -20% winter collection sale, minimalism style, 1200x628 format". Bot sends prompt to DALL-E 3 (
POST /v1/images/generations) or Stable Diffusion via Replicate API. Gets image URL, downloads, uploads to Bitrix24 Disk viadisk.file.uploadtofolder, sends preview in chat. - Via business process. When creating a marketing activity in CRM, visual options are automatically generated based on text and campaign parameters.
DALL-E 3 works well with conceptual images and illustrations. For photorealistic images and style control, we use Stable Diffusion XL via Replicate — more flexible in settings, supports ControlNet and LoRA models.
Classification and Sentiment Analysis
Text models solve tasks that are not just generation:
Lead classification by topic. A lead contains arbitrary request text. The model determines the category: "website development", "support", "integration", "hosting". Based on the category, middleware via crm.lead.update sets the appropriate direction and assigns the responsible department.
Sentiment analysis of inquiries. Emails and messages from open channels pass through a sentiment analysis model. If sentiment is negative — the inquiry gets higher priority, the supervisor is notified. Implemented via fine-tuned BERT-based model or via GPT prompt with classification instructions on a scale "positive / neutral / negative".
Entity extraction (NER). Structured data is extracted from email or form text: company name, INN, required timeline, budget. Lead fields are filled automatically via crm.lead.update. Saves the manager's time on manual card filling.
ML Pipelines via REST API
For complex scenarios, we build processing chains:
- Incoming email → text extraction → topic classification → sentiment determination → qualification → routing to responsible person.
- Completed call → transcription (Whisper) → summarization (GPT) → task extraction from conversation → automatic task creation in Bitrix24.
-
New product in catalog → description generation (GPT) → image generation (DALL-E) → publication in CRM catalog via
crm.product.update.
Each pipeline step is a separate API call with error handling. If Whisper returns an error — retry with exponential backoff. If GPT exceeds token limit — fallback to a smaller model.
Model Selection by Task
| Task | Model | API | Cost |
|---|---|---|---|
| Call transcription | Whisper | OpenAI | $0.006/min |
| Conversation summary | GPT-4o-mini | OpenAI | ~$0.01 per summary |
| Lead qualification | GPT-4o-mini | OpenAI | ~$0.005 per lead |
| Image generation | DALL-E 3 | OpenAI | $0.04–0.08 per image |
| Photorealistic images | SDXL | Replicate | ~$0.01 per image |
| Sentiment classification | BERT fine-tuned | Own server | Hosting cost |
| Entity extraction | GPT-4o-mini / spaCy | OpenAI / own server | $0.005 per request / hosting |
Implementation Timeline
| Scale | Includes | Timeline |
|---|---|---|
| Single scenario | Call transcription or image generation | 3–5 days |
| Complex | 2–3 scenarios, classification + transcription + generation | 1–2 weeks |
| ML pipeline | Full processing chain, custom models, monitoring | 3–5 weeks |
What We Implement
- Middleware server with request routing by task type and model
- Call transcription via Whisper with summary recording in CRM
- Marketing image generation from chat or business process
- Lead classification and routing by topic and sentiment
- Structured data extraction from emails and forms
- ML pipelines: data processing chains with automatic actions in Bitrix24
- Cost and quality monitoring: token spending dashboard, result logging







