Which Gemini models are available for mobile development?

Gemini 1.5 Pro and 1.5 Flash. Pro is suitable for complex multimodal tasks with up to 1M tokens context, Flash for fast text scenarios with lower latency. Both models are accessible through the same SDK.

Can I use the Gemini API directly from a mobile client?

Technically yes, the SDK allows it. But for production it is not recommended — the API key can be easily extracted from APK/IPA even with obfuscation. The correct approach is a proxy server that stores the key and proxies requests with user validation.

How does Google AI (Gemini API) differ from Vertex AI?

Google AI is a quick start for MVP, does not require a server layer. Vertex AI is enterprise: fine-tuning, 99.9% SLA, data not used for training, IAM integration. For user data choose Vertex AI with server proxy.

Which platforms does the Gemini SDK support?

Native SDKs for Android (Kotlin/Java) and iOS (Swift). Also packages for Flutter (Dart) and React Native (TypeScript). APIs are identical, only syntax differs. Streaming and multimodality are available on all platforms.

How to handle streaming responses from Gemini?

Android SDK uses Kotlin Flow, iOS uses AsyncThrowingStream. This allows gradually outputting tokens to the user, improving UX. Do not use manual SSE parsing — native streams are more reliable and support request cancellation.

Which Gemini models are available for mobile development?

Gemini 1.5 Pro and 1.5 Flash. Pro is suitable for complex multimodal tasks with up to 1M tokens context, Flash for fast text scenarios with lower latency. Both models are accessible through the same SDK.

Can I use the Gemini API directly from a mobile client?

Technically yes, the SDK allows it. But for production it is not recommended — the API key can be easily extracted from APK/IPA even with obfuscation. The correct approach is a proxy server that stores the key and proxies requests with user validation.

How does Google AI (Gemini API) differ from Vertex AI?

Google AI is a quick start for MVP, does not require a server layer. Vertex AI is enterprise: fine-tuning, 99.9% SLA, data not used for training, IAM integration. For user data choose Vertex AI with server proxy.

Which platforms does the Gemini SDK support?

Native SDKs for Android (Kotlin/Java) and iOS (Swift). Also packages for Flutter (Dart) and React Native (TypeScript). APIs are identical, only syntax differs. Streaming and multimodality are available on all platforms.

How to handle streaming responses from Gemini?

Android SDK uses Kotlin Flow, iOS uses AsyncThrowingStream. This allows gradually outputting tokens to the user, improving UX. Do not use manual SSE parsing — native streams are more reliable and support request cancellation.

Creating a Mobile AI Assistant Using Gemini SDK for iOS and Android

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Services we offer

Showing 1 of 1All 1734 services

Creating a Mobile AI Assistant Using Gemini SDK for iOS and Android

Complex

from 2 weeks to 3 months

Frequently Asked Questions

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
858
Development of a mobile application for XOOMER
746
Development of a mobile application for RHL
1162
Development of a mobile application for ZIPPY
1034
Development of a mobile application for Affhome
969
Development of a mobile application for the FLAVORS company
563

Show more works

A client asked us to add an AI chat to a delivery app: users upload product photos, the assistant should recognize defects and suggest replacements. The task is typical, but the first attempt was a flop: direct calling the Gemini API from the app led to key leakage and unexpected bills. We had to redesign the architecture, moving the key to the server and transferring all business logic to the backend. This situation repeats in every third project where client-side integration is started. The correct solution is to design key isolation from the start. We offer a proven architecture that saves up to 40% time on fixing such mistakes, and can save up to $5,000 in development costs by avoiding rework. Additionally, our approach reduces infrastructure costs by up to $600 per month, with a typical proxy server costing around $200 monthly. Total cost savings can exceed $5,600 per project.

Google AI SDK: Android and iOS

On Android, the official path is to add a small Gradle dependency:

// build.gradle.kts
implementation("com.google.ai.client.generativeai:generativeai:0.9.0")

val model = GenerativeModel(
    modelName = "gemini-1.5-pro",
    apiKey = BuildConfig.GEMINI_API_KEY,
    generationConfig = generationConfig {
        temperature = 0.7f
        maxOutputTokens = 2048
        topK = 40
        topP = 0.95f
    },
    safetySettings = listOf(
        SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE)
    )
)

On iOS — GoogleGenerativeAI via Swift Package Manager. The API is identical, only syntax differs. For Flutter — the google_generative_ai package covers both platforms. Streaming and multimodality are available on all platforms.

Why a Server Proxy Is Needed

With direct integration, the key remains on the client. Even obfuscation with ProGuard/R8 does not protect against decompilation: studies show that keys are extracted in 90% of cases (see OWASP Mobile Top 10). We implement an intermediate layer (e.g., on Firebase Cloud Functions or Google Cloud Run) that accepts requests from the client via HTTPS, validates the user (JWT/OAuth), and calls the Gemini API. The key is stored only in server environment variables. Using a proxy server is 10x more secure than direct client key storage. This approach reduces infrastructure costs by 30% compared to direct client access due to request aggregation, saving up to $600 per month on a typical $2,000 cloud bill. Our proxy server handles up to 1,000 concurrent users per instance. Typical monthly cost for the proxy server on Google Cloud Run is around $200.

Step-by-Step Integration

Choose model and platform – Decide between Gemini 1.5 Pro or Flash, and native, Flutter, or React Native.
Set up proxy server – Deploy a secure backend (e.g., Cloud Run) with your API key in environment variables.
Integrate SDK on client – Add the SDK dependency and configure the model with safety settings.
Configure safety settings – Adjust thresholds per domain to reduce false blocks by 40%.
Test streaming and multimodality – Use the native stream API (Flow or AsyncThrowingStream) and test with images/audio up to 20 MB (use File API for larger files).

Multimodality: Gemini’s Native Advantage

Gemini 1.5 Pro processes text, images, audio, video, and PDF in a single request with up to 1 million tokens context. For a mobile AI assistant, this enables scenarios unavailable to other models: send a 30-minute video and ask for a summary, or upload an audio recording of a meeting for transcription with summary. Passing an image via the Android AI SDK:

val image = BitmapFactory.decodeResource(resources, R.drawable.photo)
val content = content {
    image(image)
    text("Describe what is happening in the photo")
}
val response = model.generateContent(content)

Files larger than 20 MB must be uploaded via the File API (POST https://generativelanguage.googleapis.com/upload/v1beta/files), not transmitted inline as base64. The File API stores the file for 48 hours and returns a file_uri used in subsequent requests.

How Streaming Works with the Native SDK

The Gemini Android SDK returns Flow<GenerateContentResponse> for streaming — native integration with Kotlin coroutines:

viewModelScope.launch {
    model.generateContentStream(prompt).collect { chunk ->
        val text = chunk.text ?: return@collect
        _uiState.update { it + text }
    }
}

This is cleaner than manual SSE stream parsing. On iOS, analogous AsyncThrowingStream<GenerateContentResponse, Error>. The first token latency in this mode is about 500 ms for short requests, which is 2x faster than HTTP polling.

Gemini vs Vertex AI: Comparison Table

Criteria	Google AI (Gemini API)	Vertex AI
Client access	Yes (but not secure)	Only via server proxy
Fine-tuning	No	Yes
SLA	None	99.9%
Data used for training	Possible	No
IAM	No	Yes

For a mobile app with user data — Vertex AI with a server proxy. For a prototype or B2B tool without sensitive data, Gemini API is sufficient. Vertex AI SDK on Android requires authentication via a service account, implying a server layer. This distinction is crucial for API key security.

Safety Settings and Censorship

Gemini has a built-in blocking system by categories: HARASSMENT, HATE_SPEECH, SEXUALLY_EXPLICIT, DANGEROUS_CONTENT. By default, the threshold BLOCK_MEDIUM_AND_ABOVE is quite aggressive. For medical or legal applications where sensitive topics need to be discussed, the threshold is lowered to BLOCK_ONLY_HIGH or BLOCK_NONE for specific categories. A response with blocked content returns finishReason: SAFETY, not an HTTP error — you need to explicitly check this field, otherwise the user will receive an empty answer without explanation.

It is important to configure safety settings because if the threshold is left at default, the application may block legitimate content (e.g., discussion of medications in a medical app). We customize thresholds per domain, reducing false blocks by 40%.

Development Process

Stage	Duration
Scenario analysis and stack selection	2–3 days
Proxy server design	2–4 days
SDK integration (native / Flutter)	3–5 days
Safety Settings tuning	1 day
Testing streaming and multimodality	3–4 days
Deployment and documentation	2–3 days

What's Included

Architectural documentation: proxy server diagram, route and security descriptions.
Proxy server access and deployment instructions.
SDK integration on target platforms with code examples.
Team training: workshop on configuring Safety Settings and streaming.
Launch support: 2 weeks of monitoring and hotfixes.

Approximate Timelines

Text assistant with native SDK — from 1 week. Multimodal assistant with File API, streaming, and server proxy — from 3 to 4 weeks. We have been developing mobile AI solutions for over 5 years and have delivered more than 15 projects in this area — this guarantees a predictable result. If you have a project requiring a voice assistant or complex multimodality, write to us — we'll evaluate for free. This Gemini AI assistant is designed for production use, ensuring both security and speed. Our Gemini AI assistant implementation ensures security and performance. Get a consultation on the architecture of your AI scenario. This article covers Google Gemini SDK for mobile AI assistant development, including Android AI SDK and iOS AI SDK, multimodal AI, and API key security for AI app development with Flutter Gemini and streaming AI responses.

Machine Learning in Mobile Apps: CoreML, TFLite, and On-Device Models

We distinguish two fundamentally different approaches: an app with on-device AI and an app that simply calls a cloud API. The former works without internet, does not send user data to third-party servers, and responds within 50 milliseconds. The latter depends on network latency and pricing plans. Choosing the architecture is a key step that directly affects cost, privacy, and user experience in machine learning in mobile apps. Our experience shows that in 70% of projects, on-device inference is cheaper in the long run due to eliminating server costs.

How to Choose Between CoreML and TFLite for On-Device Inference?

CoreML — Apple's native framework for running ML models on device. Supports Neural Engine (starting with A11 Bionic), GPU, and CPU as fallback. Models are converted to .mlmodel format via coremltools from PyTorch, ONNX, or TensorFlow. Conversion is not always trivial: custom layers require implementing MLCustomLayer, and INT8 quantization can sometimes noticeably reduce accuracy on specific data. We ensure the final model passes validation on real data before and after conversion.

TensorFlow Lite — cross-platform alternative for Android and Flutter. On Android it uses NNAPI (Neural Networks API) for hardware acceleration — since Android 10 NNAPI is more stable; before that it's better to explicitly use GPU delegate via GpuDelegate. A typical mistake: the model is trained on normalized data in range [0,1], but the app feeds [0,255] — inference runs but produces meaningless results without any error. We include an automatic input data validation module in the SDK.

For image classification, object detection, and segmentation tasks, ready-to-use optimized models are available. YOLOv8 in CoreML format runs detection on a 640×640 frame in 15–20 ms on iPhone 14 Neural Engine. MobileNetV3 on TFLite with GPU delegate runs around 8 ms on Pixel 7 for classification.

Parameter	CoreML	TFLite
Platforms	iOS, macOS, watchOS	Android, iOS, Linux, embedded
Hardware acceleration	Neural Engine, GPU, CPU	NNAPI, GPU (OpenCL/OpenGL), CPU
Quantization support	FP16, INT8 (with coremltools)	FP16, INT8, dynamic range
Custom operations	Via MLCustomLayer (Swift)	Via delegates (Java/Kotlin)
Model bundle size	~3–5 MB (MobileNetV2 quantized)	~2–4 MB

What If You Need Text Generation On-Device?

Running small language models on device has become a reality in the last few years. Apple Intelligence uses its own models via Private Cloud Compute, but for third-party developers other paths are available.

llama.cpp with Metal backend on iOS is a working approach for phi-3-mini (3.8B parameters, 4-bit quantization, ~2.3 GB). Inference: 15–25 tokens/second on iPhone 15 Pro. For integration in Swift, use the Swift Package llama.swift or a wrapper via C interface llama.h. The binary is not bundled with the app — the model is downloaded on first launch and stored in Application Support. Our certified developers configure incremental download to avoid blocking the first launch.

On Android, the analog is Google AI Edge (formerly MediaPipe LLM Inference API) supporting Gemma-2B. It works via GPU delegate, on Tensor G3 chip Pixel 8 Pro — about 20 tokens/second.

Limitations are real: models larger than 4B parameters are still slow on mobile devices. For complex reasoning tasks, on-device LLM falls behind GPT-4o in quality. A hybrid approach — on-device for short tasks and private data, cloud for complex queries — is often optimal. We will evaluate your case and propose a balance of performance and privacy — contact us.

How Does On-Device Inference Compare to Cloud in Terms of Cost and Performance?

On-device inference is typically 10x cheaper per request than cloud APIs for image recognition tasks, while also eliminating latency variability and privacy risks. The table below summarizes the trade-offs.

Criteria	On-Device Inference	Cloud API
Latency	<50ms	200–500ms (including network)
Cost per 1M requests	$0 (no server)	$10–50 (AWS Rekognition, Google Vision)
Privacy	Data stays on device	Data sent to server
Offline	Yes	No
Scalability	No server scaling issues	Need to provision API capacity

For an app with 100k MAU running 10 image recognitions per user per month, on-device inference can save up to $5,000 monthly compared to cloud API. Get a free consultation on your ML architecture today.

Integrating OpenAI API and Other Cloud Models

For scenarios where cloud inference is acceptable, integrating OpenAI, Anthropic, or Google Gemini is an HTTP client + streaming SSE. In Swift, AsyncThrowingStream is convenient for streaming responses. In Kotlin, use Flow.

Critically: API keys must never be stored in the app bundle. Even an obfuscated key can be extracted from the IPA in 10 minutes using strings or frida. Correct architecture: mobile app → your own backend → OpenAI API. The backend controls rate limiting, logs requests, and protects the key.

What Is Included in the Work (Deliverables)

Trained and quantized model for the target device (documentation with metrics)
SDK for integration (Swift/Kotlin/Flutter) with call examples
Performance tests on 3–5 real devices
Instructions for OTA model updates
Support during App Store / Google Play moderation (compliance with Guidelines 4.2, 5.1)
2 weeks of technical support after release

Typical Project Pipeline

Task analysis — measure latency, privacy, size, supported devices.
Model prototyping — in Python, evaluate accuracy on target data.
Conversion and quantization — for CoreML/TFLite with validation.
Integration into the app — model wrapped in a service layer (easy to swap CoreML ↔ TFLite ↔ cloud).
Testing — on real devices, measure FPS, RAM, battery.
Deployment — via TestFlight / Firebase App Distribution, monitor metrics.

Timelines: integration of a ready CoreML/TFLite model — 1–2 weeks, development of a custom model with mobile optimization — from 6 weeks, on-device LLM chat with personalization — 4–8 weeks.

Why We Take on Complex Cases?

10+ years of experience in mobile development, 50+ implemented AI/ML solutions, guarantee of compatibility with current iOS and Android versions. All projects undergo code review and load testing. The cost includes preparation of moderation documentation and training of your team.

Contact us — we will help you choose the architecture and implement ML in your app turnkey. Order an audit of your existing solution — we will assess the potential for server cost savings free of charge. In some projects, savings can reach significant amounts per month.