Implementing AI-Powered Note-Taking (Summarization) in Mobile Applications
Note-taking looks simple at first: give LLM long text, get summary. In practice, text can be 10x longer than context window, arrive in chunks (from streaming source), or require structured output, not just paragraphs.
Long documents: chunking and map-reduce summarization
gpt-4o supports 128k token context, but pumping entire documents costs and slows. Standard pattern: map-reduce:
- Split document into 2000–3000 token chunks with ~200 token overlap
- Summarize each chunk independently (map)
- Summarize list of summaries into final summary (reduce)
// iOS
func summarizeDocument(_ text: String) async throws -> String {
let chunks = chunkText(text, maxTokens: 2500, overlap: 200)
// Parallel chunk summarization
let partialSummaries = try await withThrowingTaskGroup(of: String.self) { group in
for chunk in chunks {
group.addTask { try await self.summarizeChunk(chunk) }
}
var results = [String]()
for try await result in group { results.append(result) }
return results
}
// Final reduce
let combined = partialSummaries.joined(separator: "\n\n")
return try await summarizeChunk(combined, isFinal: true)
}
func chunkText(_ text: String, maxTokens: Int, overlap: Int) -> [String] {
// ~4 chars = 1 token for English (approximate)
let chunkSize = maxTokens * 3
let overlapSize = overlap * 3
var chunks = [String]()
var start = text.startIndex
while start < text.endIndex {
let end = text.index(start, offsetBy: chunkSize, limitedBy: text.endIndex) ?? text.endIndex
chunks.append(String(text[start..<end]))
guard let nextStart = text.index(start, offsetBy: chunkSize - overlapSize, limitedBy: text.endIndex) else { break }
start = nextStart
}
return chunks
}
withThrowingTaskGroup runs each chunk in parallel. For 10 chunks, 5–7x faster than sequential.
Output formats
Summaries can be several types. Prompts for each:
| Type | Prompt instruction |
|---|---|
| Brief summary | "Summarize in 3-5 sentences. Key points only." |
| Bullets | "Extract 5-8 key points as bullet list. Each point = one idea." |
| Mind-map JSON | "Return JSON: {title, branches: [{topic, subtopics: []}]}" |
| Q&A | "Generate 5 questions and answers based on the text." |
| Action items | "Extract only action items and deadlines. Format: - [Task]: [Deadline/Owner]" |
For structured output, use response_format: { type: "json_object" } in OpenAI API—model must return valid JSON without markdown wrapper.
let requestBody: [String: Any] = [
"model": "gpt-4o-mini",
"messages": messages,
"response_format": ["type": "json_object"],
"temperature": 0.2
]
Real-time summarization: lectures and meetings
If source is mic (lecture recording, meeting), summary builds on transcription. Flow:
AVAudioEngine → 30-sec fragments → SpeechRecognizer (Whisper API or native SFSpeechRecognizer) → accumulated transcript → summarization with rolling window.
// Summarize every 5 minutes of transcript with overlap
class LiveSummaryEngine {
private var transcript = ""
private var lastSummaryLength = 0
func onNewTranscript(_ chunk: String) {
transcript += " " + chunk
// Summarize new block when ~2000 words accumulated
let wordCount = transcript.split(separator: " ").count
if wordCount - lastSummaryLength > 2000 {
Task { await summarizeNewBlock() }
lastSummaryLength = wordCount
}
}
private func summarizeNewBlock() async {
let newContent = transcript.components(separatedBy: " ")
.dropFirst(max(0, lastSummaryLength - 200)) // 200-word overlap
.joined(separator: " ")
let summary = try? await llmService.summarize(newContent)
await MainActor.run { appendToNotes(summary ?? "") }
}
}
Android equivalent via SpeechRecognizer + MediaRecorder with chunking by RECOGNIZER_RESULT_STABILITY.
Storage and search of summaries
Summaries must be available offline and support search. iOS: Core Data or SwiftData with FTS via NSPersistentStoreDescription with SQLite FTS5. Android: Room with @Fts4 or @Fts5 annotation.
Semantic search (by meaning, not words): via vector embeddings stored locally in SQLite-VSS or server-side with pgvector. For mobile, server search with result caching suffices.
Timeline estimates
Basic document summarization via API—2–3 days. Map-reduce for long documents + multiple output formats—1.5 weeks. Live summarization with transcription—3–4 weeks.







