Implementing Data Anonymization/Pseudonymization in Mobile Apps
Anonymization and pseudonymization — two different tools with different legal consequences. Anonymized data exits GDPR entirely: if data can't be linked to person with reasonable effort — regulator doesn't require protection. Pseudonymized data GDPR still regulates but with reduced requirements. Confusing them — typical design error.
Anonymization: When and How
True anonymization in product systems rare — full anonymization usually kills data value. But justified in two scenarios: analytics aggregates (DAU, retention by cohorts) and archived data after retention expiry.
k-anonymity — basic method: record set anonymous if each record indistinguishable from minimum k-1 others by quasi-identifiers (age, region, device). At k=5: if "iOS, Moscow, 25-30" cohort < 5 users — don't publish.
For mobile analytics: raw event export to Data Warehouse — generalize IP to /24 subnet, age to ranges, remove precise coords, replace device_id with daily-rotated hashed ID.
Pseudonymization in Practice
Pseudonymization — replace direct identifiers (email, phone, name) with reversible surrogates with separate key storage. In backend context:
Data storage:
-- Main table — pseudonymized data only
users:
id UUID PK
pseudonym_id VARCHAR -- 'usr_a8f3c91d' — reversible via key vault
-- Vault table — separate DB or KMS
user_identity_vault:
pseudonym_id VARCHAR PK
email_encrypted BYTEA
phone_encrypted BYTEA
name_encrypted BYTEA
encryption_key_id VARCHAR -- ref to key in KMS (AWS KMS, HashiCorp Vault)
Main DB handles 99% ops with pseudonym_id. Real data queried from vault only when necessary (send email, show name in profile).
Technical Methods at Application Level
Tokenization for card numbers and sensitive financial data: real value replaced with token stored in isolated token vault (PCI DSS scope). Mobile app works only with token.
Hashing with salt for analytics IDs:
// Android — generate anonymous analytics ID
fun getAnalyticsId(userId: String, dailySalt: String): String {
val input = "$userId:$dailySalt".toByteArray()
val digest = MessageDigest.getInstance("SHA-256").digest(input)
return Base64.encodeToString(digest, Base64.NO_WRAP).take(16)
}
Daily salt guarantees: user ID in analytics can't link between days without salt knowledge. Meets Apple ATT and Android Privacy Sandbox requirements.
Data Retention and Auto-Deletion
Pseudonymization without deletion policy — half-measure. For each data category — explicit retention period in schema:
-- Mark retention on creation
INSERT INTO user_events (user_id, event_type, data, delete_after)
VALUES (?, 'page_view', ?, NOW() + INTERVAL '90 days');
-- Background task (cron)
DELETE FROM user_events WHERE delete_after < NOW();
-- On delete — not DELETE, but anonymize via nulling user_id:
UPDATE user_events SET user_id = NULL WHERE delete_after < NOW();
Anonymization vs deletion preserves stats (event count by type) while losing user link.
Mobile Client Specifics
Never cache decrypted personal data in UserDefaults or SharedPreferences on client. If offline profile access needed — encrypt via Keychain/Android Keystore (AES-GCM, key in TEE). On logout — delete encrypted blob.
Pseudonymization on client: for analytics events use only analytics_id (hashed, rotating), not user_id. If analytics SDK (Firebase, Amplitude) requires user_id — send pseudonym, not real identifier.
Timeline — 2–3 days: pseudonymization schema design, vault layer implementation, retention jobs setup, analytics layer update on client.







