normogen/thoughts/research/2026-02-14-mongodb-schema-decision.md
goose 4dca44dbbe Research: MongoDB schema design complete
- Zero-knowledge encryption for ALL sensitive data + metadata
- Blood pressure example: value + type + unit ALL encrypted
- 9 collections: users, families, profiles, health_data, lab_results, medications, appointments, shares, refresh_tokens
- Client-side encryption (AES-256-GCM, PBKDF2)
- Server NEVER decrypts data
- Privacy-preserving queries (plaintext fields: userId, profileId, familyId, date, tags)
- Tagging system for encrypted data search
- Date range queries (plaintext dates)

Key principle:
- Both value AND metadata encrypted (e.g., "blood_pressure" + "120/80")
- No plaintext metadata leaks
- Server stores ONLY encrypted data

Updated tech stack decisions with MongoDB schema

All major research complete (Rust, Mobile, Web, State, Auth, Database)

Next: Backend development (Axum + MongoDB)
2026-02-14 13:39:57 -03:00

5.6 KiB

MongoDB Schema Design Decision Summary

Date: 2026-02-14 Decision: Zero-Knowledge Encryption for All Sensitive Data + Metadata


Core Principle

ALL sensitive data AND metadata must be encrypted client-side before reaching MongoDB.

Example: Blood Pressure Reading

Before encryption (client-side):

{
  value: "120/80",
  type: "blood_pressure",
  unit: "mmHg",
  date: "2026-02-14T10:30:00Z"
}

After encryption (stored in MongoDB):

{
  healthDataId: "health-123",
  userId: "user-456",
  profileId: "profile-789",
  familyId: "family-012",
  
  // Encrypted (value + metadata)
  healthData: [
    {
      encrypted: true,
      data: "a1b2c3d4...",
      iv: "e5f6g7h8...",
      authTag: "i9j0k1l2..."
    }
  ],
  
  // Metadata (plaintext)
  createdAt: ISODate("2026-02-14T10:30:00Z"),
  updatedAt: ISODate("2026-02-14T10:30:00Z"),
  dataSource: "healthKit"
}

Collections Summary

Collection Purpose Encrypted Fields Plaintext Fields
users Authentication encryptedRecoveryPhrase userId, email, passwordHash, tokenVersion, familyId, familyRole, permissions
families Family structure familyName, familyMetadata familyId, members[].userId, members[].profileId, members[*].role
profiles Person profiles profileName, profileMetadata profileId, userId, familyId, profileType
health_data Health records healthData[*] (value + metadata) healthDataId, userId, profileId, familyId, createdAt, updatedAt, dataSource
lab_results Lab tests labData (value + metadata), labMetadata labResultId, userId, profileId, familyId, createdAt, updatedAt, dataSource
medications Medication tracking medicationData (value + metadata), reminderSchedule medicationId, userId, profileId, familyId, active, createdAt, updatedAt
appointments Medical appointments appointmentData (value + metadata), reminderSettings appointmentId, userId, profileId, familyId, createdAt, updatedAt
shares Shared data encryptedData (share-specific password) shareId, userId, documentId, collectionName, createdAt, expiresAt, accessCount, isRevoked
refresh_tokens JWT tokens None jti, userId, createdAt, expiresAt, revoked

Encryption Strategy

Client-Side Encryption

Encryption Flow:

  1. User enters health data
  2. Client derives encryption key from password (PBKDF2)
  3. Client encrypts health data (AES-256-GCM)
  4. Client sends encrypted data to server
  5. Server stores encrypted data in MongoDB
  6. Server NEVER decrypts data

What Must Be Encrypted

  • Health data values (e.g., "120/80")
  • Health data metadata (e.g., "blood_pressure", "mmHg")
  • Lab test results (e.g., "cholesterol", "200", "LabCorp")
  • Medication data (e.g., "Aspirin", "100mg", "daily")
  • Appointment data (e.g., "checkup", "Dr. Smith")
  • Profile data (e.g., "John Doe", "1990-01-01")
  • Family data (e.g., "Smith Family", "123 Main St")

What Can Be Plaintext

  • User IDs (userId, profileId, familyId) - for queries
  • Email addresses - for authentication
  • Dates (createdAt, updatedAt) - for sorting
  • Data sources (healthKit, googleFit) - for analytics
  • Tags (cardio, daily) - for client-side search

Privacy-Preserving Queries

Query by plaintext fields only:

const healthData = await db.health_data.find({
  userId: 'user-123',    // Plaintext ✅
  profileId: 'profile-456',  // Plaintext ✅
  familyId: 'family-789'     // Plaintext ✅
}).toArray();

// Client decrypts healthData[i].healthData[j]

Client adds searchable tags to encrypted data:

const healthData = await db.health_data.find({
  userId: 'user-123',
  tags: { $in: ['cardio', 'daily'] }  // Plaintext tags ✅
}).toArray();

// Client decrypts healthData[i].healthData[j]

3. Date Range Queries (Plaintext Dates)

Store dates as plaintext (for range queries):

const healthData = await db.health_data.find({
  userId: 'user-123',
  date: {
    $gte: ISODate("2026-02-01"),
    $lte: ISODate("2026-02-28")
  }
}).toArray();

// Client decrypts healthData[i].healthData[j]

Technology Stack

Backend (Axum + MongoDB)

  • Axum 0.7.x: Web framework
  • MongoDB 6.0+: Database
  • Rust: Server language

Client (React Native + React)

  • AES-256-GCM: Encryption algorithm
  • PBKDF2: Key derivation function
  • Crypto API: Node.js crypto / react-native-quick-crypto

Implementation Timeline

  • Week 1: Create MongoDB indexes
  • Week 1-2: Implement client-side encryption (React Native + React)
  • Week 2-3: Implement server-side API (Axum + MongoDB)
  • Week 3: Test encryption flow
  • Week 3-4: Test data migration (key rotation)
  • Week 4: Test privacy-preserving queries
  • Week 4-5: Performance testing Total: 4-5 weeks

Next Steps

  1. Create MongoDB indexes for all collections
  2. Implement client-side encryption (React Native + React)
  3. Implement server-side API (Axum + MongoDB)
  4. Test encryption flow (end-to-end)
  5. Test data migration (key rotation)
  6. Test privacy-preserving queries
  7. Performance testing
  8. Create API documentation

References