normogen/thoughts/research/2026-02-14-mongodb-schema-decision.md
goose 4dca44dbbe Research: MongoDB schema design complete
- Zero-knowledge encryption for ALL sensitive data + metadata
- Blood pressure example: value + type + unit ALL encrypted
- 9 collections: users, families, profiles, health_data, lab_results, medications, appointments, shares, refresh_tokens
- Client-side encryption (AES-256-GCM, PBKDF2)
- Server NEVER decrypts data
- Privacy-preserving queries (plaintext fields: userId, profileId, familyId, date, tags)
- Tagging system for encrypted data search
- Date range queries (plaintext dates)

Key principle:
- Both value AND metadata encrypted (e.g., "blood_pressure" + "120/80")
- No plaintext metadata leaks
- Server stores ONLY encrypted data

Updated tech stack decisions with MongoDB schema

All major research complete (Rust, Mobile, Web, State, Auth, Database)

Next: Backend development (Axum + MongoDB)
2026-02-14 13:39:57 -03:00

183 lines
5.6 KiB
Markdown

# MongoDB Schema Design Decision Summary
**Date**: 2026-02-14
**Decision**: **Zero-Knowledge Encryption for All Sensitive Data + Metadata**
---
## Core Principle
**ALL sensitive data AND metadata must be encrypted client-side before reaching MongoDB.**
### Example: Blood Pressure Reading
**Before encryption** (client-side):
```javascript
{
value: "120/80",
type: "blood_pressure",
unit: "mmHg",
date: "2026-02-14T10:30:00Z"
}
```
**After encryption** (stored in MongoDB):
```javascript
{
healthDataId: "health-123",
userId: "user-456",
profileId: "profile-789",
familyId: "family-012",
// Encrypted (value + metadata)
healthData: [
{
encrypted: true,
data: "a1b2c3d4...",
iv: "e5f6g7h8...",
authTag: "i9j0k1l2..."
}
],
// Metadata (plaintext)
createdAt: ISODate("2026-02-14T10:30:00Z"),
updatedAt: ISODate("2026-02-14T10:30:00Z"),
dataSource: "healthKit"
}
```
---
## Collections Summary
| Collection | Purpose | Encrypted Fields | Plaintext Fields |
|-----------|---------|------------------|-----------------|
| **users** | Authentication | encryptedRecoveryPhrase | userId, email, passwordHash, tokenVersion, familyId, familyRole, permissions |
| **families** | Family structure | familyName, familyMetadata | familyId, members[*].userId, members[*].profileId, members[*].role |
| **profiles** | Person profiles | profileName, profileMetadata | profileId, userId, familyId, profileType |
| **health_data** | Health records | healthData[*] (value + metadata) | healthDataId, userId, profileId, familyId, createdAt, updatedAt, dataSource |
| **lab_results** | Lab tests | labData (value + metadata), labMetadata | labResultId, userId, profileId, familyId, createdAt, updatedAt, dataSource |
| **medications** | Medication tracking | medicationData (value + metadata), reminderSchedule | medicationId, userId, profileId, familyId, active, createdAt, updatedAt |
| **appointments** | Medical appointments | appointmentData (value + metadata), reminderSettings | appointmentId, userId, profileId, familyId, createdAt, updatedAt |
| **shares** | Shared data | encryptedData (share-specific password) | shareId, userId, documentId, collectionName, createdAt, expiresAt, accessCount, isRevoked |
| **refresh_tokens** | JWT tokens | None | jti, userId, createdAt, expiresAt, revoked |
---
## Encryption Strategy
### Client-Side Encryption
**Encryption Flow**:
1. User enters health data
2. Client derives encryption key from password (PBKDF2)
3. Client encrypts health data (AES-256-GCM)
4. Client sends encrypted data to server
5. Server stores encrypted data in MongoDB
6. Server NEVER decrypts data
### What Must Be Encrypted
-**Health data values** (e.g., "120/80")
-**Health data metadata** (e.g., "blood_pressure", "mmHg")
-**Lab test results** (e.g., "cholesterol", "200", "LabCorp")
-**Medication data** (e.g., "Aspirin", "100mg", "daily")
-**Appointment data** (e.g., "checkup", "Dr. Smith")
-**Profile data** (e.g., "John Doe", "1990-01-01")
-**Family data** (e.g., "Smith Family", "123 Main St")
### What Can Be Plaintext
-**User IDs** (userId, profileId, familyId) - for queries
-**Email addresses** - for authentication
-**Dates** (createdAt, updatedAt) - for sorting
-**Data sources** (healthKit, googleFit) - for analytics
-**Tags** (cardio, daily) - for client-side search
---
## Privacy-Preserving Queries
### 1. Plaintext Queries (Recommended)
**Query by plaintext fields only**:
```javascript
const healthData = await db.health_data.find({
userId: 'user-123', // Plaintext ✅
profileId: 'profile-456', // Plaintext ✅
familyId: 'family-789' // Plaintext ✅
}).toArray();
// Client decrypts healthData[i].healthData[j]
```
### 2. Tagging System (Encrypted Search)
**Client adds searchable tags to encrypted data**:
```javascript
const healthData = await db.health_data.find({
userId: 'user-123',
tags: { $in: ['cardio', 'daily'] } // Plaintext tags ✅
}).toArray();
// Client decrypts healthData[i].healthData[j]
```
### 3. Date Range Queries (Plaintext Dates)
**Store dates as plaintext** (for range queries):
```javascript
const healthData = await db.health_data.find({
userId: 'user-123',
date: {
$gte: ISODate("2026-02-01"),
$lte: ISODate("2026-02-28")
}
}).toArray();
// Client decrypts healthData[i].healthData[j]
```
---
## Technology Stack
### Backend (Axum + MongoDB)
- **Axum 0.7.x**: Web framework
- **MongoDB 6.0+**: Database
- **Rust**: Server language
### Client (React Native + React)
- **AES-256-GCM**: Encryption algorithm
- **PBKDF2**: Key derivation function
- **Crypto API**: Node.js crypto / react-native-quick-crypto
---
## Implementation Timeline
- **Week 1**: Create MongoDB indexes
- **Week 1-2**: Implement client-side encryption (React Native + React)
- **Week 2-3**: Implement server-side API (Axum + MongoDB)
- **Week 3**: Test encryption flow
- **Week 3-4**: Test data migration (key rotation)
- **Week 4**: Test privacy-preserving queries
- **Week 4-5**: Performance testing
**Total**: 4-5 weeks
---
## Next Steps
1. Create MongoDB indexes for all collections
2. Implement client-side encryption (React Native + React)
3. Implement server-side API (Axum + MongoDB)
4. Test encryption flow (end-to-end)
5. Test data migration (key rotation)
6. Test privacy-preserving queries
7. Performance testing
8. Create API documentation
---
## References
- [Comprehensive MongoDB Schema Research](./2026-02-14-mongodb-schema-design-research.md)
- [Normogen Encryption Guide](../encryption.md)
- [JWT Authentication Research](./2026-02-14-jwt-authentication-research.md)
- [Technology Stack Decisions](./2026-02-14-tech-stack-decision.md)