# 学习信息收集 总设计 ## 1. 概述 M8 里程碑实现从 iOS 客户端(via Rust document runtime)→ API 服务端的学习行为信息收集闭环。 ### 数据流 ``` iOS App → Rust zx_document_core (ReadingEventV2) → iOS 适配层(补充 readingTargetType/platform/appVersion/timezone) → POST /reading/events (批量上报) → ReadingEventProcessorService(校验/去重/聚合) → LearningSession / MaterialReadingProgress / DailyLearningActivity / LearningRecord → 查询接口(进度/继续学习/summary/trend/heatmap/历史) ``` ## 2. readingTargetType Rust 侧不存储 `readingTargetType`,由 iOS 适配层在上传时补充。 | readingTargetType | materialId 映射 | knowledgeBaseId | |---|---|---| | `knowledge_source` | `KnowledgeSource.id` | `KnowledgeSource.knowledgeBaseId` | | `temporary_file` | `TemporaryReadingMaterial.id` | `null`(后续可补) | ### iOS 上传时补充逻辑 ```typescript // iOS 适配层在构造上传请求时: const item = { eventId: rustEvent.eventId, clientSessionId: rustEvent.clientSessionId, materialId: rustEvent.materialId, eventType: rustEvent.eventType, position: rustEvent.position, activeSecondsDelta: rustEvent.activeSecondsDelta, clientTimestampMs: rustEvent.timestampMs, sequence: rustEvent.sequence, // iOS 补充字段: readingTargetType: resolveTargetType(rustEvent.materialId), // 'knowledge_source' | 'temporary_file' platform: 'ios', appVersion: getAppVersion(), clientTimezoneOffsetMinutes: getTimezoneOffset(), }; ``` ## 3. 实体映射 ### 3.1 新增表 #### ReadingEvent(原始事件日志) ```prisma model ReadingEvent { id String @id @default(cuid()) userId String eventId String clientSessionId String readingTargetType String @db.VarChar(32) materialId String knowledgeBaseId String? eventType String @db.VarChar(32) position Json? activeSecondsDelta Int @default(0) clientTimestampMs BigInt clientTimezoneOffsetMinutes Int? sequence Int platform String? @db.VarChar(16) appVersion String? @db.VarChar(32) status String @default("pending") @db.VarChar(32) errorCode String? @db.VarChar(32) warningCodes Json? serverReceivedAt DateTime @default(now()) processedAt DateTime? createdAt DateTime @default(now()) user User @relation(fields: [userId], references: [id]) @@unique([userId, eventId]) @@index([userId, clientSessionId]) @@index([userId, readingTargetType, materialId, clientTimestampMs]) @@index([status, createdAt]) @@index([userId, createdAt]) } ``` #### MaterialReadingProgress(资料阅读进度) ```prisma model MaterialReadingProgress { id String @id @default(cuid()) userId String materialId String // 关联的 materialId readingTargetType String @db.VarChar(32) knowledgeBaseId String? // 从 KnowledgeSource 反查 lastClientSessionId String? lastPosition Json? // camelCase ReadingPosition lastProgress Float? // 0~1 归一化进度值 totalActiveSeconds Int @default(0) // 累计活跃阅读秒数 sessionCount Int @default(0) // 阅读会话次数 status String @default("not_started") @db.VarChar(32) firstOpenedAt DateTime? lastOpenedAt DateTime? lastReadAt DateTime? isMarkedRead Boolean @default(false) markedReadAt DateTime? createdAt DateTime @default(now()) updatedAt DateTime @updatedAt user User @relation(fields: [userId], references: [id]) @@unique([userId, materialId]) @@index([userId]) @@index([knowledgeBaseId]) @@index([status]) } ``` #### TemporaryReadingMaterial(临时阅读资料) ```prisma model TemporaryReadingMaterial { id String @id @default(cuid()) userId String title String? @db.VarChar(255) originalFilename String? @db.VarChar(255) mimeType String? @db.VarChar(100) sizeBytes BigInt @default(0) storageKey String? @db.VarChar(500) sourceStatus String @default("active") @db.VarChar(32) expiresAt DateTime? deletedAt DateTime? createdAt DateTime @default(now()) updatedAt DateTime @updatedAt user User @relation(fields: [userId], references: [id]) @@index([userId]) @@index([expiresAt]) } ``` ### 3.2 扩展现有表 #### LearningSession(扩展字段) 在现有 `LearningSession` 基础上新增: ```prisma model LearningSession { // ... 现有字段 ... // M8 新增字段: clientSessionId String? // Rust client_session_id(关联上报事件) materialId String? // 正在阅读的资料 materialId readingTargetType String? @db.VarChar(32) totalActiveSeconds Int @default(0) // 来自 Rust 的累计活跃秒数 lastPosition Json? // 最后阅读位置 lastEventAt DateTime? // 最后事件时间 } ``` > 现有字段 `mode` 保留,新增 `readingTargetType` 不冲突。`durationSeconds` 兼容:优先使用 `totalActiveSeconds`(Rust tracker),无 Rust 数据则保留旧逻辑。 #### DailyLearningActivity(扩展字段) ```prisma model DailyLearningActivity { // ... 现有字段 (durationSeconds, sessionsCount, activeRecallCount, reviewCount, aiAnalysisCount, completedLoopCount, activityLevel) ... // M8 新增字段: readingSeconds Int @default(0) // 当日阅读时长(秒) materialsReadCount Int @default(0) // 当日阅读资料数 markedReadCount Int @default(0) // 当日标记已读数 } ``` ### 3.3 复用现有表 #### LearningRecord(无需改 schema) `recordType` 取值扩展: - `reading` — 阅读记录(新增) - `read_completed` — 完成阅读(新增) `metadata` JSON 扩展字段: ```json { "materialId": "...", "readingTargetType": "knowledge_source", "knowledgeBaseId": "...", "totalActiveSeconds": 120, "lastPosition": {...} } ``` ## 4. 核心聚合链路 ``` POST /reading/events (批量上报) │ ▼ ReadingEventProcessorService.processBatch(events) │ ├─ 1. 幂等去重(eventId unique) ├─ 2. 校验(activeSecondsDelta >= 0 且 <= 300) ├─ 3. 写入 ReadingEvent 表(status=pending→processed) │ ├─ 4. 聚合 → LearningSession │ - 按 clientSessionId 找已存在 session │ - 存在:更新 lastPosition / totalActiveSeconds / lastEventAt │ - 不存在(MaterialOpened):新建 LearningSession │ - MaterialClosed:结束 session(status=ended) │ ├─ 5. 聚合 → MaterialReadingProgress │ - UPSERT (userId, materialId) │ - 累加 totalActiveSeconds / sessionCount │ - 更新 latestPosition / progressValue │ - 时间更新:firstOpenedAt / lastReadAt / completedAt │ ├─ 6. 聚合 → DailyLearningActivity │ - UPSERT (userId, activityDate) │ - 累加 readingDurationSeconds / materialCount │ └─ 7. 写入 LearningRecord(当 MarkedAsRead / MaterialClosed / 首次打开) ``` ### 聚合时机 **同步聚合**(在请求处理中完成): - 校验通过后立即写入 ReadingEvent - 立即聚合到 LearningSession / MaterialReadingProgress / DailyLearningActivity - 暂不使用 worker/队列 ### 特殊情况处理 | 场景 | 处理 | |------|------| | 重复 eventId | status=duplicate, 跳过聚合 | | activeSecondsDelta < 0 | status=failed, errorCode=INVALID_DELTA | | activeSecondsDelta > 300 | 截断为 300(单次 tick 不超过 5 分钟) | | activeSecondsDelta = 0 | 合法(MaterialOpened/PositionChanged/MarkedAsRead) | | MaterialClosed 无 position | 不覆盖已有 position | | 乱序事件(时间倒退) | 不拒绝,正常处理(客户端时钟漂移容忍) | ## 5. 错误码与警告码 ### 错误码(事件被拒绝,status=failed) | 码 | 含义 | |----|------| | `MATERIAL_NOT_FOUND` | knowledge_source 不存在 | | `TEMPORARY_MATERIAL_NOT_FOUND` | temporary_file 不存在 | | `MATERIAL_ACCESS_DENIED` | 不属于当前用户 | | `TEMPORARY_MATERIAL_EXPIRED` | 临时文件已过期 | | `INVALID_TARGET_TYPE` | 未知 readingTargetType | | `INVALID_EVENT_TYPE` | 未知 eventType | | `INVALID_TIMESTAMP` | 时间戳格式错误 | | `INVALID_POSITION` | position JSON 格式错误 | | `INVALID_ACTIVE_SECONDS` | activeSecondsDelta < 0 | | `BATCH_LIMIT_EXCEEDED` | 超过批量上限(100) | | `MISSING_CLIENT_SESSION` | 缺少 clientSessionId | | `MISSING_MATERIAL_ID` | 缺少 materialId | ### 警告码(事件被接受但标记) | 码 | 含义 | |----|------| | `ACTIVE_SECONDS_CAPPED` | delta > 300,已截断 | | `CLIENT_TIMESTAMP_SKEWED` | 时钟偏差 > 5 min | | `POSITION_IGNORED` | position 存在但对 eventType 无效 | | `DUPLICATE_EVENT` | 幂等重放 | | `OUT_OF_ORDER_EVENT` | 乱序事件 | | `SOURCE_DELETED` | 来源资料已删除 | ## 6. 权限校验 ### 上报接口 - `readingTargetType=knowledge_source`:验证 `KnowledgeSource` 存在且属于当前用户 - `readingTargetType=temporary_file`:验证 `TemporaryReadingMaterial` 存在且属于当前用户 - 未知 materialId:记录 warning,仍接受事件(避免丢失数据) ### 查询接口 - `GET /reading/progress/:materialId`:验证用户权限 - `GET /reading/continue-learning`:返回当前用户的资料 - 所有查询接口通过 JWT guard 获取 userId ## 7. 接口列表 | 方法 | 路径 | 说明 | |------|------|------| | POST | `/reading/events` | 批量上报阅读事件 | | GET | `/reading/progress/:materialId` | 查询单资料阅读进度 | | GET | `/reading/continue-learning` | 首页继续学习 | | GET | `/reading/summary` | 学习 summary | | GET | `/reading/trend` | 纯数据 trend | | GET | `/reading/heatmap` | 热力图数据 | | GET | `/reading/history` | 学习历史记录 | | POST | `/reading/events/replay` | 事件重放/修复 | ## 8. 验收清单 - [x] `docs/learning-info-design.md` 存在 - [x] readingTargetType 定义:knowledge_source / temporary_file - [x] materialId 映射:→ KnowledgeSource.id / TemporaryReadingMaterial.id - [x] 权限校验方式:JWT guard + userId + 资源归属检查 - [x] Rust ReadingEventV2 → API ReadingEvent 字段映射 - [x] 核心聚合链路:ReadingEvent → LearningSession → MaterialReadingProgress → DailyLearningActivity → LearningRecord - [x] 错误码定义:8 种 - [x] 同步聚合策略