api-server/docs/learning-info-design.md
wangdl 38a8629e42
Some checks failed
Deploy API Server / build-and-deploy (push) Failing after 11s
feat: M8 学习信息收集系统完整实现
Phase 1-2: 设计文档 + 数据库 (ReadingEvent/MaterialReadingProgress/TemporaryReadingMaterial/LearningSession扩展/DailyLearningActivity扩展/LearningRecord)
Phase 3: 批量上报 + 校验去重 + ReadingEventProcessorService
Phase 4: 4表聚合管线 (LearningSession/MaterialReadingProgress/DailyLearningActivity/LearningRecord)
Phase 5: 查询接口 (progress/continue/summary/trend/heatmap/history/reprocess)
Phase 6: 权限校验 + session中断清理 + API文档

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-08 21:09:13 +08:00

11 KiB
Raw Permalink Blame History

学习信息收集 总设计

1. 概述

M8 里程碑实现从 iOS 客户端via Rust document runtime→ API 服务端的学习行为信息收集闭环。

数据流

iOS App → Rust zx_document_core (ReadingEventV2)
       → iOS 适配层(补充 readingTargetType/platform/appVersion/timezone
       → POST /reading/events (批量上报)
       → ReadingEventProcessorService校验/去重/聚合)
       → LearningSession / MaterialReadingProgress / DailyLearningActivity / LearningRecord
       → 查询接口(进度/继续学习/summary/trend/heatmap/历史)

2. readingTargetType

Rust 侧不存储 readingTargetType,由 iOS 适配层在上传时补充。

readingTargetType materialId 映射 knowledgeBaseId
knowledge_source KnowledgeSource.id KnowledgeSource.knowledgeBaseId
temporary_file TemporaryReadingMaterial.id null(后续可补)

iOS 上传时补充逻辑

// iOS 适配层在构造上传请求时:
const item = {
  eventId: rustEvent.eventId,
  clientSessionId: rustEvent.clientSessionId,
  materialId: rustEvent.materialId,
  eventType: rustEvent.eventType,
  position: rustEvent.position,
  activeSecondsDelta: rustEvent.activeSecondsDelta,
  clientTimestampMs: rustEvent.timestampMs,
  sequence: rustEvent.sequence,
  // iOS 补充字段:
  readingTargetType: resolveTargetType(rustEvent.materialId), // 'knowledge_source' | 'temporary_file'
  platform: 'ios',
  appVersion: getAppVersion(),
  clientTimezoneOffsetMinutes: getTimezoneOffset(),
};

3. 实体映射

3.1 新增表

ReadingEvent原始事件日志

model ReadingEvent {
  id                        String   @id @default(cuid())
  userId                    String
  eventId                   String
  clientSessionId           String
  readingTargetType         String   @db.VarChar(32)
  materialId                String
  knowledgeBaseId           String?
  eventType                 String   @db.VarChar(32)
  position                  Json?
  activeSecondsDelta        Int      @default(0)
  clientTimestampMs         BigInt
  clientTimezoneOffsetMinutes Int?
  sequence                  Int
  platform                  String?  @db.VarChar(16)
  appVersion                String?  @db.VarChar(32)
  status                    String   @default("pending") @db.VarChar(32)
  errorCode                 String?  @db.VarChar(32)
  warningCodes              Json?
  serverReceivedAt          DateTime @default(now())
  processedAt               DateTime?
  createdAt                 DateTime @default(now())

  user User @relation(fields: [userId], references: [id])

  @@unique([userId, eventId])
  @@index([userId, clientSessionId])
  @@index([userId, readingTargetType, materialId, clientTimestampMs])
  @@index([status, createdAt])
  @@index([userId, createdAt])
}

MaterialReadingProgress资料阅读进度

model MaterialReadingProgress {
  id                  String   @id @default(cuid())
  userId              String
  materialId          String               // 关联的 materialId
  readingTargetType   String   @db.VarChar(32)
  knowledgeBaseId     String?              // 从 KnowledgeSource 反查
  lastClientSessionId String?
  lastPosition        Json?                // camelCase ReadingPosition
  lastProgress        Float?               // 0~1 归一化进度值
  totalActiveSeconds  Int      @default(0) // 累计活跃阅读秒数
  sessionCount        Int      @default(0) // 阅读会话次数
  status              String   @default("not_started") @db.VarChar(32)
  firstOpenedAt       DateTime?
  lastOpenedAt        DateTime?
  lastReadAt          DateTime?
  isMarkedRead        Boolean  @default(false)
  markedReadAt        DateTime?
  createdAt           DateTime @default(now())
  updatedAt           DateTime @updatedAt

  user User @relation(fields: [userId], references: [id])

  @@unique([userId, materialId])
  @@index([userId])
  @@index([knowledgeBaseId])
  @@index([status])
}

TemporaryReadingMaterial临时阅读资料

model TemporaryReadingMaterial {
  id              String    @id @default(cuid())
  userId          String
  title           String?   @db.VarChar(255)
  originalFilename String?  @db.VarChar(255)
  mimeType        String?   @db.VarChar(100)
  sizeBytes       BigInt    @default(0)
  storageKey      String?   @db.VarChar(500)
  sourceStatus    String    @default("active") @db.VarChar(32)
  expiresAt       DateTime?
  deletedAt       DateTime?
  createdAt       DateTime  @default(now())
  updatedAt       DateTime  @updatedAt

  user User @relation(fields: [userId], references: [id])

  @@index([userId])
  @@index([expiresAt])
}

3.2 扩展现有表

LearningSession扩展字段

在现有 LearningSession 基础上新增:

model LearningSession {
  // ... 现有字段 ...

  // M8 新增字段:
  clientSessionId    String?              // Rust client_session_id关联上报事件
  materialId         String?              // 正在阅读的资料 materialId
  readingTargetType  String?  @db.VarChar(32)
  totalActiveSeconds Int      @default(0) // 来自 Rust 的累计活跃秒数
  lastPosition       Json?                // 最后阅读位置
  lastEventAt        DateTime?            // 最后事件时间
}

现有字段 mode 保留,新增 readingTargetType 不冲突。durationSeconds 兼容:优先使用 totalActiveSecondsRust tracker无 Rust 数据则保留旧逻辑。

DailyLearningActivity扩展字段

model DailyLearningActivity {
  // ... 现有字段 (durationSeconds, sessionsCount, activeRecallCount, reviewCount, aiAnalysisCount, completedLoopCount, activityLevel) ...

  // M8 新增字段:
  readingSeconds     Int @default(0)  // 当日阅读时长(秒)
  materialsReadCount Int @default(0)  // 当日阅读资料数
  markedReadCount    Int @default(0)  // 当日标记已读数
}

3.3 复用现有表

LearningRecord无需改 schema

recordType 取值扩展:

  • reading — 阅读记录(新增)
  • read_completed — 完成阅读(新增)

metadata JSON 扩展字段:

{
  "materialId": "...",
  "readingTargetType": "knowledge_source",
  "knowledgeBaseId": "...",
  "totalActiveSeconds": 120,
  "lastPosition": {...}
}

4. 核心聚合链路

POST /reading/events (批量上报)
  │
  ▼
ReadingEventProcessorService.processBatch(events)
  │
  ├─ 1. 幂等去重eventId unique
  ├─ 2. 校验activeSecondsDelta >= 0 且 <= 300
  ├─ 3. 写入 ReadingEvent 表status=pending→processed
  │
  ├─ 4. 聚合 → LearningSession
  │     - 按 clientSessionId 找已存在 session
  │     - 存在:更新 lastPosition / totalActiveSeconds / lastEventAt
  │     - 不存在MaterialOpened新建 LearningSession
  │     - MaterialClosed结束 sessionstatus=ended
  │
  ├─ 5. 聚合 → MaterialReadingProgress
  │     - UPSERT (userId, materialId)
  │     - 累加 totalActiveSeconds / sessionCount
  │     - 更新 latestPosition / progressValue
  │     - 时间更新firstOpenedAt / lastReadAt / completedAt
  │
  ├─ 6. 聚合 → DailyLearningActivity
  │     - UPSERT (userId, activityDate)
  │     - 累加 readingDurationSeconds / materialCount
  │
  └─ 7. 写入 LearningRecord当 MarkedAsRead / MaterialClosed / 首次打开)

聚合时机

同步聚合(在请求处理中完成):

  • 校验通过后立即写入 ReadingEvent
  • 立即聚合到 LearningSession / MaterialReadingProgress / DailyLearningActivity
  • 暂不使用 worker/队列

特殊情况处理

场景 处理
重复 eventId status=duplicate, 跳过聚合
activeSecondsDelta < 0 status=failed, errorCode=INVALID_DELTA
activeSecondsDelta > 300 截断为 300单次 tick 不超过 5 分钟)
activeSecondsDelta = 0 合法MaterialOpened/PositionChanged/MarkedAsRead
MaterialClosed 无 position 不覆盖已有 position
乱序事件(时间倒退) 不拒绝,正常处理(客户端时钟漂移容忍)

5. 错误码与警告码

错误码事件被拒绝status=failed

含义
MATERIAL_NOT_FOUND knowledge_source 不存在
TEMPORARY_MATERIAL_NOT_FOUND temporary_file 不存在
MATERIAL_ACCESS_DENIED 不属于当前用户
TEMPORARY_MATERIAL_EXPIRED 临时文件已过期
INVALID_TARGET_TYPE 未知 readingTargetType
INVALID_EVENT_TYPE 未知 eventType
INVALID_TIMESTAMP 时间戳格式错误
INVALID_POSITION position JSON 格式错误
INVALID_ACTIVE_SECONDS activeSecondsDelta < 0
BATCH_LIMIT_EXCEEDED 超过批量上限100
MISSING_CLIENT_SESSION 缺少 clientSessionId
MISSING_MATERIAL_ID 缺少 materialId

警告码(事件被接受但标记)

含义
ACTIVE_SECONDS_CAPPED delta > 300已截断
CLIENT_TIMESTAMP_SKEWED 时钟偏差 > 5 min
POSITION_IGNORED position 存在但对 eventType 无效
DUPLICATE_EVENT 幂等重放
OUT_OF_ORDER_EVENT 乱序事件
SOURCE_DELETED 来源资料已删除

6. 权限校验

上报接口

  • readingTargetType=knowledge_source:验证 KnowledgeSource 存在且属于当前用户
  • readingTargetType=temporary_file:验证 TemporaryReadingMaterial 存在且属于当前用户
  • 未知 materialId记录 warning仍接受事件避免丢失数据

查询接口

  • GET /reading/progress/:materialId:验证用户权限
  • GET /reading/continue-learning:返回当前用户的资料
  • 所有查询接口通过 JWT guard 获取 userId

7. 接口列表

方法 路径 说明
POST /reading/events 批量上报阅读事件
GET /reading/progress/:materialId 查询单资料阅读进度
GET /reading/continue-learning 首页继续学习
GET /reading/summary 学习 summary
GET /reading/trend 纯数据 trend
GET /reading/heatmap 热力图数据
GET /reading/history 学习历史记录
POST /reading/events/replay 事件重放/修复

8. 验收清单

  • docs/learning-info-design.md 存在
  • readingTargetType 定义knowledge_source / temporary_file
  • materialId 映射:→ KnowledgeSource.id / TemporaryReadingMaterial.id
  • 权限校验方式JWT guard + userId + 资源归属检查
  • Rust ReadingEventV2 → API ReadingEvent 字段映射
  • 核心聚合链路ReadingEvent → LearningSession → MaterialReadingProgress → DailyLearningActivity → LearningRecord
  • 错误码定义8 种
  • 同步聚合策略