下载状态:Version 2 保存在 data/version_2_friend_ios/。两个主 SQLite 数据库都通过 PRAGMA quick_check。
Parallel Trend / Event Study Verdict
这里的判断对象是 cutoff 前的 event-study path:treated-control gap 从 2025-06 到 2025-11 是否基本平。按这个标准,current main 是 believable 的;V2 full 不是。V2 按 current exact 100M bucket developer counts 重新抽样后,pre path 明显变平,但仍应该被解释为 robustness / diagnostic,而不是替代 current main 的主识别样本。
| Sample | Pre-slope | Last 28 pre minus early pre | Max monthly deviation vs June | Read |
|---|---|---|---|---|
| current main | -0.024 apps/day | -0.9 apps/day (-3.1%) | 4.3 apps/day | passes visual pretrend |
| V2 full | +0.174 apps/day | +27.2 apps/day (+46.6%) | 26.3 apps/day | fails visual pretrend |
| V2 matched buckets | +0.038 apps/day | +6.5 apps/day (+19.5%) | 5.2 apps/day | improved, but not fully clean |
结论:如果目标是一个让人相信 parallel trend 的主结果,我会用 current main。V2 full 可以作为外部/alternative sample robustness,但不能直接拿来当主识别样本;它的 pre-event-study 在 treatment 前已经从 June/July 往 Sep/Oct/Nov 上升。
样本规模
主结果对比
Coverage 表
| Dataset | Software artists | Portfolio artists | Unique apps | Portfolio rows | Review RSS rows |
|---|---|---|---|---|---|
| current main | 75,266 | 44,074 | 84,985 | 2,115,242 | 486,923 |
| V2 200M | 110,903 | 61,351 | 144,907 | 5,264,767 | missing |
| V2 250M final | 151,251 | 86,172 | 202,706 | 7,333,268 | missing |
Overlap 表
| Comparison | ID type | Left count | Right count | Overlap | Share of current |
|---|---|---|---|---|---|
| current vs V2 final | artist_id | 44,074 | 86,172 | 3,872 | 8.8% |
| current vs V2 final | track_id | 84,985 | 202,706 | 7,482 | 8.8% |
| V2 200M vs V2 final | track_id | 144,907 | 202,706 | 103,154 | 71.2% |
Pretrend 为什么不一样
Pretrend 诊断表
| Dataset | Outcome | Last 28 pre days minus early pre | % of control early mean |
|---|---|---|---|
| current main | unique global apps | -0.9 apps/day | -3.0% |
| current main | local country rows | -74.2 rows/day | -10.2% |
| V2 final | unique global apps | +27.2 apps/day | +46.6% |
| V2 final | local country rows | +992.5 rows/day | +49.3% |
解释
- V2 的 positive treatment effect 不是问题。问题是处理前的 treated-control gap 已经在扩大,所以 strict pretrend 会更难通过。
- Sampling design 不一致。当前主库主要来自 bucket-aware sampling rounds;V2 R200 是 uniform full-namespace sample,改变了进入样本的 artistId cohort mix。
- 事件日期混乱会放大 pretrend 问题。朋友 audit 写的是 Dec.15,代码实际用 Dec.1;当前 paper 主 cutoff 是 Nov.24。如果用 Dec.1 或 Dec.15,late-November movement 会被放进 apparent pre-period。
- V2 的 pre-ramp 集中在特定 artistId 段。尤其是 1.4B-1.6B,另有 1.7B-1.8B 最新密集段的正向 movement。
Review Data 是否能重跑
不能直接重跑完整 review stream / review text / bottleneck attention 分析。V2 缺少 reviews、review_scrape_status、app_country_meta、app_privacy。
V2 可以做基于 current rating_count snapshot 的弱版 review proxy,但需要把 hardcoded SCRAPE_DT 从 2026-04-27 改成 V2 scrape date 2026-05-13,并保持相同 minimum-age horizon。
| Metric | current treated-post | V2 final treated-post |
|---|---|---|
| Zero-review share | 67.2% | 57.0% |
| >=10-review share | 3.9% | 5.4% |
| Existing bottleneck label coverage | 41.3% | 1.5% |
Sampling Rule Deep Dive
V1/current 和 V2 的 sampling rule 不一样。Current main 是多个 bucket-aware / importance-weighted rounds 合并;V2 final 是单个 uniform full-namespace sample,250M probes,seed=200020,所有 probed IDs 的 sample probability 是常数。
| Dataset | Sampling rule | 0-1.3B SW artists | 1.4B-1.6B SW artists | 1.7B-1.8B SW artists |
|---|---|---|---|---|
| current main | pooled importance-weighted / bucket-aware rounds | 4,494 | 29,424 | 41,348 |
| V2 final | uniform full-namespace R200 sample | 41,114 | 46,376 | 63,761 |
关键点:bucket mix 确实不同,但不是唯一原因。把 V2 reweight 成 current 的 bucket share 后,pretrend 没有消失,反而从 +27.2 apps/day 变成 +29.8 到 +32.8 apps/day。
Account Cohort 和 Bucket Weight
Bucket weight 的经济含义是 account vintage mix:高 bucket 不是普通“权重差异”,它对应更年轻的 developer accounts、更少的 apps per developer、也更可能暴露在 2025 年 AI tooling ramp。
| Dataset | Bucket group | Dev share | Median entry | Entry >=2023 | Apps / dev |
|---|---|---|---|---|---|
| current | 0-1.3B early/mid | 4.3% | 2017 | 12.8% | 3.80 |
| current | 1.4B-1.6B mid/late | 42.2% | 2022 | 33.5% | 2.22 |
| current | 1.7B-1.8B latest/dense | 53.5% | 2025 | 96.5% | 1.55 |
| V2 | 0-1.3B early/mid | 20.2% | 2017 | 13.0% | 3.86 |
| V2 | 1.4B-1.6B mid/late | 34.5% | 2021 | 30.1% | 2.46 |
| V2 | 1.7B-1.8B latest/dense | 45.3% | 2025 | 96.9% | 1.60 |
Matched V2 Sample
如果从 V2 里按 current 的 exact 100M bucket developer counts 重新抽样,pretrend 会显著下降,但不会完全消失。
| Sample / scheme | Last 28 pre minus early | Pre-slope p |
|---|---|---|
| full current main | -0.9 apps/day | 0.671 |
| full V2 final | +27.2 apps/day | 0.137 |
| V2 reweighted to current exact-bucket shares | +12.5 apps/day | 0.605 |
| V2 random subsample matched to current exact-bucket counts | +6.5 apps/day | 0.605 |
200 次 matched subsample 的范围是 +2.4 到 +11.3 apps/day,平均 +6.5。也就是说 sampling distribution 可以解释一部分,但 V2-only release-date distribution 仍然有正的 pre-ramp。
V2 Pretrend 来自哪里
| Segment | Unique apps | Last 28 pre minus early pre | Pre-slope p |
|---|---|---|---|
| V2-only track IDs | 195,097 | +27.6 apps/day | 0.115 |
| V2/current overlapping track IDs, measured in V2 | 7,599 | -0.5 apps/day | 0.600 |
| current-only track IDs | 77,544 | -0.5 apps/day | 0.690 |
| current/V2 overlapping track IDs, measured in current | 7,599 | -0.4 apps/day | 0.595 |
Bucket-Date Pattern
- Current main 的 1.7B-1.8B 最新密集 bucket 在 pre-period 中下降:June/July 是 +1,059/+1,148,Nov.1-Nov.23 降到 +519。
- V2 final 的 1.7B-1.8B bucket 没有同样下降:June/July 是 +1,862/+2,038,October 是 +2,136,Nov.1-Nov.23 仍有 +1,601。
- V2 final 的 1.4B-1.6B bucket 也从 June/July 的 negative gap 走到 Oct/Nov 的 positive gap。
- 所以 V2 的问题是 V2-only sample 的 release-date distribution,在 Nov.24 前已经有 treated-cycle ramp。
Economic Interpretation
Sampling rule 本身不是经济机制。它重要,是因为 artistId bucket 代理了 developer-account vintage 和平台 cohort。换 sampling rule,会改变哪些 account cohorts 进入样本;而这些 cohorts 的 untreated trend 不一定平行。
怎么理解 V2:V2 是 uniform artistId sample,所以它不是刻意 targeting dense buckets;dense buckets 出现很多 Software Artists,是因为 Apple namespace 里这些 bucket 真实更密集。
为什么这会影响 dates:新的 account cohorts 更可能暴露在 2025 AI-development ramp、indie experimentation、Apple catalog churn、以及 Nov.24 之前的 coding-agent shocks。
| Dataset | Segment | Last 28 pre minus early pre | Pre-slope p |
|---|---|---|---|
| current main | new developer first-day apps | -0.9 apps/day | 0.613 |
| current main | later apps by cycle-new developers | +0.6 apps/day | 0.818 |
| current main | incumbent apps | -0.6 apps/day | 0.677 |
| V2 final | new developer first-day apps | +12.3 apps/day | 0.094 |
| V2 final | later apps by cycle-new developers | +7.4 apps/day | <0.001 |
| V2 final | incumbent apps | +7.5 apps/day | 0.578 |
这个分解说明 V2 的 pre-ramp 是 broad-based:新开发者 entry 最大,但 incumbent 和 cycle-new developer 的后续 app 也有贡献。因此它更像 V2-only 样本里的 account-cohort / market-ramp 现象,不像单纯下载错误或单点 sampling bug。
How To Believe It
- 把 V2 定义为不同 sampling lens 下的 robustness sample,而不是 current main 的直接替换。
- 主日期统一用 Nov.24,并同时报告 Dec.1/Dec.15 as timing sensitivity。
- 报告 2023-24 和 2022-23 两个 control cycles;V2 在两者下都保持正向。
- 每次报告 V2 都附带 artistId bucket / account-vintage diagnostics。
- overlap sample 可作为 sanity check:重合 apps 没有 pre-ramp,但它太小且太 selected,不能作为主 estimand。
How To Reduce The Concern
- 用 bucket-by-day fixed effects 控制不同 account-vintage cohort 的共同时间路径。
- 做 clean-pre / gap-clean design,drop Sep.29-Nov.23,因为这段很可能已经包含 prior AI-tool shocks。
- 做 pretrend-adjusted same-day differences,把处理前 slope 明确扣掉。
- 如果 paper 使用 V2,把 estimand 写成 uniform artistId-sample robustness,不和 current main 混成同一个 sample。
| Dataset | Spec | Unique apps |
|---|---|---|
| current main | baseline global | +48.9% |
| current main | drop Sep.29-Nov.23 gap | +47.3% |
| V2 final | baseline global | +54.1% |
| V2 final | drop Sep.29-Nov.23 gap | +61.9% |
Control Cycle Sensitivity
我把 treated cycle 固定为 2025-06-01 到 2026-04-26,cutoff 固定为 Nov.24,只替换 control cycle。结果显示:换成 2022-23 后,V2 的 pretrend 变干净一些,但没有完全解决。
| Dataset | Control cycle | Unique apps | Unique new devs | Unique new companies | Pretrend: last 28 minus early | Pretrend scale |
|---|---|---|---|---|---|---|
| current main | 2023-24 | +48.9% | +51.4% | +30.6% | -0.9 apps/day | -3.1% |
| current main | 2022-23 | +43.3% | +45.2% | +27.0% | +3.4 apps/day | +15.6% |
| V2 final | 2023-24 | +54.1% | +64.9% | +47.9% | +27.2 apps/day | +46.6% |
| V2 final | 2022-23 | +50.7% | +57.0% | +42.4% | +16.9 apps/day | +40.3% |
结论:control cycle 选择确实影响 V2 的 pretrend;2022-23 比 2023-24 更好一些。但 V2 仍然有明显处理前 ramp,所以问题不是单纯 “control 日期不对”,还包括 V2 的 uniform sampling design 和 treated-cycle composition ramp。
Date / Seasonality / Window Checks
这不是单纯 date fixed effect 没加的问题。严格的 calendar-date fixed effect 在这个 two-cycle 设计里不能直接识别,因为真实日期只属于 treated cycle 或 control cycle;可用的是 day-in-cycle、weekday、week/month-in-cycle、genre-by-day 这类 seasonality controls。baseline 已经是 same-day-in-cycle comparison。
Seasonality controls
| Spec | current unique apps | V2 unique apps |
|---|---|---|
| day-in-cycle FE | +48.9% | +54.1% |
| day-in-cycle + weekday | +46.9% | +52.0% |
| week-in-cycle + weekday | +46.5% | +51.6% |
| month-in-cycle + weekday | +42.1% | +47.0% |
| genre-by-day FE | +48.9% | +54.1% |
Window sensitivity
| Window | current | V2 |
|---|---|---|
| baseline Jun.1-Apr.26 | +48.9% | +54.1% |
| longer pre Jan.1-Apr.26 | +49.0% | +71.6% |
| spring pre Mar.1-Apr.26 | +53.0% | +67.9% |
| late pre Aug.1-Jan.31 | +21.9% | +22.0% |
| short post Jun.1-Jan.31 | +20.2% | +27.0% |
| long post Jun.1-May.13 | +35.4% | +60.2% |
| Dataset | Pretrend window | Last 28 pre minus early | Pre-slope p |
|---|---|---|---|
| current | baseline Jun.1-Apr.26 | -0.9 apps/day | 0.671 |
| current | longer pre Jan.1-Apr.26 | -5.5 apps/day | 0.474 |
| current | spring pre Mar.1-Apr.26 | +5.2 apps/day | 0.609 |
| current | late pre Aug.1-Jan.31 | +0.8 apps/day | 0.951 |
| V2 | baseline Jun.1-Apr.26 | +27.2 apps/day | 0.137 |
| V2 | longer pre Jan.1-Apr.26 | +43.5 apps/day | 0.0003 |
| V2 | spring pre Mar.1-Apr.26 | +48.0 apps/day | 0.0004 |
| V2 | late pre Aug.1-Jan.31 | +20.0 apps/day | 0.323 |
结论:加 weekday / week / month / genre-by-day seasonality controls 后,V2 还是 +47% 到 +54%。真正敏感的是 window:late-pre 或 short-post 会把效应压到约 +22% 到 +27%,但延长 pre-period 反而让 V2 的 pretrend 更明显。这说明 V2 的 treated cycle 从 2025 年 6 月到 11 月已经相对 control cycle 上升,不是简单季节性没控制。
Recommendation
Version 2 应该作为 robustness / alternative sample,而不是直接替代当前主数据库。主结论方向一致,但如果要把 V2 放进 paper,需要明确:统一 cutoff 到 Nov.24;报告 2023-24 和 2022-23 control cycles;加入 date-window sensitivity;把 sampling design、artistId bucket、account vintage diagnostics 放在主表旁边;review 相关分析则需要重新抓取 review stream 或只声明为 snapshot proxy。