下载状态:Version 2 保存在 data/version_2_friend_ios/。两个主 SQLite 数据库都通过 PRAGMA quick_check。
How To Read The Two-Node Model
我们现在不是在跑一个单一 treatment date。更合理的是一个 two-node event-study / stacked DID model:
Y = FE + beta1 * Treated x AfterMay22 + beta2 * Treated x AfterNov24
Coefficient meaning
- beta1 是 May22 第一阶段 treatment effect:May22 后,2025 treated cycle 相对 control cycle 多出来多少 production。
- beta2 是 Nov24 第二阶段 incremental effect:它不是从零开始的 effect,而是在 May22 agentic-coding regime 已经存在之后,Nov24 late-capability stack 额外增加多少。
- beta1 + beta2 才是 Nov24 之后相对“没有 May22、也没有 Nov24”的总 effect。
Diagnostics labels
- ParallelTran 是 pre-treatment event-study 是否平:cutoff 前 treated-control gap 不应该系统性上升。
- PrePass / PubPass 是能否作为 paper-facing causal spec 的标记:不仅要有正 effect,还要 control-year robustness 和 believable pretrend。
- 一个 spec 可以有大 treatment effect,但如果 ParallelTran 不过,就不能作为 clean first-treatment design。
| Spec | PubPass | ParallelTran | Effect read | Co-author interpretation |
|---|---|---|---|---|
| V1 May22 | weak / no | borderline across 2024/2023/2022 controls | beta1 = -7% to +1% | V1 does not strongly see the May22 broad-adoption shock. |
| V1 Nov24 | yes for late shock | passes across 2024/2023/2022 controls | beta2 = +39% to +50%; total = +30% to +46% | V1 is credible for the late-Nov capability/intensity shock. |
| V2 May22 | yes for broad adoption | passes across 2024/2023/2022 controls | beta1 = +22% to +27% | V2 cleanly identifies a May22 broad agentic-coding adoption shock. |
| V2 Nov24 as first treatment | no | fails across 2024/2023/2022 controls | beta2 = +45% to +56%; total = +73% to +95% | Nov24 is interpretable only as a second-stage increment, not as first exposure. |
Co-author Brief
The clean causal story is likely two-stage: May22 is broad agentic-coding adoption; Nov24 is a later capability / intensity shock. V2 fails the Nov24 pretrend because the market is already treated by May22-Nov23.
Why May22 matters: Feb24 Claude Code was limited research preview. On May22, Anthropic made Claude Code generally available and moved it into terminal, IDE workflows, background tasks, and the Claude Code SDK. The same week also includes OpenAI Codex cloud agent and GitHub Copilot coding agent, so this is a market-wide agentic-coding adoption cluster.
2025 Coding-Agent Timeline
Two-Node Double DID Visualization
Use a stacked treatment model: Y = FE + beta1 * Treated x AfterMay22 + beta2 * Treated x AfterNov24. Here beta1 is the May22 first-stage adoption effect. beta2 is the Nov24 incremental effect on top of the May22 regime. The post-Nov24 total effect is beta1 + beta2. Percent effects are reported against the relevant counterfactual treated mean, not against raw observed apps/day.
V2 story: May22 first-stage effect is +22 to +25 apps/day across all control years, with May22 pretrend passing. Nov24 then adds another +60 to +70 apps/day. Total post-Nov24 effect is therefore about +82 to +95 apps/day relative to the no-May22 baseline.
V1 story: May22 first-stage effect is near zero, while Nov24 adds +25 to +30 apps/day with Nov24 pretrend passing. V1 is therefore more credible for the late-stage shock than for the broad May22 adoption shock.
| Dataset | May22 first-stage effect beta1 | Nov24 incremental effect beta2 | Post-Nov24 total beta1 + beta2 | Interpretation |
|---|---|---|---|---|
| V1/current | -4.1 to +0.5/day -7% to +1% |
+24.7 to +29.5/day +39% to +50% |
+20.6 to +28.0/day +30% to +46% |
No robust May22 effect in V1; Nov24 looks like the first visible jump in this sample. |
| V2 full | +21.9 to +25.4/day +22% to +27% |
+60.2 to +69.5/day +45% to +56% |
+82.1 to +94.9/day +73% to +95% |
V2 sees both stages: broad May22 adoption first, then a much larger Nov24 intensification. |
| Dataset | Control year | May22 pretrend | Nov24 pretrend | May22 early DID | May22 % | Nov24 late increment | Nov24 % | Post-Nov24 total | Total % |
|---|---|---|---|---|---|---|---|---|---|
| V1/current | 2024 | borderline -9.9/day | pass -5.0/day | -4.1/day | -6.7% | +24.7/day | +38.5% | +20.6/day | +30.2% |
| V1/current | 2023 | borderline -10.0/day | pass -0.9/day | -1.5/day | -2.6% | +29.5/day | +49.8% | +28.0/day | +46.1% |
| V1/current | 2022 | borderline -9.7/day | pass +3.4/day | +0.5/day | +0.9% | +27.3/day | +44.5% | +27.8/day | +45.6% |
| V2 full | 2024 | pass -1.9/day | fail +24.3/day | +21.9/day | +22.3% | +60.2/day | +44.8% | +82.1/day | +73.0% |
| V2 full | 2023 | pass -4.8/day | fail +27.2/day | +25.4/day | +26.7% | +69.5/day | +55.7% | +94.9/day | +95.2% |
| V2 full | 2022 | pass -3.6/day | fail +16.9/day | +22.2/day | +22.7% | +66.9/day | +52.4% | +89.1/day | +84.5% |
How to interpret beta2: The Nov24 coefficient is not the full effect of agentic coding from zero. It is the marginal effect of the late-Nov capability stack conditional on the market already being in the May22 agentic-coding regime. In a level-count model, the total post-Nov24 effect is additive: beta1 + beta2. In a log or PPML model, the same idea is multiplicative, but the causal interpretation is still “second-stage increment on top of first-stage adoption.”
Across control years: V2 May22 is the most robust first-stage story. Whether the control is 2024, 2023, or 2022, May22 pretrend passes and beta1 stays in a narrow positive band: +22% to +27%.
Why percentages move: 2024, 2023, and 2022 have different baseline app-production levels. Percent effects therefore change because the counterfactual denominator changes, not because the qualitative treatment story changes.
Where The May22 V2 Effect Comes From
| Component | V2 May22 early DID | V2 Nov24 late increment |
|---|---|---|
| New developer first apps | +13 to +18/day | +33 to +37/day |
| Young repeat apps | +5 to +6/day | +16 to +17/day |
| Older incumbent apps | +2/day | +10 to +17/day |
May22 的第一阶段主要是 new entrants 和年轻 developers;Nov24 的第二阶段更 broad,new devs、repeat apps、incumbents 都明显增加。
Product Evidence
- Feb 24: Claude 3.7 Sonnet + Claude Code research preview
- May 22: Claude 4 + Claude Code generally available
- May 16: OpenAI Codex cloud agent preview
- May 19: GitHub Copilot coding agent
- Jun 4: Cursor Background Agent generally available
- Stack Overflow 2025: AI agents and coding-tool adoption
Co-author wording: “The V2 sample suggests that the broad adoption shock begins around the May 2025 agentic-coding release cluster. The November 2025 cutoff should be interpreted as an incremental late-capability shock, not as the first exposure to agentic coding.”
样本规模
主结果对比
Coverage 表
| Dataset | Software artists | Portfolio artists | Unique apps | Portfolio rows | Review RSS rows |
|---|---|---|---|---|---|
| current main | 75,266 | 44,074 | 84,985 | 2,115,242 | 486,923 |
| V2 200M | 110,903 | 61,351 | 144,907 | 5,264,767 | missing |
| V2 250M final | 151,251 | 86,172 | 202,706 | 7,333,268 | missing |
Overlap 表
| Comparison | ID type | Left count | Right count | Overlap | Share of current |
|---|---|---|---|---|---|
| current vs V2 final | artist_id | 44,074 | 86,172 | 3,872 | 8.8% |
| current vs V2 final | track_id | 84,985 | 202,706 | 7,482 | 8.8% |
| V2 200M vs V2 final | track_id | 144,907 | 202,706 | 103,154 | 71.2% |
Pretrend 为什么不一样
Pretrend 诊断表
| Dataset | Outcome | Last 28 pre days minus early pre | % of control early mean |
|---|---|---|---|
| current main | unique global apps | -0.9 apps/day | -3.0% |
| current main | local country rows | -74.2 rows/day | -10.2% |
| V2 final | unique global apps | +27.2 apps/day | +46.6% |
| V2 final | local country rows | +992.5 rows/day | +49.3% |
解释
- V2 的 positive treatment effect 不是问题。问题是处理前的 treated-control gap 已经在扩大,所以 strict pretrend 会更难通过。
- Sampling design 不一致。当前主库主要来自 bucket-aware sampling rounds;V2 R200 是 uniform full-namespace sample,改变了进入样本的 artistId cohort mix。
- 事件日期混乱会放大 pretrend 问题。朋友 audit 写的是 Dec.15,代码实际用 Dec.1;当前 paper 主 cutoff 是 Nov.24。如果用 Dec.1 或 Dec.15,late-November movement 会被放进 apparent pre-period。
- V2 的 pre-ramp 集中在特定 artistId 段。尤其是 1.4B-1.6B,另有 1.7B-1.8B 最新密集段的正向 movement。
Review Data 是否能重跑
不能直接重跑完整 review stream / review text / bottleneck attention 分析。V2 缺少 reviews、review_scrape_status、app_country_meta、app_privacy。
V2 可以做基于 current rating_count snapshot 的弱版 review proxy,但需要把 hardcoded SCRAPE_DT 从 2026-04-27 改成 V2 scrape date 2026-05-13,并保持相同 minimum-age horizon。
| Metric | current treated-post | V2 final treated-post |
|---|---|---|
| Zero-review share | 67.2% | 57.0% |
| >=10-review share | 3.9% | 5.4% |
| Existing bottleneck label coverage | 41.3% | 1.5% |
Sampling Rule Deep Dive
V1/current 和 V2 的 sampling rule 不一样。Current main 是多个 bucket-aware / importance-weighted rounds 合并;V2 final 是单个 uniform full-namespace sample,250M probes,seed=200020,所有 probed IDs 的 sample probability 是常数。
| Dataset | Sampling rule | 0-1.3B SW artists | 1.4B-1.6B SW artists | 1.7B-1.8B SW artists |
|---|---|---|---|---|
| current main | pooled importance-weighted / bucket-aware rounds | 4,494 | 29,424 | 41,348 |
| V2 final | uniform full-namespace R200 sample | 41,114 | 46,376 | 63,761 |
关键点:bucket mix 确实不同,但不是唯一原因。把 V2 reweight 成 current 的 bucket share 后,pretrend 没有消失,反而从 +27.2 apps/day 变成 +29.8 到 +32.8 apps/day。
Account Cohort 和 Bucket Weight
Bucket weight 的经济含义是 account vintage mix:高 bucket 不是普通“权重差异”,它对应更年轻的 developer accounts、更少的 apps per developer、也更可能暴露在 2025 年 AI tooling ramp。
| Dataset | Bucket group | Dev share | Median entry | Entry >=2023 | Apps / dev |
|---|---|---|---|---|---|
| current | 0-1.3B early/mid | 4.3% | 2017 | 12.8% | 3.80 |
| current | 1.4B-1.6B mid/late | 42.2% | 2022 | 33.5% | 2.22 |
| current | 1.7B-1.8B latest/dense | 53.5% | 2025 | 96.5% | 1.55 |
| V2 | 0-1.3B early/mid | 20.2% | 2017 | 13.0% | 3.86 |
| V2 | 1.4B-1.6B mid/late | 34.5% | 2021 | 30.1% | 2.46 |
| V2 | 1.7B-1.8B latest/dense | 45.3% | 2025 | 96.9% | 1.60 |
Matched V2 Sample
如果从 V2 里按 current 的 exact 100M bucket developer counts 重新抽样,pretrend 会显著下降,但不会完全消失。
| Sample / scheme | Last 28 pre minus early | Pre-slope p |
|---|---|---|
| full current main | -0.9 apps/day | 0.671 |
| full V2 final | +27.2 apps/day | 0.137 |
| V2 reweighted to current exact-bucket shares | +12.5 apps/day | 0.605 |
| V2 random subsample matched to current exact-bucket counts | +6.5 apps/day | 0.605 |
200 次 matched subsample 的范围是 +2.4 到 +11.3 apps/day,平均 +6.5。也就是说 sampling distribution 可以解释一部分,但 V2-only release-date distribution 仍然有正的 pre-ramp。
V2 Pretrend 来自哪里
| Segment | Unique apps | Last 28 pre minus early pre | Pre-slope p |
|---|---|---|---|
| V2-only track IDs | 195,097 | +27.6 apps/day | 0.115 |
| V2/current overlapping track IDs, measured in V2 | 7,599 | -0.5 apps/day | 0.600 |
| current-only track IDs | 77,544 | -0.5 apps/day | 0.690 |
| current/V2 overlapping track IDs, measured in current | 7,599 | -0.4 apps/day | 0.595 |
Bucket-Date Pattern
- Current main 的 1.7B-1.8B 最新密集 bucket 在 pre-period 中下降:June/July 是 +1,059/+1,148,Nov.1-Nov.23 降到 +519。
- V2 final 的 1.7B-1.8B bucket 没有同样下降:June/July 是 +1,862/+2,038,October 是 +2,136,Nov.1-Nov.23 仍有 +1,601。
- V2 final 的 1.4B-1.6B bucket 也从 June/July 的 negative gap 走到 Oct/Nov 的 positive gap。
- 所以 V2 的问题是 V2-only sample 的 release-date distribution,在 Nov.24 前已经有 treated-cycle ramp。
Economic Interpretation
Sampling rule 本身不是经济机制。它重要,是因为 artistId bucket 代理了 developer-account vintage 和平台 cohort。换 sampling rule,会改变哪些 account cohorts 进入样本;而这些 cohorts 的 untreated trend 不一定平行。
怎么理解 V2:V2 是 uniform artistId sample,所以它不是刻意 targeting dense buckets;dense buckets 出现很多 Software Artists,是因为 Apple namespace 里这些 bucket 真实更密集。
为什么这会影响 dates:新的 account cohorts 更可能暴露在 2025 AI-development ramp、indie experimentation、Apple catalog churn、以及 Nov.24 之前的 coding-agent shocks。
| Dataset | Segment | Last 28 pre minus early pre | Pre-slope p |
|---|---|---|---|
| current main | new developer first-day apps | -0.9 apps/day | 0.613 |
| current main | later apps by cycle-new developers | +0.6 apps/day | 0.818 |
| current main | incumbent apps | -0.6 apps/day | 0.677 |
| V2 final | new developer first-day apps | +12.3 apps/day | 0.094 |
| V2 final | later apps by cycle-new developers | +7.4 apps/day | <0.001 |
| V2 final | incumbent apps | +7.5 apps/day | 0.578 |
这个分解说明 V2 的 pre-ramp 是 broad-based:新开发者 entry 最大,但 incumbent 和 cycle-new developer 的后续 app 也有贡献。因此它更像 V2-only 样本里的 account-cohort / market-ramp 现象,不像单纯下载错误或单点 sampling bug。
How To Believe It
- 把 V2 定义为不同 sampling lens 下的 robustness sample,而不是 current main 的直接替换。
- 主日期统一用 Nov.24,并同时报告 Dec.1/Dec.15 as timing sensitivity。
- 报告 2023-24 和 2022-23 两个 control cycles;V2 在两者下都保持正向。
- 每次报告 V2 都附带 artistId bucket / account-vintage diagnostics。
- overlap sample 可作为 sanity check:重合 apps 没有 pre-ramp,但它太小且太 selected,不能作为主 estimand。
How To Reduce The Concern
- 用 bucket-by-day fixed effects 控制不同 account-vintage cohort 的共同时间路径。
- 做 clean-pre / gap-clean design,drop Sep.29-Nov.23,因为这段很可能已经包含 prior AI-tool shocks。
- 做 pretrend-adjusted same-day differences,把处理前 slope 明确扣掉。
- 如果 paper 使用 V2,把 estimand 写成 uniform artistId-sample robustness,不和 current main 混成同一个 sample。
| Dataset | Spec | Unique apps |
|---|---|---|
| current main | baseline global | +48.9% |
| current main | drop Sep.29-Nov.23 gap | +47.3% |
| V2 final | baseline global | +54.1% |
| V2 final | drop Sep.29-Nov.23 gap | +61.9% |
Control Cycle Sensitivity
我把 treated cycle 固定为 2025-06-01 到 2026-04-26,cutoff 固定为 Nov.24,只替换 control cycle。结果显示:换成 2022-23 后,V2 的 pretrend 变干净一些,但没有完全解决。
| Dataset | Control cycle | Unique apps | Unique new devs | Unique new companies | Pretrend: last 28 minus early | Pretrend scale |
|---|---|---|---|---|---|---|
| current main | 2023-24 | +48.9% | +51.4% | +30.6% | -0.9 apps/day | -3.1% |
| current main | 2022-23 | +43.3% | +45.2% | +27.0% | +3.4 apps/day | +15.6% |
| V2 final | 2023-24 | +54.1% | +64.9% | +47.9% | +27.2 apps/day | +46.6% |
| V2 final | 2022-23 | +50.7% | +57.0% | +42.4% | +16.9 apps/day | +40.3% |
结论:control cycle 选择确实影响 V2 的 pretrend;2022-23 比 2023-24 更好一些。但 V2 仍然有明显处理前 ramp,所以问题不是单纯 “control 日期不对”,还包括 V2 的 uniform sampling design 和 treated-cycle composition ramp。
Date / Seasonality / Window Checks
这不是单纯 date fixed effect 没加的问题。严格的 calendar-date fixed effect 在这个 two-cycle 设计里不能直接识别,因为真实日期只属于 treated cycle 或 control cycle;可用的是 day-in-cycle、weekday、week/month-in-cycle、genre-by-day 这类 seasonality controls。baseline 已经是 same-day-in-cycle comparison。
Seasonality controls
| Spec | current unique apps | V2 unique apps |
|---|---|---|
| day-in-cycle FE | +48.9% | +54.1% |
| day-in-cycle + weekday | +46.9% | +52.0% |
| week-in-cycle + weekday | +46.5% | +51.6% |
| month-in-cycle + weekday | +42.1% | +47.0% |
| genre-by-day FE | +48.9% | +54.1% |
Window sensitivity
| Window | current | V2 |
|---|---|---|
| baseline Jun.1-Apr.26 | +48.9% | +54.1% |
| longer pre Jan.1-Apr.26 | +49.0% | +71.6% |
| spring pre Mar.1-Apr.26 | +53.0% | +67.9% |
| late pre Aug.1-Jan.31 | +21.9% | +22.0% |
| short post Jun.1-Jan.31 | +20.2% | +27.0% |
| long post Jun.1-May.13 | +35.4% | +60.2% |
| Dataset | Pretrend window | Last 28 pre minus early | Pre-slope p |
|---|---|---|---|
| current | baseline Jun.1-Apr.26 | -0.9 apps/day | 0.671 |
| current | longer pre Jan.1-Apr.26 | -5.5 apps/day | 0.474 |
| current | spring pre Mar.1-Apr.26 | +5.2 apps/day | 0.609 |
| current | late pre Aug.1-Jan.31 | +0.8 apps/day | 0.951 |
| V2 | baseline Jun.1-Apr.26 | +27.2 apps/day | 0.137 |
| V2 | longer pre Jan.1-Apr.26 | +43.5 apps/day | 0.0003 |
| V2 | spring pre Mar.1-Apr.26 | +48.0 apps/day | 0.0004 |
| V2 | late pre Aug.1-Jan.31 | +20.0 apps/day | 0.323 |
结论:加 weekday / week / month / genre-by-day seasonality controls 后,V2 还是 +47% 到 +54%。真正敏感的是 window:late-pre 或 short-post 会把效应压到约 +22% 到 +27%,但延长 pre-period 反而让 V2 的 pretrend 更明显。这说明 V2 的 treated cycle 从 2025 年 6 月到 11 月已经相对 control cycle 上升,不是简单季节性没控制。
Recommendation
Version 2 应该作为 robustness / alternative sample,而不是直接替代当前主数据库。主结论方向一致,但如果要把 V2 放进 paper,需要明确:统一 cutoff 到 Nov.24;报告 2023-24 和 2022-23 control cycles;加入 date-window sensitivity;把 sampling design、artistId bucket、account vintage diagnostics 放在主表旁边;review 相关分析则需要重新抓取 review stream 或只声明为 snapshot proxy。