Version 2 Friend iOS Dataset Audit

May22 和 Nov24 是两个不同的 treatment 节点

给 co-authors 的核心更新:V2 full 并不是“坏数据”,它更像是在告诉我们 broad coding-agent treatment 从 2025-05-22 左右已经开始。Nov24 仍然可以是 late-stage ability shock,但不能再被解释为第一次 treatment。

May22 前 V2 pretrend 过 V2 full 在 2024/2023/2022 三个 control years 下,May22 前 pretrend 都 pass。
Nov24 前 V2 pretrend 不过 V2 full 的 May22-Nov23 已经上升,所以用 Nov24 当 first treatment 会被 pre-treated。
V1 和 V2 讲的是不同 margin V1 更像 Nov24 late shock;V2 更像 May22 broad adoption + Nov24 second-stage intensification。
V2 final unique apps 202,706 current main: 84,985
V2 final software artists 151,251 current main: 75,266
Current-V2 app overlap 8.8% 7,482 / 84,985 current apps
Review RSS availability 0 V2 lacks review stream tables

下载状态:Version 2 保存在 data/version_2_friend_ios/。两个主 SQLite 数据库都通过 PRAGMA quick_check

How To Read The Two-Node Model

我们现在不是在跑一个单一 treatment date。更合理的是一个 two-node event-study / stacked DID model:

Y = FE + beta1 * Treated x AfterMay22 + beta2 * Treated x AfterNov24

Coefficient meaning

  • beta1 是 May22 第一阶段 treatment effect:May22 后,2025 treated cycle 相对 control cycle 多出来多少 production。
  • beta2 是 Nov24 第二阶段 incremental effect:它不是从零开始的 effect,而是在 May22 agentic-coding regime 已经存在之后,Nov24 late-capability stack 额外增加多少。
  • beta1 + beta2 才是 Nov24 之后相对“没有 May22、也没有 Nov24”的总 effect。

Diagnostics labels

  • ParallelTran 是 pre-treatment event-study 是否平:cutoff 前 treated-control gap 不应该系统性上升。
  • PrePass / PubPass 是能否作为 paper-facing causal spec 的标记:不仅要有正 effect,还要 control-year robustness 和 believable pretrend。
  • 一个 spec 可以有大 treatment effect,但如果 ParallelTran 不过,就不能作为 clean first-treatment design。
Spec PubPass ParallelTran Effect read Co-author interpretation
V1 May22 weak / no borderline across 2024/2023/2022 controls beta1 = -7% to +1% V1 does not strongly see the May22 broad-adoption shock.
V1 Nov24 yes for late shock passes across 2024/2023/2022 controls beta2 = +39% to +50%; total = +30% to +46% V1 is credible for the late-Nov capability/intensity shock.
V2 May22 yes for broad adoption passes across 2024/2023/2022 controls beta1 = +22% to +27% V2 cleanly identifies a May22 broad agentic-coding adoption shock.
V2 Nov24 as first treatment no fails across 2024/2023/2022 controls beta2 = +45% to +56%; total = +73% to +95% Nov24 is interpretable only as a second-stage increment, not as first exposure.

Co-author Brief

The clean causal story is likely two-stage: May22 is broad agentic-coding adoption; Nov24 is a later capability / intensity shock. V2 fails the Nov24 pretrend because the market is already treated by May22-Nov23.

What V2 is telling us V2 full passes the May22 pretrend under all three control years, then jumps by +22 to +25 apps/day in May22-Nov23. It fails the Nov24 pretrend because the early-agent period is no longer untreated.
What V1 is telling us V1/current has weaker May22 evidence but passes Nov24 pretrend robustly. It is better suited to the late-stage Nov24 shock than to the broad May22 adoption shock.
Paper implication If the estimand is agentic coding adoption, use May22 as the main treatment and treat Nov24 as a second-stage incremental shock. If the estimand is Opus 4.5 / late-Nov stack only, keep Nov24 but do not call V2 full a clean untreated pre-period.

Why May22 matters: Feb24 Claude Code was limited research preview. On May22, Anthropic made Claude Code generally available and moved it into terminal, IDE workflows, background tasks, and the Claude Code SDK. The same week also includes OpenAI Codex cloud agent and GitHub Copilot coding agent, so this is a market-wide agentic-coding adoption cluster.

2025 Coding-Agent Timeline

Baseline Jan 1 - May 21 Early Agentic-Coding Period May 22 - Nov 23 Late Capability / Intensity Shock Nov 24 - Apr 25 Feb 24 Claude Code research preview May 16 OpenAI Codex cloud preview May 19-22 Copilot coding agent Claude Code GA Jun 4 Cursor 1.0 Background Agent GA Sep 29 Claude Agent SDK subagents / hooks Nov 24 late-stage ability shock Interpretation: May22 is not a single-company news item. It is the week when agentic coding tools become broadly usable in real workflows.

Two-Node Double DID Visualization

Use a stacked treatment model: Y = FE + beta1 * Treated x AfterMay22 + beta2 * Treated x AfterNov24. Here beta1 is the May22 first-stage adoption effect. beta2 is the Nov24 incremental effect on top of the May22 regime. The post-Nov24 total effect is beta1 + beta2. Percent effects are reported against the relevant counterfactual treated mean, not against raw observed apps/day.

0 50 100 140 Jan1-May21 baseline May22-Nov23 early agent Nov24-Apr25 late shock 24.8 23.1 50.2 36.3 59.5 125.0 May22 Nov24 V1/current V2 full

V2 story: May22 first-stage effect is +22 to +25 apps/day across all control years, with May22 pretrend passing. Nov24 then adds another +60 to +70 apps/day. Total post-Nov24 effect is therefore about +82 to +95 apps/day relative to the no-May22 baseline.

V1 story: May22 first-stage effect is near zero, while Nov24 adds +25 to +30 apps/day with Nov24 pretrend passing. V1 is therefore more credible for the late-stage shock than for the broad May22 adoption shock.

Dataset May22 first-stage effect beta1 Nov24 incremental effect beta2 Post-Nov24 total beta1 + beta2 Interpretation
V1/current -4.1 to +0.5/day
-7% to +1%
+24.7 to +29.5/day
+39% to +50%
+20.6 to +28.0/day
+30% to +46%
No robust May22 effect in V1; Nov24 looks like the first visible jump in this sample.
V2 full +21.9 to +25.4/day
+22% to +27%
+60.2 to +69.5/day
+45% to +56%
+82.1 to +94.9/day
+73% to +95%
V2 sees both stages: broad May22 adoption first, then a much larger Nov24 intensification.
Dataset Control year May22 pretrend Nov24 pretrend May22 early DID May22 % Nov24 late increment Nov24 % Post-Nov24 total Total %
V1/current 2024 borderline -9.9/day pass -5.0/day -4.1/day -6.7% +24.7/day +38.5% +20.6/day +30.2%
V1/current 2023 borderline -10.0/day pass -0.9/day -1.5/day -2.6% +29.5/day +49.8% +28.0/day +46.1%
V1/current 2022 borderline -9.7/day pass +3.4/day +0.5/day +0.9% +27.3/day +44.5% +27.8/day +45.6%
V2 full 2024 pass -1.9/day fail +24.3/day +21.9/day +22.3% +60.2/day +44.8% +82.1/day +73.0%
V2 full 2023 pass -4.8/day fail +27.2/day +25.4/day +26.7% +69.5/day +55.7% +94.9/day +95.2%
V2 full 2022 pass -3.6/day fail +16.9/day +22.2/day +22.7% +66.9/day +52.4% +89.1/day +84.5%

How to interpret beta2: The Nov24 coefficient is not the full effect of agentic coding from zero. It is the marginal effect of the late-Nov capability stack conditional on the market already being in the May22 agentic-coding regime. In a level-count model, the total post-Nov24 effect is additive: beta1 + beta2. In a log or PPML model, the same idea is multiplicative, but the causal interpretation is still “second-stage increment on top of first-stage adoption.”

Across control years: V2 May22 is the most robust first-stage story. Whether the control is 2024, 2023, or 2022, May22 pretrend passes and beta1 stays in a narrow positive band: +22% to +27%.

Why percentages move: 2024, 2023, and 2022 have different baseline app-production levels. Percent effects therefore change because the counterfactual denominator changes, not because the qualitative treatment story changes.

Where The May22 V2 Effect Comes From

Component V2 May22 early DID V2 Nov24 late increment
New developer first apps +13 to +18/day +33 to +37/day
Young repeat apps +5 to +6/day +16 to +17/day
Older incumbent apps +2/day +10 to +17/day

May22 的第一阶段主要是 new entrants 和年轻 developers;Nov24 的第二阶段更 broad,new devs、repeat apps、incumbents 都明显增加。

Product Evidence

Co-author wording: “The V2 sample suggests that the broad adoption shock begins around the May 2025 agentic-coding release cluster. The November 2025 cutoff should be interpreted as an incremental late-capability shock, not as the first exposure to agentic coding.”

样本规模

Software artists Portfolio artists Unique apps 75,266 151,251 44,074 86,172 84,985 202,706
current mainV2 final

主结果对比

Unique apps New devs New companies +48.0% +54.1% +50.4% +64.9% +29.2% +47.9%
current mainV2 final

Coverage 表

Dataset Software artists Portfolio artists Unique apps Portfolio rows Review RSS rows
current main 75,266 44,074 84,985 2,115,242 486,923
V2 200M 110,903 61,351 144,907 5,264,767 missing
V2 250M final 151,251 86,172 202,706 7,333,268 missing

Overlap 表

Comparison ID type Left count Right count Overlap Share of current
current vs V2 final artist_id 44,074 86,172 3,872 8.8%
current vs V2 final track_id 84,985 202,706 7,482 8.8%
V2 200M vs V2 final track_id 144,907 202,706 103,154 71.2%

Pretrend 为什么不一样

0 750 1,500 2,250 Jun Jul Aug Sep Oct Nov 1-23
current treated-control gapV2 final treated-control gap

Pretrend 诊断表

Dataset Outcome Last 28 pre days minus early pre % of control early mean
current main unique global apps -0.9 apps/day -3.0%
current main local country rows -74.2 rows/day -10.2%
V2 final unique global apps +27.2 apps/day +46.6%
V2 final local country rows +992.5 rows/day +49.3%

解释

  • V2 的 positive treatment effect 不是问题。问题是处理前的 treated-control gap 已经在扩大,所以 strict pretrend 会更难通过。
  • Sampling design 不一致。当前主库主要来自 bucket-aware sampling rounds;V2 R200 是 uniform full-namespace sample,改变了进入样本的 artistId cohort mix。
  • 事件日期混乱会放大 pretrend 问题。朋友 audit 写的是 Dec.15,代码实际用 Dec.1;当前 paper 主 cutoff 是 Nov.24。如果用 Dec.1 或 Dec.15,late-November movement 会被放进 apparent pre-period。
  • V2 的 pre-ramp 集中在特定 artistId 段。尤其是 1.4B-1.6B,另有 1.7B-1.8B 最新密集段的正向 movement。

Review Data 是否能重跑

不能直接重跑完整 review stream / review text / bottleneck attention 分析。V2 缺少 reviewsreview_scrape_statusapp_country_metaapp_privacy

V2 可以做基于 current rating_count snapshot 的弱版 review proxy,但需要把 hardcoded SCRAPE_DT 从 2026-04-27 改成 V2 scrape date 2026-05-13,并保持相同 minimum-age horizon。

Metric current treated-post V2 final treated-post
Zero-review share 67.2% 57.0%
>=10-review share 3.9% 5.4%
Existing bottleneck label coverage 41.3% 1.5%

Sampling Rule Deep Dive

V1/current 和 V2 的 sampling rule 不一样。Current main 是多个 bucket-aware / importance-weighted rounds 合并;V2 final 是单个 uniform full-namespace sample,250M probes,seed=200020,所有 probed IDs 的 sample probability 是常数。

Dataset Sampling rule 0-1.3B SW artists 1.4B-1.6B SW artists 1.7B-1.8B SW artists
current main pooled importance-weighted / bucket-aware rounds 4,494 29,424 41,348
V2 final uniform full-namespace R200 sample 41,114 46,376 63,761

关键点:bucket mix 确实不同,但不是唯一原因。把 V2 reweight 成 current 的 bucket share 后,pretrend 没有消失,反而从 +27.2 apps/day 变成 +29.8 到 +32.8 apps/day。

Account Cohort 和 Bucket Weight

Bucket weight 的经济含义是 account vintage mix:高 bucket 不是普通“权重差异”,它对应更年轻的 developer accounts、更少的 apps per developer、也更可能暴露在 2025 年 AI tooling ramp。

Dataset Bucket group Dev share Median entry Entry >=2023 Apps / dev
current 0-1.3B early/mid 4.3% 2017 12.8% 3.80
current 1.4B-1.6B mid/late 42.2% 2022 33.5% 2.22
current 1.7B-1.8B latest/dense 53.5% 2025 96.5% 1.55
V2 0-1.3B early/mid 20.2% 2017 13.0% 3.86
V2 1.4B-1.6B mid/late 34.5% 2021 30.1% 2.46
V2 1.7B-1.8B latest/dense 45.3% 2025 96.9% 1.60

Matched V2 Sample

如果从 V2 里按 current 的 exact 100M bucket developer counts 重新抽样,pretrend 会显著下降,但不会完全消失。

Sample / scheme Last 28 pre minus early Pre-slope p
full current main -0.9 apps/day 0.671
full V2 final +27.2 apps/day 0.137
V2 reweighted to current exact-bucket shares +12.5 apps/day 0.605
V2 random subsample matched to current exact-bucket counts +6.5 apps/day 0.605

200 次 matched subsample 的范围是 +2.4 到 +11.3 apps/day,平均 +6.5。也就是说 sampling distribution 可以解释一部分,但 V2-only release-date distribution 仍然有正的 pre-ramp。

V2 Pretrend 来自哪里

Segment Unique apps Last 28 pre minus early pre Pre-slope p
V2-only track IDs 195,097 +27.6 apps/day 0.115
V2/current overlapping track IDs, measured in V2 7,599 -0.5 apps/day 0.600
current-only track IDs 77,544 -0.5 apps/day 0.690
current/V2 overlapping track IDs, measured in current 7,599 -0.4 apps/day 0.595

Bucket-Date Pattern

  • Current main 的 1.7B-1.8B 最新密集 bucket 在 pre-period 中下降:June/July 是 +1,059/+1,148,Nov.1-Nov.23 降到 +519。
  • V2 final 的 1.7B-1.8B bucket 没有同样下降:June/July 是 +1,862/+2,038,October 是 +2,136,Nov.1-Nov.23 仍有 +1,601。
  • V2 final 的 1.4B-1.6B bucket 也从 June/July 的 negative gap 走到 Oct/Nov 的 positive gap。
  • 所以 V2 的问题是 V2-only sample 的 release-date distribution,在 Nov.24 前已经有 treated-cycle ramp。

Economic Interpretation

Sampling rule 本身不是经济机制。它重要,是因为 artistId bucket 代理了 developer-account vintage 和平台 cohort。换 sampling rule,会改变哪些 account cohorts 进入样本;而这些 cohorts 的 untreated trend 不一定平行。

怎么理解 V2:V2 是 uniform artistId sample,所以它不是刻意 targeting dense buckets;dense buckets 出现很多 Software Artists,是因为 Apple namespace 里这些 bucket 真实更密集。

为什么这会影响 dates:新的 account cohorts 更可能暴露在 2025 AI-development ramp、indie experimentation、Apple catalog churn、以及 Nov.24 之前的 coding-agent shocks。

Dataset Segment Last 28 pre minus early pre Pre-slope p
current main new developer first-day apps -0.9 apps/day 0.613
current main later apps by cycle-new developers +0.6 apps/day 0.818
current main incumbent apps -0.6 apps/day 0.677
V2 final new developer first-day apps +12.3 apps/day 0.094
V2 final later apps by cycle-new developers +7.4 apps/day <0.001
V2 final incumbent apps +7.5 apps/day 0.578

这个分解说明 V2 的 pre-ramp 是 broad-based:新开发者 entry 最大,但 incumbent 和 cycle-new developer 的后续 app 也有贡献。因此它更像 V2-only 样本里的 account-cohort / market-ramp 现象,不像单纯下载错误或单点 sampling bug。

How To Believe It

  • 把 V2 定义为不同 sampling lens 下的 robustness sample,而不是 current main 的直接替换。
  • 主日期统一用 Nov.24,并同时报告 Dec.1/Dec.15 as timing sensitivity。
  • 报告 2023-24 和 2022-23 两个 control cycles;V2 在两者下都保持正向。
  • 每次报告 V2 都附带 artistId bucket / account-vintage diagnostics。
  • overlap sample 可作为 sanity check:重合 apps 没有 pre-ramp,但它太小且太 selected,不能作为主 estimand。

How To Reduce The Concern

  • 用 bucket-by-day fixed effects 控制不同 account-vintage cohort 的共同时间路径。
  • 做 clean-pre / gap-clean design,drop Sep.29-Nov.23,因为这段很可能已经包含 prior AI-tool shocks。
  • 做 pretrend-adjusted same-day differences,把处理前 slope 明确扣掉。
  • 如果 paper 使用 V2,把 estimand 写成 uniform artistId-sample robustness,不和 current main 混成同一个 sample。
Dataset Spec Unique apps
current main baseline global +48.9%
current main drop Sep.29-Nov.23 gap +47.3%
V2 final baseline global +54.1%
V2 final drop Sep.29-Nov.23 gap +61.9%

Control Cycle Sensitivity

我把 treated cycle 固定为 2025-06-01 到 2026-04-26,cutoff 固定为 Nov.24,只替换 control cycle。结果显示:换成 2022-23 后,V2 的 pretrend 变干净一些,但没有完全解决。

Dataset Control cycle Unique apps Unique new devs Unique new companies Pretrend: last 28 minus early Pretrend scale
current main 2023-24 +48.9% +51.4% +30.6% -0.9 apps/day -3.1%
current main 2022-23 +43.3% +45.2% +27.0% +3.4 apps/day +15.6%
V2 final 2023-24 +54.1% +64.9% +47.9% +27.2 apps/day +46.6%
V2 final 2022-23 +50.7% +57.0% +42.4% +16.9 apps/day +40.3%

结论:control cycle 选择确实影响 V2 的 pretrend;2022-23 比 2023-24 更好一些。但 V2 仍然有明显处理前 ramp,所以问题不是单纯 “control 日期不对”,还包括 V2 的 uniform sampling design 和 treated-cycle composition ramp。

Date / Seasonality / Window Checks

这不是单纯 date fixed effect 没加的问题。严格的 calendar-date fixed effect 在这个 two-cycle 设计里不能直接识别,因为真实日期只属于 treated cycle 或 control cycle;可用的是 day-in-cycle、weekday、week/month-in-cycle、genre-by-day 这类 seasonality controls。baseline 已经是 same-day-in-cycle comparison。

Seasonality controls

Spec current unique apps V2 unique apps
day-in-cycle FE +48.9% +54.1%
day-in-cycle + weekday +46.9% +52.0%
week-in-cycle + weekday +46.5% +51.6%
month-in-cycle + weekday +42.1% +47.0%
genre-by-day FE +48.9% +54.1%

Window sensitivity

Window current V2
baseline Jun.1-Apr.26 +48.9% +54.1%
longer pre Jan.1-Apr.26 +49.0% +71.6%
spring pre Mar.1-Apr.26 +53.0% +67.9%
late pre Aug.1-Jan.31 +21.9% +22.0%
short post Jun.1-Jan.31 +20.2% +27.0%
long post Jun.1-May.13 +35.4% +60.2%
Dataset Pretrend window Last 28 pre minus early Pre-slope p
current baseline Jun.1-Apr.26 -0.9 apps/day 0.671
current longer pre Jan.1-Apr.26 -5.5 apps/day 0.474
current spring pre Mar.1-Apr.26 +5.2 apps/day 0.609
current late pre Aug.1-Jan.31 +0.8 apps/day 0.951
V2 baseline Jun.1-Apr.26 +27.2 apps/day 0.137
V2 longer pre Jan.1-Apr.26 +43.5 apps/day 0.0003
V2 spring pre Mar.1-Apr.26 +48.0 apps/day 0.0004
V2 late pre Aug.1-Jan.31 +20.0 apps/day 0.323

结论:加 weekday / week / month / genre-by-day seasonality controls 后,V2 还是 +47% 到 +54%。真正敏感的是 window:late-pre 或 short-post 会把效应压到约 +22% 到 +27%,但延长 pre-period 反而让 V2 的 pretrend 更明显。这说明 V2 的 treated cycle 从 2025 年 6 月到 11 月已经相对 control cycle 上升,不是简单季节性没控制。

Recommendation

Version 2 应该作为 robustness / alternative sample,而不是直接替代当前主数据库。主结论方向一致,但如果要把 V2 放进 paper,需要明确:统一 cutoff 到 Nov.24;报告 2023-24 和 2022-23 control cycles;加入 date-window sensitivity;把 sampling design、artistId bucket、account vintage diagnostics 放在主表旁边;review 相关分析则需要重新抓取 review stream 或只声明为 snapshot proxy。