iOS Version 2 Data Audit

V2 final unique apps 202,706 current main: 84,985

V2 final software artists 151,251 current main: 75,266

Current-V2 app overlap 8.8% 7,482 / 84,985 current apps

Review RSS availability 0 V2 lacks review stream tables

下载状态：Version 2 保存在 data/version_2_friend_ios/。两个主 SQLite 数据库都通过 PRAGMA quick_check。

How To Read The Two-Node Model

我们现在不是在跑一个单一 treatment date。更合理的是一个 two-node event-study / stacked DID model：

Y = FE + beta1 * Treated x AfterMay22 + beta2 * Treated x AfterNov24

Coefficient meaning

beta1 是 May22 第一阶段 treatment effect：May22 后，2025 treated cycle 相对 control cycle 多出来多少 production。
beta2 是 Nov24 第二阶段 incremental effect：它不是从零开始的 effect，而是在 May22 agentic-coding regime 已经存在之后，Nov24 late-capability stack 额外增加多少。
beta1 + beta2 才是 Nov24 之后相对“没有 May22、也没有 Nov24”的总 effect。

Diagnostics labels

ParallelTran 是 pre-treatment event-study 是否平：cutoff 前 treated-control gap 不应该系统性上升。
PrePass / PubPass 是能否作为 paper-facing causal spec 的标记：不仅要有正 effect，还要 control-year robustness 和 believable pretrend。
一个 spec 可以有大 treatment effect，但如果 ParallelTran 不过，就不能作为 clean first-treatment design。

Spec	PubPass	ParallelTran	Effect read	Co-author interpretation
V1 May22	weak / no	borderline across 2024/2023/2022 controls	beta1 = -7% to +1%	V1 does not strongly see the May22 broad-adoption shock.
V1 Nov24	yes for late shock	passes across 2024/2023/2022 controls	beta2 = +39% to +50%; total = +30% to +46%	V1 is credible for the late-Nov capability/intensity shock.
V2 May22	yes for broad adoption	passes across 2024/2023/2022 controls	beta1 = +22% to +27%	V2 cleanly identifies a May22 broad agentic-coding adoption shock.
V2 Nov24 as first treatment	no	fails across 2024/2023/2022 controls	beta2 = +45% to +56%; total = +73% to +95%	Nov24 is interpretable only as a second-stage increment, not as first exposure.

Co-author Brief

The clean causal story is likely two-stage: May22 is broad agentic-coding adoption; Nov24 is a later capability / intensity shock. V2 fails the Nov24 pretrend because the market is already treated by May22-Nov23.

What V2 is telling us V2 full passes the May22 pretrend under all three control years, then jumps by +22 to +25 apps/day in May22-Nov23. It fails the Nov24 pretrend because the early-agent period is no longer untreated.

What V1 is telling us V1/current has weaker May22 evidence but passes Nov24 pretrend robustly. It is better suited to the late-stage Nov24 shock than to the broad May22 adoption shock.

Paper implication If the estimand is agentic coding adoption, use May22 as the main treatment and treat Nov24 as a second-stage incremental shock. If the estimand is Opus 4.5 / late-Nov stack only, keep Nov24 but do not call V2 full a clean untreated pre-period.

Why May22 matters: Feb24 Claude Code was limited research preview. On May22, Anthropic made Claude Code generally available and moved it into terminal, IDE workflows, background tasks, and the Claude Code SDK. The same week also includes OpenAI Codex cloud agent and GitHub Copilot coding agent, so this is a market-wide agentic-coding adoption cluster.

2025 Coding-Agent Timeline

Two-Node Double DID Visualization

Use a stacked treatment model: Y = FE + beta1 * Treated x AfterMay22 + beta2 * Treated x AfterNov24. Here beta1 is the May22 first-stage adoption effect. beta2 is the Nov24 incremental effect on top of the May22 regime. The post-Nov24 total effect is beta1 + beta2. Percent effects are reported against the relevant counterfactual treated mean, not against raw observed apps/day.

V2 story: May22 first-stage effect is +22 to +25 apps/day across all control years, with May22 pretrend passing. Nov24 then adds another +60 to +70 apps/day. Total post-Nov24 effect is therefore about +82 to +95 apps/day relative to the no-May22 baseline.

V1 story: May22 first-stage effect is near zero, while Nov24 adds +25 to +30 apps/day with Nov24 pretrend passing. V1 is therefore more credible for the late-stage shock than for the broad May22 adoption shock.

Dataset	May22 first-stage effect beta1	Nov24 incremental effect beta2	Post-Nov24 total beta1 + beta2	Interpretation
V1/current	-4.1 to +0.5/day -7% to +1%	+24.7 to +29.5/day +39% to +50%	+20.6 to +28.0/day +30% to +46%	No robust May22 effect in V1; Nov24 looks like the first visible jump in this sample.
V2 full	+21.9 to +25.4/day +22% to +27%	+60.2 to +69.5/day +45% to +56%	+82.1 to +94.9/day +73% to +95%	V2 sees both stages: broad May22 adoption first, then a much larger Nov24 intensification.

Dataset	Control year	May22 pretrend	Nov24 pretrend	May22 early DID	May22 %	Nov24 late increment	Nov24 %	Post-Nov24 total	Total %
V1/current	2024	borderline -9.9/day	pass -5.0/day	-4.1/day	-6.7%	+24.7/day	+38.5%	+20.6/day	+30.2%
V1/current	2023	borderline -10.0/day	pass -0.9/day	-1.5/day	-2.6%	+29.5/day	+49.8%	+28.0/day	+46.1%
V1/current	2022	borderline -9.7/day	pass +3.4/day	+0.5/day	+0.9%	+27.3/day	+44.5%	+27.8/day	+45.6%
V2 full	2024	pass -1.9/day	fail +24.3/day	+21.9/day	+22.3%	+60.2/day	+44.8%	+82.1/day	+73.0%
V2 full	2023	pass -4.8/day	fail +27.2/day	+25.4/day	+26.7%	+69.5/day	+55.7%	+94.9/day	+95.2%
V2 full	2022	pass -3.6/day	fail +16.9/day	+22.2/day	+22.7%	+66.9/day	+52.4%	+89.1/day	+84.5%

How to interpret beta2: The Nov24 coefficient is not the full effect of agentic coding from zero. It is the marginal effect of the late-Nov capability stack conditional on the market already being in the May22 agentic-coding regime. In a level-count model, the total post-Nov24 effect is additive: beta1 + beta2. In a log or PPML model, the same idea is multiplicative, but the causal interpretation is still “second-stage increment on top of first-stage adoption.”

Across control years: V2 May22 is the most robust first-stage story. Whether the control is 2024, 2023, or 2022, May22 pretrend passes and beta1 stays in a narrow positive band: +22% to +27%.

Why percentages move: 2024, 2023, and 2022 have different baseline app-production levels. Percent effects therefore change because the counterfactual denominator changes, not because the qualitative treatment story changes.

Where The May22 V2 Effect Comes From

Component	V2 May22 early DID	V2 Nov24 late increment
New developer first apps	+13 to +18/day	+33 to +37/day
Young repeat apps	+5 to +6/day	+16 to +17/day
Older incumbent apps	+2/day	+10 to +17/day

May22 的第一阶段主要是 new entrants 和年轻 developers；Nov24 的第二阶段更 broad，new devs、repeat apps、incumbents 都明显增加。

Product Evidence

Co-author wording: “The V2 sample suggests that the broad adoption shock begins around the May 2025 agentic-coding release cluster. The November 2025 cutoff should be interpreted as an incremental late-capability shock, not as the first exposure to agentic coding.”

样本规模

current mainV2 final

主结果对比

current mainV2 final

Coverage 表

Dataset	Software artists	Portfolio artists	Unique apps	Portfolio rows	Review RSS rows
current main	75,266	44,074	84,985	2,115,242	486,923
V2 200M	110,903	61,351	144,907	5,264,767	missing
V2 250M final	151,251	86,172	202,706	7,333,268	missing

Overlap 表

Comparison	ID type	Left count	Right count	Overlap	Share of current
current vs V2 final	artist_id	44,074	86,172	3,872	8.8%
current vs V2 final	track_id	84,985	202,706	7,482	8.8%
V2 200M vs V2 final	track_id	144,907	202,706	103,154	71.2%

Pretrend 为什么不一样

current treated-control gapV2 final treated-control gap

Pretrend 诊断表

Dataset	Outcome	Last 28 pre days minus early pre	% of control early mean
current main	unique global apps	-0.9 apps/day	-3.0%
current main	local country rows	-74.2 rows/day	-10.2%
V2 final	unique global apps	+27.2 apps/day	+46.6%
V2 final	local country rows	+992.5 rows/day	+49.3%

解释

V2 的 positive treatment effect 不是问题。问题是处理前的 treated-control gap 已经在扩大，所以 strict pretrend 会更难通过。
Sampling design 不一致。当前主库主要来自 bucket-aware sampling rounds；V2 R200 是 uniform full-namespace sample，改变了进入样本的 artistId cohort mix。
事件日期混乱会放大 pretrend 问题。朋友 audit 写的是 Dec.15，代码实际用 Dec.1；当前 paper 主 cutoff 是 Nov.24。如果用 Dec.1 或 Dec.15，late-November movement 会被放进 apparent pre-period。
V2 的 pre-ramp 集中在特定 artistId 段。尤其是 1.4B-1.6B，另有 1.7B-1.8B 最新密集段的正向 movement。

Review Data 是否能重跑

不能直接重跑完整 review stream / review text / bottleneck attention 分析。V2 缺少 reviews、review_scrape_status、app_country_meta、app_privacy。

V2 可以做基于 current rating_count snapshot 的弱版 review proxy，但需要把 hardcoded SCRAPE_DT 从 2026-04-27 改成 V2 scrape date 2026-05-13，并保持相同 minimum-age horizon。

Metric	current treated-post	V2 final treated-post
Zero-review share	67.2%	57.0%
>=10-review share	3.9%	5.4%
Existing bottleneck label coverage	41.3%	1.5%

Sampling Rule Deep Dive

V1/current 和 V2 的 sampling rule 不一样。Current main 是多个 bucket-aware / importance-weighted rounds 合并；V2 final 是单个 uniform full-namespace sample，250M probes，seed=200020，所有 probed IDs 的 sample probability 是常数。

Dataset	Sampling rule	0-1.3B SW artists	1.4B-1.6B SW artists	1.7B-1.8B SW artists
current main	pooled importance-weighted / bucket-aware rounds	4,494	29,424	41,348
V2 final	uniform full-namespace R200 sample	41,114	46,376	63,761

关键点：bucket mix 确实不同，但不是唯一原因。把 V2 reweight 成 current 的 bucket share 后，pretrend 没有消失，反而从 +27.2 apps/day 变成 +29.8 到 +32.8 apps/day。

Account Cohort 和 Bucket Weight

Bucket weight 的经济含义是 account vintage mix：高 bucket 不是普通“权重差异”，它对应更年轻的 developer accounts、更少的 apps per developer、也更可能暴露在 2025 年 AI tooling ramp。

Dataset	Bucket group	Dev share	Median entry	Entry >=2023	Apps / dev
current	0-1.3B early/mid	4.3%	2017	12.8%	3.80
current	1.4B-1.6B mid/late	42.2%	2022	33.5%	2.22
current	1.7B-1.8B latest/dense	53.5%	2025	96.5%	1.55
V2	0-1.3B early/mid	20.2%	2017	13.0%	3.86
V2	1.4B-1.6B mid/late	34.5%	2021	30.1%	2.46
V2	1.7B-1.8B latest/dense	45.3%	2025	96.9%	1.60

Matched V2 Sample

如果从 V2 里按 current 的 exact 100M bucket developer counts 重新抽样，pretrend 会显著下降，但不会完全消失。

Sample / scheme	Last 28 pre minus early	Pre-slope p
full current main	-0.9 apps/day	0.671
full V2 final	+27.2 apps/day	0.137
V2 reweighted to current exact-bucket shares	+12.5 apps/day	0.605
V2 random subsample matched to current exact-bucket counts	+6.5 apps/day	0.605

200 次 matched subsample 的范围是 +2.4 到 +11.3 apps/day，平均 +6.5。也就是说 sampling distribution 可以解释一部分，但 V2-only release-date distribution 仍然有正的 pre-ramp。

V2 Pretrend 来自哪里

Segment	Unique apps	Last 28 pre minus early pre	Pre-slope p
V2-only track IDs	195,097	+27.6 apps/day	0.115
V2/current overlapping track IDs, measured in V2	7,599	-0.5 apps/day	0.600
current-only track IDs	77,544	-0.5 apps/day	0.690
current/V2 overlapping track IDs, measured in current	7,599	-0.4 apps/day	0.595

Bucket-Date Pattern

Current main 的 1.7B-1.8B 最新密集 bucket 在 pre-period 中下降：June/July 是 +1,059/+1,148，Nov.1-Nov.23 降到 +519。
V2 final 的 1.7B-1.8B bucket 没有同样下降：June/July 是 +1,862/+2,038，October 是 +2,136，Nov.1-Nov.23 仍有 +1,601。
V2 final 的 1.4B-1.6B bucket 也从 June/July 的 negative gap 走到 Oct/Nov 的 positive gap。
所以 V2 的问题是 V2-only sample 的 release-date distribution，在 Nov.24 前已经有 treated-cycle ramp。

Economic Interpretation

Sampling rule 本身不是经济机制。它重要，是因为 artistId bucket 代理了 developer-account vintage 和平台 cohort。换 sampling rule，会改变哪些 account cohorts 进入样本；而这些 cohorts 的 untreated trend 不一定平行。

怎么理解 V2：V2 是 uniform artistId sample，所以它不是刻意 targeting dense buckets；dense buckets 出现很多 Software Artists，是因为 Apple namespace 里这些 bucket 真实更密集。

为什么这会影响 dates：新的 account cohorts 更可能暴露在 2025 AI-development ramp、indie experimentation、Apple catalog churn、以及 Nov.24 之前的 coding-agent shocks。

Dataset	Segment	Last 28 pre minus early pre	Pre-slope p
current main	new developer first-day apps	-0.9 apps/day	0.613
current main	later apps by cycle-new developers	+0.6 apps/day	0.818
current main	incumbent apps	-0.6 apps/day	0.677
V2 final	new developer first-day apps	+12.3 apps/day	0.094
V2 final	later apps by cycle-new developers	+7.4 apps/day	<0.001
V2 final	incumbent apps	+7.5 apps/day	0.578

这个分解说明 V2 的 pre-ramp 是 broad-based：新开发者 entry 最大，但 incumbent 和 cycle-new developer 的后续 app 也有贡献。因此它更像 V2-only 样本里的 account-cohort / market-ramp 现象，不像单纯下载错误或单点 sampling bug。

How To Believe It

把 V2 定义为不同 sampling lens 下的 robustness sample，而不是 current main 的直接替换。
主日期统一用 Nov.24，并同时报告 Dec.1/Dec.15 as timing sensitivity。
报告 2023-24 和 2022-23 两个 control cycles；V2 在两者下都保持正向。
每次报告 V2 都附带 artistId bucket / account-vintage diagnostics。
overlap sample 可作为 sanity check：重合 apps 没有 pre-ramp，但它太小且太 selected，不能作为主 estimand。

How To Reduce The Concern

用 bucket-by-day fixed effects 控制不同 account-vintage cohort 的共同时间路径。
做 clean-pre / gap-clean design，drop Sep.29-Nov.23，因为这段很可能已经包含 prior AI-tool shocks。
做 pretrend-adjusted same-day differences，把处理前 slope 明确扣掉。
如果 paper 使用 V2，把 estimand 写成 uniform artistId-sample robustness，不和 current main 混成同一个 sample。

Dataset	Spec	Unique apps
current main	baseline global	+48.9%
current main	drop Sep.29-Nov.23 gap	+47.3%
V2 final	baseline global	+54.1%
V2 final	drop Sep.29-Nov.23 gap	+61.9%

Control Cycle Sensitivity

我把 treated cycle 固定为 2025-06-01 到 2026-04-26，cutoff 固定为 Nov.24，只替换 control cycle。结果显示：换成 2022-23 后，V2 的 pretrend 变干净一些，但没有完全解决。

Dataset	Control cycle	Unique apps	Unique new devs	Unique new companies	Pretrend: last 28 minus early	Pretrend scale
current main	2023-24	+48.9%	+51.4%	+30.6%	-0.9 apps/day	-3.1%
current main	2022-23	+43.3%	+45.2%	+27.0%	+3.4 apps/day	+15.6%
V2 final	2023-24	+54.1%	+64.9%	+47.9%	+27.2 apps/day	+46.6%
V2 final	2022-23	+50.7%	+57.0%	+42.4%	+16.9 apps/day	+40.3%

结论：control cycle 选择确实影响 V2 的 pretrend；2022-23 比 2023-24 更好一些。但 V2 仍然有明显处理前 ramp，所以问题不是单纯 “control 日期不对”，还包括 V2 的 uniform sampling design 和 treated-cycle composition ramp。

Date / Seasonality / Window Checks

这不是单纯 date fixed effect 没加的问题。严格的 calendar-date fixed effect 在这个 two-cycle 设计里不能直接识别，因为真实日期只属于 treated cycle 或 control cycle；可用的是 day-in-cycle、weekday、week/month-in-cycle、genre-by-day 这类 seasonality controls。baseline 已经是 same-day-in-cycle comparison。

Seasonality controls

Spec	current unique apps	V2 unique apps
day-in-cycle FE	+48.9%	+54.1%
day-in-cycle + weekday	+46.9%	+52.0%
week-in-cycle + weekday	+46.5%	+51.6%
month-in-cycle + weekday	+42.1%	+47.0%
genre-by-day FE	+48.9%	+54.1%

Window sensitivity

Window	current	V2
baseline Jun.1-Apr.26	+48.9%	+54.1%
longer pre Jan.1-Apr.26	+49.0%	+71.6%
spring pre Mar.1-Apr.26	+53.0%	+67.9%
late pre Aug.1-Jan.31	+21.9%	+22.0%
short post Jun.1-Jan.31	+20.2%	+27.0%
long post Jun.1-May.13	+35.4%	+60.2%

Dataset	Pretrend window	Last 28 pre minus early	Pre-slope p
current	baseline Jun.1-Apr.26	-0.9 apps/day	0.671
current	longer pre Jan.1-Apr.26	-5.5 apps/day	0.474
current	spring pre Mar.1-Apr.26	+5.2 apps/day	0.609
current	late pre Aug.1-Jan.31	+0.8 apps/day	0.951
V2	baseline Jun.1-Apr.26	+27.2 apps/day	0.137
V2	longer pre Jan.1-Apr.26	+43.5 apps/day	0.0003
V2	spring pre Mar.1-Apr.26	+48.0 apps/day	0.0004
V2	late pre Aug.1-Jan.31	+20.0 apps/day	0.323

结论：加 weekday / week / month / genre-by-day seasonality controls 后，V2 还是 +47% 到 +54%。真正敏感的是 window：late-pre 或 short-post 会把效应压到约 +22% 到 +27%，但延长 pre-period 反而让 V2 的 pretrend 更明显。这说明 V2 的 treated cycle 从 2025 年 6 月到 11 月已经相对 control cycle 上升，不是简单季节性没控制。

Recommendation

Version 2 应该作为 robustness / alternative sample，而不是直接替代当前主数据库。主结论方向一致，但如果要把 V2 放进 paper，需要明确：统一 cutoff 到 Nov.24；报告 2023-24 和 2022-23 control cycles；加入 date-window sensitivity；把 sampling design、artistId bucket、account vintage diagnostics 放在主表旁边；review 相关分析则需要重新抓取 review stream 或只声明为 snapshot proxy。