Agentic SDLC: tín hiệu mạnh nhưng dữ liệu xã hội thiếu Reddit/Facebook
171 candidates quét; X=32, YouTube=20, HN/dev=30, GitHub=64, Papers/Product=25; Reddit=0 do 403, Facebook=0 do fallback không có link dùng được. Vì vậy không claim PASS.
1. Executive Snapshot
1) Context rot là blocker thật
Why now: HN có thread mới về codex /goal fail với 1 pts/0 comments trong 24h; thấp engagement nhưng đúng pain point harness. Quyết định: trial guardrail context snapshot.
2) Agent security benchmark nổi lên
AgentToolBench-Code xuất hiện 2026-05-26; metric 1 pts nhưng chủ đề security benchmark trực tiếp liên quan agent reliability.
3) Terminal-Bench tạo pressure OSS
Dirac đạt 393 pts/148 comments trên HN; ForgeCode gắn Terminal-Bench 2.0. Cần benchmark nội bộ 20 task.
4) Repo momentum phân mảnh
GitHub 64 signals; top sample: multica 33,146 stars/3,984 forks nhưng 753 issues → adoption cao, vận hành rủi ro.
5) Workflow automation gần business hơn model news
Kampala 100 pts/83 comments về reverse-engineer app→API; hợp NEXA/DOMUS hơn agent IDE thuần.
2. KPI Dashboard
Blocker: Reddit 5 lần HTTP 403; Facebook public fallback 0 link; X dùng search fallback, direct unauthenticated unavailable.
3. KOL/OG Feed Watch
| Nền tảng | Signal/link | Metric | Timestamp |
|---|---|---|---|
| HN | Why codex /goal fails on complex workflows | 1 pts / 0 comments | 2026-05-26T06:33Z |
| HN | Show HN: AgentToolBench-Code | 1 pts / 0 comments | 2026-05-26T03:45Z |
| HN | What ClickHouse learned from a year of coding with AI agents | 2 pts / 0 comments | 2026-05-25T17:36Z |
| HN | Ask HN: What do you do at work while the coding agent is working? | 5 pts / 6 comments | 2026-05-25T16:55Z |
| HN | Zero – Programming Language for Agents | 3 pts / 0 comments | 2026-05-23T11:13Z |
| HN | Launch HN: Kampala – Reverse-Engineer Apps into APIs | 100 pts / 83 comments | 2026-04-16T15:19Z |
| HN | Dirac topped TerminalBench on Gemini-3-flash-preview | 393 pts / 148 comments | 2026-04-27T12:35Z |
| GitHub | gadievron/raptor | 2,755 stars / 435 forks / 17 issues | 2026-05-26T07:33Z |
| GitHub | getpaseo/paseo | 6,726 stars / 637 forks / 456 issues | 2026-05-26T07:33Z |
| GitHub | multica-ai/multica | 33,146 stars / 3,984 forks / 753 issues | 2026-05-26T07:33Z |
| GitHub | codeaholicguy/ai-devkit | 1,214 stars / 198 forks / 5 issues | 2026-05-26T07:31Z |
| GitHub | MayDay-wpf/snow-cli | 828 stars / 68 forks / 5 issues | 2026-05-26T07:31Z |
| GitHub | RDI-Foundation/amber | HN 1 pts / 0 comments | 2026-04-13T07:48Z |
| Product/Bench | ForgeCode Terminal-Bench 2.0 | 4 HN pts / benchmark claim | 2026-04-29T18:16Z |
| Product | DAAF Claude Code research workflow | 1 HN pts / 0 comments | 2026-05-25T22:52Z |
4. Trend Radar
Terminal-Bench/SWE harness: 2 benchmark-linked signals, quyết định trial.
Agent runtime/language: Zero, Amber, Salacia; N/A delta → watch.
Vibe-coded app showcases: 3+ HN items, low direct Fabbi ROI → ignore/monitor.
5. Repo Watch
Repo có adoption cao nhưng issue risk: multica 753 issues; paseo 456 issues; raptor 17 issues. Không adopt production khi chưa có maintainer/activity audit 7 ngày.
6. Paper / Benchmark Watch
Benchmark/product candidates=25. Ưu tiên Terminal-Bench + AgentToolBench-Code; papers cụ thể từ harness chưa expose đủ direct metadata trong run → DATA_HEALTH impact -15% confidence.
7. Product / Business Watch
Claude Code/Codex/Cursor/Devin/OpenCode/Copilot/Gemini CLI được theo dõi qua social/product fallback; direct product changelog links chưa đủ trong sample → dùng quyết định watch/trial có kiểm soát, không adopt rộng.
8. Impact Coverage
| Domain | 0-2w | 1-2m | 3-6m | Decision |
|---|---|---|---|---|
| FARE | Harness 20 task cho bugfix CRUD/API | Agent QA regression | Governed SDLC | trial |
| NEXA | App→API discovery từ Kampala pattern | Legacy integration copilots | Workflow mining | trial |
| DOMUS | Agent code review policy | Multi-agent backlog triage | SDLC telemetry | watch |
| Japan/Vietnam/Global | PoC nội bộ 2 tuần | JP enterprise security story | Reusable Fabbi accelerator | adopt gated |
9. CTO Evaluation Matrix
| Top signal | Luận điểm / why now | Bằng chứng | Phản tín hiệu / rủi ro | Fabbi implication | Decision | Conf. | Next validation 1-2w |
|---|---|---|---|---|---|---|---|
| Context rot | Agent dài phiên fail vì compaction/context | HN 1/0 | Engagement thấp, anecdote | FARE/SYNCA: cần context contract | trial | 62% | Đo pass@1 trên 20 tasks trước/sau snapshot |
| Security benchmark | Tool-use attack surface thành gating criterion | AgentToolBench-Code | Gist mới, chưa peer review | Japan/global: security narrative | trial | 58% | 5 malicious-tool tests trong CI |
| Terminal-Bench OSS | Benchmark tạo leaderboard pressure | 393/148, ForgeCode | Leaderboard gaming; task mismatch | NEXA/DOMUS: internal task bench | adopt | 70% | Build 20 Fabbi tasks, compare 3 agents |
| High-star repos | OSS adoption có tín hiệu nhưng maintenance risk | 33,146 stars/753 issues | Stars ≠ production readiness | Global: watch, not vendor lock | watch | 64% | Issue close velocity + release cadence 30d |
| App→API automation | Legacy modernization ROI gần business | 100 pts/83 comments | YC launch hype, not OSS | NEXA/Japan: presales demo | trial | 68% | 1 legacy screen→API spec demo |
10. CTO Recommendations
1) Dựng Fabbi Agent Bench v0
ROI/time-saving: 15-25%; risk 2/5; owner: Tech Lead AI SDLC; TTV: 10 ngày; validation: pass@1/pass@3 trên 20 tasks.
2) Context Contract cho coding agents
ROI/time-saving: 10-18%; risk 2/5; owner: Platform Eng; TTV: 7 ngày; validation: giảm retry ≥20% trên task dài.
3) Security mini-suite cho tool-use
ROI/time-saving: 8-12% avoided rework; risk 3/5; owner: AppSec + AI Champion; TTV: 14 ngày; validation: 5 malicious prompt/tool tests pass CI.
4) NEXA legacy app→API demo
ROI/time-saving: 20-30% discovery; risk 3/5; owner: Solution Architect; TTV: 2 tuần; validation: 1 JP-style legacy flow → API spec + test stub.
11. Source Appendix
Coverage: 171 scanned / cited 30 possible. Missing: Reddit 403, Facebook 0 usable public links. DATA_HEALTH_PARTIAL.
| Nền tảng | Signal/link | Metric | Timestamp |
|---|---|---|---|
| HN | Why codex /goal fails on complex workflows | 1 pts / 0 comments | 2026-05-26T06:33Z |
| HN | Show HN: AgentToolBench-Code | 1 pts / 0 comments | 2026-05-26T03:45Z |
| HN | What ClickHouse learned from a year of coding with AI agents | 2 pts / 0 comments | 2026-05-25T17:36Z |
| HN | Ask HN: What do you do at work while the coding agent is working? | 5 pts / 6 comments | 2026-05-25T16:55Z |
| HN | Zero – Programming Language for Agents | 3 pts / 0 comments | 2026-05-23T11:13Z |
| HN | Launch HN: Kampala – Reverse-Engineer Apps into APIs | 100 pts / 83 comments | 2026-04-16T15:19Z |
| HN | Dirac topped TerminalBench on Gemini-3-flash-preview | 393 pts / 148 comments | 2026-04-27T12:35Z |
| GitHub | gadievron/raptor | 2,755 stars / 435 forks / 17 issues | 2026-05-26T07:33Z |
| GitHub | getpaseo/paseo | 6,726 stars / 637 forks / 456 issues | 2026-05-26T07:33Z |
| GitHub | multica-ai/multica | 33,146 stars / 3,984 forks / 753 issues | 2026-05-26T07:33Z |
| GitHub | codeaholicguy/ai-devkit | 1,214 stars / 198 forks / 5 issues | 2026-05-26T07:31Z |
| GitHub | MayDay-wpf/snow-cli | 828 stars / 68 forks / 5 issues | 2026-05-26T07:31Z |
| GitHub | RDI-Foundation/amber | HN 1 pts / 0 comments | 2026-04-13T07:48Z |
| Product/Bench | ForgeCode Terminal-Bench 2.0 | 4 HN pts / benchmark claim | 2026-04-29T18:16Z |
| Product | DAAF Claude Code research workflow | 1 HN pts / 0 comments | 2026-05-25T22:52Z |