The LLM Landscape: Models, Benchmarks & Pricing

Overview

The large language model landscape in mid-February 2026 is defined by three converging forces: the commoditization of GPT-4-class intelligence, the rise of agentic autonomy as the primary differentiator, and the rapid closure of the gap between open-weight and proprietary models. GPT-4-level performance now costs roughly 1/100th of what it did two years ago [s1], and frontier models have pushed well beyond that bar — with Gemini 3 Pro, Claude Opus 4.6, GPT-5.2, and GLM-5 competing for the top slots across reasoning, coding, and real-world task benchmarks [s2][s3][s23].

Mixture-of-Experts (MoE) architecture has become the default for frontier-scale models, enabling massive total parameter counts while keeping inference costs manageable by activating only a fraction of parameters per token [s4]. Context windows have exploded: Llama 4 Scout offers 10 million tokens, and multiple models now support 1 million tokens as standard [s5]. The week of February 3–5 alone saw the release of Claude Sonnet 5, Claude Opus 4.6, and GPT-5.3 Codex, while Chinese labs followed with GLM-5 (Feb 11) and Qwen 3.5 (Feb 16). Every major release emphasizes agentic workflows — AI that can decompose goals, execute multi-step plans across tools, and recover from errors autonomously [s6].

Frontier Models

Frontier models represent the highest-capability systems from the leading AI labs. As of February 2026, the top tier includes offerings from Anthropic, OpenAI, Google DeepMind, and xAI, with Chinese labs closing fast.

Anthropic — Claude

Claude Opus 4.6, released February 5, 2026, takes first place on the Artificial Analysis Intelligence Index v4 — a composite of 10 evaluations including GDPval-AA, Terminal-Bench Hard, SciCode, Humanity's Last Exam, GPQA Diamond, and CritPT [s3]. Key specs: 1M token context window (beta), 128K max output tokens (doubled from Opus 4.5), with tiered pricing — $5/$25 per million input/output tokens for standard prompts (≤200K context) and $10/$37.50 for extended prompts (>200K) [s3][s17]. Its standout feature is Adaptive Thinking Mode, which replaces fixed extended thinking with dynamic effort allocation. Opus 4.6 leads on GDPval-AA at 1,606 Elo (144 points ahead of GPT-5.2), Terminal-Bench 2.0 coding at 65.4%, Humanity's Last Exam, and BrowseComp [s3]. On ARC-AGI-2, reasoning nearly doubled from 37.6% to 68.8%, and MRCR v2 long-context retrieval jumped from 18.5% to 76% [s17].

Claude Sonnet 5, codenamed "Fennec," launched on February 3, 2026 and immediately set a new SWE-Bench Verified record of 82.1% — surpassing both Opus 4.5 (80.9%) and GPT-5.1 (76.3%) [s18]. At $3/$15 per million input/output tokens with a 1M token context window, Sonnet 5 delivers frontier coding performance at roughly 80% less cost than Opus 4.5. Claude Sonnet 4.5 continues to serve production workloads, while the Claude Opus 4.5 (thinking) variant leads the Chatbot Arena coding leaderboard at 1510 Elo [s7].

OpenAI — GPT

OpenAI's GPT-5.2 sits at the frontier, with GPT-5.2-high ranked #5 on the Chatbot Arena at 1465 Elo [s7]. The flagship GPT-5 costs approximately $10/$30 per million input/output tokens, while the premium GPT-5.2 Pro tier commands $21/$168 [s8]. GPT-5.3 Codex, released February 5, 2026, is the standout performer for coding: it scores 77.3% on Terminal-Bench 2.0, 56.8% on SWE-Bench Pro, and 64.7% on OSWorld — setting a new industry high on agentic coding benchmarks and running 25% faster than its predecessor [s19]. OpenAI describes it as the first model "instrumental in creating itself," having been used to debug its own training and diagnose evaluation results [s19]. GPT-5 models support 400K token context windows [s5].

Google DeepMind — Gemini

Gemini 3 Pro holds the #1 position on the Chatbot Arena at 1492 Elo, with its sibling Gemini 3 Flash close behind at #3 with 1470 Elo [s7]. Gemini 3 Pro remains in preview status as of mid-February 2026, with GA expected imminently [s20]. It delivers significant improvements in reasoning, complex instruction following, tool use, and long-context capabilities over Gemini 2.5 Pro [s20]. Gemini 2.5 Pro remains widely deployed, offering a 1M token context window with native multimodal processing at $1.25/$10 per million tokens [s8]. Gemini 2.5 Flash-Lite leads the speed category at 499 tokens/second [s2], while Gemini 3 Pro supports up to 10M tokens of context at $12 per million input tokens [s5].

xAI — Grok

Grok-4.1-Thinking has surged to #2 on the Chatbot Arena at 1482 Elo, with the base Grok-4.1 at #7 (1463 Elo) [s7]. The Grok 4.1 family demonstrates a massive leap in complex reasoning, with competitive pricing and strong performance on text reasoning and coding tasks. Grok 4.1 Fast offers a 2M token context window [s2]. xAI founder Elon Musk announced on February 15 that Grok 4.20 will launch the following week, describing it as a "significant improvement" with enhanced multimodal capabilities and reduced hallucinations [s21].

Open-Weight Models

The open-weight ecosystem has matured dramatically. In 2026, models like DeepSeek V3.2, Llama 4, Qwen 3, and Mistral 3 regularly match or exceed the previous generation of proprietary models on standard benchmarks, while costing a fraction to deploy [s11][s12].

Meta — Llama 4

Llama 4 launched as Meta's most ambitious open-source release. Llama 4 Maverick features 400B total parameters with 17B active (128 experts), natively multimodal via early fusion, and a 1M token context window. It matches or exceeds GPT-4o and Gemini 2.0 on coding, reasoning, and multilingual benchmarks [s13]. Llama 4 Scout uses a leaner 109B total / 17B active (16 experts) configuration but delivers an industry-leading 10M token context window — enough to process approximately 7,500 pages of text [s13][s5]. Both are fully open source.

DeepSeek

DeepSeek V3.2, released December 2025, has 685B total parameters but activates only 37B per token, yielding a remarkably efficient MoE architecture [s11]. It was trained with a fraction of the compute budget of its competitors, proving what is possible with efficient training methods. Pricing is among the most competitive at $0.14–$0.25 per million input tokens and $0.28–$0.38 per million output tokens [s8][s14]. DeepSeek-V3.2-Speciale, the reasoning variant, surpasses GPT-5 and approaches Gemini-3.0-Pro-level reasoning on benchmarks such as AIME and HMMT [s1].

Alibaba — Qwen 3

Qwen 3.5, released February 16, 2026, is Alibaba's latest flagship: a 397B total parameter MoE model activating 17B per token, with native multimodal capabilities fusing text, image, and video from pretraining [s22]. It employs a hybrid linear attention and sparse MoE architecture, delivering 8.6–19x faster decoding throughput than Qwen3-Max while scoring 87.8% on MMLU-Pro [s22]. Alibaba claims it outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro across 80% of evaluated categories at 60% lower cost. The earlier Qwen 3 (235B/22B active, 119 languages) remains widely deployed, and the Qwen3-Coder-Next model (80B total, 3B active) achieved SWE-Bench Pro performance on par with Claude Sonnet 4.5 [s12][s15].

Mistral — Mistral 3

Mistral Large 3 employs a 675B total / 41B active MoE architecture with 256K token context, trained with emphasis on non-English languages [s16]. The entire Mistral 3 family is released under the Apache 2.0 license, permitting unrestricted commercial use. The smaller Ministral models beat Google's and Microsoft's similarly-sized alternatives on most benchmarks, while the flagship model is competitive with DeepSeek V3.1 [s12].

Other Notable Open-Weight Models

GLM-5, released February 11, 2026 by Zhipu AI, features 744B total parameters with 44B active per token — more than doubling its predecessor GLM-4.5 [s23]. Released under the MIT license, it was trained entirely on domestically produced Huawei Ascend chips, marking China's push toward self-reliant AI infrastructure. GLM-5 surpassed rival offerings to claim the top spot among open-source models on Artificial Analysis and ranks among the top four intelligence leaders alongside Claude Opus 4.6, GPT-5.2, and Claude Opus 4.5 [s2][s23]. At approximately $0.80/$2.56 per million input/output tokens, it is roughly six times cheaper than proprietary competitors [s23]. Moonshot AI's Kimi K2.5 (January 27, 2026) is an open-source model with GPQA of 0.9 [s9]. MiniMax M2.5, released February 12, 2026, is a lightweight open-source model from the Chinese developer [s9].

Benchmarks & Rankings

Model evaluation in 2026 relies on a multi-faceted approach combining human preference voting with standardized benchmarks. No single metric tells the full story; the most respected leaderboards synthesize multiple signals [s10].

Chatbot Arena (Human Preference)

Chatbot Arena, now at arena.ai, uses over 5.3 million crowdsourced human pairwise comparisons across 306 models to compute Elo ratings [s7]. As of February 2026, the top overall rankings are:

1. Gemini-3-Pro — 1492 Elo
2. Grok-4.1-Thinking — 1482 Elo
3. Gemini-3-Flash — 1470 Elo
4. Claude Opus 4.5 (thinking-32k) — 1466 Elo
5. GPT-5.2-high — 1465 Elo
6. GPT-5.1-high — 1464 Elo
7. Grok-4.1 — 1463 Elo
8. Claude Opus 4.5 — 1462 Elo
9. ERNIE-5.0 — 1461 Elo
10. Gemini-2.5-Pro — 1460 Elo

For the coding-specific leaderboard, Claude Opus 4.5 (thinking) leads at 1510 Elo [s7].

Academic & Composite Benchmarks

The Artificial Analysis Intelligence Index v4 (AAII v4) incorporates 10 evaluations across four equally-weighted categories (Agents, Coding, General, Scientific Reasoning): GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, and CritPT [s10]. Top composite scores as of February 2026:

1. Claude Opus 4.6 (Adaptive) — 53
2. GPT-5.2 (xhigh) — 51
3. Claude Opus 4.5 (Reasoning) — 50
4. GLM-5 — frontier-tier (exact score pending)

Claude Opus 4.6 leads on GDPval-AA (agentic tasks at 1,606 Elo), Terminal-Bench 2.0 (65.4%), Humanity's Last Exam, and BrowseComp. GPT-5.3 Codex surpasses it on Terminal-Bench at 77.3% — the highest agentic coding score recorded [s3][s19]. MMLU-Pro leaders remain Gemini 3 Pro Preview at 89.8%, followed by Qwen 3.5 at 87.8% and Claude Opus 4.5 (Reasoning) at 89.5% [s10][s22].

Pricing Comparison

Output tokens consistently cost 3–10x more than input tokens across all major providers, making output-heavy workloads (such as code generation) disproportionately expensive [s8]. The table below summarizes API pricing per million tokens as of February 2026:

Frontier Tier

GPT-5.2 Pro — $21 input / $168 output (8x multiplier)
GPT-5 — ~$10 input / ~$30 output
Gemini 3 Pro — $12 input (10M context tier)
Claude Opus 4.6 — $5 input / $25 output (≤200K); $10/$37.50 (>200K)
Claude Opus 4.5 — $5 input / $25 output

Mid Tier

Claude Sonnet 5 — $3 input / $15 output
Claude Sonnet 4.5 — $3 input / $15 output
Gemini 2.5 Pro — $1.25 input / $10 output
GPT-4o — ~$2.50 input / ~$10 output

Budget & Open-Weight Tier

DeepSeek V3.2 — $0.14–$0.25 input / $0.28–$0.38 output
Gemma 3n E4B — $0.03 per million tokens
DeepSeek-OCR — $0.05 per million tokens
Llama 4 Scout/Maverick — Free weights; hosted API pricing varies by provider

Cost Optimization

Both OpenAI and Anthropic offer significant discounts for non-real-time workloads. OpenAI's Batch API provides a 50% discount across all models. Anthropic's prompt caching can reduce costs by up to 90% on cached tokens [s8]. For production workloads, self-hosting open-weight models like DeepSeek V3.2 or Llama 4 Maverick can reduce per-token costs by an additional order of magnitude, depending on hardware and utilization.

Context Windows & Capabilities

Context windows have grown by orders of magnitude, but raw window size is only part of the story. The AA-LCR (Long Context Retrieval) benchmark tests whether models can actually find and use information from anywhere in their window [s5][s10].

Context Windows by Size

10M tokens: Llama 4 Scout, Gemini 3 Pro
2M tokens: Grok 4.1 Fast
1M tokens: Gemini 2.5 Pro, Gemini 2.5 Flash, Claude Opus 4.6 (beta), Claude Sonnet 5, Llama 4 Maverick, Claude Sonnet 4
400K tokens: GPT-5 family
256K tokens: Mistral Large 3
200K tokens: Claude Opus 4.5
164K tokens: DeepSeek V3.2

Multimodal & Agentic Capabilities

Multimodality is now table stakes for frontier models. Gemini 3 and Llama 4 both use "early fusion" — integrating vision, audio, and text at the architecture level rather than bolting on separate encoders [s1][s13]. Agentic capabilities are the primary differentiator in 2026: models are evaluated on their ability to use tools, browse the web, write and execute code, and operate across multi-step workflows. Benchmarks like GDPval-AA, TerminalBench, and τ²-Bench specifically measure these capabilities [s3][s10].

Speed & Latency

For latency-sensitive applications, the speed leaders are Gemini 2.5 Flash-Lite at 499 tokens/second and Granite 3.3 8B at 472 tokens/second [s2]. DeepSeek-OCR achieves the lowest time-to-first-token at 0.18 seconds. Reasoning models (thinking variants) trade speed for accuracy, often taking 10–60 seconds of "thinking time" before generating output, which is reflected in significantly higher output token costs [s3].

Recent Releases

The pace of releases in early 2026 has been relentless, with major model drops occurring multiple times per week. Key releases in January–February 2026:

February 16 — Qwen 3.5: Alibaba's 397B MoE native multimodal model, 87.8% MMLU-Pro, 8–19x faster than predecessors [s22].
February 12 — MiniMax M2.5: lightweight open-source model from Chinese AI developer [s9].
February 11 — GLM-5: Zhipu AI's 744B open-source model under MIT license, trained on Huawei Ascend chips [s23].
February 5 — Claude Opus 4.6: Anthropic's new flagship with adaptive thinking, 1M context (beta), #1 on AAII v4 composite [s3].
February 5 — GPT-5.3 Codex: OpenAI's agentic coding model, Terminal-Bench 77.3%, 25% faster, first model to help create itself [s19].
February 5 — LongCat-Flash-Lite: fast open-source model from Meituan (GPQA: 0.7) [s9].
February 3 — Claude Sonnet 5 ("Fennec"): 82.1% SWE-Bench record at $3/$15 per million tokens [s18].
January 27 — Kimi K2.5: open-source model from Moonshot AI with GPQA of 0.9 [s9].
January 14 — GPT-5.2 Codex: OpenAI's coding variant [s9].
January 14 — LongCat-Flash-Thinking-2601: fast reasoning model from Meituan (GPQA: 0.8) [s9].

Industry-wide, three dominant themes define early 2026: first, the proliferation of coding-specialized variants (GPT-5.3 Codex, Qwen3-Coder-Next, Claude Code, Claude Sonnet 5), reflecting enormous commercial demand for AI-assisted software engineering; second, the continued surge of Chinese AI labs (DeepSeek, Qwen, Moonshot, Zhipu, Meituan, MiniMax), releasing competitive open-source models at a rapid clip under permissive licenses; and third, the convergence toward agentic autonomy, with models evaluated not on isolated benchmarks but on their ability to plan, decompose, and execute multi-step real-world tasks [s6][s12][s19].

References

10 Best LLMs of February 2026: Performance, Pricing & Use Cases — Azumo, Feb 2026
LLM Leaderboard — Comparison of over 100 AI models — Artificial Analysis, Feb 2026
Opus 4.6 — Everything you need to know — Artificial Analysis, Feb 2026
2026 Frontier LLM Architectures Compared — largo.dev, 2026
Context Length Comparison: Leading AI Models in 2026 — Elvex, 2026
AI Trends 2026 — LLM Statistics & Industry Insights — llm-stats.com, 2026
LMSYS Chatbot Arena Leaderboard Current Top Models: Feb 2026 Rankings — AI Dev Day India, Feb 2026
Complete LLM Pricing Comparison 2026 — CloudIDR, 2026
AI Updates Today (February 2026) — Latest AI Model Releases — llm-stats.com, Feb 2026
LLM Benchmarks 2026 — Complete Evaluation Suite — llm-stats.com, 2026
Top 10 Open Source LLMs 2026: DeepSeek Revolution Guide — o-mega.ai, 2026
Best Open Source LLMs: Complete 2026 Guide — Contabo, 2026
The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation — Meta AI, 2025
DeepSeek V3.2 vs Llama 4 Maverick: Model Comparison — Artificial Analysis, 2026
Qwen3: Think Deeper, Act Faster — Qwen Team, 2025
Mistral launches Mistral 3 — VentureBeat, 2025
Claude Opus 4.6 Deep Dive: Benchmarks, Agent Teams, and the Writing Controversy — Claude 5 Hub, Feb 2026
Claude Sonnet 5 + Opus 4.6 Released: Benchmarks, Pricing, Agent Teams — ByteBot, Feb 2026
Introducing GPT-5.3-Codex — OpenAI, Feb 2026
Gemini 3 Pro — Generative AI on Vertex AI — Google Cloud, Feb 2026
XAI Grok 4.20 Releasing Next Week — NextBigFuture, Feb 2026
Alibaba Launches Qwen 3.5 AI Model — TechStartups, Feb 2026
GLM-5 achieves record low hallucination rate — VentureBeat, Feb 2026

This is a o-o living document. To update, run: bash ai-model-landscape.o-o.html

/){gsub(/<[^>]*>/,"");gsub(/&/,"\\&");gsub(/—/,"—");gsub(/\[s[0-9]+\]/,"");gsub(/^[[:space:]]+|[[:space:]]+$/,"");print substr($0,1,120);exit}}' "$file") [[ -n "$excerpt" ]] && excerpt="${excerpt}..." local sort_date="$as_of" [[ "$sort_date" == "—" || -z "$sort_date" ]] && sort_date="0000-00-00" card_data="${card_data}${sort_date}|${title}|${rel_path}|${as_of}|${excerpt}|${update_days} " count=$((count + 1)) done < <(find "$SELF_DIR" -name "*.o-o.html" -type f) # Build card grid (top 8 most recently updated) local card_html="" if [[ -n "$card_data" ]]; then local sorted_cards sorted_cards=$(echo "$card_data" | grep -v '^$' | sort -t'|' -k1 -r | head -8) while IFS='|' read -r c_sort c_title c_path c_date c_excerpt c_update_days; do [[ -z "$c_title" ]] && continue local c_badge="" if [[ -n "$c_date" && "$c_date" != "—" ]]; then # Format date nicely: 2026-02-16 → Feb 16, 2026 local nice_date="$c_date" if date -j -f "%Y-%m-%d" "$c_date" "+%b %-d, %Y" &>/dev/null 2>&1; then nice_date=$(date -j -f "%Y-%m-%d" "$c_date" "+%b %-d, %Y") elif date -d "$c_date" "+%b %-d, %Y" &>/dev/null 2>&1; then nice_date=$(date -d "$c_date" "+%b %-d, %Y") fi # Check freshness based on update_every_days local badge_class="fresh" local now_epoch=$(date +%s) local date_epoch="" if date -j -f "%Y-%m-%d" "$c_date" "+%s" &>/dev/null 2>&1; then date_epoch=$(date -j -f "%Y-%m-%d" "$c_date" "+%s") elif date -d "$c_date" "+%s" &>/dev/null 2>&1; then date_epoch=$(date -d "$c_date" "+%s") fi if [[ -n "$date_epoch" ]]; then local age=$(( now_epoch - date_epoch )) local stale_threshold=$(( ${c_update_days:-7} * 86400 )) [[ "$age" -gt "$stale_threshold" ]] && badge_class="stale" fi c_badge="${nice_date}" else c_badge="New" fi card_html="${card_html} ${c_title} ${c_badge} ${c_excerpt} " done <<< "$sorted_cards" fi # Build the new content local now=$(date "+%Y-%m-%d %H:%M") local new_content if [[ "$count" -eq 0 ]]; then new_content='

No documents found.

Create one with:

bash index.o-o.html --new "Your Topic"

' else new_content="

${card_html}

${table_rows}

Title	Size	Last Updated	Version

" fi # Escape special characters for perl regex local escaped_content=$(echo "$new_content" | sed 's/\\/\\\\/g' | sed 's/\$/\\$/g' | sed 's/@/\\@/g') # Update the index using perl (works reliably with multiline content) perl -i -pe "BEGIN{undef \$/;} s|.*?|\n${escaped_content}\n |sm" "$SELF" # Update stats local tmp="/tmp/oo_stats_$$" sed -e "s|[^<]*|$count|" \ -e "s|[^<]*|$now|" \ "$SELF" > "$tmp" && mv "$tmp" "$SELF" echo "o-o Index: Found $count document(s). Index updated." } create_new() { local input="$1" local title desc # Split on " / " if present if [[ "$input" == *" / "* ]]; then title="${input%% / *}" desc="${input##* / }" else title="$input" desc="$input" fi # Slugify title for filename local slug=$(slugify "$title") local filepath="${SELF_DIR}/${slug}.o-o.html" # Check if file exists if [[ -e "$filepath" ]]; then echo "o-o: Error — file already exists: $filepath" >&2 exit 1 fi echo "o-o: Creating new document: $title" echo "o-o: File: $filepath" # Generate the file generate_oo_file "$filepath" "$title" "$desc" "$slug" echo "o-o: Created $filepath" echo "o-o: Running first update..." # Run the file to trigger first update bash "$filepath" echo "o-o: Document created and populated. Open in browser to read." } update_all() { local force="${1:-0}" echo "o-o: Checking for stale documents..." local now_epoch=$(date +%s) local updated_count=0 while IFS= read -r file; do [[ "$file" == "$SELF" ]] && continue local as_of=$(grep -o '"as_of"[[:space:]]*:[[:space:]]*"[^"]*"' "$file" | head -1 | sed 's/.*:[[:space:]]*"//' | sed 's/"$//') local update_days=$(grep -o '"update_every_days"[[:space:]]*:[[:space:]]*[0-9]*' "$file" | head -1 | grep -o '[0-9]*$' || true) update_days="${update_days:-7}" local should_update=0 if [[ "$force" -eq 1 ]]; then should_update=1 elif [[ -z "$as_of" || "$as_of" == "null" ]]; then should_update=1 else local fresh_secs=$((update_days * 86400)) local as_of_epoch if date -j -f "%Y-%m-%d" "$as_of" "+%s" &>/dev/null; then as_of_epoch=$(date -j -f "%Y-%m-%d" "$as_of" "+%s") elif date -d "$as_of" "+%s" &>/dev/null; then as_of_epoch=$(date -d "$as_of" "+%s") else should_update=1 fi if [[ -n "${as_of_epoch:-}" && "$should_update" -eq 0 ]]; then local age=$((now_epoch - as_of_epoch)) if [[ "$age" -gt "$fresh_secs" ]]; then should_update=1 fi fi fi if [[ "$should_update" -eq 1 ]]; then echo "o-o: Updating $(basename "$file")..." if [[ "$force" -eq 1 ]]; then bash "$file" --force else bash "$file" fi updated_count=$((updated_count + 1)) else echo "o-o: $(basename "$file") is still fresh. Skipping." fi done < <(find "$SELF_DIR" -name "*.o-o.html" -type f) if [[ "$updated_count" -gt 0 ]]; then echo "o-o: Updated $updated_count document(s)." echo "o-o: Rebuilding index..." rebuild_index else echo "o-o: All documents are up to date." fi } # ─── SYNC ───────────────────────────────────────────────────── sync_section() { local section="$1" local sm em case "$section" in css) sm='' em='' ;; js) sm='' em='' ;; shell) sm='# OO:SHELL:START' em='# OO:SHELL:END' ;; all) sync_section css; sync_section js; sync_section shell; return ;; *) echo "o-o Sync: Unknown section '$section'. Use: css, js, shell, all" >&2; exit 1 ;; esac # Extract canonical section boundaries from THIS file local start_line end_line start_line=$(grep -n "^${sm}\$" "$SELF" | head -1 | cut -d: -f1) end_line=$(grep -n "^${em}\$" "$SELF" | head -1 | cut -d: -f1) if [[ -z "$start_line" || -z "$end_line" ]]; then echo "o-o Sync: ERROR — no $section markers found in $SELF_NAME" >&2 return 1 fi local synced=0 for file in "$SELF_DIR"/*.o-o.html; do [[ "$file" == "$SELF" ]] && continue local f_start f_end f_start=$(grep -n "^${sm}\$" "$file" | head -1 | cut -d: -f1 || true) f_end=$(grep -n "^${em}\$" "$file" | head -1 | cut -d: -f1 || true) if [[ -z "$f_start" || -z "$f_end" ]]; then echo "o-o Sync: SKIP $(basename "$file") (no $section markers)" continue fi # Assemble: content before marker + canonical section + content after marker { head -n $((f_start - 1)) "$file" sed -n "${start_line},${end_line}p" "$SELF" tail -n +$((f_end + 1)) "$file" } > "${file}.tmp" && mv "${file}.tmp" "$file" synced=$((synced + 1)) echo "o-o Sync: $(basename "$file") [$section]" done # Handle custom oo.css for CSS sync if [[ "$section" == "css" ]]; then local custom_css="$SELF_DIR/oo.css" local csm='' local cem='' for file in "$SELF_DIR"/*.o-o.html; do # Remove existing custom block if present local c_start c_end c_start=$(grep -n "^${csm}\$" "$file" 2>/dev/null | head -1 | cut -d: -f1 || true) c_end=$(grep -n "^${cem}\$" "$file" 2>/dev/null | head -1 | cut -d: -f1 || true) if [[ -n "$c_start" && -n "$c_end" ]]; then { head -n $((c_start - 1)) "$file" tail -n +$((c_end + 1)) "$file" } > "${file}.tmp" && mv "${file}.tmp" "$file" fi # If oo.css exists, inject it right after OO:CSS:END if [[ -f "$custom_css" ]]; then local css_end_line css_end_line=$(grep -n "^${em}\$" "$file" | head -1 | cut -d: -f1) if [[ -n "$css_end_line" ]]; then { head -n "$css_end_line" "$file" echo "$csm" echo "" echo "$cem" tail -n +$((css_end_line + 1)) "$file" } > "${file}.tmp" && mv "${file}.tmp" "$file" fi fi done fi echo "o-o Sync: $section synced to $synced file(s)." } # ─── HELP ───────────────────────────────────────────────────── show_help() { echo "o-o — self-updating living documents" echo "" echo "Usage:" echo " bash $SELF_NAME [OPTIONS]" echo "" if [[ "$IS_INDEX" -eq 1 ]]; then echo "Index commands:" echo " (no args) Rebuild the index" echo " --new Create new document (interactive)" echo " --new \"Title / description\" Create new document (quick)" echo " --update-all Update stale documents" echo " --update-all --force Force update all documents" echo "" else echo "Article commands:" echo " (no args) Update this document" echo "" fi echo "Shared options:" echo " --show Show current contract and config" echo " --set KEY VALUE Set a contract/config field" echo " --add intent|section VALUE Add to a research array field" echo " --remove intent|section VALUE Remove from a research array field" echo " --sync [css|js|shell|all] Sync shared sections to sibling files" echo " --agent NAME Agent backend: claude (default)" echo " --model NAME Override model (e.g. opus, sonnet, haiku)" echo " --force Update even if document is still fresh" echo " --help, -h Show this help" echo "" echo "Settable fields (--set):" echo " subject, scope, audience, tone, budget, update_every_days" echo "" echo "Array fields (--add / --remove):" echo " intent Research search queries" echo " section Required article sections" echo "" echo "Examples:" if [[ "$IS_INDEX" -eq 1 ]]; then echo " bash $SELF_NAME --new \"History of the USA\"" echo " bash $SELF_NAME --new \"Python Async / Guide to async/await patterns\"" echo " bash $SELF_NAME --update-all" else echo " bash $SELF_NAME # Update with latest research" echo " bash $SELF_NAME --force # Force update even if fresh" echo " bash $SELF_NAME --model opus # Use a specific model" fi echo " bash $SELF_NAME --set scope \"US market analysis\"" echo " bash $SELF_NAME --add intent \"quarterly earnings 2026\"" echo " bash $SELF_NAME --add section \"Market Analysis\"" echo " bash $SELF_NAME --remove intent \"old search query\"" echo " bash $SELF_NAME --sync all # Propagate shared code to siblings" } # ─── ARG PARSING ────────────────────────────────────────────── ACTION="" NEW_TOPIC="" SYNC_SECTION="" AGENT="claude" MODEL="" FORCE=0 while [[ $# -gt 0 ]]; do case "$1" in --new) ACTION="new"; NEW_TOPIC="${2:-}"; [[ -n "$NEW_TOPIC" ]] && shift; shift ;; --update-all) ACTION="update-all"; shift ;; --sync) ACTION="sync"; SYNC_SECTION="${2:-all}"; shift; [[ $# -gt 0 && "${1:0:2}" != "--" ]] && shift ;; --show) ACTION="show"; shift ;; --set) ACTION="set"; SET_KEY="${2:-}"; SET_VAL="${3:-}"; shift; [[ -n "$SET_KEY" ]] && shift; [[ -n "$SET_VAL" ]] && shift ;; --add) ACTION="add"; ARR_FIELD="${2:-}"; ARR_VAL="${3:-}"; shift; [[ -n "$ARR_FIELD" ]] && shift; [[ -n "$ARR_VAL" ]] && shift ;; --remove) ACTION="remove"; ARR_FIELD="${2:-}"; ARR_VAL="${3:-}"; shift; [[ -n "$ARR_FIELD" ]] && shift; [[ -n "$ARR_VAL" ]] && shift ;; --agent) AGENT="$2"; shift 2 ;; --model) MODEL="$2"; shift 2 ;; --force) FORCE=1; shift ;; --help|-h) ACTION="help"; shift ;; *) echo "o-o: Unknown option: $1 (try --help)" >&2; exit 1 ;; esac done IS_INDEX=0 [[ "$SELF_NAME" == index* ]] && IS_INDEX=1 # ─── FRESHNESS CHECK ────────────────────────────────────────── check_freshness() { [[ "$FORCE" -eq 1 ]] && return 1 # return 1 = not fresh, should update local update_days as_of update_days=$(grep -o '"update_every_days"[[:space:]]*:[[:space:]]*[0-9]*' "$SELF" | head -1 | grep -o '[0-9]*$' || true) update_days="${update_days:-7}" as_of=$(grep -o '"as_of"[[:space:]]*:[[:space:]]*"[^"]*"' "$SELF" | head -1 | sed 's/.*:[[:space:]]*"//' | sed 's/"$//') [[ -z "$as_of" ]] && return 1 # no date = needs update local fresh_secs=$((update_days * 86400)) local now_epoch=$(date +%s) local as_of_epoch="" if date -j -f "%Y-%m-%d" "$as_of" "+%s" &>/dev/null 2>&1; then as_of_epoch=$(date -j -f "%Y-%m-%d" "$as_of" "+%s") elif date -d "$as_of" "+%s" &>/dev/null 2>&1; then as_of_epoch=$(date -d "$as_of" "+%s") fi if [[ -n "$as_of_epoch" ]]; then local age=$(( now_epoch - as_of_epoch )) if [[ "$age" -lt "$fresh_secs" ]]; then echo "o-o: '$SELF_NAME' is still fresh (updated $as_of, updates every ${update_days}d). Skipping." echo "o-o: Use --force to update anyway." return 0 # fresh, skip fi fi return 1 # not fresh, should update } # ─── AGENT DISPATCH ─────────────────────────────────────────── dispatch_update() { # Extract budget from contract local budget budget=$(grep -o '"max_cost_usd"[[:space:]]*:[[:space:]]*[0-9.]*' "$SELF" | head -1 | grep -o '[0-9.]*$' || true) budget="${budget:-0.50}" # Build the prompt local prompt read -r -d '' prompt << 'PROMPT_EOF' || true You are a o-o research agent. Your task is to update a living document. The document is at: __SELF__ This file is a polyglot HTML/bash file structured as follows: - Above window.stop(): browser-visible content (article, CSS, JS, manifest) - Below window.stop(): machine-readable zone (update contract, source cache, changelog) Read the update contract (the JSON block with id="oo-contract") — it contains your complete instructions: the subject, research intents, required sections, quality thresholds, source policy, and output format rules. Check the oo-manifest "as_of" field for when this document was last updated. If empty, this is a first run — research everything. If it has a date, focus your research on new information since that date. Use the Edit tool to modify specific parts of the file in-place. Only modify:

content, oo-manifest, oo-source-cache, oo-changelog. Do NOT touch CSS, JavaScript, the shell preamble, or structural HTML outside

. IMAGES: The contract may have an "images" section. If images are allowed: - Find relevant images via web search (official sites, wikimedia, press kits) - Download with: curl -sL "" -o /tmp/oo_img_N.ext - Verify it is an image: file /tmp/oo_img_N.ext - Resize (preserve format — keep PNG for transparency, JPEG for photos): macOS: sips --resampleWidth /tmp/oo_img_N.ext --out /tmp/oo_img_N_r.ext Linux: convert /tmp/oo_img_N.ext -resize x /tmp/oo_img_N_r.ext - Check size: if over max_file_kb, reduce further or skip - Encode: base64 < /tmp/oo_img_N_r.ext - Embed as:

- Clean up: rm /tmp/oo_img_N* PROMPT_EOF # Replace __SELF__ placeholder with actual path prompt="${prompt//__SELF__/$SELF}" echo "o-o: Updating '$SELF_NAME' via $AGENT (budget: \$$budget)..." case "$AGENT" in claude) if ! command -v claude &>/dev/null; then echo "o-o: Error — 'claude' CLI not found." >&2 echo "o-o: Install: https://docs.anthropic.com/en/docs/claude-code" >&2 exit 1 fi local -a claude_args=( -p "$prompt" --allowed-tools "Bash,Read,Edit,WebSearch,WebFetch" --max-budget-usd "$budget" ) if [[ -n "$MODEL" ]]; then claude_args+=(--model "$MODEL") fi claude "${claude_args[@]}" ;; *) echo "o-o: Unknown agent '$AGENT'." >&2 echo "o-o: Currently supported: claude" >&2 exit 1 ;; esac echo "o-o: Update complete. Open '$SELF_NAME' in a browser to read." } # ─── COMMAND ROUTER ─────────────────────────────────────────── # ─── SHOW CONTRACT ──────────────────────────────────────────── show_contract() { echo "" echo " $SELF_NAME" echo " ────────────────────────" # Extract manifest fields local manifest contract manifest=$(perl -0777 -ne 'print $1 if /id="oo-manifest"[^>]*>\s*(\{.*?\})\s*<\/script>/s' "$SELF") contract=$(perl -0777 -ne 'print $1 if /id="oo-contract"[^>]*>\s*(\{.*?\})\s*<\/script>/s' "$SELF") if [[ -n "$manifest" ]]; then local title as_of version update_days title=$(echo "$manifest" | perl -ne 'print $1 if /"title"\s*:\s*"([^"]*)"/') as_of=$(echo "$manifest" | perl -ne 'print $1 if /"as_of"\s*:\s*"([^"]*)"/') version=$(echo "$manifest" | perl -ne 'print $1 if /"version"\s*:\s*(\d+)/') update_days=$(echo "$manifest" | perl -ne 'print $1 if /"update_every_days"\s*:\s*(\d+)/') [[ -n "$title" ]] && echo " Title: $title" [[ -n "$version" ]] && echo " Version: $version" [[ -n "$as_of" ]] && echo " Last updated: $as_of" [[ -n "$update_days" ]] && echo " Update every: ${update_days} days" fi if [[ -n "$contract" ]]; then local subject scope audience tone budget subject=$(echo "$contract" | perl -ne 'print $1 if /"subject"\s*:\s*"([^"]*)"/') scope=$(echo "$contract" | perl -ne 'print $1 if /"scope"\s*:\s*"([^"]*)"/') audience=$(echo "$contract" | perl -ne 'print $1 if /"audience"\s*:\s*"([^"]*)"/') tone=$(echo "$contract" | perl -ne 'print $1 if /"tone"\s*:\s*"([^"]*)"/') budget=$(echo "$contract" | perl -ne 'print $1 if /"max_cost_usd"\s*:\s*([\d.]+)/') echo "" [[ -n "$subject" ]] && echo " Subject: $subject" [[ -n "$scope" ]] && echo " Scope: $scope" [[ -n "$audience" ]] && echo " Audience: $audience" [[ -n "$tone" ]] && echo " Tone: $tone" [[ -n "$budget" ]] && echo " Budget: \$$budget" # Research intents local intents intents=$(echo "$contract" | perl -0777 -ne 'if(/"intents"\s*:\s*\[(.*?)\]/s){$i=$1; while($i=~/"([^"]+)"/g){print "$1\n"}}') if [[ -n "$intents" ]]; then echo "" echo " Research intents:" while IFS= read -r line; do echo " - $line" done <<< "$intents" fi # Required sections local sections sections=$(echo "$contract" | perl -0777 -ne 'if(/"required_sections"\s*:\s*\[(.*?)\]/s){$i=$1; while($i=~/"([^"]+)"/g){print "$1\n"}}') if [[ -n "$sections" ]]; then echo "" echo " Required sections:" while IFS= read -r line; do echo " - $line" done <<< "$sections" fi fi echo "" } # ─── SET FIELD ───────────────────────────────────────────────── set_field() { local key="$1" val="$2" [[ -z "$key" || -z "$val" ]] && { echo "o-o: Usage: --set KEY VALUE" >&2; exit 1; } case "$key" in subject|scope|audience|tone) # These live in oo-contract → identity.KEY perl -i -0pe "s/(\"identity\"\\s*:\\s*\\{[^}]*\"$key\"\\s*:\\s*\")([^\"]*)(\")/\${1}$val\${3}/s" "$SELF" echo "o-o: Set identity.$key = \"$val\"" ;; budget) # budget.max_cost_usd in oo-contract perl -i -pe "s/(\"max_cost_usd\"\\s*:\\s*)[\\d.]+/\${1}$val/" "$SELF" echo "o-o: Set budget.max_cost_usd = $val" ;; update_every_days) # In oo-manifest perl -i -pe "s/(\"update_every_days\"\\s*:\\s*)\\d+/\${1}$val/" "$SELF" echo "o-o: Set update_every_days = $val" ;; *) echo "o-o: Unknown field: $key" >&2 echo "o-o: Settable fields: subject, scope, audience, tone, budget, update_every_days" >&2 echo "o-o: For array fields use: --add intent|section VALUE / --remove intent|section VALUE" >&2 exit 1 ;; esac } # ─── ADD / REMOVE ARRAY ITEMS ────────────────────────────────── add_to_array() { local field="$1" value="$2" [[ -z "$field" || -z "$value" ]] && { echo "o-o: Usage: --add intent|section VALUE" >&2; exit 1; } local arr_name case "$field" in intent) arr_name="intents" ;; section) arr_name="required_sections" ;; *) echo "o-o: Unknown array field: $field (use: intent, section)" >&2; exit 1 ;; esac # Append before the closing ] of the named array perl -i -0777 -pe 's/("'"$arr_name"'"\s*:\s*\[.*?)(\s*\])/$1,\n "'"$value"'"$2/s' "$SELF" # Fix leading comma if array was previously empty: ["x"] → ["x"] perl -i -pe 's/\[\s*,\s*"/["/' "$SELF" echo "o-o: Added to research.$arr_name: \"$value\"" } remove_from_array() { local field="$1" value="$2" [[ -z "$field" || -z "$value" ]] && { echo "o-o: Usage: --remove intent|section VALUE" >&2; exit 1; } local arr_name case "$field" in intent) arr_name="intents" ;; section) arr_name="required_sections" ;; *) echo "o-o: Unknown array field: $field (use: intent, section)" >&2; exit 1 ;; esac # Remove line matching the exact value perl -i -ne 'print unless /^\s*"\Q'"$value"'\E"\s*,?\s*$/' "$SELF" # Fix trailing comma before ]: ..., ] → ...] perl -i -0777 -pe 's/,(\s*\])/$1/g' "$SELF" echo "o-o: Removed from research.$arr_name: \"$value\"" } case "$ACTION" in new) if [[ "$IS_INDEX" -eq 0 ]]; then echo "o-o: --new is only available on index files." >&2 echo "o-o: Rename this file to index*.o-o.html to enable library management." >&2 exit 1 fi if [[ -z "$NEW_TOPIC" ]]; then # Interactive mode echo "" echo " Create new o-o document" echo " ────────────────────────" echo "" read -p " Title: " OO_NEW_TITLE if [[ -z "$OO_NEW_TITLE" ]]; then echo " Error: Title is required." >&2 exit 1 fi echo "" read -p " Scope (what should this document cover?): " OO_NEW_SCOPE [[ -z "$OO_NEW_SCOPE" ]] && OO_NEW_SCOPE="$OO_NEW_TITLE" echo "" read -p " Audience [General readers]: " OO_NEW_AUDIENCE [[ -z "$OO_NEW_AUDIENCE" ]] && OO_NEW_AUDIENCE="General readers" read -p " Tone [Informative, well-researched, accessible]: " OO_NEW_TONE [[ -z "$OO_NEW_TONE" ]] && OO_NEW_TONE="Informative, well-researched, accessible" read -p " Budget USD [0.50]: " OO_NEW_BUDGET [[ -z "$OO_NEW_BUDGET" ]] && OO_NEW_BUDGET="0.50" echo "" OO_NEW_SLUG=$(slugify "$OO_NEW_TITLE") OO_NEW_PATH="${SELF_DIR}/${OO_NEW_SLUG}.o-o.html" if [[ -e "$OO_NEW_PATH" ]]; then echo " Error: File already exists: $OO_NEW_PATH" >&2 exit 1 fi echo " Creating: ${OO_NEW_SLUG}.o-o.html" generate_oo_file "$OO_NEW_PATH" "$OO_NEW_TITLE" "$OO_NEW_SCOPE" "$OO_NEW_SLUG" # Customize audience, tone, budget if not defaults OO_TMP="/tmp/oo_custom_$$" sed -e "s|\"audience\": \"General readers\"|\"audience\": \"${OO_NEW_AUDIENCE}\"|" \ -e "s|\"tone\": \"Informative, well-researched, accessible\"|\"tone\": \"${OO_NEW_TONE}\"|" \ -e "s|\"max_cost_usd\": 0.50|\"max_cost_usd\": ${OO_NEW_BUDGET}|" \ "$OO_NEW_PATH" > "$OO_TMP" && mv "$OO_TMP" "$OO_NEW_PATH" chmod +x "$OO_NEW_PATH" echo " Running first update..." echo "" bash "$OO_NEW_PATH" else create_new "$NEW_TOPIC" fi ;; update-all) if [[ "$IS_INDEX" -eq 0 ]]; then echo "o-o: --update-all is only available on index files." >&2 echo "o-o: Rename this file to index*.o-o.html to enable library management." >&2 exit 1 fi update_all "$FORCE" ;; sync) sync_section "$SYNC_SECTION" ;; show) show_contract ;; set) set_field "$SET_KEY" "$SET_VAL" ;; add) if [[ "$IS_INDEX" -eq 1 ]]; then echo "o-o: --add is for article files (modifies the research contract)." >&2 exit 1 fi add_to_array "$ARR_FIELD" "$ARR_VAL" ;; remove) if [[ "$IS_INDEX" -eq 1 ]]; then echo "o-o: --remove is for article files (modifies the research contract)." >&2 exit 1 fi remove_from_array "$ARR_FIELD" "$ARR_VAL" ;; help) show_help ;; "") if [[ "$IS_INDEX" -eq 1 ]]; then rebuild_index else # Article update: check freshness then dispatch if check_freshness; then exit 0 # still fresh, already printed message fi dispatch_update fi ;; esac # OO:SHELL:END exit 0