In partnership with

Welcome back, Insider. Google dropped Gemini 3.1 Pro claiming it leads 13 of 16 benchmarks — but Claude and GPT-5.3 still own the categories that matter most for real work.

Today: the model war's latest scorecard, a supply chain attack that weaponized an AI coding tool, and Apple's quiet bet on on-device agents.

This Friday: a hands-on deep dive into AI-native browsers. Different format, fully practical. Stay tuned.

🔒 The edge your competition doesn't have (yet)

Claude Co-Work can change the way you work. In 10 minutes you'll know exactly how. Not for sale — only unlocked by referring 1 person to AI Edge.

The Essentials

1. A hacker turned Cline's AI triage bot into a supply chain weapon: Security researcher Adnan Khan disclosed a prompt injection flaw in Cline's Claude-powered GitHub issue triage. Eight days later, an unknown attacker exploited it to publish a poisoned npm package (v2.3.0) that silently installed OpenClaw on developers' machines. The compromised version was live for roughly 8 hours on February 17 before Cline shipped a fix. OpenClaw isn't malicious by itself, but Snyk's ToxicSkills study found 36% of AI agent skills on platforms like ClawHub contain active security flaws. Microsoft Threat Intelligence flagged an installation spike. The entry point wasn't code — it was a GitHub issue title. When AI tools have system access, even natural language becomes an attack vector.

2. Google puts an AI music studio inside Gemini: Google rolled out Lyria 3, its most advanced music generation model, directly into the Gemini app. Users describe a genre, mood, or memory — or upload a photo — and get a 30-second track with auto-generated lyrics and cover art. Available globally in beta across 8 languages to all users 18+. With 750 million monthly active Gemini users, this instantly becomes the largest AI music tool by reach. All tracks carry SynthID watermarks, and Google expanded YouTube's Dream Track feature worldwide alongside the launch.

3. Apple's 3B-parameter model outperforms agents 24x its size: Apple researchers published Ferret-UI Lite, a 3 billion parameter on-device AI agent that can see app screens, reason about UI elements, and execute multi-step tasks — all without cloud processing. On the ScreenSpot-Pro benchmark, it scored 53.3%, beating UI-TARS-1.5 (a 7B model) by over 15 points. The model generates its own training data through a multi-agent pipeline that simulates real-world app interactions. With WWDC 2026 in June, this could preview the biggest Siri upgrade since 2011.

The AI-NAtive CRM

Attio is the AI CRM for modern teams.

Connect your email and calendar, and Attio instantly builds your CRM. Every contact, every company, every conversation, all organized in one place.

Then Ask Attio anything:

  • Prep for meetings in seconds with full context from across your business

  • Know what’s happening across your entire pipeline instantly

  • Spot deals going sideways before they do

No more digging and no more data entry. Just answers.

The Headline
Gemini 3.1 Pro: the benchmark king that still can't dethrone Claude where it counts

Three frontier models in sixteen days. February 2026 is the most competitive month in AI history. Claude Opus 4.6 launched February 4, GPT-5.3-Codex arrived February 5, and now Gemini 3.1 Pro dropped on February 19. Google's first-ever ".1" increment more than doubled its predecessor's reasoning score — 77.1% on ARC-AGI-2, up from 31.1%. It claims 13 of 16 benchmark wins, hits 94.3% on GPQA Diamond (PhD-level science), and tops the APEX-Agents leaderboard for autonomous professional tasks at 33.5%.

The gaps that matter. Benchmarks don't tell the full story. Claude Opus 4.6 leads GDPval-AA — the expert-task benchmark measuring real office work like financial modeling — with 1,606 Elo vs Gemini's 1,317. That's a 289-point gap on the tasks closest to how knowledge workers actually use these models. Claude also wins when tools enter the picture: 53.1% vs 51.4% on Humanity's Last Exam with search and code. GPT-5.3-Codex still dominates terminal coding (77.3% on Terminal-Bench 2.0 vs Gemini's 68.5%). Each model was built for a different paradigm.

The price changes everything. Here's where it gets interesting. Gemini 3.1 Pro costs $2 per million input tokens — Claude Opus 4.6 charges $15. That's 7.5x cheaper for a model that ties or leads on most raw reasoning benchmarks. Add a 1M token context window (5x Claude's standard 200K) and configurable thinking levels, and Google just made the strongest case yet that frontier intelligence doesn't have to come at frontier prices. For teams processing large codebases or running high-volume API calls, the math is hard to ignore.

No single winner. The real takeaway from February 2026 isn't that any model "won." It's that the frontier has fragmented. Google optimized for breadth and price. Anthropic optimized for depth and expert precision. OpenAI doubled down on specialized coding speed. The best setup is probably a mix — and that's exactly what tools like model routers are built for. The era of one model to rule them all is over.

The Edge
Turn novelty into your competitive advantage: How to create Chrome extensions using Claude Opus 4.6

  • Log in to Claude and select Opus 4.6

  • Generate the code for the extension. Use the prompt below.

Sample prompt: "Develop an extension that [how your extension should work, its features and how you can use it]. Generate the complete code, organize all the files into a folder, and create a downloadable folder named [Name of extension]"

  • Download the files created by Claude.

  • Extract the zip file and open Chrome's extensions manager.

  • Enable "Developer mode" in the top right corner.

  • Click "Load unpacked" and select your project folder.

  • Activate the extension.

Productivity
5 Tools Worth Your Time

  • 🚀 Vanta: Stop losing deals over security questions.

  • 📂 TalkBI: Talk to your data and get insights using AI.

  •  GPT For Work: AI for Sheets and Excel that does your work.

  • 🤖 SideKicker: Humanize AI-generated content to maintain clarity, trust, and brand tone.

  • ✍️ Findable: Monitor competitors, improve your content, turn traffic into sales.

Ready to Use
Build an online presence to attract remote jobs

Prompt: You are an experienced Remote Work Strategist. Help me build a strong online presence on [insert platform] that attracts remote employers organically.

Provide platform-specific, actionable advice on:

How to optimize my profile (headline, bio, keywords, proof of work) to signal I’m remote-ready

What type of content to post to showcase skills and expertise

How to demonstrate async communication, ownership, and reliability

How to engage with remote-first companies and hiring managers

How to find remote job openings and stand out when applying

A simple 60–90 day action plan with weekly steps

Keep the advice practical, structured, and tailored specifically to the chosen platform.

Monday Fun

Which one is AI generated?

Login or Subscribe to participate

Whenever you’re ready to take the next step

AI won't replace you. Someone who knows how to use it will. This free 5-day course teaches you the essentials: effective prompts, automation fundamentals, and when agents beat workflows. Ready to level up?

Would you like a free, practical guide with 5 lessons to make AI work for you — not the other way around?

Login or Subscribe to participate

Login or Subscribe to participate

Keep Reading