AI Weekly #3 — chips, agents, and a liability wake-up call
OpenAI ships a custom inference chip and security tooling, DeepMind adds computer use to Gemini, and the AI liability question gets real.
This week the hardware layer got serious: OpenAI and Broadcom announced a custom LLM inference chip, while ASML’s next-gen chipmaking machine got a deep look. On the software side, DeepMind shipped computer-use capabilities in Gemini 3.5 Flash and OpenAI launched a security-focused tooling suite under the Daybreak brand. Underneath all of it, the question of who’s legally responsible when AI gets things wrong moved from background noise to front page.
OpenAI and Broadcom unveil Jalapeño, a custom LLM inference chip
OpenAI and Broadcom have jointly developed a chip called Jalapeño purpose-built for LLM inference workloads. The announcement signals OpenAI’s intent to own more of its own silicon stack rather than relying entirely on third-party GPUs. Few architectural details were disclosed, but the focus is on performance per watt and scale across OpenAI’s inference fleet.
Why it matters: Custom inference silicon tends to change cost curves in ways that eventually ripple into API pricing and throughput limits—worth tracking if your workloads are inference-heavy or cost-sensitive. (OpenAI News)
DeepMind ships computer use in Gemini 3.5 Flash
Google DeepMind has added computer-use capabilities to Gemini 3.5 Flash, letting the model observe and interact with desktop and web interfaces. This puts Gemini in direct competition with Anthropic’s computer-use offering on Claude. Flash’s latency and cost profile makes this more immediately practical for agent pipelines than heavier models.
Why it matters: Computer use in a fast, cheaper model lowers the bar for building agents that can operate real software—useful if you’re prototyping anything that needs to navigate UIs without custom scraping. (DeepMind Blog)
OpenAI’s Daybreak suite brings AI-assisted vulnerability scanning and patching
OpenAI launched Daybreak, a set of security tools including Codex Security and GPT-5.5-Cyber, aimed at helping organizations find and patch vulnerabilities at scale. A companion initiative called Patch the Planet targets open-source maintainers specifically, using AI plus expert review to validate and fix security issues. The tooling is positioned for both enterprise security teams and the open-source ecosystem.
Why it matters: If the tooling delivers on automated vulnerability validation, it could meaningfully reduce the backlog of unpatched CVEs in open-source dependencies—something every engineering team carries. (OpenAI News)
Anthropic’s clash with the US government raises concrete AI liability questions
MIT Technology Review outlines three fault lines to watch following Anthropic’s dispute with the US government over its Mythos model, disclosed in April. The piece frames the conflict as an early stress-test of how AI liability, oversight, and disclosure obligations might actually be enforced. The outcome could set informal precedent well before any formal regulatory framework exists.
Why it matters: How liability gets assigned when a frontier model causes harm will shape every enterprise deployment contract and terms-of-service agreement engineers work within—this case is one of the first real data points. (MIT Technology Review AI)
Simon Willison reframes prompt injection as role confusion
Willison published a conceptual piece arguing that prompt injection is best understood as a role-confusion attack rather than a purely syntactic exploit: the model fails to distinguish between the developer-defined role and attacker-supplied content. The framing has practical implications for how you structure system prompts and validate tool outputs in agent pipelines. It complements existing defenses without replacing them.
Why it matters: If you’re building anything that feeds untrusted content into a model context—scraped web pages, user uploads, tool call results—this framing gives you a cleaner mental model for where the attack surface actually lives. (Simon Willison)
Subquadratic claims a fix for the attention bottleneck that has constrained LLMs for years
Miami-based startup Subquadratic emerged from stealth claiming to have solved the quadratic scaling problem in transformer attention, which limits how efficiently models handle long contexts. The company has begun sharing technical receipts after initial skepticism, though independent verification is still limited. If the claims hold, it would be a foundational change to how LLMs are trained and served.
Why it matters: Quadratic attention cost is a real constraint on context length and inference efficiency today; a genuine fix would affect model architecture choices and infrastructure planning across the industry. (MIT Technology Review AI)