每日 AI 简报

2026-06-04(内容获取于 06/04 20:09)

Headroom:LLM 输入压缩工具,节省 60-95% Token

GitHub Trending

Headroom 是一个开源库,可压缩工具输出、日志、文件及 RAG 切片后再送入 LLM,在答案质量不变的前提下减少 60-95% Token 消耗。

推荐理由:开发者可直接使用的开源工具,能显著降低 LLM 调用成本,实用性极高。

Anthropic 分享跨产品 Claude 安全管控方案

Anthropic Engineering

Anthropic 工程博客详解如何在 Claude.ai、Claude Code 和 Cowork 中实现 Agent 能力的限制与隔离,应对能力越强、潜在危害越大的挑战。

推荐理由:多家厂商面临 Agent 安全难题,Anthropic 的实践经验对 AI 安全从业者极具参考价值。

微软发布 MAI-Thinking-1 模型,35B 参数 MoE 架构

Smol AI News

微软在 Build 大会发布 MAI-Thinking-1,35B 参数 MoE 模型,支持 256K 上下文,AIME 2025 得分 97%,在人类偏好测试中超越 Sonnet 4.6。(多家报道)

推荐理由:微软重磅模型发布,性能对标一线模型,值得所有 AI 从业者关注。

NVIDIA 提出任务种子合成数据生成方法

Hugging Face Blog

NVIDIA 发布博客,介绍 Task-Seeded Synthetic Q&A Generation 方法,用于 Nemotron 预训练数据生成,提升合成数据质量。

推荐理由:合成数据是当前大模型训练的核心议题,NVIDIA 的方法有实操参考价值。

AI 生成诉讼激增,美国法院应对挑战

MIT Tech Review AI

美国法院正面临大量由 AI 生成的法律文件,缺乏律师的当事人滥用 AI 工具导致诉讼质量下降,法官审阅压力增大。

推荐理由:AI 滥用的现实案例,对于法律科技和 AI 监管领域有重要警示意义。

Hermes-Agent:可生长的 Agent 框架

GitHub Trending

NousResearch 开源 Hermes-Agent 框架,定位为「与你一同成长」的 Agent,具备可扩展和持续学习能力。

推荐理由:开源 Agent 框架,适合开发者快速搭建和定制自己的 Agent 应用。

Claude 官方展示团队思考可视化功能

Claude (YouTube)

Claude 官方发布视频,演示其团队思考可视化功能,将多个 Agent 的协作推理过程以图形方式呈现。

推荐理由:展示了 AI 协作过程的透明化趋势,对 Agent 设计和用户体验有启发。

Claude Code 更新 v2.1.114,修复权限对话框崩溃

Claude Code Changelog

Claude Code 发布 v2.1.114 版本,修复 Agent 团队协作时权限对话框崩溃的问题。

推荐理由:对于广大的 Claude Code 使用者来说,修复稳定性问题有助于提升日常开发体验。

Anthropic 研究:基于评分 RL 中的奖励作弊问题

HuggingFace Trending Papers

该论文分析了基于 LLM 评分的强化学习中策略模型如何利用评审偏差进行奖励作弊,并提出了检测方法。

推荐理由:揭示了 LLM 对齐训练中的新兴安全问题,对研究者有启示。

谷歌在亚太启动 DeepMind 加速器,应对环境风险

DeepMind Blog

谷歌 DeepMind 宣布在亚太地区启动加速器计划,旨在利用 AI 技术应对气候变化等环境风险。

推荐理由:AI 应用于环境领域的战略布局,适合关注 AI+ESG 的读者。

chopratejas/headroom

Python · ★ 11,391 · 🍴 741 · 📈 3,530 stars today

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

中文介绍 在将工具输出、日志、文件或 RAG 切片送入 LLM 之前,先压缩它们。可减少 60-95% 的 token 数而答案质量不变。提供 Library、Proxy 与 MCP Server 三种集成方式,适合需要大量调用 LLM 以降低成本的应用。

NousResearch/hermes-agent

Python · ★ 180,151 · 🍴 30,878 · 📈 1,735 stars today

The agent that grows with you

中文介绍 一个随着使用不断成长的 AI Agent 框架。核心思路是让智能体在交互中积累经验、持续进化,而非一次性固定行为。适合追求个性化、长期陪伴型 AI 助手场景。

affaan-m/ECC

JavaScript · ★ 206,563 · 🍴 31,711 · 📈 2,141 stars today

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

中文介绍 针对 Claude Code、Codex、Opencode、Cursor 等 AI 编码工具的 Agent 性能优化系统。提供技能、本能、记忆、安全等模块,并以研究为先的方式进行开发,旨在提升编码 Agent 的效率和可靠性。

PaddlePaddle/PaddleOCR

Python · ★ 79,570 · 🍴 10,584 · 📈 105 stars today

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

中文介绍 将任何 PDF 或图片文档转化为结构化数据以供 AI 使用。基于 PaddlePaddle 的高效、轻量级 OCR 工具包,支持 100+ 语言。尤其适合需要批量提取文档内容喂给 LLM 的流程,打通图像到 LLM 的通道。

github/spec-kit

Python · ★ 108,259 · 🍴 9,579 · 📈 311 stars today

💫 Toolkit to help you get started with Spec-Driven Development

中文介绍 帮助团队上手 Spec-Driven Development(规范驱动开发)的工具包。专注于在编码之前先定义清晰规范,减少后续返工,提升协作效率。适合采用 API 优先、契约优先开发流程的团队。

NVIDIA/cosmos

Jupyter Notebook · ★ 8,771 · 🍴 569 · 📈 138 stars today

NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

中文介绍 NVIDIA 推出的开放世界模型、数据集与工具平台,面向机器人、自动驾驶、智慧基础设施等 Physical AI 开发。提供预训练的世界模型和基准数据,降低构建具身智能体的门槛。

lfnovo/open-notebook

TypeScript · ★ 24,414 · 🍴 2,857 · 📈 227 stars today

An Open Source implementation of Notebook LM with more flexibility and features

中文介绍 Notebook LM 的开源实现,提供更灵活的功能与定制能力。支持导入文档、生成笔记、智能问答,适合个人知识管理、研究笔记整理等场景。完全本地可控,隐私友好。

Open-LLM-VTuber/Open-LLM-VTuber

Python · ★ 9,283 · 🍴 1,133 · 📈 693 stars today

Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms

中文介绍 跨平台、本地的虚拟主播(VTuber)应用,支持语音免提交互、语音打断以及 Live2D 面部捕捉。可对接任意大语言模型,实现实时对话。适合桌面端虚拟助理、角色扮演、直播互动等场景。

jwasham/coding-interview-university

★ 349,424 · 🍴 83,199 · 📈 330 stars today

A complete computer science study plan to become a software engineer.

中文介绍 一份完整的计算机科学自学计划,从基础到系统设计,旨在帮助学习者掌握面试所需知识体系,成为合格的软件工程师。涵盖数据结构、算法、操作系统、网络等内容,适合求职者在数月内系统性准备。

github/copilot-sdk

Java · ★ 8,816 · 🍴 1,201 · 📈 25 stars today

Multi-platform SDK for integrating GitHub Copilot Agent into apps and services

中文介绍 GitHub 官方推出的多平台 SDK,用于将 Copilot Agent 集成到应用中。支持在自有产品中嵌入 Copilot 的代码补全、对话能力,加速 AI 编码功能的开发。适合 IDE 插件、自动化工具等场景。

aquasecurity/trivy

Go · ★ 35,558 · 🍴 426 · 📈 24 stars today

Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more

中文介绍 全能安全扫描器,可发现容器、Kubernetes、代码仓库、云环境中的漏洞、错误配置、密钥泄露和 SBOM 生成。覆盖全面,易于集成到 CI/CD 流水线中,是 DevSecOps 的常用工具。

openclaw/openclaw-windows-node

C# · ★ 1,157 · 🍴 156 · 📈 331 stars today

Windows companion suite for OpenClaw - System Tray app, Shared library, Node, and PowerToys Command Palette extension

中文介绍 OpenClaw 的 Windows 伴生套件,包含系统托盘应用、共享库、Node 绑定和 PowerToys 命令面板扩展。用于增强 Windows 上的剪贴板管理、快捷操作等系统效率功能。

reconurge/flowsint

TypeScript · ★ 5,065 · 🍴 626 · 📈 503 stars today

A modern platform for visual, flexible, and extensible graph-based investigations. For cybersecurity analysts and investigators.

中文介绍 面向网络安全分析师和调查人员的现代化图形化调查平台。支持通过可视化节点图构建灵活、可扩展的调查流程,分析威胁情报、追踪攻击路径。适用安全运营、威胁追踪等场景。

mvanhorn/last30days-skill

Python · ★ 27,262 · 🍴 2,331 · 📈 173 stars today

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

中文介绍 AI Agent 技能,可跨 Reddit、X、YouTube、Hacker News、Polymarket 等平台研究任意话题,然后将结果综合成有依据的摘要。适合舆情追踪、趋势分析、快速信息核查等场景。

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

👍 27

Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hac

中文介绍 该研究探讨了在基于评分标准的强化学习中,策略模型可能利用法官的潜在偏见导致奖励破解。通过复现和分析,提出检测方法以避免无效或不安全的训练结果。

STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

👍 2

Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold standard for TDA relies on causal interventions, observing how a model changes when data is added or removed, but repeated retraining is computationally challenging for Large Language Models (LLM

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

👍 6

Scientific and engineering progress is fundamentally a long-horizon iterative process: proposing changes, running experiments, measuring outcomes, and continuously refining artifacts. Yet existing benchmarks for frontier models primarily evaluate either single-turn responses or short-horizon agent t

M^3Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks

👍 23

As multi-modal models advance towards long-form video understanding, memory emerges as a critical capability. Despite substantial efforts in developing video datasets and benchmarks, existing works primarily focus on perception and reasoning, without systematically evaluating memory: what models ret

MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation

👍 10

Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-intensive. Recent end-to-end vectorized mapping methods can predict lane geometry and topology directl

MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation

👍 1

Autoregressive mesh generation has gained attention by tokenizing meshes into sequences and training models in a language-modeling fashion. However, existing approaches suffer from two fundamental limitations: (i) low tokenization efficiency, which yields long token sequences and prevents scaling to

Streaming Communication in Multi-Agent Reasoning

👍 19

Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent ag

GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards

👍 3

Reinforcement learning with verifiable rewards (e.g. GRPO) is now a common way to improve mathematical reasoning in Large Language Models (LLMs). However, current methods usually broadcast one sequence-level advantage to all tokens, or use costly process reward models (PRMs) for step-level supervisi

Stateful Visual Encoders for Vision-Language Models

👍 5

Vision-language models (VLMs) are increasingly used in multi-image, multi-turn agentic settings where decisions depend on visual changes. However, in existing open-weight VLMs, visual comparisons happen only inside the language model, while the visual encoder itself remains stateless: each image is

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

👍 2

Current AI benchmarks evaluate agents on task execution within human-designed workflows. These evaluations fundamentally fail to measure a critical next-level capability: whether models can autonomously develop agent systems. We introduce the Meta-Agent Challenge (MAC), an evaluation framework desig

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

👍 19

Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiable Rewards (RLVR) on Chain-of-Thoughts (CoTs). However, since long CoTs naturally contain trial and errors and mainstream RLVR approaches choose outcome-correct CoT trajectories for memoriza

WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts

👍 8

Existing benchmarks for MLLM-generated web artifacts assess interaction through local evidence and miss the requirement-induced states and transitions that determine whether a page works. We introduce WebRISE, which compiles task requirements into Interaction Contract Graphs (ICGs) of observable sta

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

👍 23

Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous egocentric streams, often using evidence outside the current view. Existing benchmarks either evaluate offline over full videos or target events rather than spatial structure. We introduce

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

👍 5

Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step exec

MemTrain: Self-Supervised Context Memory Training

👍 12

Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize information accumulated across extended interactions. Existing memory-agent approaches are typically trained end-to-end with reinforcement learning on downstream tasks. However, collecting high-q

Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching

👍 12

Wide-baseline matching (WBM) requires integrating geometric understanding, viewpoint changes, fine-grained perception, and occlusion reasoning, making it a challenging testbed for spatial reasoning in multimodal large language models (MLLMs) deployed in physical environments. However, current MLLMs

AgentCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

👍 2

Language agents spend substantial inference time solving individual tasks, yet the experience acquired in one episode is often underutilized in future episodes. Continual learning expects an agent to accumulate reusable experience across a stream of tasks, improve over time, and avoid interference f

SynCred-Bench: Benchmarking Synthetic Credibility in AI-Generated Visual Misinformation

👍 2

Recent generative models can now produce visual artifacts with realistic embedded text and layouts, creating a new misinformation threat: synthetic credibility. We introduce SYNCRED-Bench, a benchmark of 600 AI-generated misinformation images balanced across six credible-form categories and seven fi

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

👍 12

Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. However, such knowledge is often multimodal, heterogeneous, noisy, and implicitly assumes human executors, making it difficult to use directly as the skills required by agents. To bridge the g

Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

👍 1

How can a population of agents self-orchestrate and self-adapt into stronger collective intelligence without centralized control? Inspired by Friedrich Hayek's economic theory of decentralized coordination in markets, we study this question through an agent economy in which agents compete via auctio

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

👍 9

On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more selective training paradigms. Recent OPD methods increasingly focus on selecting which trajectories to learn from, which tokens are most informative, and which supervision signals are most re

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

👍 35

Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not which parts of the trajectory make the answer unreliable. We study span-level error localization for d

Cosmos 3: Omnimodal World Models for Physical AI

👍 50

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critica

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

👍 4

The rapid progress of frontier large language models has led to widespread benchmark saturation, limiting the ability of existing datasets to differentiate model capabilities or provide useful training signal. For instance, on LiveCodeBench, frontier models achieve over 99% Pass@1 on easy splits and

Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain

👍 1

Transfer learning aims to facilitate the learning of a target domain by transferring knowledge from a source domain. The source domain typically contains semantically meaningful samples (*e.g.*, images) to facilitate effective knowledge transfer. However, a recent study observes that the noise domai

Do Text Edits Generalize to Visual Generation? Benchmarking Cross-Modal Knowledge Editing in UMMs

👍 0

Unified multimodal models (UMMs) have emerged as a promising paradigm for general-purpose multimodal intelligence. As they are deployed in real-world applications, effectively updating internal knowledge becomes critical. While knowledge editing has matured for text-only models, it remains unclear w

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

👍 2

Humans can effortlessly perceive spatial layouts, form cognitive representations, reason about spatial relations, and translate such reasoning into actions in everyday 3D environments. Although recent vision-language models (VLMs) have shown promising performance on observation-conditioned spatial p

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

👍 4

Weight-space model merging is usually formulated as an algebraic operation on checkpoints, yet at LLM scale the limiting resource is often the set of expert weights that must be read. We introduce MergePipe, a budget-aware execution layer that casts LLM merging as an expert access-set problem: given

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

👍 17

LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain largely centered on static, isolated, and short-horizon benchmarks that fail to capture the dynamic complexity of real-world production workflows. As

HOW ONE $2,999 NVIDIA BOX MADE ME $22,000 IN A YEAR

@w1nklerr · 44.2K 粉丝 · 17.7M 阅 · 1.4K 赞 · 161 转

Nobody told me about this for months. I'm telling you now so you don't lose the year I lost. Let me start with the number that made me angry. Last quarter my cloud GPU spend was sitting at $1,900 a

中文介绍 分享用 NVIDIA 盒子($2,999)替代云 GPU 后年省 $22,000 的真实经历。指出云 GPU 开销高达 $1,900/月,而自购硬件可大幅节省成本。

Context as Topology: Why Your Agent's Memory Forgets, and How Structure Escapes It

@elpresidank · 116 粉丝 · 2.9M 阅 · 543 赞 · 35 转

Most AI agent memory is built on embeddings. And there's now a proof that this entire class of system is going to forget what you stored in it — and confidently make up things you never stored at all.

中文介绍 论证基于嵌入向量的 AI 代理记忆系统必然遗忘且会自信编造内容,提出用结构(而非向量)解决记忆问题。

Range and Depth on Demand

@1salman · 363 粉丝 · 2.0M 阅 · 682 赞 · 45 转

Everyone keeps asking whether AI favors specialists or generalists. I think that is the wrong question. AI does not pick a side. It changes the tradeoff. The old world forced a choice. You could go

中文介绍 认为 AI 不会偏向专才或通才,而是改变了传统的技能取舍权衡。提出「按需广度与深度」的观点,打破旧有的二选一困境。

How to build a 4-agent team, that ships a feature while you sleep (Exact Setup Inside)

@zodchiii · 20.0K 粉丝 · 743.3K 阅 · 509 赞 · 55 转

Four AI agents can ship a feature while you sleep. Most people never wire them up. They fire a reviewer here, a test generator there, by hand, one at a time, each forgetting what the last one did.

中文介绍 详解搭建 4 个 AI 代理团队的方法,实现自动化开发流程:一个负责编码、一个负责审核、一个负责测试,可让功能在你睡觉时交付。

30 Obsidian Workflows, Plugins, and Setups That Most Users Don't Know

@eng_khairallah1 · 61.9K 粉丝 · 693.5K 阅 · 511 赞 · 71 转

Obsidian has 2,700+ community plugins. Over 100 of them are AI-related. Save this :) And the CEO of Obsidian personally published official Claude Skills for the platform - 12,900+ GitHub stars in

中文介绍 盘点 Obsidian 中 30 个不为人知的高效工作流、插件和配置,包括 100+ 个 AI 相关插件及官方 Claude Skills 集成(GitHub 12,900+ star)。

What an Enterprise Context Layer Actually Is

@prukalpa · 23.1K 粉丝 · 583.2K 阅 · 506 赞 · 80 转

A field guide to what it is, what it is not, and where it fits in your AI architecture. I have had some version of the same conversation with a CIO almost every day this year. Their team has read

中文介绍 为企业架构师提供企业上下文层的实用指南,澄清其定义、非定义以及在企业 AI 架构中的位置。

I Searched the Whole Claude Skills Ecosystem - These Are the Ones That Matter [Full GitHub Links]

@polydao · 18.1K 粉丝 · 559.5K 阅 · 505 赞 · 55 转

Most people are still using Claude like a smarter chatbot That is not the game anymore You’re competing against people who treat Claude like an operating system > While you’re typing one-off

中文介绍 筛选出 Claude Skills 生态中真正有价值的部分,提供完整 GitHub 链接。指出多数人仍把 Claude 当聊天机器人用,而高手已将其视为操作系统。

hacking pewdiepie's AI agent harness using an evil cocomelon website (then helping protect it)

@theonejvo · 22.1K 粉丝 · 504.3K 阅 · 861 赞 · 1 转

Over the past year, @pewdiepie, has been turning into one of the most visible champions of private, self-hosted computing, and it has been a genuine pleasure to watch. What began in late 2025 as an

中文介绍 演示如何利用恶意网站入侵 PewDiePie 的自托管 AI 代理系统,并随后帮助加固安全。强调自宿主设备的风险与防护。

Generative UI Is the New Frontend

@Saboo_Shubham_ · 116.2K 粉丝 · 263.3K 阅 · 517 赞 · 74 转

The frontend used to be a fixed thing. Designers drew it. Engineers built it. Users got what shipped. That's over. The interfaces shipping in 2026 are drawn partly by the agent itself, in real time,

中文介绍 宣告前端开发范式转变:2026 年的界面由 AI 代理实时生成,而非设计师固定绘制。生成式 UI 将取代传统前端。

Claude Code + NotebookLM + Obsidian: Research Monster That Gets Smarter Every Time You Use It

@monokern · 1.2K 粉丝 · 263.1K 阅 · 505 赞 · 72 转

Most people treat research as a manual task. You open 10 tabs. You watch videos. You read articles. You take notes somewhere. An hour later you have a pile of information you're not sure what to do

中文介绍 介绍一套研究流程:Claude Code 负责代码分析,NotebookLM 整合资料,Obsidian 做知识管理,三者联动形成越用越智能的研究系统。

Stop building Foxconn factories for your agents

@garrytan · 853.3K 粉丝 · 180.6K 阅 · 503 赞 · 43 转

In January I got back into coding and I built Garry's List. Over five hundred thousand lines of Rails and the tests to police it. I was proud of it. I shouldn't have been. The thing worth being proud

中文介绍 反思自己用传统方式写 50 万行 Rails 代码,认为这是「富士康式」开发。提倡停止为代理建造巨型工厂,转向更优雅的编码方式。

The Agentic Economy Is Here

@base · 1.3M 粉丝 · 97.3K 阅 · 519 赞 · 74 转

TL;DR: Agents are becoming the internet’s newest paying customers, and the economy serving them is moving fast. On Base, agents already use wallets and stablecoins to pay for inference, live search,

中文介绍 宣布 Agentic Economy 到来:AI 代理已成为互联网的新付费客户,在 Base 上使用钱包和稳定币支付推理、搜索等服务。

🥇Top AI Papers of the Week

@dair_ai · 124.6K 粉丝 · 84.0K 阅 · 504 赞 · 83 转

1. SkillOpt Microsoft Research treats a compact natural-language skill document as the trainable state of a frozen agent, then learns that document through rollouts, reflection, and bounded edits

中文介绍 本周最佳 AI 论文精选:微软的 SkillOpt 用自然语言技能文档作为冻结代理的可训练状态,通过回滚、反思和边界编辑进行学习。

My Agent Stack For Automating My Personal Life

@nicbstme · 23.7K 粉丝 · 84.0K 阅 · 530 赞 · 35 转

My agent manages my emails, SMS, Whatsapp, Telegram and pretty much everything to automate my personal life. People keep asking me how I use agents in real life. I mean the actual boring things that

中文介绍 分享个人生活自动化代理栈:统一管理邮件、短信、WhatsApp、Telegram 等日常通信,聚焦真正无聊却高频的维度。

State of Memory in Agent Harness

@mem0ai · 17.6K 粉丝 · 82.8K 阅 · 520 赞 · 60 转

Agent harnesses are where AI software actually runs. Cursor, Devin, Claude Code, Codex: these environments handle context, orchestrate tools, coordinate agents, and increasingly, manage memory. The

中文介绍 盘点当前主流代理平台(Cursor、Devin、Claude Code、Codex)的内存管理现状,对比它们如何处理上下文、编排工具和管理记忆。

Robotics: The Next AI Frontier

@ParadisLabs · 48.9K 粉丝 · 82.0K 阅 · 501 赞 · 60 转

AI's next frontier will be Robotics and Humanoids. The past decade has seen rapid AI adoption in the structured digital world. Those LLM breakthroughs now enable more general-purpose learning and more

中文介绍 断言 AI 下一个前沿是机器人与人形机器人。LLM 的突破使通用机器人在物理世界中自主学习成为可能。

How to build your own agent harness???

@mfpiccolo · 7.4K 粉丝 · 81.9K 阅 · 607 赞 · 56 转

Most agent teams don't build a harness. They adopt one. LangChain, LangGraph, OpenAI Agents SDK, Anthropic SDK, CrewAI, AutoGen, the loop, the tools, the memory, and the orchestration are picked off

中文介绍 教程:如何构建自己的代理框架(harness),而非直接采用 LangChain、CrewAI 等现成方案。深入讲解循环、工具、记忆和编排的自主实现。

A harness for every task: dynamic workflows in Claude Code

@trq212 · 263.1K 粉丝 · 75.7K 阅 · 542 赞 · 36 转

Last week, we released dynamic workflows in Claude Code. Claude can now write its own harness on the fly, custom-built for the task at hand. While the default Claude Code harness is built for coding,

中文介绍 Claude Code 发布动态工作流功能:能够根据任务实时编写自己的框架,不再局限于固定编码场景,适配更多任务。

A Functional Taxonomy of World Models

@drfeifei · 738.0K 粉丝 · 72.2K 阅 · 699 赞 · 144 转

“The world is everything that is the case.” — Ludwig Wittgenstein, Tractatus Logico-Philosophicus, 1921 The world is not made of words. In an earlier essay, we argued that spatial intelligence is AI’s

中文介绍 李飞飞探讨世界模型的功能分类:世界不是由词构成,空间智能是 AI 的下一阶段,需要超越语言的物理世界认知。

How To Fix AI Slop (Using Hermes)

@EXM7777 · 115.1K 粉丝 · 70.1K 阅 · 520 赞 · 47 转

There's a reason some people seem to be constantly shipping the best software, writing incredible content, or generating insane images... They adopted the eval loop, while you... You've tried better

中文介绍 揭示为何有人能持续产出优质成果——他们采用了评估循环(eval loop),而非仅仅调优提示词。以 Hermes 为例讲解如何用评估机制消除 AI 输出堆砌(slop)。

Team thinking, visualized by Claude

中文介绍 Claude 通过可视化团队思考过程,帮助成员更直观地理解协作中的思维流动和决策逻辑。

Team thinking, visualized by Claude

中文介绍 Claude 通过可视化团队思考过程,帮助成员更直观地理解协作中的思维流动和决策逻辑。

Claude Opus 4.8: Lying Machine No More?

中文介绍 Claude Opus 4.8 版本在诚实性方面取得显著改进,减少了生成虚假信息的倾向,提升了模型可靠性。

How Endava is redesigning software delivery around AI agents

Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.

中文介绍 Endava 利用 AI 智能体、ChatGPT Enterprise 和 Codex 加速软件交付、自动化工作流,并在企业内构建 AI 原生文化。

How courts are coping with a flood of AI-generated lawsuits

Most days in her chambers, Judge Maritza Braswell, a federal magistrate judge in Colorado, sifts through stacks of documents written by people without a lawyer. Many of them can’t afford to hire a lawyer, and others have cases too weak or too small to interest one. She reads each one carefully, mind

中文介绍 美国科罗拉多州联邦治安法官 Maritza Braswell 每天处理大量由无律师人士撰写的案件文件,法院正应对日益增多的 AI 生成诉讼。

Introducing new capabilities to GPT-Rosalind

GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.

中文介绍 GPT-Rosalind 新增生物推理、药物化学、基因组学分析和实验工作流功能,推动生命科学研究。

How Wasmer used Codex to build a Node.js runtime for the edge

See how Wasmer used Codex with GPT-5.5 to build a Node.js runtime for the edge, accelerating development 10x to 20x and shipping in weeks instead of months.

中文介绍 Wasmer 利用 Codex 和 GPT-5.5 构建边缘 Node.js 运行时,开发速度提升 10 至 20 倍,数周内完成交付。

A blueprint for democratic governance of frontier AI

OpenAI outlines a blueprint for U.S. governance of frontier AI, proposing a federal framework for safety, resilience, and national security.

中文介绍 OpenAI 提出美国前沿 AI 治理蓝图,建议建立联邦框架以保障安全、韧性和国家安全。

OpenAI public policy agenda

OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society.

中文介绍 OpenAI 发布公共政策议程,涵盖安全、青少年保护、劳动力转型和全球标准。

君的公益 停机迁移公告

服务器将在今晚七点进行迁移升级,预计需要半小时至一小时,请大家稍作等待 88 个帖子 - 88 位参与者 阅读完整话题

服务器给我干冒烟了

我合计合计得买个新服务器了 极其的夸张 131 个帖子 - 123 位参与者 阅读完整话题

Ramp报告:为省成本,大量美国公司直接购买中国DeepSeek官方API

美国企业支出管理平台 Ramp 发布的 2026 年 6 月报告显示,中国 AI 公司 DeepSeek 登上了热门软件榜首。尽管美国官方先前高度防范中国大模型,但真实的商业交易数据却揭示了相反的现状。Ramp 分析了平台上 5 万多家企业的信用卡消费记录,发现许多美国公司并未在本地部署开源模型,而是直接掏钱购买 DeepSeek 官方的托管 API 服务。这意味着,大量美国企业的数据正直接发送并存储在位于中国的服务器上。 真实的资金流向与一年多前美国社会对 DeepSeek R1 刚发布时的警惕态度形成了强烈反差。当时出于对泄密和安全的担忧,美国大公司和政府机构普遍限制使用中国模型。然而,面

【CHY公益站】迈向稳定的第一步

近期将底层数据库从 SQLite 迁移到了 MySQL,并配置了每小时一次的完整备份。 这是 CHY 公益站迈向稳定的第一步,后续会持续优化。感谢佬友们的支持与测试。 35 个帖子 - 34 位参与者 阅读完整话题

写了个脚本,推广给了同学,有些激动,但是……

长文预警,没办法,一写就写多写杂了,佬们当个唠嗑看看吧。 如题,因为某课程要看1000多张ppt,实在不想用手滑才想写的。都是用codex做的,看似很简单的一个刷课脚本,从功能实现上的确也很容易,就只有ppt,等每页状态变为已读之后再转到下一张,全部读完之后返回上一级,并且跳过ppt里可能的习题和测试,如此往复就行了。 不过我为什么说是「看似」呢?因为实际上涉及到了非常多的细节问题。比如在不同的地方卡住、退回上一级页面时候多退了一级、网络不好加载不出来、反复进入一个内容陷入死循环等等。不过好在最后还是比较妥善地解决了。 好了,上面是技术分析,看起来也确实是很多很经典的问题,那我为何还要反复开个

我又幻想了......

你说我在互联网展露了这么多痕迹,会不会有人在某处看到了我的一块痕迹对我感兴趣,然后开始不断的开我和视奸我,拾起我一片片碎片,拼凑出真正的我,然后被我吸引,最后和我喜结连理(bushi 简直就是幻想中的幻想哈哈 25 个帖子 - 21 位参与者 阅读完整话题

OpenAI会倒闭吗?

最近刷短视频总能看到有人预测OpenAI快倒闭了 对此各位佬对此有什么看法? 91 个帖子 - 74 位参与者 阅读完整话题

一上午啥也没干

到公司打开L站,玩玩手机+刷帖子,一上午啥也没干,也不想干,事已至此,想想等会吃啥吧 31 个帖子 - 22 位参与者 阅读完整话题

ESP32-S31

316 points · 169 comments