每日 AI 简报

2026-06-06(内容获取于 06/06 18:59)

headroom:压缩 LLM 输入 token 的开源工具

GitHub Trending

headroom 是一个开源库/代理/MCP 服务器,可在工具输出、日志、文件和 RAG 块进入 LLM 前压缩 60-95% token,且不影响回答质量。大幅降低推理成本。

推荐理由:直接可用的开源工具,能显著降低 LLM 使用成本,对开发者极具实操价值。

Meta AI 客服被用于盗取 Instagram 账号

MIT Tech Review AI

攻击者利用 Meta 的 AI 客户支持代理,通过简单询问即可将 Instagram 账户链接到其控制的邮箱,实现账号窃取。该事件引发对 AI 代理安全边界与设计缺陷的广泛关注。(多家报道)

推荐理由:此事件是 AI 代理安全风险的近期典型案例,对产品设计者与安全从业者有直接警示意义。

hermes-agent:与你一同成长的 AI 代理

GitHub Trending

NousResearch 发布了 hermes-agent,一个强调自适应成长的 AI 代理项目,代码已在 GitHub 开源。

推荐理由:来自知名团队的 Agent 框架新作,适合 AI 开发者和研究者跟进实验。

Claude Code 修复团队协作权限崩溃

Claude Code Changelog

Claude Code v2.1.114 修复了当 Agent Team 成员请求工具权限时权限对话框崩溃的 Bug。

推荐理由:Bug 修复信息对正在使用 Claude Code 的团队有实际运维价值。

如何在 2026 年成为 AI 工程师(无 CS 学位)

X 推文 (AttentionVC)

一条 X 推文分享了无需计算机学位、无需训练营、无需了解 Transformer 即可在 2026 年成为 AI 工程师的路径,强调企业实际招聘需求。

推荐理由:对想转型 AI 工程师的读者有实际指导意义,内容具体且可操作。

Anthropic 发布 Agent 安全控制实践

Anthropic Engineering

Anthropic 工程师分享了在 claude.ai、Claude Code 和 Cowork 等产品中构建 Agent 安全控制的经验,讨论如何限制 Agent 能力增长带来的潜在影响。

推荐理由:来自一线团队的工程实践,对构建安全 Agent 系统的开发者有直接参考价值。

Anthropic 研究:让 Claude 成为化学家

Anthropic Research

Anthropic 发布最新研究,探索如何将 Claude 应用于化学领域,使模型具备化学推理与实验辅助能力。

推荐理由:展示 AI 在科学领域的垂直应用潜力,对科研从业者和 AI 应用开发者有启发。

3B 模型上的多代理经济模拟

Hugging Face Blog

Hugging Face 博客发布项目「千言木」,展示如何在参数量仅 3B 的小模型上运行多代理经济模拟,验证小模型在复杂场景下的可行性。

推荐理由:小模型多代理的实践案例,对资源受限场景的开发者有重要参考。

ChatGPT 与 Codex 合并预示平台整合

Riley Brown (YouTube)

Riley Brown 发布视频,分析 ChatGPT 与 Codex 合并将对 AI 开发范式、开发者工作流及平台生态产生的深远影响。

推荐理由:合并趋势影响开发者日常工具链,值得关注以提前适应变化。

Anthropic 销售员用 Claude Code 重建团队流程

Claude Blog

一位 Anthropic 销售人员分享了如何利用 Claude Code 重构团队工作流,提升 GTM 工程效率。

推荐理由:非技术角色使用 AI 工具的实际案例,对业务人员有启发。

NousResearch/hermes-agent

Python · ★ 184,117 · 🍴 31,546 · 📈 1,845 stars today

The agent that grows with you

中文介绍 Hermes Agent 是一个随用户成长的自适应 AI 代理。它通过持续学习和反馈优化自身行为,适用于需要长期人机协作、个性化任务执行的场景,如个人助理和自动化工作流。

chopratejas/headroom

Python · ★ 15,072 · 🍴 962 · 📈 2,473 stars today

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

中文介绍 Headroom 压缩工具输出、日志、文件和 RAG 块等文本,在送入 LLM 前减少 60-95% 的 token 数,同时保持回答质量。提供库、代理和 MCP 服务器三种使用方式,适用于节省推理成本和提升响应速度的场景。

CopilotKit/CopilotKit

TypeScript · ★ 32,931 · 🍴 4,213 · 📈 366 stars today

The Frontend Stack for Agents & Generative UI. React + Angular. Makers of the AG-UI Protocol

中文介绍 CopilotKit 是面向 Agent 和生成式 UI 的前端框架,支持 React 和 Angular。它实现了 AG-UI 协议,帮助开发者快速构建带有 AI 协作能力的界面,适用于智能应用和聊天机器人的前端开发。

lfnovo/open-notebook

TypeScript · ★ 26,298 · 🍴 3,019 · 📈 1,152 stars today

An Open Source implementation of Notebook LM with more flexibility and features

中文介绍 Open Notebook 是 Notebook LM 的开源替代方案,提供更高的灵活性和更多功能。它允许用户自定义笔记和 AI 交互方式,适用于需要本地化、可定制的智能笔记和问答系统场景。

affaan-m/ECC

JavaScript · ★ 208,703 · 🍴 32,011 · 📈 1,361 stars today

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

中文介绍 ECC 是一个 Agent Harness 性能优化系统,为 Claude Code、Codex、Opencode、Cursor 等工具提供技能、本能、记忆、安全和研究优先开发能力。它提升 AI 代理的执行效率和可靠性,适用于代码辅助和自动化开发环境。

Panniantong/Agent-Reach

Python · ★ 21,901 · 🍴 1,884 · 📈 148 stars today

Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees.

中文介绍 Agent-Reach 让 AI 代理通过单一 CLI 访问整个互联网,包括 Twitter、Reddit、YouTube、GitHub、Bilibili、小红书等平台,无需 API 费用。适用于信息收集、舆情监控和社交媒体数据分析场景。

NVIDIA/cosmos

Jupyter Notebook · ★ 9,539 · 🍴 606 · 📈 479 stars today

NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

中文介绍 NVIDIA Cosmos 是一个开放的世界模型、数据集和工具平台,帮助开发者构建物理 AI,用于机器人、自动驾驶、智能基础设施等领域。它提供仿真和感知能力,加速物理世界应用的研发。

666ghj/MiroFish

Python · ★ 64,915 · 🍴 10,097 · 📈 320 stars today

A Simple and Universal Swarm Intelligence Engine, Predicting Anything. 简洁通用的群体智能引擎,预测万物

中文介绍 MiroFish 是一个简洁通用的群体智能引擎,基于 Swarm Intelligence 原理,用于预测和分析各种复杂系统。它适用于需要集体决策、趋势预测和优化的场景,如金融市场和资源调度。

mvanhorn/last30days-skill

Python · ★ 28,395 · 🍴 2,406 · 📈 731 stars today

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

中文介绍 last30days-skill 是一个 AI 代理技能,能自动检索 Reddit、X、YouTube、Hacker News、Polymarket 等平台上的话题,并综合生成有据可依的摘要。适用于新闻分析、趋势跟踪和快速研究场景。

PaddlePaddle/PaddleOCR

Python · ★ 80,711 · 🍴 10,641 · 📈 747 stars today

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

中文介绍 PaddleOCR 是一款强大的轻量级 OCR 工具包,支持 100+ 种语言,能将 PDF 或图片文档转换为结构化数据,桥接图像/PDF 与 LLM。适用于文档数字化、发票识别和信息提取等场景。

openai/plugins

JavaScript · ★ 1,623 · 🍴 252 · 📈 49 stars today

OpenAI Plugins

中文介绍 OpenAI Plugins 是 OpenAI 的插件仓库,允许 ChatGPT 等模型通过第三方插件扩展能力,如访问实时信息或执行外部操作。适用于需要将 AI 与外部服务集成的开发场景。

MemPalace/mempalace

Python · ★ 54,034 · 🍴 7,092 · 📈 227 stars today

The best-benchmarked open-source AI memory system. And it's free.

中文介绍 MemPalace 是基准测试成绩最好的开源 AI 记忆系统,且免费使用。它帮助 AI 代理持久化和管理长期记忆,适用于需要上下文保持和个性化交互的对话系统和智能助手场景。

withastro/flue

TypeScript · ★ 4,650 · 🍴 246 · 📈 126 stars today

The sandbox agent framework.

中文介绍 Flue 是一个沙箱代理框架,提供安全隔离的执行环境,允许运行不可信代码或 AI 代理。适用于需要受控测试、插件系统或安全执行第三方脚本的场景。

openclaw/openclaw-windows-node

C# · ★ 1,662 · 🍴 187 · 📈 326 stars today

Windows companion suite for OpenClaw - System Tray app, Shared library, Node, and PowerToys Command Palette extension

中文介绍 OpenClaw Windows 配套套件,包括系统托盘应用、共享库、Node.js 支持和 PowerToys Command Palette 扩展,增强 OpenClaw 在 Windows 上的集成体验。适用于需要桌面端控制和扩展功能的用户。

aquasecurity/trivy

Go · ★ 35,907 · 🍴 448 · 📈 207 stars today

Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more

中文介绍 Trivy 是一个全面的安全扫描工具,可检测容器、Kubernetes、代码仓库、云环境等多种场景中的漏洞、配置错误、密钥和 SBOM。适用于 CI/CD 流水线和安全审计场景。

jwasham/coding-interview-university

★ 350,573 · 🍴 83,308 · 📈 745 stars today

A complete computer science study plan to become a software engineer.

中文介绍 一个完整的计算机科学学习计划,旨在帮助自学者系统掌握算法、数据结构等知识,准备软件工程师技术面试。提供详细的课程路线图和资源推荐,适合求职者和转行者。

github/copilot-sdk

Java · ★ 9,340 · 🍴 1,233 · 📈 309 stars today

Multi-platform SDK for integrating GitHub Copilot Agent into apps and services

中文介绍 GitHub Copilot SDK 是一个多平台 SDK,用于将 GitHub Copilot Agent 集成到自有应用和服务中,让开发者快速构建 AI 编程助手功能。适用于 IDE 插件和自动化代码助手场景。

Regret Minimization with Adaptive Opponents in Repeated Games

👍 1

In this paper, we study regret minimization in repeated games with adaptive opponents who can respond based on histories of play. The standard metric of external regret in online learning is known to fail to capture such adaptivity. To account for players' counterfactual reasoning, we introduce {\tt

中文介绍 该论文研究在重复博弈中针对自适应对手的遗憾最小化问题,指出在线学习中常用的外部遗憾指标无法捕捉对手的适应性行为。

Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models

👍 3

Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, resulting in representations that fail to maintain geometric and spatial consistency across video frames. Given the scarcity of large-scale 3D data, we present GeoVR, a novel framework that l

中文介绍 GeoVR提出从视频中学习几何表示,以增强多模态大语言模型的3D空间感知能力,解决其在视频帧中缺乏几何和空间一致性的问题。

AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding

👍 5

Vision-Language-Action (VLA) models leverage the rich world knowledge of pretrained vision-language models (VLMs) to enable instruction-following robotic manipulation. However, the structural mismatch between VLM semantic spaces and embodied control policies often hinders the learning of precise per

中文介绍 AffordanceVLA将视觉-语言模型与可控策略结合,通过可供性感知理解提升机器人操作中的指令跟随和动作生成能力。

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

👍 57

Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolv

中文介绍 Code2LoRA利用超网络生成适配器,使代码语言模型在软件演化时无需为每个仓库进行微调或长上下文检索,降低仓库级适配成本。

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

👍 1

A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want to know whether Lin Wei is free, in a good mood, or worth interrupting now. Standard tool-use agents answer the literal question and stop. AURA inserts an inference step between scene percep

中文介绍 AURA在情景式LLM代理中加入意图导向推理,用于识别用户隐式需求,如查询他人是否空闲,超越字面问题回答。

Benchmark Everything Everywhere All at Once

👍 2

Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit measures of performance. However, their construction is labor-intensive and hard to reuse, raising concerns about sustainability and scalability. Moreover, existing benchmarks often quickly

中文介绍 该论文提出一个通用基准框架,旨在解决现有大语言和多模态模型基准构建劳动密集、难以复用的问题,提升可持续性和可扩展性。

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

👍 2

Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether they are highly sensitive to semantically independent changes

中文介绍 研究审计LLM在模拟社交媒体用户立场时,是否对语义相似的上下文变化高度敏感,从而影响模拟结果的准确性。

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

👍 0

AI research often requires decisions before future evidence exists: which bottleneck to attack, which direction to pursue, or where a project should be positioned. We introduce ForeSci, a temporally controlled benchmark for evaluating whether LLM agents can make such forward-looking research judgeme

中文介绍 ForeSci是一个时间控制基准,用于评估LLM代理在缺乏未来证据时做出前瞻性AI研究判断(如选择瓶颈或研究方向)的能力。

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

👍 15

Video generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? W

中文介绍 Dream.exe探索视频生成模型生成的视频能否直接用于执行机器人操作任务,评估其反映物理世界的能力。

Towards One-to-Many Temporal Grounding

👍 4

Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments for a single query -- a setting we term One-to-Many Temporal

中文介绍 该论文提出一对多时间定位任务,旨在根据单个文本查询定位视频中多个不连续的片段,超越传统单片段检索的限制。

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

👍 7

Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based

中文介绍 PropMe提出倾向感知的记忆化评估框架,度量LLM在常规使用中是否会泄露训练数据,而非强制提取下的表现。

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

👍 35

Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual c

中文介绍 AdaPlanBench评估LLM代理在交互中逐步披露的世界和用户约束下,进行自适应规划的能力。

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

👍 42

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

👍 23

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To tra

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

👍 36

Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many other important problems coexist, hidden in plain sight, within the broader user context, with their

MAOAM: Unified Object and Material Selection with Vision-Language Models

👍 7

Selection is a core operation in interactive image editing. To be practical, a user should be able to specify and disambiguate the desired selection region through either text or click-based interactions, and the system should support selecting not only objects but also other criteria, such as mater

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

👍 4

In robotics systems, vast amounts of visual data are easily captured at high resolution using low-cost, low-power hardware. Yet, limited bandwidth and on-device compute resources prevent full utilization when transmitted via conventional codecs like JPEG/MPEG. Newer codecs, like AV1/AVIF, improve th

RobotValues: Evaluating Household Robots When Human Values Conflict

👍 23

While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet,

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

👍 2

Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and lack of principled long-horizon context management, hindering their ability to accumulate reusable

LLM Anonymization Against Agentic Re-Identification

👍 1

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

👍 1

Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeatedly restate goals, risk preferences, portfolio context, past judgments, and shifting market assumptions, while the agent answers, retrieves, acts, and forgets. In finance, this is not just in

AdaCodec: A Predictive Visual Code for Video MLLMs

👍 4

Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video MLLMs) usually encode each sampled frame as an independent RGB image, causing visual tokens to repeat content already present in earlier frame

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

👍 0

Large language models are increasingly deployed as coding agents, shifting safety from individual responses to action sequences. Existing benchmarks, however, primarily assess whether models refuse unsafe prompts, leaving impacts on stateful workspaces largely unexamined. We present SABER, a benchma

The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

👍 2

Large Language Models exhibit paradoxical fragility in fundamental arithmetic, implying a disconnect between internal computation and discrete output. By analyzing the residual stream geometry during multi-operand addition, we identify the Iso-Raw-Sum Trajectory (IRST), a geometric structure where r

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

👍 3

Multimodal Large Language Models (MLLMs) have demonstrated significant achievements in general visual question answering (VQA) tasks. However, they remain brittle on mechanical engineering drawings, where high annotation density and weak domain knowledge, compounded by unreliable spatial relation re

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

👍 1

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the remarkable coding abilities of Large Language Models (LLMs). However, the scalability of RLVR is severely constrained by the scarcity of sufficiently challenging verifiable code tasks that t

Multimodal Music Recommendation System using LLMs

👍 1

Music recommendation systems typically treat songs as opaque tokens, relying on collaborative interaction histories which overlooks semantic or acoustic content. Prior work has explored LLM-augmented, multimodal, and text-enhanced approaches to sequential recommendation, and while some methods parti

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

👍 5

Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajectories into compact memory. However, existing approaches typically train these memory policies using outcome-based reinforcement learning, failing to localize where intermediate memory quality

Trust Region Q Adjoint Matching

👍 2

Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of optimization arising from the multi-step sampling process. Recently, Q-learning with Adjoint Matching (QAM) addressed this issue by reformulating into a memoryless stochastic optimal control (

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

👍 1

Diffusion-based image editing has achieved strong visual fidelity under natural language instructions, yet most existing systems still operate at the level of surface instruction following, without reasoning about the implicit contextual constraints embedded in real user requests. This often leads t

Context as Topology: Why Your Agent's Memory Forgets, and How Structure Escapes It

@elpresidank · 116 粉丝 · 2.9M 阅 · 543 赞 · 35 转

Most AI agent memory is built on embeddings. And there's now a proof that this entire class of system is going to forget what you stored in it — and confidently make up things you never stored at all.

中文介绍 证明基于嵌入的AI记忆系统存在固有缺陷:会遗忘存储内容并自信编造新信息。从拓扑结构视角重新思考记忆,为构建更可靠的代理记忆提供新思路。

SpaceX IPOs in 7 days. I Fed the S1 Doc Into Claude. Here Is What It Found Buried in 300 Pages.

@DamiDefi · 96.5K 粉丝 · 2.3M 阅 · 584 赞 · 80 转

The number that stopped me was not the $2 trillion valuation. It was $791 million. That is what SpaceX made in net income in 2024. A profitable, growing aerospace company with a genuine moat in launch

中文介绍 将SpaceX S1文件喂给Claude分析,发现关键数据:2024年净利润7.91亿美元。但仅提供了开场信息,分析结果简短。

Range and Depth on Demand

@1salman · 363 粉丝 · 2.0M 阅 · 682 赞 · 45 转

Everyone keeps asking whether AI favors specialists or generalists. I think that is the wrong question. AI does not pick a side. It changes the tradeoff. The old world forced a choice. You could go

中文介绍 探讨AI时代专才与通才的取舍问题,认为AI不是站在某一方,而是改变了传统权衡。旧世界只能选其一,现在AI让两者可兼得,个人应重新思考能力建设。

How To Become An AI Engineer in 2026 (Without a CS Degree)

@sairahul1 · 110.7K 粉丝 · 710.8K 阅 · 509 赞 · 97 转

How To Become An AI Engineer in 2026. Without a CS degree. Without a bootcamp. Without knowing what a transformer is today. Here's what nobody tells you: The companies hiring right now don't need

中文介绍 无CS学位如何成为2026年的AI工程师。指出当前招聘公司并不需要你懂transformer,关键在于解决实际问题的能力。提供具体路径,强调实战而非学历。

30 Obsidian Workflows, Plugins, and Setups That Most Users Don't Know

@eng_khairallah1 · 61.9K 粉丝 · 693.5K 阅 · 511 赞 · 71 转

Obsidian has 2,700+ community plugins. Over 100 of them are AI-related. Save this :) And the CEO of Obsidian personally published official Claude Skills for the platform - 12,900+ GitHub stars in

中文介绍 汇总30个鲜为人知的Obsidian工作流、插件和设置,其中超100个AI相关插件。Obsidian官方发布了Claude Skills,获12900+GitHub星标。提升效率的实用指南。

How to master Dynamic Workflows in Claude Code: 6 patterns and 14 steps Anthropic engineers actually

@0xCodez · 3.3K 粉丝 · 637.2K 阅 · 510 赞 · 59 转

Most Claude Code users still write their workflows by hand. They chain prompts, copy outputs, paste them into the next prompt, fix what went wrong, repeat. 9 out of 10 builders haven’t tried Dynamic

中文介绍 详解Claude Code动态工作流的6种模式和14个步骤。大多数用户仍在手动链式写prompt,而通过动态工作流可以自动化并提升效率。

What an Enterprise Context Layer Actually Is

@prukalpa · 23.1K 粉丝 · 583.2K 阅 · 506 赞 · 80 转

A field guide to what it is, what it is not, and where it fits in your AI architecture. I have had some version of the same conversation with a CIO almost every day this year. Their team has read

中文介绍 定义企业上下文层的概念、不是它是什么以及它在AI架构中的位置。基于与多位CIO的对话,澄清常见误解,提供构建企业级AI系统的实用指南。

I Searched the Whole Claude Skills Ecosystem - These Are the Ones That Matter [Full GitHub Links]

@polydao · 18.1K 粉丝 · 559.5K 阅 · 505 赞 · 55 转

Most people are still using Claude like a smarter chatbot That is not the game anymore You’re competing against people who treat Claude like an operating system > While you’re typing one-off

中文介绍 搜索整个Claude Skills生态系统,筛选出真正有用的技能并附GitHub链接。指出大多数人仍把Claude当聊天机器人,而竞争者已将其当作操作系统。

hacking pewdiepie's AI agent harness using an evil cocomelon website (then helping protect it)

@theonejvo · 22.1K 粉丝 · 504.3K 阅 · 861 赞 · 1 转

Over the past year, @pewdiepie, has been turning into one of the most visible champions of private, self-hosted computing, and it has been a genuine pleasure to watch. What began in late 2025 as an

中文介绍 演示如何通过恶意网站攻击PewDiePie的自托管AI代理系统,然后帮助加固防护。案例展示了自托管AI的安全风险与攻防实践。

Generative UI Is the New Frontend

@Saboo_Shubham_ · 116.2K 粉丝 · 263.3K 阅 · 517 赞 · 74 转

The frontend used to be a fixed thing. Designers drew it. Engineers built it. Users got what shipped. That's over. The interfaces shipping in 2026 are drawn partly by the agent itself, in real time,

中文介绍 预言2026年前端将被生成式UI重塑:界面将由AI代理实时生成而非固定设计。设计师画、工程师建、用户用的传统模式已结束。

Claude Code + NotebookLM + Obsidian: Research Monster That Gets Smarter Every Time You Use It

@monokern · 1.2K 粉丝 · 263.1K 阅 · 505 赞 · 72 转

Most people treat research as a manual task. You open 10 tabs. You watch videos. You read articles. You take notes somewhere. An hour later you have a pile of information you're not sure what to do

中文介绍 组合Claude Code、NotebookLM和Obsidian构建研究利器,每次使用都会自动累积知识。支持自动提取关键信息,集成多工具提升效率。

Stop building Foxconn factories for your agents

@garrytan · 853.3K 粉丝 · 180.6K 阅 · 503 赞 · 43 转

In January I got back into coding and I built Garry's List. Over five hundred thousand lines of Rails and the tests to police it. I was proud of it. I shouldn't have been. The thing worth being proud

中文介绍 反思不应为代理构建“富士康”式复杂代码工厂。作者亲历50万行Rails代码的教训,提倡轻量级、聚焦核心的构建方式。

Building cloud agent infrastructure: what's different, and what we learned

@intuitiveml · 6.4K 粉丝 · 171.3K 阅 · 524 赞 · 70 转

Most agent frameworks today assume a desktop. One user, one machine, one process. The agent runs while the laptop is open, writes to a local filesystem, holds API keys in environment variables, and

中文介绍 分享构建云端代理基础设施的经验。大多数代理框架假设桌面环境,而实际需要多用户、多进程、持久化存储。总结与桌面环境不同的关键点与教训。

A guide to /goal 🥅

@dkundel · 19.3K 粉丝 · 116.9K 阅 · 523 赞 · 40 转

We launched the goal mode (or /goal) as a way to help you have Codex drive towards a concrete outcome. When you set a goal Codex will continue to work until the goal is achieved, whether that takes

中文介绍 Codex新功能goal模式的详细指南。通过设定目标,Codex会自动持续工作直至达成,无需手动逐步推动,提升自动化效率。

🥇Top AI Papers of the Week

@dair_ai · 124.6K 粉丝 · 84.0K 阅 · 504 赞 · 83 转

1. SkillOpt Microsoft Research treats a compact natural-language skill document as the trainable state of a frozen agent, then learns that document through rollouts, reflection, and bounded edits

中文介绍 本周最佳AI论文汇总:微软SkillOpt方法将自然语言技能文档作为可训练状态,通过回滚和反思学习;还有其他前沿进展。

State of Memory in Agent Harness

@mem0ai · 17.6K 粉丝 · 82.8K 阅 · 520 赞 · 60 转

Agent harnesses are where AI software actually runs. Cursor, Devin, Claude Code, Codex: these environments handle context, orchestrate tools, coordinate agents, and increasingly, manage memory. The

中文介绍 剖析主流代理框架(Cursor、Claude Code、Codex等)的内存管理现状。这些环境处理上下文、编排工具,但内存机制仍是瓶颈。

A harness for every task: dynamic workflows in Claude Code

@trq212 · 263.1K 粉丝 · 75.7K 阅 · 542 赞 · 36 转

Last week, we released dynamic workflows in Claude Code. Claude can now write its own harness on the fly, custom-built for the task at hand. While the default Claude Code harness is built for coding,

中文介绍 Claude Code动态工作流更新:现在Claude能即时编写专属工作流,针对当前任务定制。超越了默认的代码编辑加工作流模式。

A Functional Taxonomy of World Models

@drfeifei · 738.0K 粉丝 · 72.2K 阅 · 699 赞 · 144 转

“The world is everything that is the case.” — Ludwig Wittgenstein, Tractatus Logico-Philosophicus, 1921 The world is not made of words. In an earlier essay, we argued that spatial intelligence is AI’s

中文介绍 提出世界模型的分类学框架,基于维特根斯坦的哲学观点,论证空间智能是AI的下一个前沿。分类法是理解AI认知的重要基础。

How To Fix AI Slop (Using Hermes)

@EXM7777 · 115.1K 粉丝 · 70.1K 阅 · 520 赞 · 47 转

There's a reason some people seem to be constantly shipping the best software, writing incredible content, or generating insane images... They adopted the eval loop, while you... You've tried better

中文介绍 使用Hermes方法修复AI生成的“slop”问题。通过引入评估循环,能持续产出高质量内容,而非靠运气。

A guide to /goal 🥅

@dkundel · 19.3K 粉丝 · 116.9K 阅 · 7d 曝光 116.9K

A guide to /goal 🥅

How Anthropic uses Claude in GTM Engineering

中文介绍 Anthropic 在 YouTube 视频中展示如何使用 Claude 进行市场推广工程,包括自动化销售和客户互动流程。

Team thinking, visualized by Claude

中文介绍 Claude YouTube 节目演示如何用可视化方式展现团队思维过程,帮助理解集体决策和协作模式。

How Anthropic uses Claude in GTM Engineering

中文介绍 Anthropic 在 YouTube 视频中展示如何使用 Claude 进行市场推广工程,包括自动化销售和客户互动流程。

Team thinking, visualized by Claude

中文介绍 Claude YouTube 节目演示如何用可视化方式展现团队思维过程,帮助理解集体决策和协作模式。

AI Agents as "Games Masters"? 🎮🔥

中文介绍 Two Minute Papers 探讨AI智能体作为游戏大师的角色,分析其在游戏设计中引导和调整玩法的潜力。

Claude Opus 4.8: Lying Machine No More?

中文介绍 Two Minute Papers 分析Claude Opus 4.8是否解决了以往AI模型欺骗用户的问题,探讨其可靠性和真实性改进。

[AINews] not much happened today

a quiet day of RSI.

中文介绍 今日AI领域相对平静,主要关注RSI(递归自我改进)相关话题。

How to Stop Shipping Low-Quality RL Environments (with Examples)

Your broken harness is actively making the model worse. Here's what I keep seeing after years of eyeballing trajectories, and what you need to fix.

中文介绍 文章指出低质量的强化学习环境会损害模型性能,作者根据多年经验提供了修复方法和示例,强调环境质量的重要性。

The Meta hack shows there’s more to AI security than Mythos

On June 5, 404 Media reported that attackers had been using Meta’s AI customer support agent to steal Instagram accounts. Their approach was simple: They asked the agent to link the accounts to email addresses that they controlled, and the agent complied. One attacker broke into the dormant Obama Wh

中文介绍 404 Media报道,攻击者利用Meta的AI客服代理窃取Instagram账号,只需请求将账户链接到其控制的邮箱,代理即执行,揭示AI安全漏洞。

not much happened today

**Anthropic's Mythos/Opus cycle** sparked mixed reactions with praise for **Claude Mythos**'s one-shot workflows and concerns over **Opus 4.8** benchmark regressions. **Opus 4.7** showed strong chemistry task performance, "making Claude a chemist." **Sakana AI** launched an **RSI Lab** focusing on r

中文介绍 Anthropic的Mythos/Opus周期引发讨论,Claude Mythos的one-shot工作流受好评,而Opus 4.8的基准测试出现回归。Opus 4.7在化学任务上表现强劲。Sakana AI推出RSI项目。

The Claude Cowork product guide

The Claude Cowork product guide

中文介绍 Anthropic发布了Claude Cowork产品指南,介绍该协作工具的功能和使用方法。

Jun 5, 2026ScienceMaking Claude a chemist

Jun 5, 2026ScienceMaking Claude a chemist

中文介绍 Anthropic的研究团队成功将Claude模型训练成具备化学能力的助手,使其能处理化学任务。

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

We talk with the VendingBench authors on evaling Claudes from Haiku to Mythos, and how they build leading, and lasting, frontier evals from scratch.

中文介绍 Andon Labs的Lukas Petersson和Axel Backlund讨论了VendingBench,一个评估从Haiku到Mythos等Claude模型的工具,并分享了构建前沿评估的方法。

How Endava is redesigning software delivery around AI agents

Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.

中文介绍 Endava利用AI代理、ChatGPT Enterprise和Codex来加速软件交付、自动化工作流,并构建AI原生企业文化。

How courts are coping with a flood of AI-generated lawsuits

Most days in her chambers, Judge Maritza Braswell, a federal magistrate judge in Colorado, sifts through stacks of documents written by people without a lawyer. Many of them can’t afford to hire a lawyer, and others have cases too weak or too small to interest one. She reads each one carefully, mind

中文介绍 美国法院面临大量AI生成的诉讼文件,法官Maritza Braswell表示,许多文件由无律师的人使用AI撰写,增加审查负担。

今天早上举报gpt的那个人急哭了:rofl:

今天早上举报team渠道的那个b急哭了在疯狂跟进帖子 https://community.openai.com/t/a-bug-about-a-team-master-number-allowing-unlimited-pulling-of-people-for-reselling/1382824) 43 个帖子 - 37 位参与者 阅读完整话题

gpt模型暂时先0额度一次 呗

gpt系列模型限时0元一次 我也不知道能用多久,看看先吧 跑代码建议用codex分组,并发比较高( 20 个帖子 - 20 位参与者 阅读完整话题

『君の公益』 加入疯狂星期六

今日签到 520刀 今日倍率0.1 限时开放QQ邮箱注册! 200w刀 GPT sk-E4qf1OtlHvAk4TWIN9UoEc5kUU7x9Z4mEnLaUW7ZnuZx633m https://muyuan.do 253 个帖子 - 244 位参与者 阅读完整话题

今天的OpenAI是无敌的

晚上奥特曼起床,看到后台数据沉思片刻后宣布:我们已经占领了全球99%的市场! 15 个帖子 - 12 位参与者 阅读完整话题

【CHY公益站】复活吧,天才程序员!

本帖使用社区公益推广,符合推广要求。我申明并遵循社区要求的以下内容: 我的项目是免费使用的,无收费(变相收费、赞助)部分: 是 我的帖子已经打上 公益推广 标签: 是 我的项目属于个人项目,与公司或商业机构无关: 是 我的项目不存在QQ、TG等群组引流: 是 我的项目不存在非运营必要的网站引流: 是 我的项目不存在为他人推广、AFF: 是 我的项目无关联的商业项目: 是 我的站点存在登录,并已接入 LINUX DO Connect: 是 我帖子内的项目介绍,AI生成、润色内容部分已截图发出: 是 以上选择我承诺是永久有效的,接受社区和佬友监督: 是 以下为项目介绍正文内容,AI生成、润色内容已

gpt 100w刀

sk-4Lfnwge3HLngPwWXMfwp0whVddTakI09IvNjC7B8NaAeAAeS https://ai.centos.hk 比赛结束~ 156 个帖子 - 141 位参与者 阅读完整话题

佬们天天在L站干啥

今天周六,过来加班上线,然后并没有太多的事情,早上10点就基本没活了,然后就是要等到下午4点下班,结束后可以换一天调休,后面有个负责人说今天过来加班上线的人请吃饭,大概要等到5点一起出发,就在公司附近4 5公里,发现没有什么活干好无聊啊,一直在刷 L 站,但刷来刷去又没什么感兴趣的话题,佬们天天都在 L 站干什么呢?今天有加班的佬吗,来聊一聊啊,评论区当做故事会吧 34 个帖子 - 33 位参与者 阅读完整话题

newapi 邮件接口鉴权漏洞

该接口无需鉴权即可发信 ****.com/api/verification?email=123@gmail.com&turnstile= 不知道修了没,忘了发帖了 22 个帖子 - 19 位参与者 阅读完整话题

国产算力里程碑:千卡昇腾910C跑通DeepSeek 1.6万亿模型全参数后训练

由深圳河套学院、哈工大(深圳)、深圳市大数据研究院与华为相关团队组成,并协同深智城 AI 算力平台的联合攻关团队,宣布在国产 AI 算力平台上成功跑通 1.6 万亿参数大模型 DeepSeek-V4-Pro 的全参数后训练(Post-training)。 这是全球第三方机构首次在国产算力平台上完成 1.6 万亿参数规模模型的全参数后训练。 相较于从零开始的预训练(Pre-training),后训练阶段(主要包括监督微调 SFT 与强化学习 RL)侧重于通过高质量指令和人类偏好对齐,教导模型遵循指令并执行特定任务。然而,对于 1.6 万亿参数的 MoE 架构模型而言,全参数后训练依然对底层硬件的

OpenAi误封后给我1个月PRO X20的补偿,大家收到了吗?

昨天相信大家都看到了,封了好多号,我是菲律宾长连接买的PRO,心想算了随便回一封全中文的上诉信。没想到给我回了,今天上午发邮件给我说要补偿我一个月的PRO,给大家看一下 怎么说呢?还是因祸得福吧,最可气的是昨天我又买了个PROX20的 实属有点浪费了 不过想着下个月可以不用续费了,心里还是挺开心的,天才程序员上线了。就是额度没有重置 19 个帖子 - 12 位参与者 阅读完整话题

Ask HN: Why is the HN crowd so anti-AI?

Genuine question.Over the past six months, there hasn’t been a single day where I’ve checked the HN Best RSS feed without seeing a post about how AI “writes bad code,” “introduces bugs,” “creates technical debt,” or something along those lines.I’ll probably make a lot of enemies by saying this, but

The Quiet Numbers Station: Decoding Nineteen Years of GPS Cryptography

https://lsc-pagepro.mydigitalpublication.com/publication/?i=...PDF: https://cdn.coverstand.com/61061/865273/2c88ea662e2b57478723... (article is on page 62)Related: https://www.404media.co/the-u-s-military-quietly-turned-gps-...

Ask HN: What was your "oh shit" moment with GenAI?

Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws.Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.Using LLMs for coding initially was a only small step up from basic code