GitHub Trending
Chopratejas/headroom 项目在 LLM 前压缩工具输出、日志、文件等,节省 60-95% Token 且不影响答案质量,提供库、代理和 MCP 服务器。
推荐理由:直接降低 LLM 成本,开源可立即使用,实用性强。
GitHub Trending
Chopratejas/headroom 项目在 LLM 前压缩工具输出、日志、文件等,节省 60-95% Token 且不影响答案质量,提供库、代理和 MCP 服务器。
推荐理由:直接降低 LLM 成本,开源可立即使用,实用性强。
Hacker News
NSA 正在使用 Anthropic 的 Mythos 模型进行网络攻击操作,引发对 AI 军事化应用的激烈讨论。
推荐理由:涉及国家机构对前沿 AI 模型的使用,对行业安全治理有重大启示。
Smol AI News
微软发布 MAI-Thinking-1(35B MoE,256K 上下文),AIME 2025 达 97%,超越 Sonnet 4.6;同时推出 MAI 系列其他模型及 Surface RTX Spark Dev Box。
推荐理由:Microsoft 重大模型发布,性能数据扎实,值得关注其生态影响。
Anthropic Engineering
Anthropic 工程师详细介绍了为 claude.ai、Claude Code、Cowork 构建安全限制(containment)的经验,应对 Agent 能力增长带来的潜在风险。
推荐理由:深度工程实践分享,对 AI 安全与 Agent 开发有直接启发。
Hugging Face Blog
NVIDIA 发布 Nemotron 3.5 Content Safety,支持自定义多模态安全过滤,面向全球企业级 AI 部署。
推荐理由:企业 AI 安全需求明确,模型开源可定制,对运营团队有用。
GitHub Trending
NousResearch 开源 Hermes Agent,定位为“与你一起成长的智能体”,强调持续学习和适应性。
推荐理由:开源 Agent 框架,适合开发者探索动态学习机制。
TLDR AI
DeepSeek 正在融资;Meta 模型延迟发布;Google 推出 Gemma 4 12B 开源模型。
推荐理由:行业动态快讯,反映资本与竞争温度,适合快速扫读。
V2EX
Cloudflare 已完成对 VoidZero 的收购,具体细节待披露,社区热议其战略意图。
推荐理由:体现云基础设施厂商对新兴技术的整合,值得关注后续产品影响。
Anthropic Research
Anthropic 发布经济研究报告,探讨编码 Agent 在社会科学研究中的应用。
推荐理由:开拓 AI Agent 在非技术领域的应用场景,对跨学科研究者有启发。
MIT Tech Review AI
美国联邦法官每天需处理大量无律师人士用 AI 生成的法律文件,法院系统面临全新压力与合规问题。
推荐理由:揭示 AI 滥用对司法系统的实际冲击,具有社会视角价值。
Python · ★ 12,312 · 🍴 804 · 📈 3,139 stars today
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
中文介绍 在将工具输出、日志、文件或 RAG 片段送入 LLM 之前,Headroom 可压缩其内容,减少 60-95% 的 token 数而答案不变。提供库、代理和 MCP Server 三种集成方式,适用于降低 API 成本、提升推理效率的场景。
Python · ★ 180,880 · 🍴 31,020 · 📈 1,951 stars today
The agent that grows with you
中文介绍 Hermes Agent 是一个可随用户持续成长的 AI 代理框架,专注于个性化与自适应能力。适合需要长期陪伴、学习用户偏好并不断进化交互体验的个人助手或实验性应用场景。
JavaScript · ★ 207,137 · 🍴 31,800 · 📈 1,736 stars today
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
中文介绍 ECC 是一个面向 AI 编程代理(如 Claude Code、Cursor 等)的性能优化系统,提供技能、直觉、记忆、安全与研发优先的开发框架。帮助代理在复杂任务中更高效、稳定地执行。
Python · ★ 79,811 · 🍴 10,597 · 📈 105 stars today
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
中文介绍 PaddleOCR 是一个轻量、强大的 OCR 工具包,支持 100+ 语言,可将任意 PDF 或图片文档转为 AI 可直接处理的结构化数据。适合文档数字化、发票识别、LLM 前处理等场景。
Python · ★ 108,531 · 🍴 9,593 · 📈 311 stars today
💫 Toolkit to help you get started with Spec-Driven Development
中文介绍 Spec Kit 是 GitHub 官方提供的开发工具包,帮助团队快速上手面向规范驱动开发(Spec-Driven Development)工作流。适用于需要先定义契约、再实现接口的 API 设计和协作场景。
Jupyter Notebook · ★ 8,955 · 🍴 578 · 📈 244 stars today
NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.
中文介绍 NVIDIA Cosmos 是一个开源世界模型平台,包含预训练模型、数据集和工具,用于构建物理 AI,如机器人、自动驾驶汽车和智能基础设施。开发者可基于该平台训练和部署具身智能。
TypeScript · ★ 24,922 · 🍴 2,907 · 📈 482 stars today
An Open Source implementation of Notebook LM with more flexibility and features
中文介绍 Open Notebook 是 Notebook LM 的开源替代品,提供更灵活的文档笔记与问答能力。支持导入各类文档并生成笔记、摘要和对话式检索,适合研究、学习与知识管理场景。
Python · ★ 9,543 · 🍴 1,144 · 📈 583 stars today
Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms
中文介绍 Open-LLM-VTuber 是一个跨平台桌面应用,支持与任意 LLM 进行免提语音交互,具备语音中断、Live2D 面部动画等能力。适合直播、虚拟主播、桌面陪伴等场景。
★ 349,674 · 🍴 83,224 · 📈 740 stars today
A complete computer science study plan to become a software engineer.
中文介绍 一个完整的计算机科学学习计划,旨在帮助从零基础或非科班背景的人系统性准备软件工程师面试。内容涵盖算法、数据结构、系统设计等核心知识,并配有资源推荐。
Java · ★ 8,940 · 🍴 1,209 · 📈 107 stars today
Multi-platform SDK for integrating GitHub Copilot Agent into apps and services
中文介绍 GitHub Copilot SDK 是一个多平台开发工具包,帮助开发者将 GitHub Copilot Agent(AI 编程助手)集成到自有应用或服务中。适用于 IDE 插件、内部工具或自动化工作流。
Go · ★ 35,654 · 🍴 431 · 📈 255 stars today
Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more
中文介绍 Trivy 是一款开源的安全扫描器,可检测容器镜像、Kubernetes、代码仓库、云环境等中的漏洞、错误配置、密钥泄露和 SBOM。适合 DevOps 和云原生环境下的安全合规流水线。
C# · ★ 1,302 · 🍴 166 · 📈 358 stars today
Windows companion suite for OpenClaw - System Tray app, Shared library, Node, and PowerToys Command Palette extension
中文介绍 OpenClaw Windows Node 是 OpenClaw 的 Windows 伴侣套件,包括系统托盘应用、共享库、Node 支持和 PowerToys 命令面板扩展。主要用于在 Windows 环境下增强 OpenClaw 的集成与操作效率。
TypeScript · ★ 5,268 · 🍴 637 · 📈 308 stars today
A modern platform for visual, flexible, and extensible graph-based investigations. For cybersecurity analysts and investigators.
中文介绍 Flowsint 是一个面向网络安全分析师和调查人员的现代化可视化调查平台,支持基于图的灵活、可扩展分析流程。适合威胁追踪、事件响应和复杂关系挖掘场景。
Python · ★ 27,528 · 🍴 2,342 · 📈 173 stars today
AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
中文介绍 这是一个 AI 代理技能,可自动研究指定话题在 Reddit、X、YouTube、Hacker News、Polymarket 等平台过去 30 天的讨论,并生成有据可依的摘要。适合舆情监控、趋势分析和快速调研。
👍 1
LLMs can appear cautious in risk decision-making tasks, yet cautious-looking outputs do not necessarily indicate alignment with human decision-making mechanisms. We investigate this distinction using the St. Petersburg game as a controlled testbed, a classical paradox in which the expected payoff is
中文介绍 研究利用圣彼得堡游戏作为受控实验,探讨大语言模型在风险决策中的输出谨慎性是否真正与人类决策机制对齐,发现表象谨慎未必等价于机制对齐。
👍 10
Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured ob
中文介绍 提出ZipSplat方法,用于前馈式3D高斯泼溅,减少每个像素对应一个高斯的开销,使表示预算适配场景复杂度而非相机分辨率。
👍 3
Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key technical challenge for LLM-based deontic reasoning is that the relevant
中文介绍 DAR框架研究基于LLM的道义推理,即根据显式规则和事实回答法律或政策问题,如计算税负或判定移民上诉结果,分析了其中的技术挑战。
👍 3
Large language models (LLMs) are increasingly proposed as clinical agents, yet static, single-turn benchmarks cannot capture how a model dynamically delivers care across an encounter: gathering information, planning treatment, and adapting longitudinal management across successive patient states. Me
中文介绍 使用标准化病人案例评估大语言模型在动态临床决策中的表现,涵盖信息收集、治疗规划和长期管理,指出静态单轮基准无法反映实际诊疗能力。
👍 34
Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hac
中文介绍 研究基于评分标准的强化学习中奖励黑客现象,即策略模型利用LLM评判器的潜在偏见导致无效或危险训练,并分析了检测方法。
👍 3
Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold standard for TDA relies on causal interventions, observing how a model changes when data is added or removed, but repeated retraining is computationally challenging for Large Language Models (LLM
中文介绍 提出STRIDE方法,通过子集扰动的稀疏恢复实现训练数据归因,避免大规模模型重复训练,追踪模型预测与训练数据的因果关系。
👍 9
Scientific and engineering progress is fundamentally a long-horizon iterative process: proposing changes, running experiments, measuring outcomes, and continuously refining artifacts. Yet existing benchmarks for frontier models primarily evaluate either single-turn responses or short-horizon agent t
中文介绍 AutoLab基准测试评估前沿模型在长时间跨度的自主科研与工程任务上的表现,包括提出修改、运行实验和迭代优化,超越单轮问答。
👍 24
As multi-modal models advance towards long-form video understanding, memory emerges as a critical capability. Despite substantial efforts in developing video datasets and benchmarks, existing works primarily focus on perception and reasoning, without systematically evaluating memory: what models ret
中文介绍 提出M^3Eval,基于认知启发的视频任务评估多模态模型的记忆能力,填补现有视频基准在记忆评估方面的空白。
👍 11
Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-intensive. Recent end-to-end vectorized mapping methods can predict lane geometry and topology directl
中文介绍 MapAgent是一个工业级智能体框架,用于城市规模的车道级地图生成,解决传统端到端方法在多个城市上的高人工成本问题。
👍 1
Autoregressive mesh generation has gained attention by tokenizing meshes into sequences and training models in a language-modeling fashion. However, existing approaches suffer from two fundamental limitations: (i) low tokenization efficiency, which yields long token sequences and prevents scaling to
中文介绍 MeshWeaver通过稀疏体素引导的表面编织实现自回归网格生成,解决了现有方法标记化效率低和序列过长的问题。
👍 10
High-quality pretraining data is a central ingredient in modern language models, but German-language resources remain far less developed than their English counterparts: they are often smaller, less carefully curated, weakly documented, and rarely validated through controlled training experiments. W
中文介绍 KletterMix致力于提升德语预训练数据质量,指出德语数据在规模、筛选和文档化方面远落后于英语,并通过受控训练实验验证。
👍 24
Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiable Rewards (RLVR) on Chain-of-Thoughts (CoTs). However, since long CoTs naturally contain trial and errors and mainstream RLVR approaches choose outcome-correct CoT trajectories for memoriza
中文介绍 ThoughtFold通过内省偏好学习折叠推理链,改进大推理模型在长思维链中利用试错信息,克服结果正确但过程低效的问题。
👍 8
Existing benchmarks for MLLM-generated web artifacts assess interaction through local evidence and miss the requirement-induced states and transitions that determine whether a page works. We introduce WebRISE, which compiles task requirements into Interaction Contract Graphs (ICGs) of observable sta
👍 1
Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reasoning methods control thinking length by shortening, early-stopping, or compressing traces, leaving ho
👍 1
Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to reward functions. They define measurable outcomes, thresholds, and exceptions, while often leaving i
👍 0
LLM-agent budget overruns are a documented production failure class: a single retry loop can spend thousands of dollars before an operator notices, and the in-process integrity properties that would prevent it (no aliasing, no double-spend, no use-after-delegation of a cost-bearing value) are enforc
👍 1
Graph Language Models (GLMs) have become a promising direction for adapting Large Language Models (LLMs) to graph learning tasks. By transforming graph topology and node information into graph tokens, GLMs allow LLMs to jointly process structured graph inputs and textual instructions. Yet, it remain
👍 2
Large language model (LLM) agents are evolving from request-response assistants into long-running software actors: they maintain state across model calls, fork subtasks, wait for external events, request human authority, generate tools, and perform side effects that must be resumed and audited. This
👍 0
Low-Rank Adaptation (LoRA) successfully enables personalization in text-to-image generation by adapting pre-trained diffusion models to specific visual concepts and styles. However, extending such models to multi-concept customization remains challenging. Naively combining multiple LoRA weights or t
👍 28
Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous egocentric streams, often using evidence outside the current view. Existing benchmarks either evaluate offline over full videos or target events rather than spatial structure. We introduce
👍 6
Structured financial audit verification is difficult for language-model agents because correctness depends on structured evidence rather than text alone. A model must link reported facts to taxonomy concepts, traverse calculation or dimensional relations, and recompute expected values before applyin
👍 3
How can a population of agents self-orchestrate and self-adapt into stronger collective intelligence without centralized control? Inspired by Friedrich Hayek's economic theory of decentralized coordination in markets, we study this question through an agent economy in which agents compete via auctio
👍 11
On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more selective training paradigms. Recent OPD methods increasingly focus on selecting which trajectories to learn from, which tokens are most informative, and which supervision signals are most re
👍 1
AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genuinely useful, such systems must move beyond short-term video comprehension and address memory gaps that humans experience for practical, personal, or social purposes over longitudinal egocent
👍 1
Transfer learning aims to facilitate the learning of a target domain by transferring knowledge from a source domain. The source domain typically contains semantically meaningful samples (*e.g.*, images) to facilitate effective knowledge transfer. However, a recent study observes that the noise domai
👍 0
Unified multimodal models (UMMs) have emerged as a promising paradigm for general-purpose multimodal intelligence. As they are deployed in real-world applications, effectively updating internal knowledge becomes critical. While knowledge editing has matured for text-only models, it remains unclear w
👍 2
While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four
👍 2
Humans can effortlessly perceive spatial layouts, form cognitive representations, reason about spatial relations, and translate such reasoning into actions in everyday 3D environments. Although recent vision-language models (VLMs) have shown promising performance on observation-conditioned spatial p
👍 4
Weight-space model merging is usually formulated as an algebraic operation on checkpoints, yet at LLM scale the limiting resource is often the set of expert weights that must be read. We introduce MergePipe, a budget-aware execution layer that casts LLM merging as an expert access-set problem: given
👍 19
LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain largely centered on static, isolated, and short-horizon benchmarks that fail to capture the dynamic complexity of real-world production workflows. As
@elpresidank · 116 粉丝 · 2.9M 阅 · 543 赞 · 35 转
Most AI agent memory is built on embeddings. And there's now a proof that this entire class of system is going to forget what you stored in it — and confidently make up things you never stored at all.
中文介绍 证明基于嵌入的智能体记忆存在结构性遗忘——不仅会忘记存储内容,还会自信地编造从未存储过的信息。从拓扑角度解释记忆为何失效。
@1salman · 363 粉丝 · 2.0M 阅 · 682 赞 · 45 转
Everyone keeps asking whether AI favors specialists or generalists. I think that is the wrong question. AI does not pick a side. It changes the tradeoff. The old world forced a choice. You could go
中文介绍 认为AI并不偏向通才或专才,而是改变了选择权衡。旧世界迫使人二选一,AI打破了这一限制。探讨AI时代的能力模型新范式。
@zodchiii · 20.0K 粉丝 · 743.3K 阅 · 509 赞 · 55 转
Four AI agents can ship a feature while you sleep. Most people never wire them up. They fire a reviewer here, a test generator there, by hand, one at a time, each forgetting what the last one did.
中文介绍 分享构建4个AI智能体团队的精确设置,让其在睡眠时自动发布功能。多数人仍手动逐个触发智能体,彼此遗忘上下文。
@eng_khairallah1 · 61.9K 粉丝 · 693.5K 阅 · 511 赞 · 71 转
Obsidian has 2,700+ community plugins. Over 100 of them are AI-related. Save this :) And the CEO of Obsidian personally published official Claude Skills for the platform - 12,900+ GitHub stars in
中文介绍 列举30个Obsidian不为人知的工作流、插件与配置,其中100多个为AI相关插件。还提到Obsidian CEO官方发布了Claude Skills,获12900+ GitHub星标。
@0xCodez · 3.3K 粉丝 · 637.2K 阅 · 510 赞 · 59 转
Most Claude Code users still write their workflows by hand. They chain prompts, copy outputs, paste them into the next prompt, fix what went wrong, repeat. 9 out of 10 builders haven’t tried Dynamic
中文介绍 展示如何用Claude Code掌握动态工作流:6种模式与14步实战。指出9成用户仍在手动链式操作,未尝试动态工作流。
@prukalpa · 23.1K 粉丝 · 583.2K 阅 · 506 赞 · 80 转
A field guide to what it is, what it is not, and where it fits in your AI architecture. I have had some version of the same conversation with a CIO almost every day this year. Their team has read
中文介绍 企业AI架构中「上下文层」的实战指南。解释其本质、非本质及在AI架构中的定位。源自与多位CIO的日常对话。
@polydao · 18.1K 粉丝 · 559.5K 阅 · 505 赞 · 55 转
Most people are still using Claude like a smarter chatbot That is not the game anymore You’re competing against people who treat Claude like an operating system > While you’re typing one-off
中文介绍 淘遍整个Claude Skills生态系统,筛选出真正值得使用的技能合集,并附完整GitHub链接。批评多数人仍把Claude当高级聊天机器人用。
@theonejvo · 22.1K 粉丝 · 504.3K 阅 · 861 赞 · 1 转
Over the past year, @pewdiepie, has been turning into one of the most visible champions of private, self-hosted computing, and it has been a genuine pleasure to watch. What began in late 2025 as an
中文介绍 实际演示如何用恶意Cocomelon网站攻击PewDiePie的AI智能体系统,随后帮助加固防护。展示私有自托管计算的脆弱性与防护思路。
@Saboo_Shubham_ · 116.2K 粉丝 · 263.3K 阅 · 517 赞 · 74 转
The frontend used to be a fixed thing. Designers drew it. Engineers built it. Users got what shipped. That's over. The interfaces shipping in 2026 are drawn partly by the agent itself, in real time,
中文介绍 认为生成式UI正在取代固定前端:2026年的界面将由智能体实时绘制,设计师画图、工程师开发、用户被动接收的时代已终结。
@monokern · 1.2K 粉丝 · 263.1K 阅 · 505 赞 · 72 转
Most people treat research as a manual task. You open 10 tabs. You watch videos. You read articles. You take notes somewhere. An hour later you have a pile of information you're not sure what to do
中文介绍 整合Claude Code、NotebookLM与Obsidian打造研究流程:非手动开10个标签页,而是每次使用都让系统更聪明。自动化吸收、整理信息。
@garrytan · 853.3K 粉丝 · 180.6K 阅 · 503 赞 · 43 转
In January I got back into coding and I built Garry's List. Over five hundred thousand lines of Rails and the tests to police it. I was proud of it. I shouldn't have been. The thing worth being proud
中文介绍 反思自己用Rails写了50万行代码:真正值得骄傲的不应是大量手工代码,批评为智能体建「富士康工厂」式的大规模生产流程。
@base · 1.3M 粉丝 · 97.3K 阅 · 519 赞 · 74 转
TL;DR: Agents are becoming the internet’s newest paying customers, and the economy serving them is moving fast. On Base, agents already use wallets and stablecoins to pay for inference, live search,
中文介绍 智能体正成为互联网的新付费客户:在Base上,智能体已用钱包和稳定币支付推理、实时搜索等费用,智能体经济已在路上。
@dair_ai · 124.6K 粉丝 · 84.0K 阅 · 504 赞 · 83 转
1. SkillOpt Microsoft Research treats a compact natural-language skill document as the trainable state of a frozen agent, then learns that document through rollouts, reflection, and bounded edits
中文介绍 本周Top AI论文精选。重点介绍微软研究院的SkillOpt:将自然语言技能文档作为冻结智能体的可训练状态,通过rollout、反思和有界编辑学习。
@nicbstme · 23.7K 粉丝 · 84.0K 阅 · 530 赞 · 35 转
My agent manages my emails, SMS, Whatsapp, Telegram and pretty much everything to automate my personal life. People keep asking me how I use agents in real life. I mean the actual boring things that
中文介绍 分享个人生活自动化智能体堆栈:管理邮件、短信、WhatsApp、Telegram等。专注真实而非炫酷的日常自动化场景。
@mem0ai · 17.6K 粉丝 · 82.8K 阅 · 520 赞 · 60 转
Agent harnesses are where AI software actually runs. Cursor, Devin, Claude Code, Codex: these environments handle context, orchestrate tools, coordinate agents, and increasingly, manage memory. The
中文介绍 分析主流智能体框架(Cursor、Devin、Claude Code、Codex)中记忆管理的现状。这些环境处理上下文、编排工具、协调智能体,记忆日益关键。
@ParadisLabs · 48.9K 粉丝 · 82.0K 阅 · 501 赞 · 60 转
AI's next frontier will be Robotics and Humanoids. The past decade has seen rapid AI adoption in the structured digital world. Those LLM breakthroughs now enable more general-purpose learning and more
中文介绍 机器人是AI下一个前沿。过去十年AI在结构化数字世界快速落地,LLM突破正赋能更通用的学习与更自主的物理世界能力。
@trq212 · 263.1K 粉丝 · 75.7K 阅 · 542 赞 · 36 转
Last week, we released dynamic workflows in Claude Code. Claude can now write its own harness on the fly, custom-built for the task at hand. While the default Claude Code harness is built for coding,
中文介绍 发布Claude Code动态工作流:Claude能即时编写自己的定制框架,不再局限于默认编码场景,每个任务都有专属框架。
@drfeifei · 738.0K 粉丝 · 72.2K 阅 · 699 赞 · 144 转
“The world is everything that is the case.” — Ludwig Wittgenstein, Tractatus Logico-Philosophicus, 1921 The world is not made of words. In an earlier essay, we argued that spatial intelligence is AI’s
中文介绍 提出世界模型的功能分类法。引用维特根斯坦名言论述世界非由词构成,空间智能是AI的下一个重大方向。
@EXM7777 · 115.1K 粉丝 · 70.1K 阅 · 520 赞 · 47 转
There's a reason some people seem to be constantly shipping the best software, writing incredible content, or generating insane images... They adopted the eval loop, while you... You've tried better
中文介绍 指出消除AI内容「低质化」的关键在于采用评估循环(eval loop),而非仅用更好的提示词。分享通过Hermes框架实现持续迭代优化的方法。
@servasyy_ai · 33.0K 粉丝 · 267.9K 阅 · 7d 曝光 267.9K
30分钟掌握Codex的97%功能(完整教程)
@yanhua1010 · 32.0K 粉丝 · 88.4K 阅 · 7d 曝光 88.4K
这应该是目前最接近正解的 Agent 记忆方案
@drfeifei · 738.0K 粉丝 · 72.2K 阅 · 7d 曝光 72.2K
A Functional Taxonomy of World Models
@Saboo_Shubham_ · 116.2K 粉丝 · 263.3K 阅 · 7d 曝光 263.3K
Generative UI Is the New Frontend
@sydneyrunkle · 7.5K 粉丝 · 69.5K 阅 · 7d 曝光 69.5K
How to Build a Custom Agent Harness
@delba_oliveira · 74.0K 粉丝 · 37.4K 阅 · 7d 曝光 37.4K
Feedback loops: Help Claude Code complete ambitious tasks with less babysitting
@ericzakariasson · 67.9K 粉丝 · 37.3K 阅 · 7d 曝光 37.3K
Don't let your agent guess, give it runtime context
@0xCodez · 3.3K 粉丝 · 637.2K 阅 · 7d 曝光 637.2K
How to master Dynamic Workflows in Claude Code: 6 patterns and 14 steps Anthropic engineers actually
@0xEcho99 · 4.4K 粉丝 · 29.2K 阅 · 7d 曝光 29.2K
Claude Code 配上这几个 Skills,基本能爬遍全网
@IBuzovskyi · 1.2K 粉丝 · 50.1K 阅 · 7d 曝光 50.1K
10 HERMES AGENT HACKS THAT TURNED MY CHAT AGENT INTO A 24/7 SYSTEM
@xiaohu · 108.2K 粉丝 · 101.8K 阅 · 7d 曝光 101.8K
Codex 发布重大更新 不再只是编码 捆绑 62 个应用和 110 个自动化技能 面向白领办公
@trq212 · 263.1K 粉丝 · 75.7K 阅 · 7d 曝光 75.7K
A harness for every task: dynamic workflows in Claude Code
@mvanhorn · 27.6K 粉丝 · 54.5K 阅 · 7d 曝光 54.5K
Every Agentic Engineering Hack I Know (June 2026)
@mem0ai · 17.6K 粉丝 · 82.8K 阅 · 7d 曝光 82.8K
State of Memory in Agent Harness
@subahwadhwani · 5.2K 粉丝 · 355.9K 阅 · 7d 曝光 705.1K
X Just Got Its TikTok Moment. It's Called Commentary.
@elpresidank · 116 粉丝 · 2.9M 阅 · 7d 曝光 2.9M
Context as Topology: Why Your Agent's Memory Forgets, and How Structure Escapes It
中文介绍 证明基于嵌入的智能体记忆存在结构性遗忘——不仅会忘记存储内容,还会自信地编造从未存储过的信息。从拓扑角度解释记忆为何失效。
中文介绍 ChatGPT 与 Codex 正在合并,这一变化可能彻底改变编程和AI交互方式。
中文介绍 Anton Osika 在 Lovable 探讨问题解决的方法论,分享如何通过AI工具提升编程效率。
中文介绍 通过Claude展示团队思维的可视化,提升协作效率。
中文介绍 Max Junestrand 在 Legora 探讨问题解决,分享AI在项目管理中的应用。
中文介绍 Anton Osika 在 Lovable 探讨问题解决的方法论,分享如何通过AI工具提升编程效率。
中文介绍 通过Claude展示团队思维的可视化,提升协作效率。
中文介绍 Max Junestrand 在 Legora 探讨问题解决,分享AI在项目管理中的应用。
中文介绍 介绍一款名为AI“共同科学家”的新工具,它正在颠覆科学研究的方式。
中文介绍 Claude Opus 4.8 是否不再说谎?评测其诚实性和可靠性。
中文介绍 AlphaFold 能否二度获得诺贝尔奖?探讨其科学贡献。
中文介绍 Jeff Dean 探讨AI计算能力提升百万倍后的影响,展望未来人工智能发展。
中文介绍 比较爱因斯坦与费曼,AI用于物理学研究,谁会更胜一筹?
We talk with the VendingBench authors on evaling Claudes from Haiku to Mythos, and how they build leading, and lasting, frontier evals from scratch.
中文介绍 Latent Space 与 Andon Labs 的 Lukas Petersson 和 Axel Backlund 讨论了 VendingBench 项目,该项目评估 Claude 系列模型(从 Haiku 到 Mythos)并提出构建前沿评测的方法。
中文介绍 NVIDIA 发布 Nemotron 3.5 Content Safety,该模型旨在为全球企业提供可定制的多模态内容安全方案,支持多种安全场景。
中文介绍 EVA-Bench 数据集 2.0 版本发布,覆盖 3 个领域,包含 121 个工具和 213 个场景,用于评估 AI agent 的工具使用能力。
Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.
中文介绍 Endava 利用 AI agent、ChatGPT Enterprise 和 Codex 加速软件交付,自动化工作流,构建企业 AI 原生文化。
Most days in her chambers, Judge Maritza Braswell, a federal magistrate judge in Colorado, sifts through stacks of documents written by people without a lawyer. Many of them can’t afford to hire a lawyer, and others have cases too weak or too small to interest one. She reads each one carefully, mind
中文介绍 美国法院面临大量由 AI 生成的法律文书,科罗拉多州联邦治安法官 Maritza Braswell 指出,许多当事人无力聘请律师,导致法院审阅负担加重。
ChatGPT introduces a new memory system to better remember preferences, keeping context fresh and relevant across conversations.
中文介绍 OpenAI 为 ChatGPT 推出新记忆系统,名为 Dreaming,能够更有效地记住用户偏好,保持对话上下文的新鲜性和相关性。
a quiet day.
中文介绍 最新 AI 动态:Reve 2 和 Ideogram 4 发布,增强图像生成中的布局控制能力,但当日其他方面较平静。
An action plan for AI-powered biological resilience
中文介绍 OpenAI 发布一份行动方案,阐述如何利用 AI 提升生物防御能力,增强对生物威胁的 resilience。
中文介绍 Hugging Face 设计了 agent 优化版 CLI 工具,简化与 Hub 的交互流程,提升 agent 使用效率。
中文介绍 当日 AI 快讯:DeepSeek 正在进行融资;Meta 模型发布推迟;Google 推出 Gemma 4 12B 模型。
Verified Generation and Compounding Intelligence
中文介绍 Axiom Math 的 Carina Hong 讨论了验证生成和复合智能,旨在推动 AI 从非正式推理向可验证推理扩展。
The legendary Microsoft CEO makes his first Latent Space appearance!
中文介绍 微软 CEO Satya Nadella 在 Microsoft Build 大会上接受 Latent Space 采访,这是其首次在该平台露面。
GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.
中文介绍 OpenAI 为 GPT-Rosalind 新增功能,提升其在生命科学领域的生物学推理、药物化学、基因组分析和实验流程能力。
中文介绍 文章探讨直接偏好优化(DPO)在聊天机器人之外的应用,拓展其在其他 AI 场景中的潜力。
See how Wasmer used Codex with GPT-5.5 to build a Node.js runtime for the edge, accelerating development 10x to 20x and shipping in weeks instead of months.
中文介绍 Wasmer 使用基于 GPT-5.5 的 Codex 构建边缘端 Node.js 运行时,开发速度提升 10~20 倍,交付周期从数月缩短至数周。
26 回复 · 程序员 节点
6 回复 · 程序员 节点
15 回复 · 程序员 节点
9 回复 · 程序员 节点
5 回复 · 程序员 节点
7 回复 · 程序员 节点
11 回复 · Apple 节点
10 回复 · Apple 节点
26 回复 · Apple 节点
17 回复 · Apple 节点
应该没问题了 muyuan.do 不要在我这里浪费时间,去做你该做的事情,去爱你该爱的人 119 个帖子 - 118 位参与者 阅读完整话题
L站的佬友们好,我入行这个AI漫剧这行已经三个月了,基本都是接触的海外项目,然后这个月刚做了海外阅文和抖音合作的海外项目,在审核阶段,就和大家分享一下自己做真人海外的心得 人物站位和人物走位控制 情绪参考 工具使用 画面,镜头的参考 最后,大家做这个真人时候,我建议大家看着提示词过一遍画面,脑子里面有这个画面的详细情况,你能知道这个画面是不是符合你的想法,还有就是,剪辑,配乐真的很重要,有能力的佬友可以去抖音找找suno ai音乐的教程,我这几天都在做这个的广告项目 欢迎佬友来问我问题,但是我上班时间一般不会登陆L站,但是晚上9点钟后到家一般会回复,还有如果是小白要入这行现在,我的建议是,先按
l0veyou公益站推出了数字人模型,一次可生成10秒,效果非常逼真,没过多久我会给它支持到生成20秒一个视频,这样更好 在这里选择音频(上传的音频需要大于两秒,否则会生成视频失败),然后点击上传模型,我需要在右下角的加号,点击上传参考图,然后提示词可以这样填:她在说话。 另外提醒一下, Ai生图暂时用不了了。我今天修了一天也没修好,可能至少要明天才能好了。今晚也不知道还有没有得睡,因为明天我还要上架新模型,明天我要上架gemini3.1pro(0.1ldc一句话)和3d模型生成(生成出来的3d模型,它的零件可以直接拆分,而不是一整块模型,效果非常好) 14 个帖子 - 9 位参与者 阅读完整
停机时间延长至今晚10点 159 个帖子 - 152 位参与者 阅读完整话题
今日早上八点半奥特曼重置了所有账号,当前正在管理9个Pro20x,刚好可以拿来估计一波GPT订阅的额度。 账号: 到截图时间的用量数据: 模型 输入 输出 缓存读取 合计 Token GPT5.5 62,178,157 1,759,623 245,985,792 309,923,572 GPT5.4 284,049,752 12,110,928 1,555,673,600 1,851,834,280 平均缓存命中率: 83.2% 到截图时间的费用数据: GPT5.4:1313 美元 到截图时间9个账号合计使用周限制: 7% + 9% + 8% + 8% + 8% + 7% + 11% + 10
转自新华社 记者从教育部获悉,今年全国高考报名人数为1290万人。 对于高校招生宣传工作,教育部强调,严禁虚假宣传、违规承诺,严禁以任何形式炒作“高考状元”“高分考生”“升学率”。推动高校录取通知书回归“一页纸”,坚决纠治奢华录取通知书、新生礼盒等不良风气。 来源:新华社微信公众号综合人民日报客户端(记者:吴丹) 有佬友知道这是何意味吗? 86 个帖子 - 38 位参与者 阅读完整话题
考虑休息一会儿 16 个帖子 - 15 位参与者 阅读完整话题
我现在是浑身冒火 44 个帖子 - 42 位参与者 阅读完整话题
我合计合计得买个新服务器了 极其的夸张 137 个帖子 - 128 位参与者 阅读完整话题
美国企业支出管理平台 Ramp 发布的 2026 年 6 月报告显示,中国 AI 公司 DeepSeek 登上了热门软件榜首。尽管美国官方先前高度防范中国大模型,但真实的商业交易数据却揭示了相反的现状。Ramp 分析了平台上 5 万多家企业的信用卡消费记录,发现许多美国公司并未在本地部署开源模型,而是直接掏钱购买 DeepSeek 官方的托管 API 服务。这意味着,大量美国企业的数据正直接发送并存储在位于中国的服务器上。 真实的资金流向与一年多前美国社会对 DeepSeek R1 刚发布时的警惕态度形成了强烈反差。当时出于对泄密和安全的担忧,美国大公司和政府机构普遍限制使用中国模型。然而,面
To my knowledge, this is the first formally verified implementation of an intersection algorithm for polygons.The experience of working with AI agents on this project changed a lot with recent model releases, as I describe in the readme. Opus 4.8 is able to provide algorithm implementation with form
43 points · 15 comments
Built a browser-based FFmpeg editor that runs entirely client-side via WebAssembly. Your files never leave your device -- all processing happens in a Web Worker. Works offline as an installable PWA after first load.
54 points · 26 comments
138 points · 49 comments
34 points · 13 comments
As a high school student, I’m trying to figure out what major I’m interested in. About half a year ago, I thought EECS was a great major for some STEM students like me, because I see many of the world's most influential entrepreneurs, such as Elon Musk and Jensen Huang, have built companies aro
155 points · 141 comments
282 points · 262 comments
1 points · 0 comments
214 points · 275 comments
209 points · 141 comments
101 points · 10 comments
27 points · 17 comments
530 points · 238 comments
It was discussed a year ago. https://news.ycombinator.com/item?id=44235467
We launched Infracost on HN five years ago (https://news.ycombinator.com/item?id=26064588) where our CLI generated cost estimates for infra-as-code, e.g. "this Terraform PR adds $400/mo". The idea was to shift cloud costs (FinOps) left, so engineers get visibility of co
446 points · 168 comments
168 points · 64 comments
You can get a 2h free trial by solving a proof-of-work captcha when topping up your account for the first time.If you'd like to learn more, an independent interview was posted a couple of weeks ago [1], and the FAQ [2] has a lot of information as well.For the source code sharing, we've tal
712 points · 675 comments
959 points · 377 comments
17 points · 2 comments
25 points · 3 comments
198 points · 52 comments
67 points · 25 comments
51 points · 20 comments
65 points · 5 comments
45 points · 4 comments
47 points · 10 comments