Archives
30 posts
2026-04
- Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing Apr 15, 2026
- Scheduler Overlap:CPU-GPU 调度级重叠 Apr 14, 2026 12776 words 32 min read
- AI Agent & Harness 设计模式 Apr 10, 2026 15908 words 40 min read
- SpecExit: Accelerating Large Reasoning Model via Speculative Exit Apr 07, 2026 11458 words 29 min read
- Reinforcement Learning for Chain of Thought Compression with One-Domain-to-All Generalization Apr 07, 2026 11414 words 29 min read
- LIFT: A Novel Framework for Enhancing Long-Context Understanding of LLMs via Long Input Fine-Tuning Apr 03, 2026 11547 words 29 min read
- OpenClaw 带来的思维范式转变 Apr 03, 2026 816 words 2 min read
2026-03
- TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate Mar 30, 2026 12603 words 32 min read
- SpargeAttention: 准确且无训练稀疏注意力加速任意模型推理 Mar 24, 2026 11341 words 28 min read
- L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models Mar 19, 2026 9955 words 25 min read
- KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction Mar 19, 2026 7087 words 18 min read
- AI-Researcher: Autonomous Scientific Innovation Mar 16, 2026 19057 words 48 min read
- Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding Mar 13, 2026 21711 words 54 min read
- Where Matters More Than What: DapQ 论文解读 Mar 12, 2026 12834 words 32 min read
- Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Mar 12, 2026 12875 words 32 min read
- DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference Mar 12, 2026 11105 words 28 min read
- Query-Aware Sparsity for Efficient Long-Context LLM Inference Mar 11, 2026 10371 words 26 min read
- ReAct: Synergizing Reasoning and Acting in Language Models Mar 11, 2026 6078 words 15 min read
- FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling Mar 10, 2026 18275 words 46 min read
- XAttention: Block Sparse Attention with Antidiagonal Scoring Mar 09, 2026 11712 words 29 min read
- FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling Mar 09, 2026 12742 words 32 min read
- DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation Mar 09, 2026 13512 words 34 min read
- ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning Mar 09, 2026 10258 words 26 min read
- Efficient Agent Training for Computer Use Mar 09, 2026 10348 words 26 min read
- Reasoning Models Generate Societies of Thought Mar 07, 2026 9299 words 23 min read
- GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Mar 06, 2026 11867 words 30 min read
- Stop Wasting Your Tokens: 高效运行时多智能体系统 Mar 06, 2026 7032 words 18 min read
- RAPID: 长上下文推理的检索增强推测解码 Mar 05, 2026 8948 words 22 min read
- 论文报告:Gated Attention for Large Language Models Mar 04, 2026 5783 words 14 min read
- RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding Mar 04, 2026 10058 words 25 min read