Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
SWE-AGI: benchmarking spec-driven software construction (arxiv.org)
1 point by mustaphah 4 days ago | past | 1 comment
Authenticated Workflows: A Systems Approach to Deterministic Agentic Controls (arxiv.org)
3 points by mrajagopalan 4 days ago | past | 1 comment
Formalization and Inevitability of the Pareto Principle (arxiv.org)
3 points by bikenaga 4 days ago | past | 1 comment
RL on GPT-5 to write better kernels (arxiv.org)
4 points by atallahw 4 days ago | past | 1 comment
Quantum observers can communicate across multiverse branches (arxiv.org)
2 points by lisper 4 days ago | past | discuss
Pushing Tensor Accelerators Beyond MatMul in a User-Schedulable Language (arxiv.org)
2 points by matt_d 4 days ago | past | discuss
HySparse: A Hybrid Sparse Attention Architecture (arxiv.org)
5 points by readitalready 4 days ago | past | discuss
Biases in the Blind Spot: Detecting What LLMs Fail to Mention (arxiv.org)
1 point by jari_mustonen 4 days ago | past | discuss
Evaluation of RAG Architectures for Policy Document Question Answering (arxiv.org)
1 point by PaulHoule 4 days ago | past | discuss
SoftMatcha 2: A Fast and Soft Pattern Matcher for Trillion-Scale Corpora (arxiv.org)
3 points by salkahfi 4 days ago | past | discuss
Opus: Towards Efficient and Principled Data Selection in LLM Pre-Training (arxiv.org)
2 points by onurkanbkrc 4 days ago | past | discuss
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters (arxiv.org)
1 point by onurkanbkrc 4 days ago | past | 1 comment
Faster and Cheaper Computations with Randomized Numerical Linear Algebra (arxiv.org)
2 points by PaulHoule 4 days ago | past | discuss
NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models (arxiv.org)
13 points by chrsw 4 days ago | past | discuss
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques (arxiv.org)
1 point by danielmorozoff 4 days ago | past | discuss
Grok4 sabotages shutdown 97% of the time,even if instructed not in system prompt (arxiv.org)
8 points by agenticagent 4 days ago | past | 4 comments
Biases in the Blind Spot: Detecting What LLMs Fail to Mention (arxiv.org)
4 points by typeofhuman 5 days ago | past | 1 comment
ARM MTE Performance in Practice (Extended Version) (arxiv.org)
2 points by PaulHoule 5 days ago | past | discuss
Attention Sinks and Compression Valleys in LLMs (arxiv.org)
1 point by alexkranias 5 days ago | past | discuss
Data-driven modelling of autonomous and forced dynamical systems (arxiv.org)
1 point by mnky9800n 5 days ago | past | discuss
Misconduct in Post-Selections and Deep Learning (2024) (arxiv.org)
3 points by bjourne 5 days ago | past | discuss
The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence (arxiv.org)
1 point by schmuhblaster 5 days ago | past | discuss
GRP-Obliteration: Unaligning LLMs with a Single Unlabeled Prompt [pdf] (arxiv.org)
2 points by janandonly 5 days ago | past | discuss
Harmless reward hacks generalize to shutdown evasion and dictatorship in GPT-4.1 (arxiv.org)
1 point by toliveistobuild 5 days ago | past | 1 comment
FullStack-Agent: Enhancing Agentic Full-Stack Web Coding (arxiv.org)
2 points by simonpure 6 days ago | past | discuss
Lightweight Memory Construction with Dynamic Evolution for LLM Agents (arxiv.org)
2 points by PaulHoule 6 days ago | past | discuss
Moltbook: Fast Response or Silence? (arxiv.org)
1 point by EagleEdge 6 days ago | past | discuss
Randomness in Agentic Evals (arxiv.org)
3 points by andre15silva 6 days ago | past | discuss
Large Language Model Reasoning Failures (arxiv.org)
1 point by mpweiher 6 days ago | past | discuss
We Should Separate Memorization from Copyright (arxiv.org)
1 point by 50kIters 6 days ago | past | discuss

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: