Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
A Survey of In-Context Reinforcement Learning (arxiv.org)
2 points by handfuloflight 58 minutes ago | past | discuss
Soft Contamination Means Benchmarks Test Shallow Generalization (arxiv.org)
1 point by cjbarber 2 hours ago | past | 1 comment
Study: Self-generated Agent Skills are useless (arxiv.org)
210 points by mustaphah 3 hours ago | past | 101 comments
Virtual Width Networks (VWN) (arxiv.org)
5 points by tesserato 4 hours ago | past | 1 comment
CodeLogician: Neuro-symbolic reasoning for precise software analysis (arxiv.org)
1 point by NTCTech 7 hours ago | past | 1 comment
Intelligent AI Delegation (2026) (arxiv.org)
1 point by Nydhal 7 hours ago | past | discuss
Delegated Agent Authorization Constrained to Semantic Task-to-Scope Matching (arxiv.org)
1 point by mooreds 8 hours ago | past | discuss
Evaluating AGENTS.md: are they helpful for coding agents? (arxiv.org)
25 points by mustaphah 12 hours ago | past | 5 comments
Multi-Agent Teams Hold Experts Back (arxiv.org)
1 point by fauigerzigerk 1 day ago | past | discuss
Large Language Model Reasoning Failures (arxiv.org)
1 point by kawera 1 day ago | past | discuss
Towards Autonomous Mathematics Research (arxiv.org)
104 points by gmays 1 day ago | past | 55 comments
Retrieval-Aware Distillation for Transformer-SSM Hybrids (arxiv.org)
2 points by readitalready 1 day ago | past | discuss
Biases in the Blind Spot: Detecting What LLMs Fail to Mention (arxiv.org)
2 points by mpweiher 2 days ago | past | discuss
A Framework for Time-Updating Probabilistic Forecasts (arxiv.org)
6 points by Luc 2 days ago | past | discuss
Towards Autonomous Mathematics Research (Google DeepMind) (arxiv.org)
1 point by u1hcw9nx 2 days ago | past | discuss
Remote Labor Index: Measuring AI Automation of Remote Work (arxiv.org)
2 points by Leynos 3 days ago | past | discuss
Generalized on-policy distillation with reward extrapolation (arxiv.org)
3 points by fzliu 3 days ago | past | discuss
OpenAI model proposes and proves Physics result (arxiv.org)
1 point by KothuRoti 3 days ago | past | discuss
An API for Biological Neural Networks (arxiv.org)
1 point by bwjx 3 days ago | past | discuss
Adversarial Patch: images that make classifiers ignore other items in a scene (arxiv.org)
1 point by felineflock 3 days ago | past | discuss
Maximum Agreement Linear Predictor (MALP) (arxiv.org)
1 point by tesserato 3 days ago | past | 1 comment
Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators (arxiv.org)
1 point by PaulHoule 3 days ago | past | discuss
Fine-Tuning GPT-5 for GPU Kernel Generation (arxiv.org)
4 points by matt_d 3 days ago | past | discuss
SWE-ContextBench: context learning benchmark in coding (arxiv.org)
1 point by mustaphah 3 days ago | past | discuss
LLMs exceed physicians on complex text-based differential diagnosis (arxiv.org)
3 points by rippeltippel 3 days ago | past | 2 comments
Horus: A Protocol For Trustless Verification Under Uncertainty (arxiv.org)
1 point by optimalsolver 3 days ago | past | discuss
Learning to Reason in 13 Parameters (arxiv.org)
2 points by stared 3 days ago | past | discuss
LLM Reasoning Failures (arxiv.org)
1 point by gradus_ad 3 days ago | past | discuss
Defining causal mechanism in dual process theory and 2 types of feedback control (arxiv.org)
1 point by s6i 3 days ago | past | discuss
Routing LLM queries using internal success predictions (70% cost reduction) (arxiv.org)
1 point by stansApprentice 4 days ago | past | 2 comments

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: