AI World's 10 Notable Research Papers
#Week

Academic Productivity Can GenAI Improve Academic Performance? Evidence from the Social and Behavioral Sciences
GenAI adoption increases research productivity significantly, with early-career researchers and non-English speakers benefiting most.
Political Bias Defining and Evaluating Political Bias in LLMs
Less than 0.01% of ChatGPT responses show political bias; GPT-5 models reduced bias by 30% versus GPT-4o.
Security Poisoning Attacks Require Near-Constant Samples Regardless of Model Size
Just 250 malicious documents can backdoor any size LLM, challenging assumptions about scale-based security.
Consumer Research LLMs Reproduce Human Purchase Intent via Semantic Similarity
LLMs achieve 90% of human test-retest reliability while maintaining realistic response distributions for consumer research.
Agent Architecture AgentFlow: In-the-Flow Optimization for Planning and Tool Use
7B AgentFlow model surpasses GPT-4o with +14.9% on search, +14.0% on agentic, +14.5% on math tasks.
Single-Cell AI Scaling Large Language Models for Next-Generation Single-Cell Analysis
Researchers trained Large Language Models on over one billion tokens of transcriptomic data using the Cell2Sentence framework.
Economic Policy Preparing for AI's Economic Impact: Exploring Policy Responses
Anthropic groups potential policy responses to AI's economic disruption into three tiers, tied to the pace and intensity of change.
Compute Scaling The Art of Scaling Reinforcement Learning Compute for LLMs
First extensive investigation into RL compute scaling, analyzing over 400,000 GPU-hours to understand algorithmic choices.
Model Monitoring MALT: A Manually-Reviewed Dataset for Detecting Reward Hacking and Sandbagging in LLMs
Simple monitoring systems can identify reward hacking and sandbagging with 80-90% accuracy at a 5% false positive rate.
Agent Evaluation Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
HAL orchestrates large-scale parallel evaluations across 21,000+ agent rollouts, revealing unexpected behaviors in AI systems.