NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
Overview Large language models are increasingly employed for scientific law discovery, yet existing benchmarks fail to capture the dynamic, interactive nature of genuine research in complex physical systems across domains. The authors introduce Newto...
paperium.hashnode.dev2 min read