frontiermath news - Search News

New secret math benchmark stumps AI models and PhDs alike

FrontierMath's performance results, revealed in a preprint research paper, paint a stark picture of current AI model ...

1don MSN

A new math benchmark just dropped and leading AI models can solve 'less than 2%' of its problems... oh dear

Sometimes I forget there's a whole other world out there where AI models aren't just used for basic tasks such as simple ...

AI’s math problem: FrontierMath benchmark shows how far technology still has to go

FrontierMath, a new benchmark from Epoch AI, challenges advanced AI systems with complex math problems, revealing how far AI still has to go before achieving true human-level reasoning.

2don MSN

Testing AI systems on hard math problems shows they still perform very poorly

A team of AI researchers and mathematicians affiliated with several institutions in the U.S. and the U.K. has developed a ...

Epoch AI Launches FrontierMath AI Benchmark to Test Capabilities of AI Models

Epoch AI highlighted that to measure AI's aptitude, benchmarks should be created on creative problem-solving where the AI has ...

Sam Altman claims AGI is coming in 2025 and machines will be able to 'think like humans' when it happens

AGI is a form of AI that is as capable as, if not more capable than, all humans across almost all areas of intelligence. It has been the ‘holy grail’ for every major AI lab, and many predicted it ...

OpenAI, Microsoft, Meta Advance New AI Tests As Transparency Concerns Grow

Tech giants struggle to evaluate AI progress and advancements, raising concerns about transparency and standardized ...

Digital information world43m

Study: Major Companies Lag in Reporting Phishing Scams Using Their Brand Names

According to a new research by Drexel University and Arizona State University presented at the International Symposium on ...

AI groups rush to redesign model testing and create new benchmarks

Companies conduct “evaluations” of AI models by teams of staff and outside researchers. These are standardised tests, known as benchmarks, that assess models’ abilities and the performance of ...

Analytics India Magazine3d

OpenAI is So Doomed if Inference Time Scaling for o1 Fails

OpenAI’s progress from GPT-4 to Orion has slowed, The information reported recently. According to the report, although OpenAI ...

The New York Times19h

New York Times - Top Stories

Trump Taps Matt Gaetz for Attorney General, a Provocative Move President-elect Trump plans to nominate the Florida congressman, among a flurry of personnel announcements as Republicans neared ...

The New York Times1d

World News

The Biden administration gave Israel 30 days to increase the flow of aid, warning that aid shipments into Gaza in September had reached an alarmingly low level. By Liam Stack and Aaron Boxerman ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results