OpenAI has announced a new testing benchmark for ChatGPT’s GPT-5 model that involves pitting it directly against human occupations through a variety of real-world tasks.
The GDPval full set includes 1,320 specialized tasks, each meticulously crafted and vetted by experienced professionals with over 14 years of experience on average from these fields.
OpenAI
The new benchmark is called GDPval and is responsible for assessing how close ChatGPT comes to outperforming humans at “economically valuable, real-world tasks”. That means moving beyond things like academic tests and coding competitions towards jobs that are carried out in the real world: nursing, financial management, engineering or journalism to name a few.
This is all part of OpenAI’s effort to establish artificial general intelligence (AGI) and the company notes that its GPT-5 model (and Anthropic’s Claude Opus 4.1) “are already approaching the quality of work produced by industry experts.”
You may like
A graph showing the various AI models and how they compare when tested against a human expert in a particular industry. (Image credit: OpenAI)
In a blog post explaining the new testing, OpenAI explained: “Unlike traditional benchmarks, GDPval tasks are not simple text prompts.
“They come with reference files and context, and the expected deliverables span documents, slides, diagrams, spreadsheets, and multimedia. This realism makes GDPval a more realistic test of how models might support professionals.”
“The GDPval full set includes 1,320 specialized tasks (220 in the gold open-sourced set), each meticulously crafted and vetted by experienced professionals with over 14 years of experience on average from these fields. Every task is based on real work products, such as a legal brief, an engineering blueprint, a customer support conversation, or a nursing care plan.”
What jobs is OpenAI testing ChatGPT against?
The tasks covered 44 different jobs across nine different industries. Here’s the full list:
Real Estate, rental and leasingConciergesProperty, real estate, and community association managersReal estate sales agentsReal estate brokersCounter and rental clerksGovernmentRecreation workersCompliance officersFirst-line supervisors of police and detectivesAdministrative services managersChild, family, and school social workersManufacturingMechanical engineersIndustrial engineersBuyers and purchasing agentsShipping, receiving, and inventory clerksFirst-line supervisors of production and operating workersProfessional, scientific, and technical servicesSoftware developersLawyersAccountants and auditorsComputer and information systems managersProject management specialistsHealth and social careRegistered nursesNurse practitionersMedical and health services managersFirst-line supervisors of office and administrative support workersMedical secretaries and administrative assistantsFinance and insuranceCustomer service representativesFinancial and investment analystsFinancial managersPersonal financial advisorsSecurities, commodities and financial services sales agentsRetailPharmacistsFirst-line supervisors of retail sales workersGeneral and operations managersPrivate detectives and investigatorsWholesale tradeSales managersOrder clerksFirst-line supervisors of non-retail sales workersSales representatives, wholesale and manufacturing, except technical and scientific productsSales representatives, wholesale and manufacturing, technical and scientific productsMediaAudio and video techniciansProducers and directorsNews analysts, reporters, and journalistsFilm and video editorsEditorsSo, will AI take my job?
It’s the $64,000 question and the answer, probably, is yes. Or at least AI will take some measure of your job. OpenAI itself notes GDPval is an “early step that doesn’t reflect the full nuance of many economic tasks.”
Additionally, while the test “spans 44 occupations and hundreds of knowledge work tasks, it is limited to one-shot evaluations, so it doesn’t capture cases where a model would need to build context or improve through multiple drafts.”
There’s still a long way to go, and a recent study claimed ChatGPT still routinely gets things wrong. But OpenAI is working hard on hitting AGI and says that future versions will extend to more interactive workflows and context-rich tasks to “better reflect the complexity of real-world knowledge work”.
The fact that AI will reshape our working landscape is pretty much a foregone conclusion at this point. But the way in which it’s integrated into most societies is still very much in the hands of humans, business leaders and customers. There will always be work for humans to do, that’s also a foregone conclusion, but the type of work is almost certain to look a lot different in the decades to come.
Follow Tom’s Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!
Back to Laptops
SORT BYPrice (low to high)Price (high to low)Product Name (A to Z)Product Name (Z to A)Retailer name (A to Z)Retailer name (Z to A)![]()
Show more