The quant shop — AI lab convergence

Grant Stenger is the CEO of Kinetic. Richard Dewey is the managing partner of Revenant Ventures and CEO of Allometry Labs.

Almost exactly a year ago, DeepSeek was released to great fanfare around the world. Acknowledged, but somewhat overlooked, was the fact that the open-weight LLM didn’t come from an American AI lab or internet giant.

DeepSeek is owned and funded by High-Flyer, a Hangzhou-based quantitative hedge fund that reportedly built a ca $14bn portfolio using AI-driven trading models, before deciding to pivot its research towards “pursuing AGI” in 2023.

Why is a Chinese quant shop behind one of the world’s strongest open-weight LLMs? It turns out that modern quantitative investing and frontier AI labs are converging on the same institutional machine: large-scale learning systems attached to balance sheets.

One optimises trading revenues, while the other optimises ad or subscription revenues. Under the hood, they share a pipeline — data, model, constraints, execution, feedback — and increasingly a talent pool, a hardware stack, and strict IP norms.

This is not to imply AI labs are becoming hedge funds; the narrower claim is that the machine — data, models, constraints, deployment, feedback — is converging. Here’s a rundown of how, where and why quants and AI labs increasingly mirror each other, and what it might mean.

The quant shop as an AI lab — and vice versa

Walk into a modern systematic shop in New York, London or Shanghai and you don’t see much of the old Wall Street. No more coloured jackets, no shouting. It’s all hackathon shirts and mechanical keyboard clacking now. In fact, if you sub out the quarter zips and Matt Levine for NeurIPS swag and Dwarkesh it looks almost identical to an AI lab.

If you’re a quant, here’s what the pipeline looks like in concrete terms: your data is tick-by-tick prices, order-book updates, trades, corporate events, macro releases, alt-data. Your model is an alpha forecast — a conditional distribution of future returns given the current state. Your constraints are portfolio construction, risk limits, leverage caps, liquidity rules, mandate restrictions, and informal “don’t be the next Archegos” norms. Your execution is algorithms deciding what orders to send where, in an adversarial environment full of other algos. Your feedback is fills, realised PnL, risk reports, investor flows, drawdowns.

If you’re an AI lab: your data is scraped web text, code, books, proprietary documents, chat logs, user-generated content. Your model is the base LLM — parameters that predict the next token given the last N tokens. Your constraints are RLHF, safety layers, context-window limits, cost budgets, UX choices. Your execution is the tokens streamed into a chat window, an API client, or a chain of tools that eventually hits your CRM and bank account. Your feedback is what users do: ratings, completion rates, retention, churn, revenue, red-team reports.

In both cases the core technical job is actually identical. You approximate a latent conditional distribution and act on it under constraints. And in both cases you only find out if you’re any good out-of-sample — in backtests and live PnL for quants; on held-out benchmarks and real users for LLMs.

Seen through that lens, DeepSeek stops being a curiosity. High-Flyer already knew how to build big GPU clusters and keep them busy; how to clean messy, adversarial data; how to run tight feedback loops between models and the real world. Pointing that stack at next-token prediction instead of next-price prediction is an incremental step, not a reinvention.

High-Flyer’s first deep-learning trades reportedly went live in 2016; by 2017, most of its trading was AI-driven. Building a general-purpose LLM is, from that perspective, just shipping a different product off the same line.

The data ouroboros

The early adopters have a subtle but crucial advantage when it comes to data: their models were trained on something closer to ground truth — and that ground truth is getting harder to find.

Early quants in the late 1980s and 1990s trained models on markets dominated by humans trading against humans. The price series reflected durable patterns in human behaviour, not a zoo of interacting algorithms. Veterans of the first quant wave will tell you it felt like you were learning something about how real humans trade, not how automated trading systems behave.

Early LLM builders had a similar gift. The pre-2018 web was messy, but most of it was written by humans. Mining it gave you language, facts, code, and culture that came from noisy reality, not from earlier generations of models.

Fast-forward to today, and markets are saturated with algorithmic systems reacting to each other. Some modern signals are just feedback loops between models. Meanwhile, the web is filling up with AI-generated slop: SEO spam, synthetic news, boilerplate product descriptions. Train on the raw fire hose and you risk doing the machine-learning equivalent of photocopying a photocopy.

The structural problem is identical. The obvious, easily scraped data is increasingly contaminated by one’s own actions and by competitors’ models. The marginal value lies in careful curation and in interaction data only you can see.

Not all data is created equal. Quants care about periods where the signal-to-noise ratio spikes — after the open or during crises. AI labs increasingly focus on identifying and reusing high-quality tokens and more advanced filtering techniques. DeepSeek’s reported edge lives exactly here: distilling a much larger model by carefully selecting a relatively small pool of question-and-answer pairs and recovering a surprising amount of performance with far less training.

This has sparked a race for not only more data, but higher quality data. Scale AI orchestrates mass data labelling for model training by paying humans to provide feedback and labels. Reddit is licensing its fire hose of posts and comments and some observers credit xAI’s Grok with quickly catching up, in part due to access to X (formerly Twitter)’s data. In finance, that same logic shows up as a willingness to pay up for long, well-curated histories and datasets that no one else has.

Purely observational price data isn’t enough. When a firm actually trades, it learns how the market responds to its impact — where liquidity is real, where it’s fake, when other players back away. That feedback is exclusive, action-conditioned data. Midjourney learning from which generated images users pick is the same story in pixels. It’s data that only exists because your system is in the loop — and it becomes more valuable as the broader environment becomes more model-polluted.

Even the data-labelling layer has quant fingerprints. Scale AI’s founder, Alexandr Wang, previously worked at Hudson River Trading, and Surge AI’s founder, Edwin Chen, previously did algorithmic trading at Clarium Capital, Peter Thiel’s hedge fund. The lineage actually makes sense. If you come from a world where edge is downstream of data quality and feedback loops, you eventually get tempted to build the factory that produces the training signal in the first place.

Authors’ estimates of AI intensity

(zoomable version)

The bitter lesson arrives on the trading floor

For most of its history, quantitative finance has been reasonably sceptical of big neural networks, and for good reason.

Markets are noisy; the signal is scarce; regimes shift without warning; latency budgets in HFT are tight; interpretability matters. The default stack at most shops was therefore linear models, carefully controlled trees, and a lot of handcrafted features built on domain-specific intuition.

That scepticism isn’t wrong, exactly. It’s just increasingly confined to a shrinking corner of the business. In ultra-fast market-making, you genuinely don’t have the latency budget for a transformer, and plenty of PnL still comes from relatively simple models. Many attempts to spray deep learning at markets have died quietly in backtest hell.

But the old equilibrium is fading. The best-known “AI-first” shops — G-Research, Quadrature, XTX, and HRT — now pitch themselves as machine learning research houses that happen to plug into markets.

In practice that means ML people in hoodies staring at order-book tensors and alt-data embeddings, not old-school econometricians coaxing another t-stat out of book-to-price. HRT’s returns have begun attracting attention, while the whisper numbers of Quadrature, G-Research, and other firms are eye-watering. The old guard has been moving more towards an AI-based approach for years, but the success of insurgents has accelerated that process.

In the US, several large systematic managers and prop trading firms have been quietly rolling out convolutional networks and transformers over limit-order-book data and alternative-data streams. The task is similar to a language model: take in a long token sequence of orders, trades, and cancels, compress it into some representation, and forecast the next few steps.

People who move between “classical” and “neural” teams tend to tell the same story. The nets mostly rediscovered the signals that the old linear crowd already had — order-book imbalance, flow patterns, obvious cross-sectional relationships — but with more ability to pick up rare interaction effects, less manual feature engineering, and more headaches about compute, deployment and interpretability.

But on longer horizons — minutes to days to months — the microsecond-latency objection fades. If you can afford a few milliseconds or more, it gets harder to argue that a handcrafted linear factor model will systematically beat a well-trained deep learning model fed the same data. The bitter lesson applies: general methods that scale with data and compute eventually beat clever, domain-specific hacks.

Authors’ estimates for the usual trade duration

(zoomable version)

Constraints and optimisation

Then comes the raw predictor. For quants, this is a supervised model that spits out a forecast of next-period returns, volatility or flows. For AI labs, it’s a base LLM trained on next-token prediction.

The third step is the constraints layer, where a disproportionate amount of practical value sits. A raw alpha forecast isn’t a portfolio. A raw LLM isn’t a product. The art is in translating noisy predictions into actions under various constraints.

In finance these are things like portfolio construction, risk limits, capital and liquidity constraints, regulatory rules, mandate restrictions and “no-go” lists. Entire careers and textbooks exist to remind people that a beautifully predictive factor can be turned into a terrible portfolio with the wrong constraints.

In AI the constraints are typically imposed by reinforcement learning from human feedback, safety filters, content rules, context-window and cost budgets, product-specific fine-tunes, evaluation dashboards and red-team reports.

A risk committee deciding how much tail risk to tolerate looks a lot like a safety board deciding how much jailbreak risk is acceptable. Both sit between modellers and the outside world. Both can veto clever ideas that blow up the balance sheet or the brand. When this goes wrong, it results in Amaranth or Archegos in finance — or ill-fated model launches like Google’s 2024 Gemini launch, which sparked widespread backlash.

Execution: microseconds versus tokens per second

Execution is where the physics shows up. In low-latency trading you live in microseconds. You co-locate servers next to exchange engines, shave nanoseconds off network hops, and count cache misses. The question isn’t “can we fit a more expressive model?” but “how much thinking can we afford in single-digit microseconds?”

In those regimes, classical methods survive because compute is extremely constrained, interpretability matters when billions of dollars are on the line, and robustness beats another third decimal place of predictive accuracy.

AI labs live with different but related constraints. They think in tokens per second rather than microseconds, but they still juggle latency, quality, and cost. Run a giant model for every request and your users get bored while your cloud bill explodes. Run only tiny models and your outputs are mediocre.

For AI labs, the big models train and occasionally get called for hard queries. Smaller, cheaper models handle most chat and API traffic, with smart routing deciding when to escalate. For quants, heavy models run offline for feature learning, scenario analysis, and risk aggregation; lightweight models and simple formulas live in the trading loops, often on field-programmable gate arrays or custom silicon next to the matching engine.

The old race was shortening fibre routes between Chicago and New York. The new race is how much “intelligence” you can cram under a fixed latency and power envelope. Whether you’re trying to hit a price before someone else or render an answer before the user taps away, it’s the same problem.

A shared stack

Under the hood, the similarity runs deeper than “everyone uses ML now.” Both sides are settling on a similar three-layer stack. Here’s what it looks like.

The bottom: Big models that learn representations. For quants, these are deep networks and sequence models — sometimes transformers — trained on gigantic histories of order books and alt-data to learn good features. For AI labs, it means frontier-scale LLMs and multimodal models trained on the internet plus proprietary corpora.

The middle: Smaller models that actually make most decisions. Quants distil heavy models into lightweight predictors, or even simple formulas, that operate on learned features and fit inside tight latency and power budgets. Meanwhile, AI labs distil frontier models into smaller variants and specialised fine-tunes that power most user-facing traffic.

On top: Something like reinforcement learning or online learning. In markets, this means adjusting aggressiveness, routing, inventory, and risk in response to live feedback. In AI products its RLHF (again), bandit algorithms, and A/B tests that adjust policies based on user engagement, safety incidents, and revenue.

This stack answers the tension between the bitter-lesson crowd and the linear-model crowd.

Other fields have ended up in the same place. Medical imaging still has radiologists. Weather models still have physicists. Finance is likely to land there too. So, probably, will AI infrastructure itself: big general models underneath, and more boring linear algebra and business logic on top.

One layer down from all this and we find that both tribes increasingly live on the same hardware stack: Nvidia accelerators, high-bandwidth memory, fast interconnects, and annoyed engineers spending an unreasonable amount of time debugging CUDA. In 2023 the bottleneck was GPUs. By 2025 it looks more like electricity.

In the old world, the gatekeepers were exchanges and prime brokers. In the new one, the gatekeepers look more like utilities, data centre landlords, and whoever can reliably deliver megawatts at tolerable prices.

The binding constraint is no longer capital alone, but deliverable power: interconnection queues, capacity charges, and the ability to build on time.

A shared stack

(zoomable version)

Dark arts, non-competes, and the end of ‘Open AI’

Zoom out from racks and routers to the information surface and another mirror appears.

Publicly, everyone reads the same material. Quants have Grinold & Kahn and Gappy Paleologo’s books. AI labs have “Attention Is All You Need,” semi-redacted GPT-4 reports and the usual NeurIPS/ICLR rumour circuit.

But in both worlds the real edge lives in things that never hit arXiv: which sliver of data you actually trust; the odd combinations of regularisation, learning rates, and curriculum schedules that converge; the deployment hacks that make the system stable at scale.

Quant finance has always been explicit about this. The famous unwritten rule at Renaissance Technologies was that they should never publish anything even vaguely useful — including failed experiments. When a couple of RenTech researchers left for Millennium in 2003, a New York court enforced their non-competes in nuclear fashion. New York law, plus the pod-shop ecosystem, treats models and data pipelines as core trade secrets.

AI started from the opposite end, with California academia and open-source culture. For years, publishing your best ideas was how you hired people. That changed fast once the models got expensive.

GPT-4’s technical report declined to specify model size, training data, or compute. Analyses of foundation-model “transparency” routinely give the frontier labs low marks on disclosure. Commercial deals now include sweeping IP language about training data, model outputs, and even fine-tuned artefacts. “Open research” has become more empty marketing than a true operating principle.

The unfortunately reality is that these dark arts are critical. And while the number of people in each field is in the thousands, the researchers who can truly advance a system are likely in the high double digits or low hundreds.

As a result, AI labs are importing quant-style opacity faster than quants are importing AI-style openness. Jim Simons famously pulled early RenTech staff from Berkeley back to Setauket because leaving key talent in California was “bad for business.” Today’s AI founders, watching rivals hire half their top researchers, may sympathise with Simons more than they’d like.

For an individual deciding between an AI job and a quant job, the trade-off increasingly isn’t what you do. It’s where your secrets are protected and how long you’re locked up.

Will AI eat quants before quants eat AI?

Put this all together and a quant fund and an AI lab in 2025 look less like different industries and more like variants of the same business: own unique data, train big models, wrap them in constraints, deploy under harsh physical limits, harvest the feedback, and try not to talk publicly about it.

The talent flow already tells the story. HRT’s longtime HR chief Julia Villagra went to OpenAI (before leaving last summer). Perplexity co-founder Johnny Ho spent five years at Tower Research while DeepMind interpretability researcher Callum McDougall interned at Jane Street and IMC. OpenAI research head Mark Chen and high-profile researcher Noam Brown also spent time in quant researcher roles. HRT’s AI head Iain Dunning was previously a senior researcher at Google’s DeepMind. And as we mentioned, Scale AI’s Wang — now chief AI officer at Meta — was at HRT before,

The ties are multi-faceted. Jane Street alumnus Sam Bankman-Fried provided some of the initial funding to Anthropic. More recently Jane Street has been involved in subsequent funding rounds of the AI giant. DRW founder Don Wilson, a pioneer of quantitative trading in options, is behind an effort called Compute Exchange, to trade GPUs on an exchange. Others want to financialise not only GPUs but also the contracts that underlie data centre usage.

X is rife with rumours and speculation that gesture towards this crossover, with whispers that some big AI labs have been dabbling with trading. Earlier this month Elon Musk even subtweeted AI performance on a stock trading competition:

Some of the sociological crossover is already under way. The MLOps engineer at a frontier lab who spends evenings tweeting about scaled‑up agents is suddenly doing a roadshow down Park Avenue, pitching KKR‑style firms on “automating your analyst class.” Meanwhile his quant friend from math camp is spending more time than he’d like on calls with Redmond about early‑access Copilot features.

This is shaping up to be the year when many of these developments come to a head. The AI labs will probably bloody each other in the fight for attention and revenue. This will come into sharp focus for the independent labs that are reliant on private market funding and partnerships.

On the quant side there is an open question as to how traditional quant shops will respond to some strange and widely talked-about drawdowns that left some firms nursing losses or underperformance las year. It’s also likely that there will be more clarity on how the storied quant firms will respond to the newer AI-driven shops.

Finance has the world’s harshest scoreboard. P&L is a powerful evaluation — it tells you, with brutal clarity, whether your model is any good. Benchmarks can be gamed. Revenue can be subsidised. Drawdowns cannot. The institutions will keep converging, because the constraints are converging: power you can secure, data you can defend, governance you can enforce. Different objectives. Same machine.

With this lens DeepSeek reads less like a quirky headline and more like a prototype: a balance sheet that turned compute into decisions, and then found multiple places to deploy that capability.

The quant shop — AI lab convergence

Tags: