Marvin Labs
The Proprietary Research Myth: Where the Real Buyside AI Moat Lives
AI in Research

The Proprietary Research Myth: Where the Real Buyside AI Moat Lives

10 min readAlex Hoffmann, Co-Founder and CEO

We have proprietary research that no one else has access to.

A good number of CEOs, VPs, and PMs over the last 10 years

It's a comforting line. It tells incumbents that the 20 years of research notes, deal memos, and PM commentary sitting on their shared drive is a category-defining advantage no upstart can replicate. The AI version adds a fresh wrapper: that proprietary corpus is the training fuel that separates firms that "own" their AI from firms that "rent" it.

The wrapper is new. The claim is not. Variations have circulated in financial services for more than a decade, and it has not held up.

The proprietary research corpus most firms sit on is worth less than they think, gets less defensible every year, and points away from where the actual buyside AI advantage lives.

A Moat Nobody Has Managed to Use

If proprietary research history were genuinely a moat, we would see it in the data. We don't.

The firms claiming "20 years of research notes is our advantage" are the same firms that lose entire investment frameworks the moment a senior PM leaves. They re-research the same names every cycle. Their analysts can't find prior work on a ticker without asking three colleagues. Whatever value sits in the historical corpus, it has not translated into measurable edge: not in lower analyst ramp times, not in better idea retention across PM turnover, not in returns.

For most firms, that research history has been theoretically valuable and operationally inert for a long time. The goldmine has been there for decades. So have the pickaxes. The gold has not come out.

The usual excuse is that the data was trapped in PDFs and email. That was a real problem in 2015. It is not one in 2026. Search and retrieval over unstructured internal content is now a solved category that costs about $50 per seat. Glean, plain vector search, and a dozen RAG-shaped tools have made finding the data straightforward. If a firm has not extracted useful structure from its own corpus by now, the constraint was never tooling.

The Real Question Is Value, Not Access

Once you can search the data, you have to ask what it's actually worth. Proprietary research output is a mix:

  • Ephemeral takes that aged out the day after publication.
  • Restated public information with light commentary on top.
  • Decision rationale that was context-dependent and often wrong with hindsight.
  • A small fraction of genuinely durable thinking about how to evaluate names in a sector.

Only the last category matters. The first three do not. They sit mixed together in the corpus, and the cost of separating signal from noise is high enough that most firms have never tried.

Even when you do separate it, what you get is a useful retrieval and grounding corpus. That is not a moat. An upstart with no proprietary data can ingest filings, earnings transcripts, sell-side research, and expert call libraries, then build a comparable pattern-matching layer on public sources. The proprietary premium is real, but it is a 10-20% lift over a well-built generic stack. A meaningful edge, not a different category of advantage.

Stop the Hype

Hype: "Our 20 years of proprietary research notes are an AI moat that upstarts can't replicate."

Reality: Most of that corpus is ephemeral takes, restated public information, and context-dependent rationale that didn't generalize. Firms have had this data for two decades and haven't turned it into edge. The constraint was never access. The value of the data over public sources is much smaller than incumbents want to believe.

Where the Real Moat Lives

The sharper version of the buyside AI argument has nothing to do with data. It's about process.

What separates a good fundamental shop from a mediocre one has always been the same set of things: a defined research process, a clear decision-making framework, quality standards for what counts as conviction, and discipline about how positions get sized and exited. None of this lives in the data. It lives in how analysts work.

The firms that benefit most from AI agents will be the ones that can encode this process into how their agents behave: an agent that mirrors how a senior analyst structures a primer, how a PM stress-tests a thesis, how the firm thinks about base rates in a sector. That is real edge, and it is hard to replicate precisely because the underlying process took years of practice to build.

This is exactly where the broader enterprise evidence lands. MIT's GenAI Divide study found that 95% of corporate generative-AI pilots produced no measurable P&L impact, and that the binding constraint was not model quality or budget but enterprise integration: the tools that won were the ones that adapted to how people actually work, not the ones pointed at the biggest pile of internal data.

This kind of advantage has three useful properties:

  1. It's a present-tense asset. You don't need 20 years of accumulated notes. You need a coherent way of working now.
  2. It compounds with new tooling. Every model upgrade lifts firms with good process more than firms with messy process.
  3. It's hard to copy. Process maturity is an organizational capability. It can't be acquired or licensed the way a dataset can.

The hierarchy of buyside AI advantages, ranked by durability:

RankSource of advantageDurability
1Process and decision frameworks encoded into agent behaviorDurable
2Workflow integration that fits how analysts actually workMedium
3Proprietary data corpus for groundingModest, commoditizing
4Generic AI toolingTable stakes by 2027

The "AI ownership" argument collapses all four rows into one and calls the whole thing a moat. That conflation is the trick. The real moat is only the first row.

Process Cannot Be Built Top-Down

This connects directly to our earlier piece on bottom-up AI adoption. If process is the moat, the question becomes how a firm encodes its process into agent behavior. And that's where the centralized rollout approach falls apart.

Process does not live in a 40-page document written by the Head of Research. It lives in how individual analysts actually work: how they read a 10-K on a name they've never covered, how they stress-test a margin assumption, how they decide when management commentary is worth taking at face value. None of this can be extracted in a planning meeting. It surfaces only when analysts use AI agents on their actual coverage and find out what holds up.

The firms that will encode process advantage into agent behavior are the ones that:

  1. Give analysts budget and time to experiment with agent configurations on their own names.
  2. Apply minimal vendor vetting. A lightweight security check, nothing more. Heavy procurement processes kill the iteration loop that surfaces process knowledge in the first place.
  3. Run a forum for sharing what works so effective patterns spread across desks and become firm-level practice.

That's the bottom-up framework: distributed experimentation, light governance, structured sharing. Firm-level process emerges from analyst-level iteration, not from a steering committee writing it down in advance.

The centralized "let's train an enterprise agent on our proprietary corpus" approach gets this backwards. It tries to capture process by pointing AI at the historical artifacts of process, hoping the underlying patterns can be reverse-engineered out of decade-old research notes. That's a far harder problem than letting analysts encode current process directly into how they use agents day to day.

What This Means in Practice

For a Head of Research, CIO, or Head of AI thinking about how to position a firm for the next two years, the implication is straightforward.

Stop optimizing for the wrong moat. The proprietary research argument is comforting because it makes the firm sound defensible. It isn't actionable. Even if it were correct, the path to extracting value from a historical corpus is years long and the payoff is modest.

Optimize for process velocity instead. The firms that pull ahead will be the ones whose analysts iterate fastest on agent configurations, decision templates, and research workflows. That requires budget, light vetting, and shared learning, not a centralized rollout plan. Most of the gains available in 2026 are workflow-level, and they accumulate through dozens of analyst-level experiments rather than one big project. McKinsey's State of AI work puts numbers on this: of all the changes tied to gen-AI value, fundamental workflow redesign correlates most strongly with bottom-line impact, yet only about a fifth of firms have actually done it. The advantage is sitting in the redesign almost nobody bothers with, not in the data everybody already has.

Treat vendor selection as a tooling decision, not a moat decision. The infrastructure layer for capturing and structuring proprietary context is becoming a commodity faster than the proprietary context itself can become an advantage. Buy the parts that are commoditizing. Put internal effort into the parts that aren't: how analysts work, what counts as quality, how the firm makes decisions.

Recognize that talent flows toward velocity. Top analysts and PMs want to work at firms that ship. They will not wait out an 18-month enterprise AI rollout while peers iterate through three generations of agent workflows. The talent argument is real. It just cuts in favor of bottom-up cultures, not top-down ones.

Why the Old Argument Keeps Coming Back

Every few years, financial services produces a new version of the "we have proprietary data" claim. In the early 2010s it was big data. In the late 2010s it was machine learning. By the early 2020s it was alternative data. Each time, the pitch was that incumbents would own the future because they held something proprietary that upstarts couldn't replicate. And each time the supposed edge diffused: by 2019 Coalition Greenwich found roughly half of investment managers already using alternative data and another quarter about to start, with the market consolidating around the same handful of large vendors everyone could buy from.

Each time, the firms that built sustained edge were the ones that invested in engineering culture, execution velocity, and disciplined process. The firms that talked about their data moats and didn't change how they worked got passed.

The "AI ownership" argument is the same trade in new packaging: a comfort blanket for firms that would rather not change how they operate. The winners over the next two to three years won't be the ones with the longest list of historical research notes. They'll be the ones whose analysts are encoding real process into agent behavior right now, through bottom-up experimentation and shared learning.

That's the work. The data was never the bottleneck.

Alex Hoffmann

by Alex Hoffmann

Alex is the co-founder and CEO of Marvin Labs. Prior to that, he spent five years in credit structuring and investments at Credit Suisse. He also spent six years as co-founder and CTO at TNX Logistics, which exited via a trade sale. In addition, Alex spent three years in special-situation investments at SIG-i Capital.

Start your free evaluation

Analyze 15 leading companies immediately. No registration, no credit card, no sales call.