Marvin Labs
Beyond Prompt Engineering: The Case for Structured AI in Equity Research
Equity Research

Beyond Prompt Engineering: The Case for Structured AI in Equity Research

4 min readJames Yerkess, Senior Strategic Advisor

Over the past year, prompt engineering has been treated as a skill in its own right. Entire workflows have been built around crafting the "right" question for a general-purpose AI model and refining it until the output looks plausible.

In equity research, this approach feels familiar. Analysts are resourceful. Given a new tool, they will find a way to make it useful. But usefulness is not the same as reliability, and in research, the gap matters.

This article builds on our earlier discussion of why prompt engineering fails for investment analysis and explores why prompting alone is not a research process. That distinction becomes increasingly important during periods of market stress or heavy reporting cycles.

Prompting Shifts Risk Onto the Analyst

At first glance, prompting feels empowering. Analysts can tailor questions to their sector, their coverage universe, and their own mental models. In practice, it quietly transfers methodological risk from the system to the individual.

Two analysts prompting the same earnings call with slightly different wording can receive materially different emphasis. One output highlights revenue growth. Another foregrounds margin pressure. Neither is technically wrong, but neither is reliably comparable.

That variability is manageable for exploratory work. It becomes problematic when outputs are used in team settings, shared internally, or incorporated into client-facing materials. The burden of reconciliation falls back on the analyst, eroding the time savings that AI was meant to provide.

During earnings season, this compounds quickly. When coverage spans 15 to 20 companies in a single week, tools that obscure the line between summary and insight amplify noise rather than reducing it.

Why Research Depends on Repeatability

Equity research is cumulative. Analysts track companies across quarters and years, compare guidance changes over time, and build conviction through patterns rather than one-off observations.

Prompt-driven workflows break this continuity. Each interaction starts from scratch. Structure must be re-imposed every time, either through longer prompts or manual correction. Over time, this creates inconsistency not only across users but across reporting periods.

A research process that cannot reliably reproduce its own outputs under similar conditions is fragile by design. In a global coverage context, where regulatory regimes and data formats vary widely across jurisdictions, that fragility compounds.

Stop the Hype

Hype: "With the right prompts, any analyst can turn ChatGPT into a research tool."

Reality: Prompt variability means two analysts covering the same company can get materially different outputs. That is not a research process. It is ad hoc extraction that depends on individual technique rather than a repeatable framework.

Structure Is Not a Constraint

There is a tendency to view structure as limiting, as something that constrains creativity or judgment. In practice, structure is what allows judgment to be exercised meaningfully.

When financial metrics, guidance language, and management commentary are consistently identified and organized, analysts can focus on interpretation rather than extraction. The creative work happens where it should: in assessing implications, not in reconstructing basic facts.

Specialist systems embed this structure upfront. They are designed around how research is actually conducted, not around conversational flexibility.

The Difference Between Summaries and Insights

A generic AI summary can be fluent and comprehensive without being particularly useful. It may restate what was said without clarifying what changed, what matters, or how it compares to prior periods.

Research value comes from differentiation. What moved. What did not. And why. Without embedded context, prompting struggles to make these distinctions consistently.

This is not a failure of intelligence, but of design intent.

Tools like Guidance Tracking that monitor whether management guidance is met, missed, or exceeded offer a new lens for assessing management discipline. This is one of the most under-analyzed levers in long-term stock performance.

Quick Start

Test This on Your Next Earnings Call

  1. Run the same earnings transcript through a generic AI tool twice, using slightly different prompts
  2. Compare the outputs side by side. Note differences in emphasis, structure, and what gets highlighted
  3. Then run the same transcript through a purpose-built research platform and compare the consistency of output

The gap between the two approaches becomes clear within a single use.

Bringing It All Together

Prompting is a tool, not a methodology. In investment research, where consistency, auditability, and comparability are essential, relying on prompting alone introduces more risk than it removes.

AI works best when it reinforces research discipline rather than replacing it. The more structured the foundation, the more useful the output becomes.

James Yerkess
by James Yerkess

James is a Senior Strategic Advisor to Marvin Labs. He spent 10 years at HSBC, most recently as Global Head of Transaction Banking & FX. He served as an executive member responsible for the launch of two UK neo banks.

Get Started

Experience professional-grade AI for equity research, validate insights for yourself, and see how it fits into your workflow.