Beyond Sectors: How AI Changes Company Classification

Amazon is classified as Consumer Discretionary in GICS. That single label covers a company that runs the world's largest cloud infrastructure business, a $50B+ advertising platform, a global logistics network, a grocery chain, a streaming service, and a consumer marketplace. I have a hard time seeing what useful information that label conveys to anyone.

Every analyst covering Amazon already knows this. They mentally override the classification and build their own comp tables around a more honest picture of what the business actually does. The classification system exists because we needed one decades ago, and the one we got was built around what humans could manage at the time: 11 sectors, 25 industry groups, a few hundred sub-industries.

That constraint is gone. The question has shifted from how to fix GICS to whether we need sector buckets at all.

We simplified because we had to

The Global Industry Classification Standard launched in 1999, a joint product of MSCI and S&P. It was a reasonable answer to a real problem: institutional investors needed a shared language for grouping companies, benchmarking performance, and constructing portfolios.

Fama-French took a similar approach to explaining returns. Ken French and Eugene Fama showed that a small number of factors (market, size, value, and later profitability and investment) explained a large share of cross-sectional return variation. Five factors. Thousands of companies. The parsimony was the point. More factors meant more noise, more overfitting risk, and more parameters than the data could support.

Both systems rest on the same logic: reduce companies to a manageable set of buckets because treating each one as unique was computationally and cognitively impractical.

Companies never actually fit these buckets well. Microsoft is in Information Technology alongside both Accenture and Western Digital. Alphabet runs the world's largest advertising business but sits in Communication Services with AT&T. These groupings generate real costs for anyone using them for peer analysis, relative valuation, or idea generation.

Where the alpha leaks out

When you assign a company to the wrong peer group, the downstream effects are not academic.

Comp tables built on sector membership pull in companies that are not real comparables and exclude companies that are. An analyst comparing Amazon to Walmart captures the retail overlap but misses the cloud infrastructure business that drives most of the value. Comparing Amazon to Microsoft captures the cloud overlap but misses the advertising business that funds it. No single comp set works, because Amazon is not one business.

Underneath the comp table is the deeper problem: there is no proper model for how two companies actually compare. The sector label asserts that Amazon and Home Depot belong together and stops there. It says nothing about which parts of the businesses overlap, by how much, or where they diverge. Without that, relative valuation rests on a similarity the data never established, and every downstream use inherits the error.

And sector screens constrain idea generation in ways that are easy to miss. A fund screening for "Technology" companies with accelerating revenue growth will never surface a healthcare company whose growth is being driven by the same AI infrastructure dynamics. The screen works. The boundaries don't.

Stop the Hype

Hype: "Sectors are broken and need to be replaced immediately."

Reality: GICS and similar systems still serve a purpose for broad market structure, index construction, and regulatory reporting. The issue is more specific: using them as a primary input for bottom-up equity analysis (peer selection, relative valuation, comparison) introduces systematic error. Analysts already compensate for this manually. The interesting question is whether AI can do that compensation better and at scale.

Model the company, do not bucket it

The fix is not a better set of buckets. It is to drop the bucket and model each company directly, by the relationships that actually describe it. Amazon supplies some companies, competes with others in cloud and with a different set in retail, and partners with firms it competes against elsewhere. A single label erases all of that. A graph keeps it.

Each relationship in that graph is typed (competitor, supplier, customer, partner), weighted by how much it matters, and scoped to a segment, so competition in cloud is not confused with competition in hardware. That is structure a sector code was never built to carry.

Representative relationship graph for the top 300 names in the MSCI World. Each dot is a company, colored by the sector it is assigned to. Each line is a mapped competitor, supplier, customer, or partner relationship.

The raw material was always there: filings, press releases, earnings calls, industry reports, and news coverage. Companies name their competitors and partners, executives discuss supply chain dependencies, and analysts ask about competitive dynamics on every call. What changed is that a model can now read all of it, across thousands of companies, and keep the map current as the relationships shift.

This does not replace the analyst's judgment on the names they know cold. Whoever covers Amazon can already name its real competitors and where they overlap. The graph earns its keep at the fringes of coverage. It surfaces relationships you would not have thought to look for, a supplier two steps down the chain or a competitor in an adjacent segment, and it shortens the ramp on a new name by laying out who a company competes with, supplies, and depends on before you have read a single filing.

It also changes how ideas start. Instead of screening within a sector, you begin from a company you understand and follow its actual connections outward: the firms that share Amazon's supplier base, or the ones competing with Microsoft in AI infrastructure that no screen would ever return under "Technology." That traversal was not possible at scale before.

The same shift is showing up on the quant side. The five-factor world was built for limited compute, and recent work from Kelly et al. (2024) in The Journal of Finance argues the industry has been underfitting, using models too simple to capture what the data already holds. The constraint was never the inputs. It was the size of the model allowed to read them. That is the factor-model version of the same idea: stop compressing companies down to a handful of dimensions when you no longer have to.

You can already see where this lands in practice. The sector and industry layer is one of the weakest parts of traditional data products, and a handful of newer vendors are rebuilding it from the bottom up, modeling companies by their revenue-line exposure across activities rather than slotting them into one box. The method is the one described here: read the primary documents, extract what a company actually does, and represent it as something richer than a label.

What it changes underneath

Classification is infrastructure. Comp tables, relative valuation, risk models, and screens all sit on top of it, so an error in the label does not stay put. It propagates into every tool built on the label, and no amount of care further up the stack corrects for a peer group that was wrong to begin with.

None of this is waiting on a breakthrough. The extraction works, and compute is cheap enough to run it across a full coverage universe rather than a handful of names. We have put parts of it into practice in our work on company relationships in AI Chat and cross-company queries.

GICS will not disappear. It still does real work for index construction and reporting. But for bottom-up analysis, the workaround every good analyst already runs in their head, quietly correcting the label name by name, is about to become something you can apply across the entire coverage universe at once. The override stops being a private adjustment and starts being the model.