XTech Equity Flow Ep.2

wyatt8240
Jul 11, 2025
11 min read

Updated: Sep 19, 2025

Welcome to the Deep Dive, the show where we cut through the noise and get straight to the core of what you need to know. Today, we're diving into, well, one of the most enduring and maybe alluring financial puzzles out there, predicting stock market movements. For decades, this pursuit of hidden signals, you know, the secret sauce for understanding stock returns, it's captivated researchers and investors alike. Oh, absolutely. And that pursuit has led to what Nobel laureate John Cochran uh famously called a veritable zoo of firm characteristics. Since the 70s, hundreds, maybe even thousands of different factors have been proposed as statistically significant predictors. The real challenge, and this is really the central question for this deep dive, is how do we possibly know which of these actually matter? Which ones give us independent non-redundant information?

Exactly? So, for you listening, our mission today is to tackle that sprawling zoo headon. We're going to sift through a well a mountain of data on these characteristics to try and pinpoint the true independent signals for average US monthly stock returns. Think of it as separating the uh the real gold nuggets from this overwhelming pile of fool's gold in the world of financial predictors. And this deep dive, it's rooted in a really seminal research paper, one that systematically evaluates a massive number of these characteristics using incredibly rigorous methods, methods designed precisely to cut through all that historical confusion.

Okay, so let's unpack this a bit. Starting with the zoo problem itself. When we say hundreds of characteristics. Why is that such a big deal? I mean, beyond just feeling overwhelming, why the confusion? Well, the sheer volume isn't just overwhelming. It creates a significant risk of uh curious findings, finding things that look real but aren't. Imagine researchers just sifting through vast data sets looking for any possible correlation. When you run enough tests, some patterns are just bound to appear statistically significant purely by chance, even if there's no real underlying economic relationship.

Right? That's what we call data snooping bias. It's kind of like flipping a coin endlessly and eventually seeing a long streak of heads. It looks meaningful, but maybe it's just random chance. That data snooping biases sounds like a really fundamental challenge in financial research then. How do you even begin to untangle patterns that are real from the statistical noise? What was this paper's uh novel approach? How do they confront that? Their approach was a really key departure from prior research. Instead of looking at characteristics one by one, you know, in isolation, they simultaneously included 94 distinct firm characteristics in a single comprehensive model.

All 94. Wow. Yeah. And this allowed them to immediately see which characteristics truly provided independent information even when everything else was accounted for. It's a much much tougher test. That's a massive difference. And you mentioned they brought a new level of methodological rigor which sounds absolutely crucial here. Can you walk us through some of those key design choices, the ones they made to ensure the findings were genuinely reliable? Absolutely. One critical aspect was how they handled um micro cap stocks. These are the really small companies typically below the NYSE 20th percentile for market cap.

Okay, they only make up about what 3% of the US stock market's total value but their extreme volatility and often sort of idiosyncratic behavior can really skew results in traditional analyses. They can dominate the findings. So they have an outsized impact. Exactly. So to avoid that distortion, the study consciously focused on non-micro cap stocks. stocks. They used two specific methods. One was valuated lease squares which basically gives bigger companies more influence in the results reflecting their actual market impact. Makes sense. And the other was ordinary lease squares but only on all stocks except the micro caps. Treating those remaining stocks equally. This dual view helped them really understand the broader market, not just that tiny volatile corner.

Okay, so they're making sure a few incredibly small, super volatile companies aren't driving the results for the whole market when they shouldn't be. And what about that data snooping bias you mentioned? How did they specifically tackle that? You said they demanded a higher bar for proof. Yeah, they explicitly address data snooping using something called dependent false discovery rates. Uh DF. Okay. So, with that level of rigor handling micro caps carefully, these robust statistical adjustments, what was the scope? How much data did they actually look at? What period?

They covered a really significant 35-year window from 1980 all the way through 2014. So, a very robust data set for their conclusions. All right, now that we understand their powerful new lens, let's look through it. Let's see what it revealed. This is where I gather years of conventional wisdom might get challenged after applying this rigorous simultaneous testing to those 94 characteristics over 35 years. How many actually passed the test? How many were reliable independent predictors for average US monthly returns?

Well, this is what's really fascinating. For the non-micro cap stocks, only 12. Just 12 out of of those 94 characteristics were reliably independent determinants of average monthly returns over that whole 1980 2014 period. Only 12 out of 94. 12. Think about that. Out of nearly 100 proposed factors, only a dozen truly stood up to this level of scrutiny. It contrasts so sharply with that zoo literature, you know, the hundreds proposed. It suggests far fewer genuinely predictive signals actually exist than we thought.

That's a staggering reduction. It really is like going to that library of hundreds of essential investing books and finding only a dozen actually offer genuinely new standalone insights. This study seems to be truly separating signal from overwhelming noise. And you mentioned a key nuance, the difference between univariat and multivariate significance. Can you explain why that distinction is important here and what they found? Yeah, this is a really profound insight from the paper. Here's why it matters. Imagine testing individual puzzle pieces one by one to see if they fit a specific spot. That's kind of like univaried analysis.

Okay. Now, imagine throwing all the pieces onto the table at once and seeing which few still fit perfectly into the complex picture, even when surrounded by hundreds of others. That's the multivaried approach they took. Right. Testing them all together. Exactly. And what they found was that even when tested univariately, one by one, only 12 characteristics were significant for non-micro caps. The fact that this number didn't drop even further when all 94 were considered simultaneously tells us something really important. What's that?

It means it's not that a few powerful characteristics are just absorbing the information from lots of others. It's not like, you know, value and momentum explain away 20 other things. Rather, there just seem to be intrinsically few characteristics that independently predict average returns in these larger stocks once you adjust for the micro cap influence and address those data snooping concerns properly. So, the zoo wasn't just crowded. It was largely filled with, well, false leads or redundant information.

That's a good way to put it. False leads or information already captured elsewhere. That's a fundamental rethinking. So what was the nature of these 12 remaining truly independent predictors? Were they more about a company's uh financial health, its fundamentals or its stocks behavior in the market? This is another kind of surprising reveal. So we talk about fundamental characteristics. Think insights from financial statements, balance sheets, earnings reports, asset growth, profitability, that sort of thing,

Right? And then marketbased characteristics derived from the stock's actual behavior, price momentum, trading volume, volatility, how it reacts to news. Okay. Fundamentals versus market. Exactly. Now, the univariat tests, the one by one tests actually showed 10 fundamental and only two market-based characteristics were significant. So, mostly fundamentals initially. Initially, yes. But the independent characteristics, remember the ones that held up when all 94 were tested together, they were predominantly market-based. Seven of the 12 were market based and only one was truly fundamental.

Only one. Which one was that? It was the number of consecutive quarters with earnings higher than the same quarter a year ago. Kind of an earnings consistency measure. Wow, that's quite a shift from what many might expect. Fundamentals usually get so much attention. Why does this distinction, this finding matter for you, the listener, in terms of say financial analysis or investing? Well, it matters significantly for anyone doing financial analysis or maybe building models that need to control for the determin of average returns. If you're trying to isolate the effect of some new variable or your own strategy, you need to control for what's truly independently predictive first. This study gives you a much clearer, much more uh parsimonious set of characteristics for that purpose instead of using a long list of potentially redundant or even spurious ones from the zoo.

So, it helps focus your analytical horsepower on what actually moves the needle independently. Precise. And before we move on, let's just quickly hammer home that micro cap effect again. You stressed handling them carefully. How much did those tiny stocks really distort things when they were included with the same weight as larger stocks? Oh, they distort findings quite a lot. It's very clear. When micro caps are given the same weight as larger stocks using that ordinary lease squares method on all stocks, the number of independent characteristics more than doubles, it jumps from 12 up to 23.

Wow. From 12 to 23 just by including micro caps equally. Yeah. It just clearly shows their outsized influence on the anomalies literature and why care handling them or excluding them for certain analyses is so important for getting accurate conclusions about the broader market. They introduce a significant amount of noise. Okay, so we've established how few independent predictors there truly are, especially for non-micro caps. But the story doesn't stop there, does it? Because the researchers uncovered something really dramatic kind of twist in the tail. A sudden unexpected shift in market dynamics around the year 2003. What did they find happen then?

Yeah, this is where the landscape seems to have fundamentally changed. Before 2003, if you look at the average monthly raw hedge return from exploiting these characteristics in non-micro cap stocks, yeah, it was about 2.8%. 2.8% a month. That's very significant. Very significant. Yeah. But after 2003, it plummeted down to just .1% per month and that became statistically insignificant from zero. Zero. So the economic value just vanished essentially. Yes. The economic viability of exploiting these signals in larger stocks seems to have disappeared after 2003.

And the number of independent characteristics that tells an even starker story about this shift, right? It absolutely does. It's incredibly stark. For non-micro cap stocks, the number of independent characteristics dropped from those 12 we identified before 2003 down just two after 2003, right? Down to two. Two. For micro caps, it also fell significantly from 25 down to eight. So, some predictability remained for them, but it was much reduced. This points to a really profound shift in overall market dynamics, not just some minor adjustment.

Okay, so if that's the effect predictability just falling off a cliff, especially for larger stocks, what's going on around that time? The study doesn't pinpoint one single cause, but what were the big shifts happening that could explain such a sudden sharp decline? Well, a number of significant changes did converge right around that 2002 2003 time frame. On the regulatory side, you had the Sarbaines-Oxley Act or SOX that imposed much stricter managerial responsibility for financial controls and it started being reported in early 2003,

Right? SOX was a big deal. Huge. Also, the SEC accelerated filing requirements for annual and quarterly reports starting in November 2002. So, financial information became more timely and more transparent. Okay. Regulatory changes and then there were technological shifts. Those sound like they could have had a massive impact on how trading actually worked. Exactly. Between January and May 2003, the NYSE introduced autoquing software. Big change. This led to really dramatic reductions in trading frictions like bid ask spread heads and overall trading costs and crucially it coincided with a significant increase in algorithmic trading especially by long short quantitative hedge funds.

Ah the rise of the quants. You got it. The theory here is pretty compelling. These changes lower costs faster execution better tech. They made it much cheaper and easier to implement high volume quantitative trading strategies that would naturally increase arbitrage activity. People exploiting tiny discrepancies much faster. So they trade away the predictability. That's the idea. This activity effectively made the market far more efficient very quickly eliminating the kind of mispricing that these characteristics based predictive signals might have reflected previously.

So if you put all those factors together, SOX, faster filings, cheaper and faster trading, rise of algo trading, it sounds like the market got a massive efficiency upgrade almost overnight around 2003. That's a really good way to put it. It looks like a step function change in market efficiency, particularly for those larger, more liquid stocks. that the quants could trade easily. And it's important to stress this sharp firm size dependent shift. It strongly suggests that data snooping alone doesn't fully explain this decline in predictability.

Why is that? Well, while data snooping definitely contributes to the zoo problem overall, finding spurious patterns. This sudden dramatic drop, which was much more pronounced for larger stocks than micro caps, points to fundamental real-world changes in market structure and efficiency. It wasn't just researchers looking harder at the same old data, the underlying market dynamics actually changed.

Okay, that makes sense. So, wrapping this all together, what does this mean for you, the listener, the individual investor, or the financial analyst trying to navigate today's markets? What are the big takeaways? Well, I think if we connect this to the bigger picture, these findings should definitely foster a healthy skepticism towards many of the hundreds, maybe thousands of reported anomalies you might read about in prior financial research. It adds even more weight to existing critiques. You know, things like Harvey Louu in Zoo's work on data snooping or MLAN and Pontiffs finding that anomalies decay after publication. The study shows a concrete point in time where predictability fundamentally changed for structural reasons.

So essentially, the market has gotten smarter, faster, and much much harder to beat using these kinds of traditional characteristic based signals, especially for larger companies. Precisely. For non-micro cap stocks, the new reality seems pretty stark. Post 2003 almost no characteristics based anomalies have shown statistically or economically significant predictive power. The arbitrageurs powered by technology and better information. They just got incredibly good at their jobs incredibly fast. For someone who started investing after 2003, their entire experience of the market has been one where these traditional signals largely vanished for big stocks. That's a completely different world than what someone investing in the 80s or '90s would remember.

And this study isn't just about critique, right? It also off for some valuable guidance for future financial models and maybe even investment strategies. It absolutely does for future research and potentially for practical investment model building. The study strongly suggests a few things:

•

First, models should probably weight post 2003 data much more strongly than pre-2003 data given that profound market shift. The pre-2003 world is just different.

•

Second, it highlights the absolute necessity to condition the return generating process on firm size. You really need to acknowledge the clear differences between micro caps and larger, more liquid stocks. Treat them differently.

Right. Can't lump them together. Exactly. And crucially, it recommends using only the independently identified characteristics, those 12 or perhaps even fewer post 2003 for control purposes in your models. Don't clutter your analysis with the broader, often spurious set from the zoo. So, for you listening, this deep dive really helps cut through the clutter. It tells you that the vast majority of those hundreds of factors you might read about, the ones that maybe sounded promising, well, they might not have independent predictive power anymore, especially in today's market for larger stocks.

Exactly. It helps you focus your attention on the few signals that might actually still matter. Or perhaps even more importantly, understand that market dynamics aren't static. They evolve sometimes quite rapidly. So to quickly recap them, the vast zoo of stock characteristics, which once seemed to offer almost endless predictive insights, actually yields very few true independent predictors when rigorously tested together. And even those few have largely disappeared for larger, more liquid stocks since that dramatic market shift around 2003, likely driven by regulatory and technological changes that boosted market efficiency.

It really leaves you with a provocative thought to maul over, doesn't it? In an increasingly efficient, increasingly digitally driven market, just how much predictability based on past patterns can genuinely remain? And where might the next true independent signals for stock returns emerge, if they emerge anywhere at all. It makes you consider how our very pursuit of knowledge, our attempts to find patterns might actually alter the very landscape we're trying to understand.

Learn how to subscribe to macro data sources and forecasts: https://www.exponential-tech.ai/

Learn more about XTech Products: https://calendly.com/xtech-marketing-leads/brief-introduction-to-exponential?from=slack&month=2025-08

Product

Data Catalog

Pricing

Professional Services

Research Services

Data Monetization

About Exponential

Team

Contact

Careers

Blog

The $7 Billion Reversal: Why Institutions Bought Microsoft's AI Capex Panic While Retail Sold the Bottom

Recent Posts

Comments

Unlock Your Data's Potential Today.

About

Learn

Engage

Legal

Unlock Your Data's Potential Today