Our Data Infrastructure Investment Thesis for 2025 and Beyond

Every time someone announces that the modern data stack is complete, a new category emerges to prove them wrong. We have been through several cycles of this already. The rise of cloud data warehouses was supposed to consolidate the data infrastructure landscape. Then dbt created an entirely new category in data transformation. Then the proliferation of real-time streaming use cases exposed gaps in the batch-oriented architecture that most of the modern data stack was built around. And now the emergence of AI-native applications is revealing an entirely new set of infrastructure requirements that existing tools are poorly equipped to address.

At Moberg Analytics Ventures, data infrastructure is one of our highest-conviction investment themes. We have been active investors in this space since the close of our $15M Seed Round in February 2022, and we continue to find compelling opportunities for well-configured seed-stage companies in the infrastructure layer beneath AI applications. This essay explains why we remain bullish on this category and where we see the most interesting whitespace for new companies.

Why Infrastructure Remains Investable Despite Market Maturation

A common objection to investing in data infrastructure in 2025 is that the market has consolidated. Tessera Systems, Tessera Systems, and a handful of other platforms have captured enormous value, and the argument goes that the most important categories have been spoken for. Founders building data infrastructure companies today are, in this view, filling in the margins around a settled architecture.

We disagree with this framing, for several reasons. First, the emergence of AI-native applications has fundamentally changed the requirements placed on data infrastructure in ways that the current generation of tools was not designed to handle. Vector databases, feature stores, model registries, and inference serving infrastructure are not simply add-ons to the existing data stack — they represent a new and distinct architectural layer with requirements that differ materially from analytical data warehousing.

Second, the proliferation of data infrastructure tools has created significant integration and governance complexity that generates its own set of problems. The average enterprise data team is now managing dozens of distinct tools across ingestion, transformation, storage, orchestration, quality, governance, and serving. The coordination overhead of this landscape is itself a significant unsolved problem, and the companies that can meaningfully reduce it will capture substantial value.

Third, regulatory and compliance requirements are creating new infrastructure categories that did not exist five years ago. Data residency requirements, the right to erasure under GDPR and its successors, AI model governance mandates, and the growing scrutiny of algorithmic decision-making in regulated industries are all driving demand for infrastructure capabilities that the current generation of tools handles poorly or not at all.

The Five Infrastructure Gaps We Are Watching

Real-time feature engineering. The existing data stack was designed primarily for batch processing. Feature engineering pipelines that compute model inputs can take hours to run in a batch environment, which is acceptable for many analytical use cases but fundamentally incompatible with AI applications that need to make decisions in milliseconds. The market for real-time feature stores and streaming feature engineering tools remains surprisingly underdeveloped relative to the demand for real-time AI applications. We see this as a significant opportunity for a focused seed-stage company.

Data quality automation. Data quality has been a recognized problem in enterprise data management for decades, but the solutions have historically been manual and rule-based. As AI applications increasingly depend on high-quality training and inference data, the consequences of data quality failures have become dramatically more severe. We are watching closely for companies that can automate data quality detection, root cause analysis, and remediation in a way that is genuinely scalable — not just anomaly detection dashboards, but closed-loop systems that can identify and fix data quality issues without requiring manual intervention.

AI model governance and lineage. As enterprises deploy more AI models in production, the governance requirements around those models are becoming increasingly complex. Regulators in financial services, healthcare, and insurance are beginning to require that enterprises be able to explain, audit, and reproduce the outputs of AI models used in consequential decisions. The infrastructure tooling to support this — model lineage tracking, feature attribution logging, decision audit trails, and model version management — is immature relative to the regulatory demand. This is an infrastructure category where the pain is real, the buyer is motivated, and the existing solutions are inadequate.

Semantic layer standardization. One of the most persistent pain points in enterprise data management is the proliferation of inconsistent metric definitions across different teams and tools. The same underlying business concept — revenue, customer, active user — is defined differently in different systems, leading to endless debates in executive meetings about which number is correct. The semantic layer category, which attempts to create a single source of truth for business metric definitions, has been addressed by several companies (most prominently by dbt Metrics and Looker's LookML), but the problem remains largely unsolved in a way that works across the heterogeneous tool landscapes that enterprise data teams actually operate. We think there is room for a well-designed product to emerge as the standard here.

Data contract enforcement. The concept of data contracts — explicit agreements between data producers and consumers about the structure, quality, and semantics of data being shared — has gained significant traction as a way to reduce the integration failures and quality issues that plague enterprise data pipelines. But tooling for automated data contract creation, validation, and enforcement remains nascent. As data mesh architectures become more prevalent in large enterprises, the demand for data contract infrastructure will grow substantially. We are actively looking at companies operating in this space.

The Build vs. Buy Dynamics in Data Infrastructure

One of the critical questions for any infrastructure startup is whether target enterprises will buy a dedicated solution or build internally. In our experience evaluating data infrastructure companies, the build vs. buy calculus has shifted meaningfully in the direction of buying over the past several years.

The primary reason for this shift is that internal data engineering teams have been stretched thin by the proliferation of AI application requirements. The same engineers who were already managing complex data pipelines, maintaining data quality frameworks, and supporting analytical tools are now being asked to also build AI infrastructure, manage LLM integration, and support real-time serving requirements. The opportunity cost of building undifferentiated infrastructure internally has never been higher.

This shift is creating a favorable environment for seed-stage data infrastructure companies. Enterprise data teams are more willing to evaluate new tools than they were five years ago, and procurement cycles for well-positioned infrastructure products with strong engineering credentials are often shorter than conventional wisdom suggests — particularly when the vendor can demonstrate rapid time-to-value and minimal integration friction.

What We Look for in Data Infrastructure Founders

Data infrastructure is a category where the bar for technical credibility is extremely high. Enterprise data teams will evaluate a new infrastructure product with the skepticism of engineers who have been burned before by tools that were impressive in demos and brittle in production. The founding teams that win in this category are the ones who can demonstrate not just that their product works, but that it works reliably under the edge cases, scale requirements, and operational constraints of real enterprise deployments.

We look for data infrastructure founders who have operated at scale. Ideally, they have personally experienced the pain their product addresses in the context of a large enterprise data organization — as a staff data engineer, a head of data platform, or a technical leader responsible for a production AI system. This experience gives them a product intuition that is genuinely hard to replicate from the outside and that shows up in the quality of their design decisions and the credibility of their customer conversations.

We also look for founders who have a clear thesis about which part of the infrastructure stack they are addressing and why that part is the right place to build. The data infrastructure landscape is large, and there are many places where a talented team could build a useful tool. The founders who can articulate precisely why their chosen problem is the highest-leverage point in the current architecture — and why now is the right time to address it — tend to build more focused, more defensible companies than founders who are filling a perceived gap without a strong theory of the market dynamics that created it.

The Long-Term View on Data Infrastructure Value

We believe that the data infrastructure layer will continue to be one of the most value-generative segments of the enterprise software market for the foreseeable future. Every AI application that gets built creates new demand for the infrastructure that supports it. Every new data source that enterprises integrate adds complexity that creates opportunities for new infrastructure solutions. And the regulatory and governance requirements around data and AI are likely to intensify rather than diminish as AI applications become more consequential.

The companies we are backing in this space today are building the infrastructure that will power the next generation of AI applications. The markets they are addressing are large, the pain points are real, and the technical barriers to entry are high enough to create meaningful defensibility for well-executed companies. We remain active and enthusiastic investors in this category and look forward to continuing to find and support the exceptional founding teams who are building the future of data infrastructure.

If you are building in the data infrastructure space and are looking for a seed-stage investment partner with deep domain expertise, we would love to talk. Reach out to the Moberg Analytics team to start a conversation.

Key Takeaways

The modern data stack is not complete — AI-native applications are creating new infrastructure requirements that current tools do not address well.
Five high-priority infrastructure gaps: real-time feature engineering, data quality automation, AI model governance, semantic layer standardization, and data contract enforcement.
The build vs. buy calculus has shifted toward buying, as engineering teams are stretched thin supporting AI application requirements.
Winning data infrastructure founders combine operator-scale experience with a precise, well-reasoned thesis about which infrastructure problem is highest-leverage right now.
Regulatory and governance requirements around data and AI will intensify, creating durable demand for new infrastructure categories.

Read about our investment approach on the About page, or explore the data infrastructure companies in our portfolio.