From data governance to data infrastructure

AI governance must shift from policy-based oversight to infrastructure-embedded, continuous systems of control, because only governance designed into data architectures can enable scalable, trustworthy, and adaptive AI.

Sanchez P.

4/17/202626 min read

Abstract

The accelerating integration of artificial intelligence (AI) into organisational decision-making exposes fundamental limitations in prevailing data governance frameworks. Existing approaches, grounded in notions of data as a stable asset governed through policy, ownership, and retrospective control, are increasingly misaligned with the continuous, distributed, and adaptive nature of AI-driven systems. This paper argues that the challenge is not incremental improvement, but a conceptual mismatch between governance paradigms and the infrastructural logic of contemporary data environments.

Drawing on data governance research, digital infrastructure theory, and emerging work on AI systems, the paper identifies five structural tensions—temporal, structural, epistemic, lifecycle, and control—that systematically undermine the effectiveness of existing governance models. These tensions reveal that governance frameworks premised on stability, boundedness, and retrospective intervention are incompatible with systems characterised by real-time data flows, probabilistic reasoning, and continuous evolution.

In response, the paper advances a reconceptualization of data governance as an infrastructural capability embedded within data architectures. It defines governance-as-infrastructure as the integration of control, accountability, and quality mechanisms directly into the design and operation of data systems, enabling continuous, distributed, and lifecycle-oriented governance. Building on this foundation, the paper develops a multi-dimensional framework for AI-ready data governance, encompassing temporal, architectural, lifecycle, and epistemic dimensions, and articulates a set of design principles for implementation.

The paper contributes to the literature in three ways. First, it provides a critical re-evaluation of data governance theory, exposing its underlying assumptions and limitations. Second, it introduces governance-as-infrastructure as a novel conceptual lens that bridges data governance, digital infrastructure, and AI systems. Third, it reframes governance as a precondition for scalable and trustworthy AI innovation, rather than a constraint upon it.

Ultimately, the paper argues that the future of AI depends not only on advances in algorithms, but on the development of governance infrastructures capable of supporting continuous, adaptive, and accountable data-driven systems.

1. Introduction

The rapid integration of artificial intelligence (AI) into organisational processes is transforming the nature of decision-making. What was once episodic, human-centred, and deliberative is increasingly continuous, data-driven, and automated. In this shift, data no longer functions merely as an input to decision-making; it becomes the substrate through which organisational action is produced, updated, and scaled in real time.

Despite this transformation, prevailing data governance frameworks remain conceptually anchored in an earlier paradigm. Existing approaches largely conceptualise governance as the exercise of control over stable data assets through formal policies, predefined roles, and retrospective oversight mechanisms. While these models have proven effective in traditional information systems environments, they are increasingly misaligned with the dynamic, adaptive, and continuously evolving nature of AI-driven systems.

This misalignment reflects a deeper theoretical limitation. Contemporary data governance research remains grounded in a static and organisational view of governance, whereas AI systems operate according to an infrastructural and temporal logic characterised by continuous data flows, recursive feedback loops, and ongoing model adaptation (Kitchin, 2014; Sculley et al., 2015). As a result, governance mechanisms based on periodic validation, hierarchical control, and discrete accountability structures are structurally incapable of regulating systems that evolve in real time.

This paper addresses this gap by advancing a central claim: data governance must be reconceptualised as an infrastructural capability rather than an organisational function.

From this perspective, governance is not something applied to data systems from the outside. Instead, it is embedded within the architecture of those systems, operating continuously through the configuration of data pipelines, platforms, and machine learning processes. Governance, therefore, becomes an emergent property of socio-technical infrastructures, rather than a layer of external oversight.

To develop this argument, the paper synthesises three bodies of literature:

  1. Data governance and data quality research

  2. Digital infrastructure and information systems theory

  3. Emerging work on AI and machine learning governance

Through this synthesis, it identifies a set of structural tensions that explain the growing inadequacy of existing governance models. These tensions arise from fundamental mismatches between:

  • static control mechanisms and dynamic data flows

  • fragmented organisational architectures and integrated AI systems

  • input-focused governance and lifecycle-based system behaviour

These tensions are not merely operational challenges; they are conceptual contradictions that limit the capacity of organisations to govern AI systems effectively.

The analysis is guided by the following research question:

How must data governance be reconfigured to align with the infrastructural and temporal logic of contemporary AI systems?

In addressing this question, the paper makes three contributions.

First, it offers a critical re-evaluation of data governance theory, exposing the implicit assumptions—stability, boundedness, and retrospective control—that underpin existing frameworks. Second, it introduces the concept of governance-as-infrastructure, extending digital infrastructure theory into the domain of data governance and repositioning governance as a continuously operating, embedded system capability. Third, it develops a conceptual framework for AI-ready data governance, identifying key design principles centred on continuous operation, lifecycle integration, and socio-technical alignment.

Taken together, these contributions argue that the challenge of governing AI is not one of incremental improvement, but of conceptual transformation. The future of trustworthy and scalable AI depends on the extent to which governance can be reimagined as an integral component of the infrastructures that produce and sustain intelligent systems.

2. Theoretical Foundations and Limits of Data Governance

2.1 Data Governance as Control: Conceptual Foundations

The dominant perspective in the information systems literature conceptualises data governance as the exercise of authority and control over data assets (Khatri and Brown, 2010). Within this view, governance is operationalised through the allocation of decision rights, the specification of data ownership, and the implementation of policies designed to ensure data quality, security, and compliance. Governance, in essence, is framed as a problem of who decides and how those decisions are enforced.

Subsequent research has elaborated this perspective by examining governance mechanisms such as stewardship roles, standards, and formalised processes (Otto, 2011; Weber, Otto and Österle, 2009). These contributions reinforce a managerial understanding of governance as an organisational function—one that can be designed, implemented, and monitored through formal structures.

However, this body of work rests on a set of implicit but foundational assumptions that are rarely interrogated.

First, it assumes environmental stability: that data contexts change slowly enough for governance to be codified in relatively fixed policies and procedures.
Second, it assumes data boundedness: that data can be clearly delineated, owned, and controlled within organisational boundaries. Third, it assumes retrospective governability: that governance can be effectively exercised after data has been created, through monitoring, auditing, and correction.

These assumptions reflect the conditions of early enterprise systems—centralised, structured, and predictable environments in which governance as control is both feasible and effective. Yet, in contemporary data environments, these assumptions no longer hold. Data is increasingly fluid, distributed, and continuously generated across organisational and technical boundaries. As a result, governance models grounded in stability, boundedness, and retrospective control are not simply strained—they are conceptually misaligned with the systems they seek to regulate.

2.2 Data Quality as a Governance Proxy

Within this control-oriented paradigm, data quality has emerged as a central proxy for governance effectiveness. A substantial body of research has developed multidimensional frameworks for assessing quality, including accuracy, completeness, consistency, and timeliness (Wang and Strong, 1996; Batini and Scannapieco, 2016). High-quality data is implicitly treated as evidence of effective governance.

However, this substitution introduces a critical limitation: data quality is inherently retrospective. Quality is assessed after data has been generated, processed, and often already used in decision-making. Governance, therefore, becomes reactive—triggered by the detection of errors rather than preventing their emergence.

This retrospective orientation is particularly problematic in the context of AI systems. As Sculley et al. (2015) demonstrate, machine learning systems exhibit complex and often opaque data dependencies, where minor inconsistencies can propagate through models and amplify over time. In such environments, errors are not isolated events but systemic phenomena, capable of degrading model performance and distorting downstream decisions.

Consequently, the reliance on data quality as a proxy for governance reflects—and reinforces—the assumption of retrospective control. While data quality remains necessary, it is insufficient as a primary governance mechanism in systems where the cost of delayed intervention increases exponentially.

This limitation points to the need for a shift from measurement and remediation toward prevention and continuous assurance, where governance operates concurrently with data processing rather than after the fact.

2.3 Governance and the Neglect of Infrastructure

A more fundamental limitation of the data governance literature lies in its limited engagement with digital infrastructure theory. Tilson, Lyytinen and Sørensen (2010) argue that contemporary digital systems are best understood as infrastructures: shared, evolving, and layered socio-technical systems that enable a wide range of activities.

Such infrastructures exhibit three defining characteristics:

  • Modularity, enabling recombination and extensibility

  • Distributed control, involving multiple actors and stakeholders

  • Generativity, allowing for unanticipated uses and innovations

Despite these properties, data governance research has remained largely focused on organisational hierarchies and formal control mechanisms. Governance is treated as something that organisations apply to data, rather than something that is embedded within the infrastructures through which data is produced and circulated.

This omission has significant consequences. Governance mechanisms designed for centralised and hierarchical systems are ill-suited to infrastructures that are inherently distributed, dynamic, and evolving. As a result, governance becomes increasingly decoupled from system operation, existing as an external layer that cannot effectively shape or constrain system behaviour.

Hanseth and Lyytinen (2010) further emphasise that infrastructures evolve through path-dependent processes, making them resistant to top-down intervention. This suggests that governance cannot be imposed exogenously, but must instead be designed into the architecture of the system itself.

The neglect of infrastructure, therefore, represents not just a gap in the literature, but a fundamental limitation in how governance is conceptualised.

2.4 The Emergence of AI and the Reconfiguration of Data

The rise of AI intensifies these limitations by fundamentally reconfiguring the relationship between data and decision-making. In traditional systems, data supports decisions; in AI systems, data actively constitutes decision-making processes. Kitchin (2014) characterises this shift as the emergence of data-driven systems, where decisions are generated through continuous analysis of real-time data streams. Similarly, Amershi et al. (2019) emphasise that machine learning systems are not static artefacts, but evolving systems shaped by ongoing data inputs and feedback loops.

This transformation introduces three critical properties. First, temporal compression: decision-making occurs in real time, significantly reducing the window for governance intervention. Second, opacity: complex models limit transparency and challenge traditional forms of accountability (Burrell, 2016). Third, adaptivity: systems evolve over time, rendering static governance frameworks obsolete.

Taken together, these properties directly undermine the assumptions identified earlier. Stability is replaced by continuous change, boundedness by distributed data flows, and retrospective control by the need for real-time intervention. As a result, the emergence of AI does not merely extend existing governance challenges—it renders the prevailing conceptual model of governance inadequate.


3. Structural Tensions in AI-Driven Data Environments

Building on the limitations identified in the preceding section, this paper argues that the inadequacy of existing data governance frameworks can be understood through a set of structural tensions. These tensions are not isolated challenges, but systematic misalignments between the assumptions underpinning traditional governance models—stability, boundedness, and retrospective control—and the operational realities of AI-driven data environments.

Taken together, these tensions constitute a conceptual framework of governance misfit, explaining why incremental adaptations to existing models are insufficient. Rather than representing implementation gaps, they reveal deep incompatibilities between governance paradigms and infrastructural systems.

3.1 Temporal Tension: Periodic Governance vs Continuous Systems

Traditional data governance operates on a periodic temporal logic, structured around audits, reviews, and reporting cycles. These mechanisms assume that data systems evolve slowly enough to allow for intermittent oversight and intervention. In contrast, AI systems operate on a continuous temporal logic, processing streaming data and generating decisions in real time. Model outputs are produced instantaneously, and system behaviour evolves dynamically through feedback loops and ongoing data ingestion.

This creates a fundamental temporal tension:

  • Governance is episodic and lagging

  • Systems are continuous and real-time

As Alles, Kogan and Vasarhelyi (2008) suggest in the context of continuous auditing, traditional oversight mechanisms are increasingly unable to keep pace with system activity. However, such approaches have not been fully extended into data governance. The consequence is a persistent governance lag, in which control is exercised only after decisions have already been made. In high-stakes environments, this lag is not merely inefficient—it is risk-generating, allowing errors and biases to propagate before intervention is possible.

This temporal tension directly reflects the breakdown of retrospective governability, highlighting the need for governance mechanisms that operate synchronously with system execution.

3.2 Structural Tension: Fragmentation vs Integration

Data governance frameworks are typically organised around organisational and system boundaries, reflecting the assumption that data can be segmented, owned, and managed within discrete domains. This results in governance structures that are inherently fragmented, with responsibility distributed across silos.

AI systems, however, depend on integrated data environments. Their performance relies on the aggregation and reconciliation of data from multiple sources, often spanning organisational, technical, and even institutional boundaries. As Dong and Srivastava (2015) demonstrate, inconsistencies in data representation across sources can significantly degrade system performance.

This produces a structural tension:

  • Governance is localised and siloed

  • AI systems require integration and interoperability

The persistence of fragmentation is not merely a technical issue but a governance outcome, reinforced by decentralised ownership and inconsistent standards. As a result, governance structures actively inhibit the integration that AI systems depend upon. This tension extends the breakdown of data boundedness, revealing that governance models premised on clear ownership and control struggle in environments where data is inherently distributed and relational. Resolving this tension requires a shift from domain-level governance to ecosystem-level coordination, where governance operates across, rather than within, system boundaries.

3.3 Epistemic Tension: Deterministic Rules vs Probabilistic Models

Traditional governance frameworks are grounded in a deterministic epistemology, where rules define acceptable states and deviations can be clearly identified and corrected. This approach assumes that system behaviour is predictable, transparent, and rule-governed.

AI systems, by contrast, operate on a probabilistic epistemology. Machine learning models generate outputs based on statistical inference, producing predictions that are inherently uncertain and context-dependent.

This creates an epistemic tension:

  • Governance assumes certainty and rule-based logic

  • AI systems operate under uncertainty and statistical inference

As Burrell (2016) argues, algorithmic opacity arises not only from technical complexity but from the probabilistic nature of models themselves. This undermines governance mechanisms that rely on clear causal explanations and deterministic accountability. The consequence is a loss of epistemic alignment: governance frameworks are unable to meaningfully interpret, evaluate, or constrain system behaviour because they rely on fundamentally incompatible assumptions about how knowledge is produced. This tension amplifies the challenges identified in earlier sections, as uncertainty interacts with temporal compression and system complexity, further limiting the effectiveness of retrospective and rule-based governance approaches.

3.4 Lifecycle Tension: Input Control vs System Evolution

Existing governance frameworks are predominantly focused on data inputs, emphasising validation, cleansing, and standardisation at the point of entry. This reflects an implicit assumption that controlling inputs is sufficient to ensure system integrity.

However, AI systems are lifecycle-driven, transforming data across multiple stages, including training, validation, deployment, and continuous learning. At each stage, data is reshaped, reinterpreted, and fed back into the system, influencing future behaviour.

This creates a lifecycle tension:

  • Governance is input-focused

  • AI systems are evolutionary and recursive

As Rahm and Do (2000) highlight, data cleaning and integration are critical, but they represent only a fraction of the processes that shape data within AI systems. Post-deployment phenomena such as model drift and feedback loops (Gama et al., 2014) further complicate the picture, introducing dynamics that cannot be addressed through input controls alone. The result is a governance blind spot, where significant transformations of data—and their implications for system behaviour—occur outside the scope of governance. This tension reinforces the inadequacy of retrospective and bounded models of governance, demonstrating the need for end-to-end lifecycle visibility and control.

3.5 Control Tension: Centralised Authority vs Distributed Systems

Finally, traditional governance models rely on centralised authority structures, where decision rights are clearly assigned and enforced through hierarchical mechanisms. This reflects the assumption that governance can be coordinated from a central point of control. Digital infrastructures, however, are increasingly distributed, involving multiple actors, platforms, and systems that evolve independently yet remain interconnected (Weill and Woerner, 2018).

This creates a control tension:

  • Governance is centralised and hierarchical

  • Systems are distributed and networked

In such environments, centralised governance becomes both ineffective and constraining. It struggles to scale across complex systems and often becomes a bottleneck, slowing down innovation and system responsiveness.

Moreover, distributed systems diffuse responsibility, making it difficult to assign accountability using traditional role-based models. Governance, therefore, becomes fragmented and inconsistent, further exacerbating the structural and lifecycle tensions identified above.

This tension reflects the breakdown of all three core assumptions—stability, boundedness, and retrospective control—culminating in a governance model that cannot effectively operate within distributed infrastructures.

3.6 Synthesis: From Tensions to Conceptual Breakdown

Taken together, these five tensions reveal a fundamental insight: The failure of contemporary data governance is not due to poor implementation, but to a mismatch between governance paradigms and infrastructural realities. Each tension represents a different dimension of this mismatch—temporal, structural, epistemic, lifecycle, and organisational—but they are deeply interconnected. Temporal lag amplifies lifecycle blind spots; fragmentation undermines integration; probabilistic uncertainty challenges control and accountability.

Collectively, they demonstrate that existing governance frameworks are systematically misaligned with AI-driven systems. This analysis leads directly to the central implication of the paper: data governance cannot be reformed within its current conceptual paradigm—it must be reconceptualised. The following section develops this reconceptualization by introducing the concept of governance-as-infrastructure, which resolves these tensions by embedding

4. Toward Governance-as-Infrastructure

The structural tensions identified in the preceding section demonstrate that the limitations of contemporary data governance are not merely operational but conceptual. Existing frameworks are grounded in a paradigm of governance as external control—applied to data systems through policies, roles, and oversight mechanisms. As shown, this paradigm is fundamentally incompatible with the continuous, distributed, and adaptive nature of AI-driven data environments.

This section advances a reconceptualization: Data governance should be understood as an infrastructural capability embedded within the architecture and operation of data systems. Rather than being imposed externally, governance is enacted internally—through the design, configuration, and interaction of socio-technical components. In this view, governance is not a layer on top of data systems, but a property of them.

4.1 Defining Governance-as-Infrastructure

Governance-as-infrastructure can be defined as: the embedding of governance mechanisms within data architectures such that control, accountability, and quality are continuously enacted through system design and operation.

This definition departs from traditional models in three key ways. First, it repositions governance from an organisational function to a system property. Governance is no longer primarily enacted through roles, committees, or policies, but through the configuration of data pipelines, platforms, and machine learning systems. Second, it shifts governance from episodic intervention to continuous execution. Control mechanisms operate in real time, triggered by data events and system behaviour rather than periodic review cycles. Third, it redefines accountability as systemic rather than role-based. Responsibility is distributed across the architecture, embedded in processes such as data validation, model monitoring, and feedback management. Together, these shifts establish governance as an emergent and operational characteristic of infrastructure, rather than an external constraint.

4.2 Core Properties of Governance-as-Infrastructure

To move beyond abstraction, governance-as-infrastructure can be specified through four core properties. These properties define how governance operates within AI-driven systems and collectively distinguish it from traditional models.

1. Embeddedness

Governance mechanisms are integrated directly into data systems, including pipelines, platforms, and models.

  • Validation rules are executed within data flows

  • Monitoring is built into system processes

  • Controls are enacted automatically during operation

This eliminates the separation between system execution and governance oversight.

2. Continuity

Governance operates continuously rather than periodically.

  • Controls are triggered by events, not schedules

  • Monitoring occurs in real time

  • Interventions are immediate rather than delayed

This directly resolves the temporal tension identified in Chapter 3 by aligning governance with the continuous nature of AI systems.

3. Distribution

Governance is enacted across multiple components and actors rather than centralised.

  • Responsibility is shared across systems and teams

  • Control mechanisms are decentralised

  • Governance scales with system complexity

This addresses the control tension, enabling governance to function within distributed infrastructures.

4. Lifecycle Integration

Governance extends across the full lifecycle of data and AI systems.

  • From data ingestion to model deployment and monitoring

  • Including feedback loops and system evolution

  • With traceability across all transformations

This resolves the lifecycle tension, ensuring that governance is not limited to inputs but encompasses ongoing system behaviour.

4.3 Resolving the Structural Tensions

The significance of governance-as-infrastructure lies in its ability to systematically resolve the tensions identified in Chapter 3.

  • The temporal tension is addressed through continuous, event-driven governance mechanisms that operate in real time.

  • The structural tension is mitigated through integrated architectures that embed governance across data ecosystems rather than within silos.

  • The epistemic tension is addressed by incorporating probabilistic monitoring, performance metrics, and model evaluation directly into system operation.

  • The lifecycle tension is resolved through end-to-end governance spanning data and model lifecycles.

  • The control tension is addressed through distributed governance mechanisms embedded across infrastructures.

Rather than attempting to adapt existing governance models to these challenges, governance-as-infrastructure reconfigures the underlying logic of governance itself, aligning it with the properties of AI-driven systems.

4.4 Theoretical Foundations

This reconceptualization is grounded in digital infrastructure theory, which emphasises that modern information systems are shared, evolving, and layered socio-technical constructs (Tilson et al., 2010; Hanseth and Lyytinen, 2010). Within such systems, control cannot be effectively imposed from above; it must be designed into the architecture.

At the same time, emerging research in AI governance highlights the need for:

  • continuous monitoring

  • built-in accountability

  • lifecycle management (Amershi et al., 2019)

Governance-as-infrastructure integrates these perspectives by extending infrastructure theory into the domain of governance and aligning it with the operational realities of machine learning systems. In doing so, it bridges a critical gap in the literature, providing a conceptual model that connects data governance, digital infrastructure, and AI system design.

4.5 Implications of the Reconfiguration

Reconceptualising governance as infrastructure has several implications for both theory and practice. First, it redefines the locus of governance. Governance shifts from organisational structures to technical architectures, making system design the primary site of control. Second, it reconfigures the role of time. Governance is no longer retrospective but operates in synchrony with system processes. Third, it transforms accountability. Responsibility is embedded in workflows, code, and system interactions, rather than assigned solely to individuals or roles. Fourth, it blurs the boundary between governance and engineering. Data engineers, platform architects, and machine learning practitioners become central actors in governance, as their design choices directly determine governance outcomes. These implications highlight that governance-as-infrastructure is not simply a technical adjustment, but a socio-technical transformation that reshapes how organisations design, operate, and regulate data systems.

4.6 Transition to AI-Ready Governance

Having established governance-as-infrastructure as a conceptual foundation, the next section develops this perspective into a multi-dimensional framework for AI-ready data governance.

Specifically, it operationalises this model across four dimensions:

  • temporal (continuous governance)

  • architectural (integrated ecosystems)

  • lifecycle (end-to-end control)

  • epistemic (probabilistic governance)

Through this framework, the paper moves from conceptual redefinition to practical design principles, demonstrating how governance-as-infrastructure can be implemented in AI-driven environments.

5. AI-Ready Data Governance: A Conceptual Framework

Building on the reconceptualization of governance as infrastructure, this section develops a multi-dimensional framework for AI-ready data governance. The framework operationalises governance-as-infrastructure by translating its core properties—embeddedness, continuity, distribution, and lifecycle integration—into four interrelated dimensions of system design.

These dimensions—temporal, architectural, lifecycle, and epistemic—collectively define how governance must be configured to align with the operational logic of AI-driven systems. Rather than representing independent components, they function as an integrated model in which each dimension reinforces the others.

5.1 Temporal Dimension: Continuous and Event-Driven Governance

The temporal dimension operationalises the principle of continuity, replacing periodic governance with continuous, event-driven control mechanisms.

In AI-driven environments, governance must operate synchronously with system processes. This requires architectures in which governance is triggered by data events—such as ingestion, transformation, or model inference—rather than scheduled audits or reviews.

Key design elements include:

  • Streaming data pipelines, enabling real-time data processing

  • Embedded validation rules, executed during data flow

  • Automated anomaly detection, identifying deviations as they occur

  • Real-time monitoring systems, providing continuous visibility

Through these mechanisms, governance becomes co-extensive with system operation, eliminating the lag between action and oversight.

However, continuous governance also introduces new risks, including over-automation and alert fatigue. As such, it requires:

  • calibration mechanisms

  • threshold tuning

  • human-in-the-loop escalation processes

This dimension directly resolves the temporal misalignment identified in Chapter 3 by ensuring that governance operates at the same speed as AI systems.

5.2 Architectural Dimension: Integrated and Governable Data Ecosystems

The architectural dimension reflects the principle of embeddedness, focusing on how governance is built into the structure of data systems.

AI-ready governance requires a shift from fragmented, siloed architectures to integrated data ecosystems in which governance mechanisms are embedded across systems and domains.

Core components include:

  • Unified data models and schemas, enabling consistency

  • Interoperability standards, supporting cross-system integration

  • Master data management (MDM), ensuring entity consistency

  • Data lineage and traceability systems, providing visibility across transformations

In this configuration, governance is not applied externally but is encoded within the architecture itself. Data structures, interfaces, and pipelines become the primary vehicles through which governance is enacted.

At the same time, integration introduces systemic risk, as failures can propagate across interconnected systems. Effective governance therefore requires balancing:

  • integration (for AI performance)

  • modularity (for system resilience)

This dimension resolves the structural tension between fragmentation and integration by repositioning governance as a property of ecosystem design rather than organisational boundaries.

5.3 Lifecycle Dimension: End-to-End Governance of AI Systems

The lifecycle dimension operationalises lifecycle integration, extending governance across the full trajectory of data and model development. AI systems are not static artefacts; they evolve through iterative cycles of training, deployment, and adaptation. Governance must therefore encompass all stages of this lifecycle.

This includes:

  1. Data collection and ingestion → provenance, consent, and quality controls

  2. Preprocessing and feature engineering → transformation traceability

  3. Model training and validation → bias, representativeness, and robustness

  4. Deployment and inference → performance monitoring and reliability

  5. Post-deployment monitoring → drift detection and feedback management

Each stage introduces distinct governance requirements, but these must be coordinated as part of a unified lifecycle system.

Key mechanisms include:

  • continuous model performance monitoring

  • automated drift detection systems

  • versioning and reproducibility frameworks

  • end-to-end documentation and audit trails

By embedding governance across these stages, organisations eliminate the lifecycle blind spot identified in Chapter 3 and ensure that system behaviour remains observable and controllable over time.

5.4 Epistemic Dimension: Governing Probabilistic Systems

The epistemic dimension operationalises the principle of distribution, recognising that governance must function within environments characterised by uncertainty and probabilistic reasoning. Traditional governance assumes deterministic outcomes; AI systems do not. Governance must therefore shift toward managing uncertainty, risk, and statistical performance.

This involves:

  • defining acceptable performance thresholds (e.g., precision, recall, error rates)

  • continuously evaluating model outputs

  • implementing explainability and interpretability mechanisms

  • enabling auditability of model decisions and behaviour

Rather than enforcing fixed rules, governance becomes a process of ongoing evaluation and calibration, adapting to changes in data and system performance.

Importantly, epistemic governance is inherently distributed:

  • across models, datasets, and evaluation metrics

  • across technical and organisational actors

  • across different stages of the lifecycle

This dimension resolves the epistemic tension by aligning governance with the probabilistic nature of AI systems, enabling organisations to manage uncertainty rather than attempting to eliminate it.

5.5 Integrated Framework: Governance as a Systemic Capability

While each dimension addresses a specific aspect of governance, their full significance emerges in combination.

  • Continuous governance (temporal) requires integrated architectures (architectural)

  • Lifecycle visibility (lifecycle) depends on embedded data systems (architectural)

  • Probabilistic evaluation (epistemic) relies on continuous monitoring (temporal)

Together, these interdependencies demonstrate that AI-ready governance is not a collection of discrete mechanisms, but a systemic capability emerging from the interaction of multiple infrastructural components.

This reinforces the central argument of the paper: Governance is not an external function applied to AI systems—it is an emergent property of the infrastructures that constitute them.

5.6 Design Principles for AI-Ready Governance

From this framework, a set of overarching design principles can be derived:

  1. Embed governance in system architecture rather than organisational overlays

  2. Operate governance continuously, aligned with real-time data flows

  3. Extend governance across the full lifecycle of data and models

  4. Design for probabilistic accountability, incorporating uncertainty into governance mechanisms

  5. Distribute governance across systems and actors, enabling scalability

These principles provide a foundation for both theoretical development and practical implementation, translating governance-as-infrastructure into actionable system design.

6. Socio-Technical Alignment in AI Governance

The reconceptualization of data governance as infrastructure has implications that extend beyond system design. Because governance is embedded within socio-technical architectures, its transformation necessitates simultaneous changes in organisational structures, regulatory approaches, and ethical frameworks.

This section argues that AI-ready governance is not purely a technical achievement, but a problem of alignment across technical, organisational, and institutional domains. Without such alignment, even well-designed governance infrastructures risk becoming ineffective or misapplied.

6.1 Organisational Transformation: From Roles to Capabilities

Traditional data governance is organised around roles and responsibilities, such as data owners, stewards, and governance committees. These structures assume that governance can be coordinated through hierarchical oversight and clearly defined accountability. However, under governance-as-infrastructure, control is embedded within systems and enacted continuously through technical processes. As a result, governance shifts from a role-based model to a capability-based model, where organisational effectiveness depends on the ability to design, operate, and maintain governance-enabled infrastructures.

This transformation has several implications. First, governance becomes inseparable from engineering practice. Data engineers, platform architects, and machine learning practitioners are no longer peripheral to governance—they are its primary agents, as their design decisions directly determine how governance is enacted in practice.

Second, traditional roles must evolve. Data stewards, for example, must move beyond static data quality management to engage with:

  • real-time data flows

  • model lifecycle monitoring

  • cross-system dependencies

Third, governance requires cross-functional integration. Because governance is distributed across systems, it cannot be confined to a single organisational unit. Instead, it must be coordinated across:

  • engineering teams

  • risk and compliance functions

  • business units

This reflects a broader shift toward platform-based organisational models, where capabilities are embedded across functions rather than centralised.

Without this organisational transformation, governance-as-infrastructure cannot be effectively realised, as the technical embedding of governance must be matched by corresponding human and institutional capabilities.

6.2 Regulatory Transformation: From Audit to Continuous Compliance

Regulatory approaches to data and AI governance have traditionally relied on documentation, auditability, and periodic review. These mechanisms reflect the same assumptions identified earlier: stability, boundedness, and retrospective control. However, AI systems challenge these assumptions by operating in real time, evolving continuously, and producing decisions at scale. As a result, traditional regulatory models are increasingly misaligned with system behaviour. Under governance-as-infrastructure, compliance must be reconceptualised as a continuous and system-integrated process.

This implies several shifts. Firstly, regulation must move toward machine-readable and executable forms, enabling rules to be embedded directly within systems and enforced automatically during operation. Secondly, compliance must become continuous rather than periodic, supported by real-time monitoring and reporting mechanisms that align with the temporal dynamics of AI systems. Thirdly, auditing must evolve into system-level observability, focusing not only on outcomes but on the processes through which those outcomes are generated. This includes:

  • data lineage tracking

  • model behaviour monitoring

  • lifecycle traceability

Bamberger’s (2010) notion of responsive regulation is particularly relevant here, as it emphasises adaptability and feedback. Governance-as-infrastructure operationalises this idea by enabling regulation to be enacted dynamically through system design.

However, this transformation also raises challenges. Embedding regulation within systems risks:

  • over-formalisation of complex ethical and legal principles

  • reduced flexibility in ambiguous contexts

  • dependence on technical implementation quality

As such, regulatory transformation must balance automation with interpretive oversight, ensuring that continuous compliance does not become rigid or reductionist.

6.3 Ethical Alignment: From Principles to Embedded Practice

Ethical concerns in AI—such as bias, fairness, accountability, and transparency—are widely recognised (Barocas and Selbst, 2016; Mittelstadt et al., 2016). However, these concerns are often addressed through high-level principles or post hoc evaluations, rather than being systematically integrated into system design.

Governance-as-infrastructure fundamentally alters this approach by requiring that ethical considerations be embedded within the operation of data and AI systems.

This entails a shift from:

  • principle-based ethicsinfrastructure-embedded ethics

In practice, this means that ethical objectives must be translated into:

  • measurable performance metrics (e.g., fairness constraints)

  • monitoring systems (e.g., bias detection)

  • intervention mechanisms (e.g., model retraining or rollback)

For example, fairness is no longer treated as an abstract goal, but as a continuously evaluated property of system outputs, integrated into model monitoring pipelines.

Similarly, transparency becomes a function of system design, supported by:

  • explainability tools

  • traceability mechanisms

  • accessible model documentation

This approach resolves a key limitation of traditional ethical frameworks: their reliance on external evaluation. By embedding ethics within infrastructure, governance ensures that ethical considerations are operationalised, measurable, and enforceable.

However, ethical embedding also introduces new tensions. Not all ethical principles can be easily quantified or codified, and excessive formalisation risks narrowing ethical interpretation. This highlights the need for hybrid governance models, combining technical embedding with human judgement and societal oversight.

6.4 Socio-Technical Integration: Governance as Alignment

Taken together, these transformations demonstrate that governance-as-infrastructure is fundamentally a problem of alignment.

  • Technical systems must support continuous, embedded governance

  • Organisations must develop the capabilities to design and operate such systems

  • Regulatory frameworks must adapt to continuous, system-level compliance

  • Ethical principles must be translated into operational mechanisms

Failure in any one of these domains undermines the effectiveness of governance as a whole. For example:

  • advanced technical systems without organisational capability lead to mismanagement

  • strong regulation without technical embedding leads to symbolic compliance

  • ethical principles without operationalisation lead to limited impact

Thus, AI-ready governance emerges not from isolated interventions, but from the integration of socio-technical elements into a coherent system.

This reinforces the central argument of the paper:

governance is not an external constraint on AI systems, but an emergent property of aligned socio-technical infrastructures.

7. Implications for the Future of AI Innovation

The reconceptualization of data governance as infrastructure has profound implications not only for control and accountability, but for the nature and trajectory of AI innovation itself. Contrary to the prevailing assumption that governance constrains innovation, this paper argues that in AI-driven environments: innovation is structurally dependent on governance capabilities embedded within data infrastructures.

This section develops this claim by examining how governance-as-infrastructure reshapes the conditions under which AI systems can be developed, scaled, and sustained.

7.1 Governance as a Precondition for Scalable AI Innovation

Traditional perspectives often position governance as an external constraint—something that must be balanced against speed, experimentation, and flexibility. This view may hold in environments where innovation is episodic and loosely coupled from data dependencies.

However, AI systems fundamentally alter this relationship. Because AI performance depends on continuous data flows, model retraining, and feedback loops, innovation is inseparable from the quality, reliability, and governability of underlying data infrastructures.

In this context, governance-as-infrastructure becomes a precondition for scalability.

Without embedded governance:

  • data pipelines degrade in quality and consistency

  • models accumulate technical and statistical debt (Sculley et al., 2015)

  • feedback loops amplify errors and biases

  • system behaviour becomes unpredictable and difficult to control

As a result, innovation cannot scale beyond isolated use cases. What appears as rapid experimentation often results in fragile systems that fail under production conditions.

By contrast, governance-enabled infrastructures support:

  • continuous validation, ensuring data integrity in real time

  • lifecycle monitoring, maintaining model performance over time

  • traceability, enabling reproducibility and debugging

  • controlled adaptation, allowing systems to evolve without loss of reliability

Thus, governance does not slow innovation—it makes sustained innovation possible.

7.2 From Experimentation to Systemic Innovation

Governance-as-infrastructure also transforms the nature of innovation itself. In traditional settings, innovation is often understood as discrete experimentation—developing and deploying individual models or applications.

In AI-driven environments, innovation becomes systemic and cumulative, emerging from the interaction of data, models, and infrastructure over time.

This shift is enabled by three characteristics of governance-as-infrastructure:

  • Continuity enables ongoing experimentation within production systems, where models can be updated, evaluated, and improved in real time.

  • Lifecycle integration ensures that insights from deployment feed back into training and design, creating iterative learning loops.

  • Embeddedness allows innovation to occur directly within data pipelines and platforms, rather than in isolated development environments.

As a result, innovation is no longer confined to discrete projects. It becomes a continuous process of system evolution, where improvements are incrementally integrated into operational infrastructures.

This redefines innovation from:

  • episodic deploymentcontinuous adaptation

and from:

  • standalone modelsinterconnected systems

Governance-as-infrastructure is what stabilises this process, ensuring that continuous change does not lead to instability or loss of control.

7.3 Competitive Differentiation and Infrastructural Advantage

The shift toward governance-enabled infrastructures also has implications for competitive dynamics. In AI-driven markets, competitive advantage is increasingly determined not by access to algorithms, but by the ability to operationalise and sustain AI systems at scale.

Governance-as-infrastructure becomes a key differentiator because it enables:

  • reliable scaling of AI systems across domains

  • faster iteration cycles without loss of control

  • integration of diverse data sources into unified systems

  • consistent performance in dynamic environments

Firms that lack these capabilities face a structural limitation: they can experiment with AI, but cannot industrialise it.

This suggests that competitive advantage shifts from:

  • model performance alone
    to:

  • infrastructural capability, including governance

In this sense, governance is not a compliance function but a strategic asset, shaping the ability of organisations to generate, deploy, and sustain AI-driven value.

7.4 Systemic Risk and the Limits of Infrastructural Innovation

While governance-as-infrastructure enables innovation, it also introduces new forms of systemic risk. As data systems become more integrated, continuous, and automated, failures can propagate across interconnected infrastructures.

These risks arise from the same properties that enable innovation:

  • Integration increases interdependence, allowing local failures to have system-wide effects

  • Continuity reduces opportunities for intervention, amplifying the speed of error propagation

  • Adaptivity introduces unpredictability, as systems evolve in response to changing data

As a result, innovation and risk become co-produced within the same infrastructural systems.

Governance-as-infrastructure must therefore incorporate mechanisms for:

  • resilience, ensuring systems can absorb and recover from disruptions

  • redundancy, reducing dependence on single points of failure

  • fail-safe design, enabling controlled degradation rather than catastrophic failure

This highlights a critical insight: the goal of governance is not to eliminate risk, but to manage and contain it within complex, adaptive systems.

7.5 Innovation as an Emergent Property of Governed Infrastructures

Taken together, these dynamics suggest a fundamental redefinition of AI innovation.

Innovation is no longer primarily a function of:

  • algorithmic breakthroughs

  • isolated experimentation

  • or individual organisational initiatives

Instead, it becomes an emergent property of governed data infrastructures.

  • Continuous governance enables continuous learning

  • Embedded controls enable safe experimentation

  • Lifecycle integration enables cumulative improvement

  • Distributed systems enable scalable deployment

In this view, governance and innovation are not opposing forces, but mutually constitutive elements of the same system.

This reframing challenges a dominant assumption in both practice and research: that governance must be balanced against innovation. Instead, this paper demonstrates that:

in AI-driven environments, innovation is only sustainable when governance is infrastructural.

8. Conclusion

This paper has argued that the challenges confronting data governance in the age of artificial intelligence are not incremental, but fundamentally conceptual. Existing governance frameworks—rooted in assumptions of stability, boundedness, and retrospective control—are structurally misaligned with AI-driven systems characterised by continuous data flows, distributed architectures, probabilistic reasoning, and ongoing adaptation.

Through a synthesis of data governance research, digital infrastructure theory, and AI systems literature, the analysis identified five structural tensions—temporal, structural, epistemic, lifecycle, and control—that explain this misalignment. These tensions demonstrate that the limitations of current governance approaches are not simply matters of implementation, but reflect a deeper incompatibility between governance paradigms and the infrastructural realities of contemporary data environments.

In response, the paper advanced a reconceptualization of data governance as governance-as-infrastructure. This perspective reframes governance as an embedded, continuous, and systemic capability, enacted through the design and operation of data architectures rather than through external oversight mechanisms. By defining governance in terms of embeddedness, continuity, distribution, and lifecycle integration, the paper provided a conceptual foundation aligned with the operational logic of AI systems.

Building on this foundation, the paper developed a multi-dimensional framework for AI-ready data governance and demonstrated how governance can be operationalised through continuous, event-driven mechanisms, integrated data ecosystems, lifecycle-spanning controls, and probabilistic evaluation processes. It further argued that effective governance requires alignment across socio-technical domains, including organisational capabilities, regulatory frameworks, and ethical practices.

A central implication of this analysis is that governance and innovation are not opposing forces. On the contrary, the paper has shown that sustainable AI innovation is structurally dependent on governance capabilities embedded within data infrastructures. Governance-as-infrastructure enables continuous learning, scalable deployment, and controlled adaptation, positioning it as a foundational enabler of trustworthy and resilient AI systems.

At the same time, the embedding of governance within infrastructure introduces new challenges, including increased system complexity, the risk of over-automation, and the emergence of systemic interdependencies. Addressing these challenges requires ongoing attention to resilience, transparency, and human oversight, ensuring that governance infrastructures remain adaptable and accountable.

Future research should extend this work in three directions. First, empirical studies are needed to examine how governance-as-infrastructure is implemented in practice across different organisational contexts. Second, further theoretical development is required to explore the interaction between infrastructural governance and regulatory systems, particularly in relation to continuous compliance and machine-readable regulation. Third, interdisciplinary research is needed to address the ethical implications of embedding governance within technical systems, including questions of power, control, and societal oversight.

In conclusion, the transformation of data governance is not a peripheral concern, but a central condition for the future of AI. As AI systems become increasingly embedded in organisational and societal processes, the ability to design governance as infrastructure will determine not only their effectiveness, but their legitimacy. Data governance, in this context, is no longer a constraint to be managed—it is a foundational capability through which intelligent, adaptive, and trustworthy systems are made possible.

References

Alles, M., Kogan, A. and Vasarhelyi, M. (2008) ‘Putting Continuous Auditing Theory into Practice: Lessons from Two Pilot Implementations’, Journal of Information Systems (2008) 22 (2): 195–214.

Amershi, S. et al. (2019) ‘Software engineering for ML’, ICSE.

Bamberger, K.A. (2010) ‘Regulation as delegation’, Duke Law Journal, 59(3).

Barocas, S. and Selbst, A.D. (2016) ‘Big data’s disparate impact’, California Law Review, 104(3).

Batini, C. and Scannapieco, M. (2016) Data and Information Quality.

Böhme, R. and Köpsell, S. (2010) ‘Trained to accept?’, WEIS.

Burrell, J. (2016) ‘How the machine “thinks”’, Big Data & Society.

Dong, X.L. and Srivastava, D. (2015) Big Data Integration.

Doshi-Velez, F. and Kim, B. (2017), ‘Towards a Rigorous Science of Interpretable Machine Learning’.

Gama, J. et al. (2014) ‘Concept drift adaptation’, ACM Computing Surveys, 46(4).

Hanseth, O. and Lyytinen, K. (2010) ‘Design Theory for Dynamic Complexity in Information Infrastructures: The Case of Building Internet, Journal of Information Technology.

Khatri, V. and Brown, C.V. (2010) ‘Designing data governance’, CACM.

Kitchin, R. (2014) The Data Revolution.

Mittelstadt, B.D. et al. (2016) ‘The Ethics of algorithms: Mapping the debate’, Big Data & Society.

Otto, B. (2011) ‘Organizing Data Governance’, Communications of the Association for Information Systems.

Rahm, E. and Do, H.H. (2000) ‘Data cleaning: Problems and Current Approaches’, SIGMOD Record.

Sculley, D. et al. (2015) ‘Hidden technical debt’, NeurIPS.

Tilson, D., Lyytinen, K. and Sørensen, C. (2010) ‘Digital infrastructures’, Information Systems Research.

Varshney, K.R. (2019) ‘Trustworthy Machine Learning and Artificial Intelligence , XRDS.

Wang, R.Y. and Strong, D.M. (1996) ‘Beyond accuracy’, Journal of Management Information Systems, Vol. 12, No. 4 (Spring, 1996), pp. 5-33.

Weber, K., Otto, B. and Österle, H. (2009) ‘One Size Does Not Fit All - A Contingency Approach to Data Governance’.

Weill, P. and Woerner, S.L. (2018) Digital Business Models.