Transparency in
Generative AI Systems

A Playbook for Organizational Leaders & Managers

SCROLL

About This Playbook

Transparency is how leaders turn generative AI from a noisy black box into a system people can interpret, use, and govern responsibly.

What is this playbook?

This playbook helps organizations operationalize transparency in generative AI. It provides ten concrete plays that leaders can use to design, govern, and deploy generative AI (genAI) systems in ways that build trust, support accountability, and enable sustainable adoption.

Who is this playbook for?

This playbook is for leaders and teams responsible for deploying generative AI, including executives, product managers, and responsible AI practitioners — whether they build models or adopt them from external providers.

Why use this playbook?

Transparency is foundational to building trust and sustaining adoption. It helps organizations to build appropriate user understanding, enable employees to use AI tools responsibly, identify and mitigate risks, and prepare for evolving regulation.

What's Inside

This playbook covers three areas: Part 1 provides background and context about AI transparency including what it is, why it matters, and how needs differ across stakeholders. Part 2 explores transparency in practice, including tools, resources, and the challenges organizations face. Part 3 presents ten actionable plays for business leaders spanning governance, tools and oversight, and deployment.

How and by whom was this playbook developed?

This playbook was developed by the Responsible AI Initiative at the UC Berkeley AI Research Lab with Berkeley Haas. It was authored by Genevieve Smith and Natalia Luka. It benefited from research assistance by Naisha Jain, Oliver DeVito, Urmi Sumant, Dominique Wimmer, and Vinaya Sivakumar. The playbook synthesizes existing literature and builds from research conducted by a broader team including Genevieve Smith, Natalia Luka, Sahiba Chopra, Min Kyung Lee, and Dominique Wimmer. This playbook was made possible with funding from Google. Some research supporting the playbook was also supported by Microsoft. (See Appendix for full acknowledgements.)

This playbook was published in May 2026.

How to use this playbook

🔍

New to AI transparency?

Start from the top to explore what transparency is, why it matters, and the current state of the field.

READ PART 1 →

📋

Explore transparency in practice

See what transparency looks like in practice, the tools and resources available, and the challenges organizations face.

EXPLORE PART 2 →

⚡

Ready to act?

Jump straight to the 10 plays — practical steps organized by governance, tools, and deployment.

GO TO THE PLAYS →

Executive Summary

GenAI technologies are rapidly being embedded in products, workplaces, and everyday decision-making. As these systems become more powerful and pervasive in our workplaces and lives – while also evolving into increasingly autonomous AI agents – the need for transparency has never been greater. Transparency is fundamental to responsible and sustainable adoption: it helps people understand AI systems and their outputs in ways that empower them to meaningfully use, trust, and adopt AI tools.

This playbook is for organizational leaders seeking to embed transparency into the design, deployment and governance of genAI systems, offering practical strategies to operationalize transparency across the AI lifecycle.

In practice, transparency is how organizations turn generative AI from a noisy black box into a system people can interpret, use, and govern responsibly.

Need to Know

Five things every leader should understand

Transparency is defined by usefulness, not disclosure alone. Rather than a single checkbox, it's about meeting different stakeholders where they are so they understand AI systems well enough to rely on them. Transparency is like documentation for a building: occupants need clear signs and exits, operators need blueprints, inspectors need compliance reports. Everyone needs transparency, just not the same kind.

Transparency is a business requirement for trust and sustainable adoption. For organizations, transparency supports user trust, better decision-making, regulatory readiness, and risk management. It enables employees, customers, and external stakeholders to understand system limitations and engage with AI appropriately.

Transparency must show up in the product experience. End users need clear signals that help them decide whether to trust an output — such as chain-of-thought reasoning, confidence signals, and citations — that act as signage about AI and where an answer came from.

Transparency must also exist in governance and oversight. For people building, deploying, and governing AI systems, transparency means actively creating and using documentation and evaluations — model and system cards as blueprints, and benchmarks or audits as reported test scores — to manage performance, limitations, and risk at scale.

Both foundation model developers and deployers have important transparency responsibilities. Even when organizational leaders have limited control over upstream model development, they retain significant influence at the system level: how AI interacts with users, what safeguards are implemented, what documentation accompanies deployment, and how the system is evaluated and governed over time.

The Plays

10 actionable plays across three categories

#	Play	Focus
A. Governance & organization Foundational leadership commitments, strategies, policies, and culture to make transparency an organizational priority
1	Make transparency a leadership priority	Leadership commitment & principles
2	Set a transparency strategy across your organization and AI portfolio	Strategy & portfolio management
3	Establish policies and structures to operationalize transparency	Policies, roles & accountability
4	Build incentives and a culture that rewards transparency	Culture & training
B. Transparency tools & oversight mechanisms Practical guidance on documentation, data practices, and evaluation processes that make transparency concrete and auditable
5	Document AI at the right level: models for builders, systems for deployers	Documentation & system cards
6	Maintain transparency around AI development, configuration, and governance	Training data & provenance
7	Maintain transparency around user data, privacy, and personal data use	Privacy & user data
8	Benchmark, evaluate and monitor systems – based on your role	Evaluation & monitoring
C. Transparency in deployment & user interactions Tools and approaches to make transparency visible and meaningful to the users and stakeholders who interact with AI systems directly
9	Design transparency in AI outputs and user interactions	User-facing transparency & UX
10	Provide accountability and recourse mechanisms	Accountability & recourse

Chapter I

Introduction

AI transparency means enabling people to understand AI systems well enough to meaningfully use, trust, and adopt them.

Maya is a product lead at a company that builds AI assistants for enterprise customers across industries, including healthcare. One morning, she receives an urgent message from a healthcare client: the assistant produced a confident but incorrect response to a health-related question. There were no sources, no uncertainty signal, and no explanation of how the answer was generated — raising concerns about potential harm and liability. Maya realized the issue wasn't just accuracy, it was about transparency: she couldn't explain the behavior, assess risk, or reassure the client. The question remaining was simple and critical: Why did this happen — and how do we make sure it doesn't happen again?

This dynamic plays out across industries. When the Apple Card launched in 2019, customers noticed dramatically different credit limits between spouses who shared finances. Without transparency into how this occurred, many concluded the system was discriminatory against women. Regulators investigated, trust eroded, and Apple and Goldman Sachs were left unable to justify the outcomes.^[1]^[2]

AI transparency means enabling people to understand AI systems well enough to meaningfully use, trust, and adopt them. For business leaders, transparency is increasingly essential to responsible deployment: supporting better decision-making, accountability, and stakeholder confidence.

Rather than a single or fixed concept, transparency is better understood as an end goal: ensuring that different people have the information and understanding they need to engage with AI systems in meaningful and responsible ways. Because stakeholders vary widely, their transparency needs also differ. What counts as "transparent" for an engineer looks very different from what is useful for a customer or regulator.

Transparency is a critical AI governance issue and a core principle in responsible AI. In a review of AI ethics frameworks, transparency was the most widely cited principle across guidelines from companies, governments, and civil society – alongside fairness, safety, security, privacy and accountability.^[3] Yet it remains limited in practice, including in generative AI (genAI). The stakes are rising as AI systems move from chat interfaces to agents that take autonomous actions and integrate into organizational workflows.

This playbook focuses on what business leaders and managers can do to bridge the gap between transparency as a principle and its implementation in practice. It's not a technical guide to interpretability research or a debate on open vs. closed models. It's a practical resource with leadership and organizational strategies to implement actionable transparency and build the kind of informed trust that enables responsible adoption.

This is not an individual business issue, it's a collective opportunity. The AI transparency landscape is still fragmented and rapidly evolving, which means the choices organizations make now will help shape emerging norms and can inform standards within and across industries. The costs of opacity are visible: loss of trust, regulatory scrutiny, and real harm to people. The opportunity is also clear. When AI systems are transparent, trust can grow among users, organizations, and the broader ecosystem. The choices that business leaders make today about how to build, deploy, and govern AI systems will help define what trustworthy AI looks like for their organizations and society more broadly.

Chapter II

Background

What is AI transparency, why does it matter, and what is the business case for getting it right?

A What is AI transparency?

What is AI transparency and why does it matter?

AI transparency refers to the extent to which an AI system – and the decisions or outputs it produces – can be understood by the people who build, deploy, and use it. It involves opening the curtain on how an AI system works in practice: what data it draws from, how it generates results, what limitations it has, the decision-making processes that went into its development, and how it is governed.

For organizations, transparency is not simply a technical concern, it is foundational to responsible adoption, supporting trust, accountability, effective oversight and informed use.^[4] Without it, stakeholders can't evaluate AI outputs, catch errors, or know when a system is being used appropriately, leaving organizations exposed to risk they can't see.

GenAI raises the stakes

Transparency has long been an important topic in machine learning systems, which learn from massive amounts of data to find patterns and make predictions. Foundation models powering genAI systems continue to scale rapidly, trained on ever-larger datasets and powered by increasing amounts of compute. Yet their behavior remains difficult to fully predict or explain.

Unlike earlier systems that primarily classified or ranked information, genAI systems generate new outputs, including text, code, images, audio and more. They can produce responses that sound confident but are inaccurate ("hallucinations"), and it is often unclear why a particular output was produced or how a system arrived at it. This creates significant challenges for oversight, accountability, and responsible use.

As these systems are increasingly deployed as agents capable of taking autonomous actions within organizational workflows, this opacity becomes even more consequential: errors don't just mislead, they act.

Model vs. system vs. agent (and why the distinction matters for transparency)

In AI, a model is the algorithmic component trained on data to transform inputs into outputs (for example, a large language model).

A system is the full, real-world application built on top of one or more models. It integrates prompts, interfaces, data pipelines, safeguards, and governance processes. The same underlying model can power many different systems: Claude.ai and Claude Code are distinct systems built on the same underlying model, as is a company's internal HR chatbot.

An agent is a type of AI system designed to take actions on a user's behalf, such as browsing the web, sending emails, or executing multi-step tasks. Agents increase autonomy and therefore raise the stakes for transparency, oversight, and accountability.

While organizations may have limited visibility into or control over the internals of a foundation model, they have substantial responsibility — and opportunity — at the system level.

Transparency is multi-dimensional

Transparency is not a single feature or checkbox.^[5]^[6] It encompasses several dimensions that work together to support meaningful understanding and oversight.

These include: visibility, explainability, interpretability, openness, and accessibility (see Box 1). Achieving transparency requires addressing these dimensions holistically. Visibility ensures that stakeholders are informed about the presence and role of AI in the first place. Explainability and interpretability focus on how decisions and processes are understood. Openness focuses on disclosing information about the design, development, and operation of AI systems. Accessibility ensures that information is understandable and usable by all relevant stakeholders.

Five dimensions of transparency — holistically connected

Visibility

Whether stakeholders are aware of when and how AI is present and influencing outcomes. Includes disclosure practices (e.g., labeling AI-generated content) to reduce the risk of AI shaping critical outcomes without people's knowledge.

Explainability

Providing clear and comprehensible justifications for an AI system's decisions or actions. Ensures end-users and stakeholders can understand why a particular result was produced.

Interpretability

The extent to which the internal workings of an AI system (e.g., model parameters, relationships between variables) can be understood by humans. Looks inward, often requiring technical expertise.

Openness

Sharing information about the design, development, and operation of AI systems — including data sources, algorithms, decision-making criteria, and ongoing practices like data retention and model updates.

Accessibility

Ensuring that information about an AI system is understandable and usable by all relevant stakeholders, regardless of their technical expertise, resources, or ability.

Relying on any one dimension or mechanism alone, such as a model card or a disclosure label, is insufficient. Importantly, transparency is not only about technical understanding: it also includes how AI systems shape organizational decisions, worker experiences, and broader societal impacts.

Transparency for whom?

Transparency means different things to different stakeholders. What counts as "transparent" depends on who is interacting with the system and what they are doing with it.

User transparency helps end users understand and trust both AI outputs (e.g., through citations and confidence scores) and the broader AI system (e.g., its data use and retention, and any external influences), enabling them to make informed decisions. Meanwhile, developer transparency supports system builders to understand, debug, and enhance AI models by providing insights into their internal workings. Organizations adopting third-party models also need this kind of transparency from model providers in order to make informed decisions and support accountability. Regulatory transparency and societal transparency support external accountability, compliance, and public legitimacy.^[7]

In a 2026 survey, the UCB Responsible AI Initiative asked 400 end users, developers, and policymakers globally what transparency means to them, which features matter most, and where the biggest gaps exist. Trusting accuracy and protecting privacy emerged as the top two priorities overall, with 73% of respondents selecting accuracy as their primary transparency need, followed by privacy and data protection at 49%. Notably, these two priorities were consistent across respondents in high, middle, and low-income countries.

Across thirteen transparency features ranging from user-facing to governance-related, end users prioritize citations and sources, privacy information, and confidence indicators, reflecting their need to trust that outputs are accurate and that systems are trustworthy. Developers and policymakers place greater weight on training data information, independent audits, and societal impact, reflecting their more governance-oriented roles. Yet across groups, features considered most important are consistently the least available, particularly for end users.

Gap scores reflect the difference between how important respondents rated each transparency feature (0–100) and how available they said it was (0–100). End users consistently experience the largest gaps across both user-facing and governance features.

The implication for business leaders is clear: stakeholders want more transparency than they are currently getting. Also, needs differ across groups. Transparency is best understood as an end goal: enabling different groups to engage with AI systems meaningfully and responsibly, rather than a one-size-fits-all definition or technique.

The current state of genAI transparency

Despite widespread agreement that transparency is important, genAI transparency remains limited.

On openness, Stanford University's Foundation Model Transparency Index has scored foundation model developers since 2023 on transparency on 100 transparency indicators. Its most recent 2025 edition found that companies scored worse than the prior year, with a 17-point average decrease from 2024.^[8] Training data continues to be particularly opaque with little disclosure on copyright and licensing, and many companies still don't share basic information such as model size and architecture. That said, not all companies declined: Writer and IBM significantly increased their scores.

On interpretability, core technical approaches that might support transparency, such as interpretability and mechanistic understanding of large models, remain an active area of research. While novel methods are evolving, including efforts to map internal concept representations and trace computational graphs inside models, reliable methods to explain why large language models produce particular outputs is lacking.^[9]^[10]

Yet, gaps in explainability and accessibility of information may be the most underresourced. Model cards are among the most common tools organizations use to demonstrate transparency, yet only 32% of end-user respondents in the UCB Responsible AI Initiative survey reported ever accessing them. This gap is telling: transparency efforts focused on technical documentation may satisfy developer or regulatory needs, but aren't reaching or serving end users. Without a foundational understanding of how AI systems work and what their limitations are, users risk becoming overtrusting or overreliant on these systems. This creates real liability and harm risks for individuals and the organizations deploying these systems.

The gap between the importance of transparency in principle and its limited realization in practice – compounded by fragmented and still-emerging standards for visibility and disclosure, documentation, and governance – motivates this playbook.

B The Business Case

What is the business case for genAI transparency?

Transparency in how AI systems are used in business is critical for building trust and loyalty with customers and employees, mitigating legal and financial liabilities, and gaining a competitive advantage through a stronger brand reputation.

The business case for genAI transparency can be broken up into four categories: trust and adoption, competitive differentiation, regulatory compliance, and stakeholder commitment. How these apply can vary depending on whether a company operates in a business-to-consumer (B2C) or business-to-business (B2B) context, and we note key distinctions below.

Trust and adoption. Transparency allows users to understand how AI works, how it uses their data, and the limitations of its capabilities. This builds confidence in these tools and leads to greater satisfaction and user retention. A 2024 study by Zendesk found that 75% of organizations believe a lack of transparency can lead to customer churn.^[11]^[12] In B2C contexts, this is especially pronounced in sensitive domains such as healthcare, finance, and legal services, where users are weighing AI-generated outputs against personal stakes. When customers feel equipped to apply their own judgment to a tool or its outcomes, they gain a greater sense of agency in decision-making making adoption more likely and deepening long-term engagement. In B2B contexts, trust operates at the organizational level, where procurement teams scrutinize vendors for clear documentation of model behavior, data handling, and known limitations as part of formal risk assessments. Transparency can accelerate sales cycles, ease internal approval processes, and support the downstream rollout of AI tools to a customer's own employees and end users.

Competitive differentiation. Transparency about AI practices allows businesses to differentiate themselves from competitors through a stronger, more trustworthy brand and an enhanced ability to attract and retain top talent. While a 2024 PwC survey found that only 33% of executives say their companies disclose their AI governance framework, 69% of employees and 66% of consumers say such disclosure is important.^[13] This gap represents an opportunity for businesses willing to be more transparent than their peers. In crowded markets, customers increasingly choose providers based on ethical AI practices. For enterprise customers especially, the ability to understand and verify AI practices can be the deciding factor in vendor selection, making transparency a genuine competitive advantage that translates directly to market share and revenue growth.

Regulatory compliance. Transparency positions businesses ahead of evolving AI regulations and reduces legal and financial liabilities. Laws like the EU AI Act (see Box 2) now require companies to disclose information about training data, model capabilities, and limitations, while regulations in healthcare, finance, and other sectors demand that AI-driven decisions be explainable. Beyond compliance, transparency makes it possible to explain AI-driven decisions in sensitive contexts like hiring, credit scoring, or medical diagnostics, which is increasingly necessary to defend against discrimination claims or regulatory scrutiny. Transparency practices also support emerging consumer rights frameworks, such as GDPR provisions on automated decision-making systems and US state-level laws for AI decisions to be explainable or contestable.

Stakeholder commitment. Investors and shareholders increasingly view transparent AI practices as an indicator of long-term sustainability and responsible governance. As investors and ESG frameworks begin to incorporate AI governance considerations, companies face growing expectations to demonstrate accountability for AI outcomes and document their governance practices. Companies that can clearly articulate their AI governance frameworks, demonstrate accountability for AI outcomes, and show commitment to addressing issues like bias are better positioned to attract investment and maintain shareholder confidence.^[14] By being upfront about AI capabilities, limitations, and ethical safeguards, companies signal mature leadership and long-term thinking that appeals to stakeholders looking beyond short-term gains.^[15]

As AI becomes more deeply embedded in business operations, transparency is shifting from a nice-to-have ethical commitment to a fundamental business imperative.

III

Chapter III

Transparency in Practice

What does transparency actually look like across the AI lifecycle — from product experience to governance and oversight?

Transparency in AI can take various forms, but at its core, it's about ensuring that stakeholders have the information they need to engage with AI systems responsibly and effectively. In practice, it includes both what an AI system communicates directly to its users and what information is available about how the system is developed and governed.

Transparency in AI outputs and interaction

From a user's perspective, transparency often shows up through the system experience itself. These features shape day-to-day trust and usability. This includes:

When content is AI generated
Why the system produced a particular output
How confident or uncertain the system is
What the tool should and should not be used for
What information or sources informed the response

Transparency in development and governance

Transparency extends beyond the interface. Many of the most important transparency signals depend on what organizations disclose about the system or tool behind the scenes, including:

How the system was built and what data it relies on
What personal data the system collects, how it is stored or used, and what control users have over it
What limitations and known failure modes exist, and what societal or ethical impacts have been assessed (or not)
Whether the system is regularly evaluated for safety, fairness, and reliability – and results of those evaluations
What accountability mechanisms exist when harms or failures occur
What oversight, auditing, or governance structures are in place, including whether reviews are conducted independently and whether commercial relationships influence system behavior or outputs
How changes, incidents, and known issues are disclosed and addressed over time

Transparency in AI outputs and interactions shapes day-to-day usability, but development and governance transparency matters to all stakeholder groups, from developers and regulators to end users. The UCB Responsible AI Initiative survey found that information on personal data use and retention was one of the biggest transparency gaps for end users and knowing whether a system is trustworthy at all was a primary transparency need for end users. That kind of trust cannot be established through individual interactions alone, it requires clear communication about development and governance.

Together, these practices illustrate that transparency is not a single technical feature, but a combination of leadership and product design choices, documentation practices, governance structures, and accountability practices. These considerations become especially consequential for AI agents, where autonomous actions and embedded workflows raise the stakes for every item on this list.

For business leaders, the development and governance components of transparency are the most actionable starting points. Documentation, evaluation, governance processes, and accountability mechanisms determine what information is available in the first place.

With this in mind, we turn to the growing ecosystem of transparency tools and resources to support these efforts.

Chapter IV

Transparency Tools & Resources

A growing ecosystem of documentation tools, benchmarks, and evaluations to support responsible AI deployment.

There are a growing number of tools and resources to help leaders increase transparency, improve oversight, and support more responsible deployment. These tools can serve different purposes: some are designed to help organizations document and communicate how AI systems work, while others are intended to test, measure, and evaluate system behavior in practice.

It is important to note that most organizations adopting genAI are not building foundation models from scratch. Instead, they are integrating commercially available models into products, services, or internal workflows. In these cases, business leaders may have limited ability to influence upstream transparency practices such as training data disclosure or provider-issued model documentation. However, organizations have influence. They can factor transparency into model selection – choosing providers with greater openness about training data, governance, and system behavior – and voice expectations directly to their providers. Customer demand is an important lever for improving upstream transparency practices.

More broadly, organizations can implement meaningful transparency at the system level. Even when an underlying model is relatively opaque, developers and managers retain meaningful control over consequential transparency decisions: how AI is introduced to and communicates with users, what documentation accompanies the system, and what evaluations and benchmarks are conducted and whether results are shared.

We organize transparency tools into two broad categories:

A) Documentation tools – which help organizations record, communicate, and govern key information; and
B) Benchmarks and evaluations – which assess system performance, risks, and reliability through structured testing.

A Documentation Tools

Capturing and communicating how AI systems work

Documentation tools support transparency by capturing key information about how an AI system was built, what it's intended to do, and what limitations or risks it may introduce. They are important for oversight and accountability across the AI lifecycle. We organize documentation tools into four levels – dataset, model, system, and practice – based on where the documentation is targeted and who is responsible for creating it.^[16] While dataset and model documentation are primarily the responsibility of model providers, system and practice-level documentation are where deploying organizations have the most direct control and the most immediate opportunity to act.

Dataset level documentation

Dataset level tools focus on transparency in the data used to train or finetune AI systems. They help stakeholders understand: where data came from; when it was collected or created; what populations or content are represented; and what biases, gaps, or risks may be present.

Typically created by: Dataset creators, data scientists, ML engineers

Examples: Datasheets for datasets, Dataset nutrition labels

Model-level documentation

Model-level tools provide standardized information about the underlying AI model, including: intended use cases; performance characteristics; training data sources and evaluation results; and ethical considerations and known failure modes.

Typically created by: Model developers, ML engineers, foundation model providers

Examples: Model Cards, Nutrition labels for models

System-level documentation

System-level tools describe an AI system in its full real-world context. They typically document: the system's purpose and intended users; how the model (or multiple models and components) is integrated into the broader product or workflow; and known limitations, safety measures, and governance practices. Unlike model-level documentation, system-level documentation captures the full deployment context: including prompts, interfaces, safeguards, and governance processes. Even when the underlying model is opaque, organizations can document how their system is intended to operate, what risks it introduces, and what accountability mechanisms are in place.

Typically created by: Product leaders, system owners, deployment teams, risk and compliance

Examples: System Cards, AI FactSheets (also covers model-level documentation), Reward Reports, Application Cards

Note: In practice, the boundary between model cards and system cards is not always clean. Foundation model providers who also deploy consumer-facing products (e.g., Anthropic (Claude), Meta (Llama)) often publish documents that function as both, since they are simultaneously the model developer and the deployer. For most organizations reading this playbook, however, the distinction is clear: they are deployers using a third-party model, and their responsibility is to create system-level documentation for their specific deployment.

Practice-level documentation

Transparency also depends on organizational processes, standards, and governance practices (not only technical artifacts). These practice-level tools may include: disclosure and reporting practices; internal governance frameworks; protocols and standards for responsible development; and mechanisms for accountability and recourse.

Typically created by: Business leaders, legal and compliance teams, policy and governance teams

Examples: content provenance standards (e.g., C2PA), transparency reporting frameworks, watermarking, organizational transparency policies

Disclosure as a cross-cutting practice

An especially important lever here is disclosure: organizational communication that helps stakeholders understand when and how AI is being used, what its limitations are, and what governance processes are in place. Disclosure may include:

Watermarking or AI tags;
Providing citations or sources with outputs;
Releasing system documentation or higher-level system prompts;
Reporting known flaws, incidents, or fixes in model updates; and
Participating in voluntary transparency benchmarks.

Disclosure goes beyond technical performance to organizational practices. This includes, as researchers have noted, labor practices, such as the recruitment of data annotation workers in the Global South, which remains one of the least transparent areas across the industry.^[17]

Examples: International initiatives such as the OECD Hiroshima AI Process Reporting Framework^[18] that allows for comparison of disclosure practices across organizations. The EU recently developed a Code of Practice on marking and labelling of AI-generated content that sets standards for providers and deployers of AI-generated content.^[19]

B Benchmarks & Evaluations

Assessing system performance and reliability

Benchmarks and evaluations are two related but distinct tools for assessing AI system performance. Benchmarks are standardized tests designed to measure capabilities such as accuracy, speed, reasoning ability, and reliability that enable comparison across models and over time. Evaluations are context-specific assessments of how a system behaves in a particular product, workflow, or user environment. Both matter, but they answer different questions: benchmarks ask "how does this model perform generally?" while evaluations ask "how does this system perform for us?"

Benchmarks are most often designed and run by foundation model developers, researchers, and infrastructure teams. Today there are three broad categories of benchmarks:

General performance benchmarks, which measure technical efficiency (e.g., speed, latency, resource use)
Model capability benchmarks, which test reasoning, instruction-following and task performance (e.g., coding or math)
Standardized test benchmarks, which compare model performance to human performance on exams or expert tasks

While relatively smaller and less popularized to date, there is another category of benchmarks emerging: Human impact benchmarks, which analyze the impact of models on physical, societal, and psychological dimensions (see for example: MIT's Human AI Impact Benchmark).

While deployers don't usually create benchmarks themselves, they rely on them to compare models when selecting vendors, understand high-level capabilities and limitations, support due diligence and procurement, and demonstrate oversight to regulators, customers, or internal stakeholders.

Benchmarks are useful but incomplete. They can be narrow, easy to game, or unrepresentative of real-world use. For example, they often assume an average English speaking user or ideal conditions. High benchmark scores do not guarantee safety, fairness, or reliability in practice. Hence, benchmarks are one input among many. Benchmarks also reflect value judgments about what matters. What gets measured, and what is left out, signals which outcomes are prioritized. When benchmarks emphasize aggregate performance or average users while excluding bias, fairness, or differential impacts, high scores raise an important question: for whom, and in what context, is this the "best" performance? For business leaders, this means benchmark scores alone are an insufficient basis for deployment decisions, especially in high-stakes or diverse user contexts.

Evaluations can complement benchmarks by testing performance in context, but can also be limited. They vary widely in rigor and scope, and can reflect the same blind spots as the teams that design them. Together, benchmarks and evaluations are important inputs, but aren't substitutes for ongoing human oversight and governance.

System and agent benchmarks offer one way to make evaluation more concrete by measuring how AI performs on structured, real-world tasks rather than isolated capabilities in lab settings. For example, HumanEval measures code generation accuracy and is widely used to evaluate coding assistants like GitHub Copilot and Amazon CodeWhisperer. OSWorld measures task completion in desktop environments and has been used to evaluate agentic systems like Claude's computer use capability. As AI agents take on more autonomous, multi-step workflows, benchmarks like these are an important transparency tool, though the space is rapidly evolving. While there is some convergence towards certain benchmarks (e.g. HumanEval), the rapid proliferation of benchmarks can make it challenging to draw accurate comparisons across systems and agents and results should be interpreted carefully.

Chapter V

Challenges to Transparency

Despite the growing recognition of transparency's importance and a proliferation of tools, achieving meaningful transparency in practice remains difficult.

Despite the increasing recognition that transparency is important and a proliferation of tools, achieving meaningful transparency in practice remains difficult. Several interconnected challenges stand in the way.

1. The "Black box" problem

AI models, particularly deep learning models including genAI, are inherently complex and operate as "black boxes" making it difficult to understand their decision making processes.^[20]^[21] This opacity poses significant risks in high-stakes domains like healthcare and finance, where issues of accountability, bias, and trust are paramount.

Explainability tools can offer approximations or high-level insights into why a decision was made. This area of research, often called explainable AI (XAI), provides techniques to clarify AI decisions in ways that humans can understand and mitigate effects of the black box problem.^[22] Researchers have found that highly complex models often provide superior predictive accuracy but are inherently difficult to interpret, creating a tradeoff between performance and interpretability.^[23]^[24]

2. Corporate opacity and balancing tensions

Companies developing foundation models are often not transparent about their models, withholding details about training data, model architectures, and decision-making processes.^[25] Opacity exists at the system level as well: organizations deploying AI may not disclose how they have configured, fine-tuned, or adapted models. At the product level, organizations may omit uncertainty signals or confidence indicators from user-facing products to protect product perception. This layered opacity exacerbates the black box problem, hindering comprehensive evaluation by users, researchers, and regulators.

This opacity stems from multiple sources. Some are legitimate protective concerns, such as: guarding against misuse by bad actors, protecting user privacy, or managing liability risks associated with training data. Others are more commercial in nature, such as: protecting proprietary model architectures and training methods from competitors, protecting the company from liability in the case of scraping and using training data that is copyright, or protecting product perception.

It's worth noting that some of the recommendations in this playbook are harder to implement because of foundation model opacity. Organizations deploying third-party models may find it difficult to document training data, assess bias, or disclose known limitations when that information is not made available by their provider, underscoring the importance of factoring transparency into vendor selection (discussed in Play 2).

3. Unintended trust consequences and information overload

If explanations or transparency is unclear or if biases and inaccuracies in the AI system are revealed, it can result in loss of trust. Further, high levels of transparency can result in a sense of information overload. There is a balance to strike with different stakeholders.^[26] Striking a balance between faithful (i.e. accurate reflections of the inner workings of the model) and explanations that are understandable to non-technical end users is difficult.^[27]

Researchers highlight that effective transparency is about explanations that are both faithful (i.e. accurate about the AI system) and understandable to the intended audience.^[28] This reinforces the point that transparency looks different for different stakeholders.

4. Pace of AI development and model change

AI models and systems are evolving quickly. New models are released frequently, capabilities change, and organizations face constant pressure to adopt the latest tools. This creates a transparency challenge as documentation, disclosures, and governance processes built for one model or system may become outdated quickly. Meanwhile, the volume of new deployments and transparency needs outpaces the teams responsible for producing and maintaining them. Smaller organizations or those without responsible AI functions may face particular challenges keeping pace. The result is transparency processes that go stale or become superficial checkbox exercises rather than meaningful accountability mechanisms.

5. Lack of standards and evolving regulation

At a higher level, there is a lack of standardization in regards to what transparency should look like in practice. Organizations face a fragmented landscape of documentation frameworks and tools with limited guidance on when each is appropriate, for whom, or what level of detail is sufficient. This fragmentation makes it difficult for stakeholders to interpret transparency efforts in a consistent, comparable, or meaningful way.

Legal and ethical expectations around transparency continue to evolve (see Box 2). Despite increasing regulatory calls for transparency, current legal frameworks often lack specificity about how transparency should be implemented, particularly for organizations deploying systems built on external models. While laws are converging on requirements for documentation, risk assessment, and governance, a clear need for standardized practices and reporting tools remain, especially at the system level.

As a result, regulatory compliance often establishes a minimum floor for transparency. This leaves substantial responsibility, and opportunity, for organizations to go further.

Box 2. Regulation related to AI transparency: The current state & trends

Regulation addresses AI transparency unevenly across the lifecycle. Most enforceable transparency requirements today focus on foundation model developers / general-purpose AI and providers of high-risk systems, emphasizing documentation, risk assessment, and governance. User-facing transparency requirements are more limited and typically focus on notice (e.g., labeling AI-generated content or disclosing when users are interacting with AI) rather than detailed explanations of how systems work.

Still, AI transparency has emerged as a central regulatory theme globally. Lawmakers increasingly require AI systems, especially those posing higher risks, to disclose their training data, logic, and outputs in ways that allow users and regulators to understand and evaluate them. While mechanisms vary, most frameworks converge on documentation, traceability, and labeling.

Where transparency requirements apply today

1 Developer & system-level transparency — strongest, more standardized

These requirements apply primarily to foundation model developers and providers of general-purpose or high-risk AI systems. They emphasize model-level documentation, risk assessment, and governance.

Examples

European Union AI Act (2024): Uses a risk-based framework. High-risk systems must document training data, testing, decision logic, and risk mitigation measures.^[30] The Act also introduces dedicated transparency and governance obligations for general-purpose AI (GPAI) models.^[31] Not yet fully enforceable, phased in through 2027.
California Transparency in Frontier AI Act (2025): Regulates developers of large, high-capacity ("frontier") AI models. Requires companies with revenues over $500 million to publicly release a frontier AI framework outlining their safety, risk, and governance practices, and to report major safety incidents to state authorities.^[32]
California AB 2013: Requires documentation of training data provenance for generative AI systems.^[33]
Singapore Model AI Governance Framework for Generative AI (2024): Takes a voluntary, governance-first approach, requiring standardized disclosure of training data, evaluation results, and mitigation measures, calibrated to the risk level of the model.^[34]

Even where legislation is less prescriptive, regulators are issuing guidance and templates that steer documentation toward more standardized formats, such as the EU AI Act's implementing guidance and GPAI Code of Practice. Academic and industry research is also proposing documentation templates and structured formats (e.g., "AI Cards" and TechOps documentation templates^[35]) to align with regulatory regimes.

A key gap remains: While these requirements strengthen transparency at the model level, they rarely mandate standardized system-level documentation for organizations deploying AI.

2 User-facing transparency — more limited, notice-focused

These focus on informing users that AI is involved, rather than fully explaining how systems work or their limitations.

Examples

EU AI Act (2024): Requires labeling of AI-generated content and disclosure when users are interacting with AI systems.^[36]
China Measures for Labeling of AI-Generated Synthetic Content (2025): Requires clear labeling of AI-generated content.^[37]
California SB 942/AB 853 (California AI Transparency Act, 2025): Requires large GenAI providers (1M+ users) to include latent watermarks in image, video, and audio content, allow users to add "AI generated" labels, and provide free AI detection tools.^[38]
India AI Governance Guidelines (2025): Take a light-touch, innovation-first approach emphasizing transparency reporting, mandatory labeling of AI-generated content, and consent and transparency obligations for AI systems processing personal data. Not yet fully enforceable, phased in by 2027.^[39]
Colorado SB 189 (2026): This bill repeals and replaces Colorado's SB 205. This new bill requires consumer notice when AI is involved in consequential decisions and the right to request human review. Earlier requirements in SB 205 included audits, impact assessments, and more public documentation transparency.^[40]

3 Explanations & accountability — strongest in traditional decision-making AI

Requirements for explanations, mostly in systems such as credit or employment, often grounded in data protection and civil rights law.

Examples

United Kingdom AI Regulation Framework (2024): Uses a principles-based approach that encourages explanations of how AI decisions affect people, supported by transparency requirements within data protection impact assessments.^[41]
OECD AI Principles: Call for transparency and explainability so affected individuals can understand outcomes, shaping regulatory approaches across more than 70 countries.^[42]

For genAI, expectations around individualized explanations are less prescriptive and emerging.

Overall regulatory trend

AI transparency regulation remains fragmented and inconsistently applied across jurisdictions. Where requirements exist, they converge most clearly around documentation and disclosure. Some stronger requirements around auditability exist in certain jurisdictions (e.g., public documentation, risk and impact assessments, incident reporting), but these are limited in scope and reach. User-facing transparency requirements are similarly narrow, typically emphasizing AI disclosure and labeling rather than detailed explanations of how outputs are produced.

The gap between transparency as a stated regulatory priority and its realization in practice motivates the need for coordinated policy action and standards at four levels: (1) baseline disclosure requirements; (2) model and system-level documentation, with particular attention to the gap in system-level standards for deploying organizations; (3) user-facing standards that go beyond labeling to require meaningful explanations and uncertainty communication proportionate to risk; and (4) accessible recourse mechanisms for consequential AI-driven decisions.

A note on terminology: Two sources of ambiguity are worth flagging. First, "model" and "system" are used inconsistently across legal frameworks and can be conflated. This creates accountability challenges and lack of clarity. Second, "transparency" is used to refer to different things across regulatory contexts, sometimes disclosure, sometimes documentation, sometimes explainability. This matters: disclosure is the lowest bar and the most commonly mandated form, but it does not constitute meaningful transparency on its own. A regulatory landscape that requires disclosure may appear to have addressed transparency while leaving documentation, explanation, and recourse largely unaddressed.

Chapter VI

The Plays

10 practical plays to help organizations operationalize transparency in genAI systems.

Make transparency a leadership priority

Who is involved: C-suite leaders, product and engineering leadership, legal and compliance teams, responsible AI leads

Transparency should be an explicit leadership commitment, not an afterthought or a technical detail delegated to teams.^[43] Establishing transparency as a core AI principle signals how the organization expects AI systems to be designed, deployed, and governed — and frames it as a strategic asset rather than a compliance burden. When leaders treat transparency as a differentiator, it shapes product decisions, governance processes, and how teams respond when systems fail or raise concerns.^[44]

Business Benefits

✓ Builds trust and adoption with users, customers, and enterprise clients by creating a reputation for prioritizing transparency
✓ Reduces regulatory risk by setting clear expectations upfront
✓ Strengthens decision-making and differentiation by making AI deployments more accountable and competitive in opaque markets

How

1Establish transparency as a named core AI principle. Align it with organizational values and make clear it is a leadership commitment.
2Communicate this commitment visibly. Share your AI principles, including transparency, internally with teams and externally with customers and partners.
3Use the principle as a guide. Apply it as an explicit input when evaluating product design, vendor selection, governance choices, and incident response. This way it is not just as a stated value but as an active decision-making lens.

Case Study — Salesforce

Salesforce offers an example of transparency established as a principle and leadership commitment. Developed through a year-long process involving employees across engineering, product, legal, and executive leadership, Salesforce's Trusted AI Principles include transparency as one of five named commitments, defined as ensuring customers understand the "why" behind each AI-driven recommendation or prediction so they can make informed decisions, identify unintended outcomes, and mitigate harm. In practice, this means publishing model cards, providing model explainability with predictions, and enabling customers to maintain control of their data at all times. For AI agents specifically, the principle extends to ensuring users understand when and how AI is involved in interactions, through clear disclosures, customizable prompts, and notifications. Rather than treating transparency as a compliance requirement, Salesforce framed it as a core differentiator shaping product design, governance, and customer relationships, a commitment that continues to be overseen by an Ethical Use Advisory Council.

Tools & Resources

World Economic Forum AI C-Suite ToolkitGovernance guidance for boards and executive teams on integrating responsible AI into corporate strategy and management.
Salesforce Trusted AI PrinciplesDescribes the leadership's AI commitments and how they guide internal practices, customer interactions, and external relations.

Set a transparency strategy across your organization and AI portfolio

Who is involved: C-suite leaders, product and engineering leadership, legal and compliance teams, responsible AI leads

Transparency requires a deliberate, organization-wide strategy that defines what transparency means, who it is for, and how it will be delivered across the AI lifecycle. A transparency strategy establishes baseline requirements for every AI system: documentation, governance, and accountability. It then tailors transparency to different stakeholders: end users, operators, developers, managers, and regulators all need different kinds of clarity to use, oversee, or evaluate AI systems responsibly.

Business Benefits

✓ Enables trustworthy, scalable adoption by setting clear transparency expectations across AI systems and teams
✓ Reduces operational, legal, and reputational risk by clarifying requirements and accountability upfront
✓ Improves organizational alignment and differentiation by making transparency a strategic advantage

Challenges to be aware of

AI models and systems evolve rapidly, which can make transparency strategies go stale quickly (See challenges section: pace of AI development).

How

1Inventory your AI portfolio. List the genAI systems you use or are planning (internal tools, pilots, customer-facing products), including owners and primary user groups.
2Define the baseline transparency bar. Establish minimum requirements that every system must meet (e.g., AI disclosure, data/privacy info, system cards, evaluation plan, and escalation path).
3Segment stakeholders and their needs. Identify the key audiences for each system (end users, operators, managers, developers, regulators) and what actionable, useful transparency means for each.
4Tier systems by risk and impact. Classify systems (e.g., low-risk internal vs. customer-facing vs. high-stakes advisory/decision-influencing).
5Set tier-specific transparency requirements. Specify what gets added as risk increases (e.g., deeper documentation, user-facing explanations/citations/uncertainty signals).
6Factor transparency and control into future model selection and procurement. As you expand your AI portfolio, assess what transparency providers offer (e.g., model cards, training data disclosure, evaluation results, and governance practices) and weigh this in vendor selection.
- Consider how much control you want over model behavior, adaptation, and data handling: open and open-source models offer greater visibility, customization, and control over where data is stored and processed. For specific, well-defined tasks, smaller or purpose-built open models can match or outperform larger proprietary ones.
- Managed API models can offer strong general performance and convenience but less control and involve sending data to third-party servers. The right balance depends on your use case, task complexity, risk tolerance, and data governance requirements.
7Keep the transparency strategy live. Reassess the strategy regularly and after major incidents, product changes, or regulatory shifts. For organizations adopting AI at high volume, consider automating the tier classification of new systems (step 4) so transparency requirements are applied consistently without manual review of every new tool.

Case Study — Microsoft

Microsoft sets baseline transparency expectations through its Responsible AI Standard, which provides actionable requirements applying to all AI systems. It tailors transparency to different stakeholders through Transparency Notes — designed to help customers, developers, operators, and users understand how AI systems work and the deployment choices that shape behavior.^[45] Its Responsible AI Transparency Report describes how transparency is embedded across the AI lifecycle through structured governance and pre-deployment oversight.

Tools & Resources

NIST AI Risk Management FrameworkProvides a structured approach to identifying, assessing, and managing AI risks. This is useful for defining baseline transparency requirements and tiering systems by risk level (steps 2 and 4).
ISO/IEC 42001 (AI Management System Standard)An international standard for AI governance that includes transparency and accountability requirements. This is useful for embedding transparency into organizational processes and governance structures (steps 6 and 7).

Establish policies and structures to operationalize transparency

Who is involved: C-suite leaders, product and engineering leadership, legal and compliance teams, responsible AI leads

A transparency strategy defines the plan, but without clear policies, defined roles, and accountability structures, transparency remains aspirational.^[46] This play is about building the organizational infrastructure that makes transparency enforceable and consistent. It covers who owns transparency decisions, what gets disclosed and when, how incidents are handled, and how compliance is monitored and reported.^[47] Without this infrastructure, transparency falls to individual judgment, leading to inconsistent communication, unclear accountability, and decisions made on the fly. Transparency policies and practices can leverage and be embedded in existing frameworks and processes. This can be particularly important for organizations evaluating and implementing AI systems at high volume to maintain consistency.

Business Benefits

✓ Minimizes legal risk by defining reviewable oversight structures
✓ Strengthens user trust through consistent, policy-driven communication of AI capabilities and limitations
✓ Improves operational consistency by embedding accountability in concrete roles and policies

Challenges to be aware of

Standards for disclosure, documentation, and governance are still fragmented and evolving, which can make organizational policy-setting difficult (see challenges section: lack of standards and evolving regulation).

How

1Define clear ownership and oversight roles. Assign named accountability for disclosure decisions, model risk, and transparency performance across the organization. This may include dedicated roles embedded in business units (such as Responsible AI Champions) who serve as points of contact for transparency practices in their teams. Ensure every AI system has a designated owner responsible for keeping documentation, disclosures, and evaluations current.
2Establish external disclosure standards. Define what gets communicated to users and under what conditions. Standardize this across products and teams so disclosure is consistent.
3Implement AI labeling and content disclosure policies. Define when and how AI-generated content is labeled across products and channels, aligned with regulatory requirements and user expectations (see Box 2).
4Create incident disclosure protocols. Establish clear processes for communicating failures, known issues, and fixes to relevant internal and external stakeholders, including timelines, responsible parties, and escalation pathways.
5Establish auditable reporting structures. Define how transparency commitments are monitored, documented, and reported over time — internally to leadership and externally to regulators or the public where required.
6Embed policies into existing processes. Integrate transparency policy requirements into product launch reviews, vendor selection, procurement, and periodic audits.

Case Study — IBM

IBM built organizational infrastructure for transparency. Recognizing that its principles (including transparency) alone are insufficient, IBM established the Responsible Technology Board. The board is a multidisciplinary group of leaders from across the company that provides centralized governance, review, and decision-making for ethics policies, practices, and products. The Board is supported by AI Ethics Focal Points (trained representatives embedded in business units who identify concerns, mitigate risks, and escalate issues when necessary) and a grassroots Advocacy Network that promotes ethics principles across teams. This structure illustrates how transparency policies move from strategy to practice with clear roles, pathways, and oversight.

Tools & Resources

Microsoft Responsible AI Impact Assessment TemplateA publicly available template for evaluating AI systems against responsible AI principles. It includes dedicated transparency goals covering system intelligibility, stakeholder communication, and disclosure of AI interaction. Useful for organizations building structured disclosure standards and impact assessment processes into their governance infrastructure (steps 1, 2, and 3).

Build incentives and a culture that rewards transparency

Who is involved: C-suite leaders, HR teams, product and engineering leadership, responsible AI leads, and employees across the organization

Transparency policies and strategies only work if the organizational culture supports them. Transparency must be a core commitment embedded in the company environment and structure, not a component added for compliance. This means building a culture where employees feel comfortable flagging issues, where responsible behavior is recognized and rewarded, and where people have the knowledge and skills to act on transparency expectations in their specific roles. Together, these practices lead to more sustainable long-term growth, improved trust, and AI systems that scale responsibly.

Business Benefits

✓ Decreases downstream legal, regulatory, and reputational risk by surfacing issues and limitations early
✓ Increases organizational resilience by ensuring performance incentives align with responsible decision-making
✓ Improves cross-team communication, speed, accountability, and employee empowerment
✓ Attracts and retains talent by signaling a strong commitment to responsible AI

How

1Embed transparency into performance incentives. Include proactive reporting and clear communication into performance and promotion evaluations to reward early risk flagging and transparency practices.
2Establish clear reporting pathways. Create formal, normalized systems for reporting transparency issues, such as model risk reviews, red-teaming logs, and anonymous flagging channels.
3Track, measure, and quantify transparency actions. Define quantifiable KPIs like team-wide review participation rates, number of issues resolved before launching, and minimum documentation requirements.
4Invest in role-specific training. Ensure that those involved in building, deploying, and overseeing AI systems have the knowledge to fulfill their transparency responsibilities. Product teams should be able to explain system behavior, engineering teams should be able to surface limitations, and legal and policy teams should be able to manage disclosure obligations. Develop role-specific references and materials that can be used in training and day-to-day work.
5Build ongoing learning. Integrate transparency and AI literacy into onboarding, regular team meetings, and leadership sessions.
6Promote psychological safety. Train managers to respond constructively to pushback, conduct objective postmortems without blame, and protect employees who highlight model risks and concerns.

Case Study — Google & Telefónica

Google illustrates two complementary aspects of building a transparency culture. Its Site Reliability Engineering (SRE) team is actively incentivizing transparency by standardizing a blameless postmortem culture^[48]. Instead of treating incidents as personal failures, they are opportunities for learning and growth. Integrating blameless reviews, formal documentation, and positive reinforcement into Google's operational culture allows the corporation to change failure reporting from a reputation damager to a rewarded action, making transparency a structural incentive. While this example comes from a large technology company, the underlying approach — separating issues from individuals, rewarding early disclosure, and normalizing constructive review — is adaptable for any organization deploying AI systems.

On the training side, Google invested in developing responsible AI education and training^[49] that reaches employees across diverse teams, technical backgrounds, and learning styles. Since 2019, over 32,000 employees have engaged with their AI training resources: interactive puzzles, quizzes, and games. They have built targeted programs for specific development and management teams, as well as broader courses such as its AI interpretability and transparency course for developers.

Telefónica embedded responsible AI into employee behavior through training and structured accountability. Since 2018 the company has built a layered approach: a company-wide AI ethics course, self-assessment questionnaires required of product managers for every new AI product or service, and a network of Responsible AI Champions embedded in each business unit.

Tools & Resources

Blameless Postmortem Framework (Google)Structured template and review process for conducting blameless postmortem assessments and reinforcing learning.
Responsible Generative AI Module (Microsoft)Introduces the benefits and risks of generative AI solutions through a 50 minute module.
NAVEXConfidential reporting system that allows employees to flag risks and violations without fear of negative repercussions.
Culture AmpEmployee feedback and performance management tool that can help organizations track perceived psychological safety and integrate transparency into promotion structures.

Document AI systems at the right level: models for builders, systems for deployers

Who is involved: Foundation model developers, product and engineering teams, responsible AI teams, compliance and risk leaders

Transparency requires documentation, but the type and level depends on your role. Foundation model developers should provide model-level documentation, such as model cards, detailing training conditions, intended use, evaluation results, and known limitations.^[50] Most organizations today are deployers, not model builders, who integrate foundation models into products, workflows, and services. In these cases, deployers must create system-level documentation that explains how the model is being used in context: its purpose, safeguards, output monitoring, and deployment risks. This distinction matters because organizations inherit the opacity and risk of the underlying foundation model. Even when model internals remain unclear, deployers remain responsible for documenting how their AI-enabled system behaves in practice.

A well-designed system card should be the central transparency artifact, capturing how the system works, its impacts, risks, and governance. System-level documentation should not be static or purely descriptive, but rather grounded in and reflective of ongoing evaluation, impact assessment, and accountability processes (see Play 8).

Business Benefits

✓ Builds trust and regulatory readiness by providing clear, credible documentation for customers, auditors, and internal stakeholders
✓ Improves accountability and incident response by clarifying intended use, system limits, and escalation paths when failures occur
✓ Reduces risk and enables safer scaling by surfacing limitations early and standardizing practices across deployments

Challenges to be aware of

Foundation model providers do not always disclose the information needed to fully complete system-level documentation, and referencing provider documentation can create a false sense of coverage where gaps remain (see challenges section: corporate opacity).

How

1Start with the model documentation available. Gather any provider-issued transparency materials: model cards, safety reports, data policies, evaluation summaries, or system descriptions.
2Conduct foundation model and vendor transparency due diligence. Document what is known and unknown about the foundation model you are adopting, such as:
- Intended and prohibited use cases
- Known failure modes and limitations
- Safety mitigations and monitoring practices
- Biases and performance discrepancies for different groups
- Data governance and privacy policies
- Available benchmarks, audits, or external assessments
3Create system-level documentation for your deployment. Develop a system card / factsheet that explains:
- The purpose of the AI system
- Who will be affected and served, and in what contexts
- How model(s) is/are integrated (prompts, tools, workflows)
- Key risks and limitations, including fairness/bias, safety, security, privacy, and reliability
- Broader societal considerations, such as impacts on labor, organizational change, or environmental footprint where relevant
- What safeguards, monitoring, and human oversight is in place
- What transparency features are provided to users
- What accountability, incident response, and recourse mechanisms exist when harms or failures occur
4Tailor documentation to stakeholder needs. Ensure documentation is usable across audiences: engineers monitoring and debugging failures, managers overseeing risk, users calibrating trust, and regulators evaluating compliance.
5Maintain documentation over time. Treat documentation as a living artifact. Update it as models change, systems are fine-tuned, workflows evolve, or new risks emerge.

Case Study — Anthropic & Microsoft

Anthropic's Transparency Hub provides condensed summaries for deployers and detailed reports for technical teams, accommodating stakeholders with varying backgrounds and interests. They publish model system cards that document capabilities, safeguards, benchmark assessments, and usage policies. Their developer documentation supports engineering and product teams in understanding system behavior, assessing tradeoffs, and monitoring outputs.

Microsoft's Transparency Notes (introduced in Play 2) offer another example: system-level documentation tailored to specific AI products and use cases, designed to help operators and users understand how systems behave and what their limitations are.

Tools & Resources

Model Cards for Model Reporting (Mitchell et al., 2019)The foundational paper introducing model cards as a documentation standard.
Model Cards Toolkit (Google Research)Documentation framework for reporting model capabilities, limitations, intended use cases, and ethical considerations in a consistent format.
AI FactSheets (IBM)A framework for documenting AI models and services through the lifecycle (covering intended use, performance, fairness considerations, and governance).
Transparency Hub (Anthropic)A publicly available resource demonstrating how system-level documentation can be structured and tailored to different audiences.^[51]

Maintain transparency around AI development, configuration, and governance

Who is involved: Product teams, data and ML teams, legal and IP teams, compliance leaders, responsible AI practitioners, procurement

Transparency around training data and model inputs helps stakeholders understand how AI systems are created and how they operate. This includes documenting the types of data used to train models, where these datasets are sourced from, and what safeguards exist around copyright, intellectual property, and labor standards.^[52] Documenting data provenance helps users and regulators assess risks related to output, quality, bias and copyright, and help organizations make more defensible claims about their AI systems.^[53]

Data fundamentally shapes outputs (whether through initial training, fine-tuning, or other forms of adaptation), so documentation must reflect this. Foundation model developers should document training data in model cards or datasheets. Organizations that fine-tune or adapt models should document those decisions in system cards, explaining what data was used, how it was sourced, and how it affects system behavior (see Play 5).

Transparency extends to the higher-level instructions and principles that shape how models behave, at both the model and system level. At the model level, developers may encode values and behavioral principles directly into training. Anthropic, for example, publicly shares its Constitutional AI approach, a technique that encodes a set of values and behavioral principles directly into the model. Where such approaches are used, documenting and disclosing them is an important transparency practice. At the system level, deploying organizations shape model behavior through system prompts and behavioral constraints applied at deployment. These decisions should also be documented and, where appropriate, disclosed.

Business Benefits

✓ Builds internal confidence in model behavior by identifying suitability for specific use cases
✓ Improves customer adoption by building trust and clarifying limitations around copyright
✓ Strengthens governance by formalizing ownership of training data and ensures quality assurance
✓ Enables responsible scaling by standardizing documentation across AI systems

Challenges to be aware of

Training data documentation is one of the least transparent areas across the industry, which can limit what deploying organizations can disclose. Fine-tuning practices are often treated as minor technical details rather than transparency obligations, but materially affect system behavior and should be documented accordingly (see challenges section: corporate opacity).

How

1Document data used to train and adapt models.
- Foundation model developers: document training data in model cards or datasheets, describing data types (e.g., licensed data, publicly available data, synthetic data), sources, and limitations.
- Organizations using third-party or foundation models: document what is known about the underlying model's training data (e.g., referencing provider-issued documentation). When fine-tuning or adapting models, document the additional data used, how it was sourced, and how it affects system behavior (e.g., in system-level documentation, see Play 5).
- Clarify copyright and IP position. Communicate the use of copyrighted material, licensing, and the exclusion of user or proprietary data.
2Disclose fine-tuning practices and behavioral instructions. If the model has been fine-tuned for specific tasks or domains, document what additional data was used, how it was sourced, and how fine-tuning affects system behavior and limitations. Document any system prompts or behavioral instructions that shape how the model responds. Where your model provider encodes behavioral principles into training (e.g., through Constitutional AI approaches) reference and document these in your system card.
3Require internal dataset governance reviews. Assess legal compliance, labor standards, and sourcing risks before use.
4Document model and system updates. Establish documentation for when models change or new datasets are added.

Case Study — IBM Granite Models

IBM achieved the highest score on the Stanford University Foundation Model Transparency Index for its Granite models.^[54] IBM distinguishes itself by publishing detailed model documentation^[55] outlining training data categories — including licensed and public datasets — while clearly stating the exclusion of scraped copyright content without permission. This transparency strengthens enterprise adoption by reducing uncertainty around copyright exposure and data misuse.

Tools & Resources

Datasheets for Datasets (Gebru et al., 2021)The foundational paper proposing a standardized documentation framework for datasets. The most widely cited dataset documentation standard and an essential reference for organizations documenting training data.
Data Statements for NLP (Bender & Friedman)A documentation standard that describes dataset demographics, linguistic sources, and annotation context. Particularly useful for language models where data provenance affects fairness and performance.
Dataset Nutrition Labels (MIT Media Lab)A standardized dataset documentation framework covering provenance, variables, ethical risks, and statistical properties. Useful for communicating to non-technical stakeholders.

Maintain transparency around user data, privacy, and personal data use

Who is involved: Product teams, data teams, privacy and legal compliance teams or leaders, responsible AI practitioners

Transparency around data, inputs, and privacy ensures users and stakeholders understand what data is being collected and how it is being used — across the lifecycle of collection, processing, storage, and retention. Through clear transparency practices, users can better understand how their data is being used and protected, supporting trust and adoption. This can be achieved through product disclosures, consent flows, privacy notices, and explanations within the product. Deployers remain responsible for transparency even if they are using third party models: organizations must explain how data flows through their specific product, including both proprietary and third party pipelines.

Business Benefits

✓ Improves user retention by clearly communicating how data and inputs are handled
✓ Reduces legal and regulatory risk through proactive privacy disclosure
✓ Strengthens governance by clarifying ownership of data flows and privacy controls

Challenges to be aware of

Organizations may face commercial or legal pressures that limit full disclosure of data practices (see challenges section: corporate opacity and balancing tensions).

How

1Map data flows end-to-end. Document what user inputs are collected, where they go, how long they are retained, and who can access them.
2Clarify input handling. Specify whether prompts or outputs are logged, reviewed by humans, used for training the model, or shared with third party vendors. Where user inputs are used to train or adapt models, document this as part of your training data governance practices (see Play 6).
3Provide clear user disclosures. Cover data practices in the form of privacy notices, onboarding screens, and in product explanations. Ensure disclosures are consistent with the organization-wide disclosure standards (see Play 3).
4Enable user choice and control. Offer opt outs where possible and consent mechanisms. Define clear mechanisms including retention periods and making sure user data does not stay on servers longer than necessary.
5Align privacy with product design. Implement data minimization and secure default controls.
6Disclose third party data sharing. Clearly state if and how data is provided to third parties (ex. Advertising, data brokers, law enforcement).

Case Study — Apple & DuckDuckGo

Apple's Privacy Nutrition Labels require every app on the App Store (including Apple's) to disclose what data is collected, how it is used, whether it is linked to a user's identity, and whether it is used for tracking. Information is presented in standardized, easy-to-read formats on each app's product page so users have a clear summary of privacy practices before they download. Developers must also disclose data collected by third parties and partners integrated into their apps. Apple's App Tracking Transparency Framework also requires apps to obtain explicit user permission before tracking activity across other apps or websites.

DuckDuckGo builds transparency directly into its business model. Its privacy policy starts with: "We don't track you." The rest of the policy explains what data is briefly seen, how it is used, and why it is deleted. Its AI product, Duck.ai, extends this to AI interactions: model providers are called on users' behalf so personal information is not exposed to them, and DuckDuckGo has agreements in place with all model providers prohibiting them from using prompts and outputs to train their models.

Tools & Resources

Data Protection Impact Assessments (EU General Data Protection Regulation (GDPR))Required under EU law for high-risk data processing. DPIAs document how personal data flows through systems and identify privacy risks, making them a useful starting framework for any organization mapping data flows (step 1).
Transparency Hub (Berkman Klein Center at Harvard)Gathers tech companies' legal and privacy policies over time, useful for benchmarking current disclosure practices against industry standards (step 3).
Data flow mapping toolsEngineering teams often use internal data flow diagrams or architecture mapping tools to document how inputs travel across services, APIs, and model providers. Examples include the open-source diagramming tool Mermaid, which allows engineers to generate architecture diagrams directly in documentation, as well as enterprise data lineage platforms such as Collibra that map data flows across services and vendors. Useful for step 1.

Benchmark, evaluate, and monitor AI systems — based on your role

Who is involved: Foundation model developers, product and engineering teams, responsible AI teams, compliance and risk leaders

System-level transparency and documentation (see Play 5) is only credible when grounded in ongoing evaluation, monitoring, and accountability practices. Benchmarks, audits, red-teaming, and impact assessments provide the operational backbone that helps organizations understand how AI systems behave, where they fail, and what risks they introduce over time.^[56]

The right approach depends on your role: foundation model developers should conduct and publish benchmarks to assess model capabilities, limitations, and safety measures over time; deployers should focus on context-specific evaluations, including testing how AI systems behave in their own products, workflows, and user environments, as well as assessing broader impacts. Evaluation processes should continuously inform system-level documentation, governance processes, and user-facing disclosures.

Business Benefits

✓ Improves trust and oversight by grounding transparency claims in ongoing testing and monitoring
✓ Reduces operational and reputational risk by surfacing failures, biases, or unsafe behaviors before harm occurs
✓ Strengthens accountability over time by ensuring systems remain governable as models, use cases, and environments evolve

Challenges to be aware of

The black box nature of large models means some evaluation findings may be difficult to interpret or explain. Evaluations can also go stale quickly as models and systems evolve (see challenges section: black box nature of AI models; pace of AI development).

How

1Clarify your evaluation responsibility.
- Foundation model developers: publish benchmarks for foundation models.
- Deployers: evaluate and assess the impact of AI systems built on external models within the appropriate context.
2If developing foundation models: conduct and publish benchmarks. This includes information on model capability and task performance; reliability and robustness; known limitations, risks, and failure modes; and safety-related behaviors. Be explicit about what benchmarks do and do not measure, recognizing that many benchmarks are limited in capturing impacts across wide ranges of people and do not report on key aspects of trustworthiness such as bias.
3If deploying AI systems: conduct evaluations. Test how the system behaves in your real workflows, including: stress tests and edge-case scenarios, red-teaming and adversarial testing, domain-specific failure analysis, performance across diverse user groups and contexts.
4Incorporate impact assessments where risks are high. For sensitive or high-stakes deployments, pair technical evaluations with structured impact analysis, including: potential harms and affected stakeholders, bias and fairness risks, organizational and societal consequences, and mitigation and oversight plans.
5Monitor systems continuously after deployment. Establish ongoing practices such as: incident tracking, performance monitoring, feedback loops from users, and regular evaluations.
6Ensure evaluations feed back into transparency documentation and disclosure.
- For foundation model developers: Integrate benchmarks within model cards (see Play 5).
- For deployers: Integrate evaluations and impact assessments within system cards and other governance processes, like acceptable use or additional safeguards (see Play 5). Ensure that evaluation and impact assessment results also inform user-facing transparency features, like disclosures (see Play 9).

Case Study — Meta

Meta actively maintains the security of its open-source AI models through ongoing evaluations. The company conducts human and AI-enabled red-teaming exercises with internal and external experts across disciplines, languages, and geographic concerns. Recurring tests — such as adversarial prompting — help identify vulnerabilities, unexpected use cases, and inform changes to benchmarks and fine-tuning datasets. Meta also enhances model safety and performance through reinforcement learning guided by human feedback. It uses a variety of data processing techniques to ensure high-quality training datasets.^[57]

This example reflects the developer side of the role-based framework above. For deploying organizations, the equivalent practice is context-specific evaluation.

Tools & Resources

MLCommons BenchmarksStandardized, open-source tests that measure and compare the performance, quality, and safety of AI systems.
Hugging Face Evaluation HubOpen-source library with standardized tools, metrics, and rankings to evaluate machine learning models and datasets across domains.
MITRE ATLASProvides a structured framework of AI attacks and techniques to help teams test, assess, and secure AI systems.

Design transparency in AI outputs and user interactions

Who is involved: Product managers, UX and design teams, engineering teams, responsible AI leads, customer support teams, and end users

User-facing transparency helps people understand and interpret outputs, calibrate trust, and decide when and how to rely on AI systems. This can include AI labels, explanations, confidence ratings, guidance on appropriate use, and citations. User-facing transparency is not one-size-fits-all. Providing too little information can lead to misunderstandings and misplaced trust, while too much or poorly presented information can overwhelm or confuse users.^[58] Research suggests more transparency generally enhances trust in the AI, perceived reliability of its recommendations, confidence in its accuracy, and ease of understanding the output. However, these effects vary by context: higher transparency boosted understanding in finance settings but caused confusion in healthcare settings.^[59]

The appropriate level of user-facing transparency depends on your deployment context and risk tier (see Play 2). At a minimum, every deployment should inform users when they are interacting with AI, what the system is for, and what it should not be used for. In many applications, especially where users may act on outputs or errors could cause harm, explanations, citations and confidence scores should be the norm. For higher-risk deployments – such as those involving health, finance, employment, or other high-stakes decisions – more thorough explanations are critical.

Business Benefits

✓ Builds user trust and adoption by helping people understand and appropriately rely on outputs
✓ Reduces misuse and liability risk by setting clear expectations, limits, and uncertainty in sensitive contexts
✓ Improves product quality and differentiation

Challenges to be aware of

More transparency is not always better, as poorly designed or excessive disclosure can overwhelm or confuse users. There is also a commercial tension in the opposite direction: organizations may omit uncertainty signals to protect product perception. In practice, omitting them is the greater risk (see challenges section: unintended trust consequences; corporate opacity and balancing tensions).

How

1Establish a baseline of user-facing transparency. Ensure every deployment clearly communicates: When AI is being used (e.g., AI disclosure) and what the system is for and should not be used for. For image generation systems, disclosure also means AI watermarks. For content generation systems, this also includes AI labeling and watermarking where appropriate (see Box 2 on disclosure standards). The baseline of what gets disclosed should be established in your organization's transparency strategy (see Play 2).
2Add "receipts," not just answers. Provide supporting context such as citations or links to sources. Transparency about sources can improve user evaluation and reduce overload when implemented thoughtfully.
3Communicate uncertainty where users may act on outputs, such as through confidence scores or indicators. This is particularly important in high-risk domains or deployments.
4Design explanations for usability, considering deployment risk level. Provide transparency by design, whereby transparency is meant to clearly respond to the needs of users. This can include simple rationales or more layered explanations for those who want additional detail. In high-stakes contexts (e.g., health, finance, employment), provide thorough explanations tied to confidence scores and clear guidance on limitations. The level of explanation required should reflect the risk tier established in your transparency strategy (see Play 2).^[60]
5Test transparency features with target users. Use UX research to assess different features and approaches to ensure that transparency approaches are meeting user needs without causing overwhelm. Update user-facing features as the system / tool changes or risks emerge.
6Consider language, literacy, and cultural context. Ensure explanations, disclosures, and uncertainty indicators are understandable across different languages, literacy levels, and cultural contexts.

Case Study — Adobe

Adobe Research is developing tools to increase transparency and build trust between creators and viewers by helping detect AI generated content like images and videos.^[61] Their Content Credentials tool allows creators to attach brief metadata to images that explain how they were created, which viewers can access with a simple click. This protects creative ownership and ensures the authenticity of artists' work. Additionally, Adobe's Content Credential Browser Plug-In allows users to view these credentials on websites and social media platforms where metadata isn't embedded directly, extending transparency beyond Adobe's own tools.

Tools & Resources

Algorithmic Transparency Playbook (Technical Guide)Provides overview of intrinsic, post-hoc, SHAP, counterfactual, and other explanation approaches that can be used for user-facing transparency.

Provide accountability and recourse mechanisms

Who is involved: C-Suite leaders, product and engineering leadership, responsible AI leads, legal and compliance leads, risk and audit teams, trust and safety teams, HR, and communications

Transparency is meaningful only when it is paired with accountability.^[62] Accountability and recourse must be integrated in governance, policies, and easily accessible through user-facing design.^[63] Providing accountability and recourse mechanisms ensures that users have support while navigating opaque systems. They must be able to question outputs and report harm. Organizations must internally have defined escalation paths, response processes, and clear ownership when such situations arise so that their stakeholders see the value of raising concerns and calling for improvement. This includes structured triage processes and cross-team review along with human review and potential reversing of algorithmic outcomes in higher stakes contexts.^[64]

Business Benefits

✓ Reduces legal and regulatory consequences by demonstrating due diligence
✓ Improves product quality through structured feedback and incident tracking, fostering trust and long-term adoption
✓ Reinforces other transparency processes by demonstrating real attention and engagement by organizational leadership

Challenges to be aware of

Commercial and legal pressures can create tension between full accountability and liability exposure. Absence of clear standards for what constitutes adequate recourse makes it difficult to know whether mechanisms built are sufficient (see challenges section: corporate opacity; lack of standards and evolving regulation).

How

1Define clear escalation pathways. Establish documented internal reporting channels. This can include outlining severity tiers, response timelines, and decision-making authority.
2Create accessible user feedback and reporting channels. Provide reporting tools within the product, structured forms, and contact information on how users can challenge outputs.
3Operationalize incident response. Track all incidents in a centralized system and identify where the failures came from. Feed findings back into evaluation practices (see Play 8) and documentation updates (see Plays 6-7). Require reporting of significant incidents to leadership as part of standard protocol.
4Define remediation pathways. Enable human review, appeals processes, or reversibility of automated outcomes if possible. Human oversight is key, especially in high risk contexts.
5Continuously update policies and employee training. As model risks evolve in the rapidly changing AI landscape, ensure recourse mechanisms are accessible and usable with the requirement of reporting significant incidents to leadership.
6Reward responsible reporting. Embed incentives for internal and external reporting into organizational culture and performance structures (see Play 4), reinforcing accountability as a shared responsibility.

Case Study — Apple Card (a cautionary tale)

The importance of accountability and recourse is illustrated by what happens when they are absent. The Apple Card controversy is an example of failure to follow Play 10. After public allegations that women were receiving significantly lower credit limits than men in similar positions, users reported difficulty obtaining clear rationale.^[65] There was no structured appeals interface within the product experience that allowed users to formally challenge outcomes. Recourse relied on traditional customer service channels, which are not well suited for algorithmic disputes. Due to a high profile complaint on social media,^[66] the New York Department of Financial Services began an investigation.^[67] The absence of clearly communicated escalation pathways and human review override visibility contributed to significant damage.

Tools & Resources

AI Incident Reporting DatabaseThe AI Incident Database models publicly known incidents of harm caused by AI systems and can provide both a model for documenting and analyzing AI failures and harms as well as a useful resource for raising awareness of the types of harms possible.
Ombudsperson or independent oversight rolesSome organizations establish internal accountability roles modeled after corporate ombuds programs promoted by the International Ombuds Association.

Call to Action

Maya, the product lead we met at the start of this playbook, faced the crisis head on. After the client's AI system gave a confident but wrong answer with no explanation and no recourse, she and her organization took action: they established transparency as a leadership principle, built internal governance structures, set clear disclosure standards, documented system limitations, built user-facing explanations into their products, and created pathways for clients to raise concerns. The result was renewed client trust, greater confidence in their systems, and a stronger foundation for growth.

Meaningful transparency is not a single feature or checkbox. It is a combination of leadership commitment, organizational infrastructure, documentation practices, evaluation processes, and user-facing design. Each layer reinforces the other.

This is not simply compliance: Transparency is central to trust, and the opportunity extends far beyond any single organization. The choices that business leaders make today will help shape the norms and inform standards that define trustworthy AI within and across industries. As standards continue to evolve, those who see transparency as a strategic investment and take action now, will get ahead and stay ahead.

Generative AI disclaimer: Generative AI tools were used for editing and refining text, as well as to gather ideas for case studies and tools in the plays section. The HTML playbook was built with assistance from Claude Code. All content was reviewed, edited, and approved by the research team.

Acknowledgements

The playbook received invaluable feedback in prototyping sessions with practitioners and leaders including: Anastasia Barnett (Credo.ai), William Bartholomew (Microsoft), Luca Belli (Spring Health), Christina Clark (Adobe), Tina Huang (Microsoft), Julie Jin (Google), Alayna Kennedy (Mastercard), Dan Leininger (Consumer Reports), Niveditha Obla (Microsoft), Alex Vasiloff (Google), and Dominique Wimmer (New York City Office of Technology & Innovation). The playbook was developed with funding support from Google. Some of the academic research that informed the playbook was also supported by Microsoft.

References

[1] BBC. (2019). "Apple's 'sexist' credit card investigated by US regulator." BBC News. https://www.bbc.com/news/business-50365609
[2] New York State Department of Financial Services. (2021, March). Report on Apple Card Investigation. https://www.dfs.ny.gov/system/files/documents/2021/03/rpt_202103_apple_card_investigation.pdf
[3] Jobin, A., Ienca, M., & Vayena, E. (2019). "Artificial Intelligence: the global landscape of ethics guidelines." Nature Machine Intelligence. https://www.nature.com/articles/s42256-019-0088-2
[4] Schmidt, P., Biessmann, F., & Teubner, T. (2020). Transparency and trust in artificial intelligence systems. Amazon Science. https://www.amazon.science/publications/transparency-and-trust-in-artificial-intelligence-systems
[5] Felzmann, H., Fosch-Villaronga, E., Lutz, C., & Tamò-Larrieux, A. (2020). Towards transparency by design for artificial intelligence. Science and Engineering Ethics, 26(6), 3333–3361. https://doi.org/10.1007/s11948-020-00276-4
[6] Felzmann et al, 2020
[7] Weller, A. (2019). Transparency: Motivations and challenges. In W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, & K.-R. Müller (Eds.), Explainable AI: Interpreting, explaining and visualizing deep learning (pp. 23–40). Springer. https://doi.org/10.1007/978-3-030-28954-6_2
[8] Wan, et al. (2025). The 2025 Foundation Model Transparency Index. Stanford University Center for Research on Foundation Models. https://crfm.stanford.edu/fmti/December-2025/paper.pdf
[9] Anthropic. (2024). "Mapping the Mind of a Large Language Model." https://www.anthropic.com/research/mapping-mind-language-model
[10] See for example: Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., Wang, S., Yin, D., & Du, M. (2024). Explainability for Large Language Models: A Survey. ACM Trans. Intell. Syst. Technol., 15(2). https://doi.org/10.1145/3639372
[11] Marshall, C. (2025, August 7). "What is AI transparency? A comprehensive guide." Zendesk Blog. https://www.zendesk.com/in/blog/ai-transparency/
[12] Park, K., & Yoon, H. Y. (2025). AI algorithm transparency, pipelines for trust not prisms: mitigating general negative attitudes and enhancing trust toward AI. Humanities and Social Sciences Communications, 12(1), 1160. https://doi.org/10.1057/s41599-025-05116-z
[13] PwC. (2024, March 12). "PwC's 2024 Trust Survey: 8 key findings." https://www.pwc.com/us/en/library/trust-in-business-survey.html
[14] Friede, Gunnar, Busch, Timo, and Bassen, Alexander 2015. ESG and financial performance: aggregated evidence from more than 2000 empirical studies. Journal of Sustainable Finance & Investment. 5, 4, 210–233. https://doi.org/10.1080/20430795.2015.1118917
[15] Canadian Marketing Association. (2025, June). The AI transparency advantage: Building trust through disclosure.
[16] Documentation levels were informed by various research. See: Haque, MD Romael, Devansh Saxena, Katy Weathington, Joseph Chudzik, and Shion Guha. "Are we asking the right questions?: Designing for community stakeholders' interactions with ai in policing." In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1-20. 2024. Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers (February 10, 2025). Available at SSRN: https://ssrn.com/abstract=4945566 or http://dx.doi.org/10.2139/ssrn.4945566. Corti, L., Oltmans, R., Jung, J., Balayn, A., Wijsenbeek, M., & Yang, J. (2024). "It Is a Moving Process": Understanding the Evolution of Explainability Needs of Clinicians in Pulmonary Medicine. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, Honolulu, HI, USA. https://doi.org/10.1145/3613904.3642551
[17] Kung, F. (2025). Workers Participating in Transparency. Partnership on AI. https://partnershiponai.org/resource/workers-participating-in-transparency/
[18] OECD Hiroshima AI Process Reporting Framework. https://transparency.oecd.ai/reports
[19] European Commission. (2025). Code of Practice on marking and labelling of AI generated content. https://digital-strategy.ec.europa.eu/en/policies/code-practice-ai-generated-content
[20] Felzmann et al, 2020
[21] von Eschenbach, W. (2021). Transparency and the Black Box Problem. Philosophy & Technology, 34: 1607–1622. https://doi.org/10.1007/s13347-021-00477-0
[22] Saeed, W., & Omlin, C. (2023). Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Expert Systems with Applications, 263. https://doi.org/10.1016/j.knosys.2023.110273
[23] Saeed & Omlin, 2023
[24] Saeed & Omlin, 2023
[25] Lemley, M. (2024). How Generative AI Turns Copyright Law Upside Down. Science and Technology Law Review, 25(2). https://doi.org/10.52214/stlr.v25i2.12761
[26] Felzmann et al, 2020
[27] Saeed & Omlin, 2023
[28] Weller, 2019
[29] Kiseleva, A., Kotzinos, D., & De Hert, P. (2022). Transparency of AI in healthcare as a multilayered system of accountabilities. AI and Ethics.
[30] European Commission. (2026, January). "AI Act." https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai
[31] EU AI Act. (2025, July 30). "Overview of Guidelines for GPAI Models." https://artificialintelligenceact.eu/gpai-guidelines-overview/
[32] SB 53. (2025). Artificial intelligence models: large developers. 2025–2026 Leg., Reg. Sess. https://legiscan.com/CA/text/SB53/id/3270002
[33] AB 2013. (2024). Artificial intelligence: training data transparency, 2023–2024 Leg., Reg. Sess. https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=202320240AB2013
[34] Infocomm Media Development Authority. (2024). Model AI Governance Framework for Generative AI. https://aiverifyfoundation.sg/wp-content/uploads/2024/05/Model-AI-Governance-Framework-for-Generative-AI-May-2024.pdf
[35] Lucaj, L., Loosley, A., Jonsson, H., Gasser, U., & van der Smagt, P. (2025). TechOps: Technical documentation templates for the AI Act. arXiv. https://doi.org/10.48550/arXiv.2508.08804
[36] Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act), Article 50: Transparency obligations for providers and deployers of certain AI systems. Official Journal of the European Union. https://artificialintelligenceact.eu/article/50/
[37] China Law Translate. (2025, March 14). Measures for the labeling of AI-generated and synthetic content. https://www.chinalawtranslate.com/en/ai-labeling/
[38] Kye, E., & Stauss, D. (2025, October 28). California AI Transparency Act Amendments Signed Into Law. Privacy + Cyber + AI (Troutman Pepper Locke). https://www.troutmanprivacy.com/2025/10/california-ai-transparency-act-amendments-signed-into-law/
[39] Ministry of Electronics and Information Technology of India. (2025). India AI Governance Guidelines. https://static.pib.gov.in/WriteReadData/specificdocs/documents/2025/nov/doc2025115685601.pdf
[40] SB26-189. Automated Decision-Making Technology. Colorado General Assembly. https://leg.colorado.gov/bills/sb26-189
[41] Department for Science, Innovation and Technology. (2023). A pro-innovation approach to AI regulation (Command Paper 815). UK Government. https://www.gov.uk/government/publications/ai-regulation-a-pro-innovation-approach
[42] Organisation for Economic Co-operation and Development. (2024). AI principles. https://www.oecd.org/en/topics/sub-issues/ai-principles.html
[43] Walmsley, J. (2020). Artificial intelligence and the value of transparency. AI & Society, 35(4), 961–972. https://doi.org/10.1007/s00146-020-01066-z
[44] Park & Yoon, 2025
[45] Microsoft. (2025, November 8). Transparency note for Azure OpenAI. Microsoft Learn. https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/openai/transparency-note
[46] Reuel, A., Connolly, P., Meimandi, K. J., Tewari, S., Wiatrak, J., Venkatesh, D., & Kochenderfer, M. (2025, June). Responsible ai in the global context: Maturity model and survey. In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (pp. 2505-2541). https://dl.acm.org/doi/full/10.1145/3715275.3732165
[47] Microsoft Responsible AI Standard V2. (2022, June). Microsoft. https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/Microsoft-Responsible-AI-Standard-General-Requirements.pdf
[48] Lunney, J., & Lueder, S. Postmortem culture: Learning from failure. Google SRE. https://sre.google/sre-book/postmortem-culture/
[49] Croak, M., & Gennai, J. (2022, July 6). An update on our work in responsible innovation. Google. https://blog.google/innovation-and-ai/products/an-update-on-our-work-in-responsible-innovation/
[50] Reuel et al., 2025
[51] Anthropic's Transparency Hub. (2026, February 20). https://www.anthropic.com/transparency
[52] Lemley, M. (2024). How Generative AI Turns Copyright Law Upside Down. Science and Technology Law Review, 25(2). https://doi.org/10.52214/stlr.v25i2.12761
[53] In 2025, Disney and Universal sued the AI company Midjourney over the use of copyrighted characters. Espiner, Tom and Lily Jamali. "Disney and Universal sue AI firm Midjourney over images." BBC. https://www.bbc.com/news/articles/cg5vjqdm1ypo
[54] Martineau, K. (2026, February 17). IBM granite has earned a reputation for transparency. IBM Research. https://research.ibm.com/blog/granite-ethical-ai
[55] IBM. Trust and transparency. https://www.ibm.com/policy/trust-transparency
[56] NIST. "AI Risk Management Framework." Artificial Intelligence Risk Management Framework (AI RMF 1.0), vol. 1, no. 1, Jan. 2023, nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf, https://doi.org/10.6028/nist.ai.100-1
[57] Meta. Expanding our open source large language models responsibly. (2024, July 23). https://ai.meta.com/blog/meta-llama-3-1-ai-responsibility/
[58] Sanneman, L., Tucker, M., & Shah, J. A. (2024). An Information Bottleneck Characterization of the Understanding-Workload Tradeoff in Human-Centered Explainable AI. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (pp. 2175-2198). Association for Computing Machinery, Rio de Janeiro, Brazil.
[59] Sullivan, V., & Weger, K. (2025). Transparency and Explainability in AI-Assisted Decision Making: Effects on Trust, Perceived Reliability, Confidence, and Ease of Understanding. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 69(1).
[60] Felzmann et al, 2020
[61] Adobe Research. (2025). "Restoring Trust and Empowering Artists with Content Credentials." https://research.adobe.com/news/empowering-artists-with-content-credentials/
[62] Nissenbaum, H. (1996). Accountability in a computerized society. Science and Engineering Ethics, 2(1), 25–42.
[63] Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., & Barnes, P. (2020). Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 33-44). Association for Computing Machinery, Barcelona, Spain. https://doi.org/10.1145/3351095.3372873
[64] Papagiannidis, E., Mikalef, P., & Conboy, K. (2025). Responsible artificial intelligence governance: A review and research framework. The Journal of Strategic Information Systems, 34(2), 101885. Elsevier. https://doi.org/10.1016/j.jsis.2024.101885
[65] BBC. (2019, November 11). Apple's "sexist" credit card investigated by US Regulator. https://www.bbc.com/news/business-50365609
[66] Reuters. (2019, November 10). Apple card issuer investigated after claims of sexist credit checks. The Guardian. https://www.theguardian.com/technology/2019/nov/10/apple-card-issuer-investigated-after-claims-of-sexist-credit-checks
[67] New York State Department of Financial Services. (2021, March). Report on Apple Card Investigation. https://www.dfs.ny.gov/system/files/documents/2021/03/rpt_202103_apple_card_investigation.pdf

Transparency inGenerative AI Systems

Transparency is how leaders turn generative AI from a noisy black box into a system people can interpret, use, and govern responsibly.

What is this playbook?

Who is this playbook for?

Why use this playbook?

What's Inside

How and by whom was this playbook developed?

How to use this playbook

Executive Summary

Five things every leader should understand

10 actionable plays across three categories

Introduction

Background

What is AI transparency and why does it matter?

GenAI raises the stakes

Transparency is multi-dimensional

Visibility

Explainability

Interpretability

Openness

Accessibility

Transparency for whom?

The current state of genAI transparency

What is the business case for genAI transparency?

Transparency in Practice

Transparency Tools & Resources

Capturing and communicating how AI systems work

Assessing system performance and reliability

Challenges to Transparency

The Plays

Governance & Organization

Make transparency a leadership priority

Set a transparency strategy across your organization and AI portfolio

Establish policies and structures to operationalize transparency

Build incentives and a culture that rewards transparency

Transparency Tools & Oversight Mechanisms

Document AI systems at the right level: models for builders, systems for deployers

Maintain transparency around AI development, configuration, and governance

Maintain transparency around user data, privacy, and personal data use

Benchmark, evaluate, and monitor AI systems — based on your role

Transparency in Deployment & User Interactions

Design transparency in AI outputs and user interactions

Provide accountability and recourse mechanisms

Call to Action

Acknowledgements

References

Transparency in
Generative AI Systems