Scientific Artifacts
Risk | Risk Definition | Risk Mitigation Management Practices |
---|---|---|
Business Problem Mismatch | Mismatch with the business problem to be addressed and the model design could direct the AI roadmap in the wrong direction leading to an obsolete system. There are also false beliefs about AI systems that can lead to confusion about output. One such example is the fallacy that a computer's narrow understanding implies broader understanding. This is one of the biggest sources of confusion and exaggerated claims about AI's progress. Risk can arise when the goal as defined by the algorithm is not clearly aligned to the real-world business problem statement. | • Ensure business purpose, governance, and stakeholder engagement are properly identified and aligned • Develop detailed understanding of the underlying business need in the context of the situation at hand • Gather insights from the use case owner as well as from domain experts to determine if application of AI is not only feasible, but also the most effective way to address the problem • Document findings, create a business case, and have it vetted by an oversight committee |
Not Precisely Defined Model Universe | Lack of detailed model universe definition could mislead the analysis from the outset. AI systems rely on training data to learn and become stronger at making predictions aligned with business problems. When the target universe is not represented by the training population, unintended consequences, such as bias or lack of fairness, can arise. | • Precisely define the model universe to ensure the data that will be used to develop the model can be appropriately sourced, aggregated, and organized to include representation of all communities that may be impacted • Verify that all of the members of the target audience are correctly accounted for in model design to avoid benefiting, excluding, or harming certain groups • Define a clear and documented statement of the fairness principle as it relates to the underlying business application and impact on stakeholders • Evaluate fairness of the design by a diverse team of technical and business people that understand the models and their impact and usage; include representation of target audience in the review |
Not Fully Understood Intended Model Use and Impact | Unrecognized intended model use and impact can lead to a disconnect in the informed decision making process as the model result may not relate to the decisions being made. This risk is fundamental, and as machine learning use increases, it is quickly becoming one of the biggest challenges for data-driven organizations, data scientists, and legal personnel around the world. The challenge is related to the basic ability to assert a causal connection between inputs to models and how that input data impacts model output. | • Develop deep understanding of the context of intended model usage, including who will be impacted by the system (e.g. groups with different vulnerabilities) to help prevent potential model outputs that perpetuate unfairness • Develop a plan for continuous assessments of the decisions tied to model outputs and their use across the entire model lifecycle • Consider the impact of the entire system in which the AI model will be integrated, including the intended model use |
Missing Business Drivers | Missing drivers may lead model development towards an inappropriate end. Without proper alignment to established business drivers, the model might not be fit for its intended purpose. | • Identify and align business drivers as the project is being framed at the beginning of the model lifecycle • Standardize rating heuristics to enable evaluation of use cases against considerations relating to business value and implementation effort • Collaborate with business and IT to understand how users’ roles will change as a result of implementing the use case as part of the overall production costs of the use case • Define success for the use case in the form of the “metric to beat” to determine if the level of effort vis-à-vis results is a worthwhile investment • Establish initial data points for a production checklist to understand the costs of taking the model to production and maintaining the model • Establish a feedback loop to the methodology from lessons learned during modeling, business adoption, and experience in use to help keep the assessment in sync with improvements in the maturity of the organization and to better support future use cases |
Misalignment with Business Strategy | Misalignment with business strategy may result in an inappropriate definition of data sources and delivery of incorrect information to the business community, thus impacting transparent reflection of business processes and their outcomes. | • Engage business sponsors in identification of opportunities using AI solutions to improve business performance • Develop detailed understanding of business processes and their outcomes • Maintain open communication among the product manager/project manager and the development team to ensure common understanding of mission statements • Evaluate the alignment to business drivers by selecting strategic measures against business objectives • Make a conscious effort to course correct the model direction towards the appropriate business strategy whenever misalignment is identified |
Missing Business Requirements | Without a holistic and detailed review of the use case and how the model will be operationalized and consumed, critical requirements might be missed, potentially preventing moving beyond a successful POC to pilot or to production. | • Recognize that AI systems are unique in that they may exhibit behavior that is outside of the documented requirements • Develop a standardized list of key risks and considerations across data, technology, process, people, and organizational dimensions to review at the outset of the model development lifecycle • Understand existing capability gaps along the POC, pilot, and operationalization spectrum that need to be addressed before project initiation (e.g. capabilities to augment/acquire, processes to mature) |
Misunderstood Risk Rating Requirements | Unlike traditional statistical models, AI models are based on pattern recognition of similarities, probability theory, and engineering. They are dynamic, non-deterministic, and continuously evolving. If risk rating requirements are mismatched, AI risks inherent in the system will be allowed to remain unmitigated, potentially leading to a risk exposure. | • Adopt a consistent enterprise-wide AI model definition to identify AI risks • Embed the standard within innovation programs, new product/business approval processes, third-party sourcing, information technology (IT) software implementation and updates, and other relevant programs across the organization • Create AI model inventory with model attributes as metadata (e.g. source code, data inputs and labels, features, explainability, retraining frequency) and maintain accuracy and completeness by providing specific criteria based on the attributes • Develop a risk assessment framework to determine for each model the inherent risk based on the particular model's complexity, materiality, and degree of reliance • Adjust the scope and rigor of design review to the risks posed by the model |
Missing Compliance Governance | Missing compliance governance can lead to aspects of the overall project being out of compliance which will undermine the integrity of the entire effort. | • Ensure alignment with enterprise governance, risk management, and control (GRC) • Ensure alignment with model risk management (MRM) • Embed a robust set of compliance checks throughout the entire model lifecycle that are consistently reviewed • Promote awareness of how the AI model works to minimize information asymmetries between the development teams, users, and target audiences |
Missing Ethics Governance | Missing ethics governance can cause issues around misalignment with organizational values, brand identity, or social responsibility. Without proper guidelines in the form of policies surrounding the development of AI, stakeholders in the AI solution cannot expect ethical development and deployment of the AI solution. | • Develop AI Ethics Governance methods in line with the company’s policies and mission, including: definitions of values, ethical principals, and ethics code of conduct, as well as AI stakeholders and AI ethics risk • Establish governance structure over ethical AI - Form AI ethics review board or a similar mechanism to discuss the overall accountability and ethics practices, including potential gray areas - Develop/mature compliance program that addresses ethics, including ethical guidelines for AI product deployment - Develop external guidance or third-party auditing processes to oversee ethical concerns and accountability measures - Conduct AI ethics assessment and identify opportunities for improving organizational alignment with ethical AI - Launch programs to mature the level of organizational AI ethics risk awareness • Set up an AI ethics council to test the algorithm for potential biases, to decide whether it requires modifications prior to deployment, and to create transparency on how the algorithm works. The council should be diverse, independent, knowledgeable, and able to act • Adopt Ethically Aligned AI Design (EAAID) methodology to address ethical issues at the design stage when the solution/technology can still be adjusted |
Missing Reliance Related Governance | Missing trustworthy or responsible AI governance may result in unintentional and significant damage, not only to the mission and brand reputation, but also to employees, citizens (consumers), and society as a whole. | • Develop Trustworthy AI Governance methods in line with enterprise governance, risk management, and control (see Governance, Risk Management, and Control (GRC)) • Establish governance structure over trustworthy AI to align with Executive Order principles (see Trustworthy AI Principles) - Form Trustworthy AI review board or a similar mechanism to discuss the overall accountability and trustworthy AI practices, including potential gray areas - Develop/mature compliance program to address trusted principles - Develop external guidance or third-party auditing processes to oversee trustworthy AI concerns and accountability measures - Launch programs to mature the level of organizational AI trust risk awareness • Conduct trusted AI assessment and identify opportunities for improving alignment with trustworthy AI principals |
Missing Security Requirements | Missing security requirements in model design could open the environment up to adversarial attacks and cause drastic issues within the ecosystem. Adversarial players could attempt to make the AI model behave in a way that differs from the model's intended purpose by corrupting data, algorithms, platforms, and/or the downstream systems acting on the model's output | • Mature the existing security program by incorporating AI security considerations to: - comprehensively address possible approaches to comprising the system (e.g. taking control of the system itself, feeding the system incorrect data to influence an action or decision that is not in line with the systems correct functioning) - consider the dynamic and integrated risks associated with AI • Ensure model risk management (MRM) policies explicitly reference how information security applies, where appropriate |
Unknown System Limitations and Boundary Conditions | Understanding the limitations of AI is critical to attaining value from AI implementation. AI is a rapidly and continuously evolving set of technologies, and the underlying science behind AI is still in its infancy. For example, solutions to the problem of transfer learning are still being developed, and limitations may reveal themselves as the tools and techniques evolve further. | • Develop processes for testing AI system's limitations and boundary conditions as the project is being framed • Define procedures for continuously monitoring the limitations and boundary conditions of AI systems throughout the entire model lifecycle • Ensure feedback and socialization amongst stakeholders includes system conversation on AI system's limitations and boundary conditions |
Lack of Diversified Development Team | Incomplete/insufficient representation on the development team, including representation from the user community and from the target audience, could lead to missing valuable inputs and perspectives throughout model development and thus an erosion of the integrity of the AI solution. A typical way in which lack of diversity in the development team may lead to bias is by failing to recognize that a design choice is linked to cultural customs. | • Form multi-disciplinary development teams consisting of data engineers, data scientists, software engineers, business analysts, those responsible for scaling/production, users, and representatives of the target population • Create an environment for close collaboration among the development team to ensure that the right data is being used and that the AI technology is being correctly utilized in alignment with ethics and trustworthy AI principles • Generate and maintain the risk registry based on prior similar projects with similar risks |
Missing Data and Pipeline Verification | Missing data verification and data pipeline replication (along key parameters such as timeframes, universe, exclusions, eligible population, target variables, source validation) may contribute to a lack of standardization of data sourcing and may introduce a range of potential data risks such as data bias and data drift. | • Balance the need for a comprehensive data collection with organizational ability • Understand the provenance of the data to ensure its relevancy and to use the right data at the right time • Understand the origin of the data to have the ability to track it • Understand the quality of the data so that the system does not fail due to data quality issues • Capture a time stamp along with source • Create Data Provenance Document covering: - Data lineage (i.e. metadata that describes pedigree of the data such as data origin, record of components, changes over time, systems and pipelines it has traveled, content deltas) - Impacting factors (i.e. influencers such as the inputs, entities, systems and processes that affect collected data to be able to reproduce the data) • Ensure that the documentation is able to point to the data at any point of the pipeline to ensure integrity • Review data sources that may contradict each other and resolve discrepancies - Address issues around varying degrees of data reliability - Address different levels of size and ingestion rates • Consistently maintain documentation about the usage of data along the model lifecycle. Once it has been “tagged” with the appropriate metadata, all data must be governed from a usage perspective. Usage of this data must be tracked whether it is being queried, used by a model, or being processed to create additional data. Once additional usage patterns are discovered, additional tags must be added to suggest different use cases, or certain types of usage must be restricted. |
Data Vendor Risks | Vendor risks relate to the pedigree of the data, such as a record of components, inputs, systems, and processes that affect collected data and provide historical context. | • Ensure that the vendor meets firm and model risk management (MRM) vendor management standards, including privacy and information security as appropriate • Establish and govern data acquisition service level agreements (SLAs) with external vendors that include provision of data provenance information (see Missing Data Pipeline Verification Risks for details) • Evaluate how the data is aggregated and monitored by data vendors • Understand the vendor's approach to risk management practices to confirm that the vendor tests for bias and fairness • Develop input validation strategies (for raw data and observed outcomes, if available) to detect invalid data • Build in reliability checks to ensure data trustworthiness • Develop response time tracking against SLAs |
Data Set Selection Risks | Not fit-for-purpose selection of data sets or insufficiently large and comprehensive data sets to be used for training create significant data risk. For many business use cases, creating or obtaining such massive data sets can be difficult. An example of such a data set is limited clinical-trial data to predict healthcare treatment outcomes more accurately. In addition, the usage of historical data may present a risk for AI models to handle market downturn or emergencies (e.g. COVID 19). | • Define standard identifiers and establish consistent definitions across different data types (structured and unstructured) and a variety of sources (internal, vendor, internet, etc.) to enable consistency in measurement of data elements throughout the datasets • Assess if the sourced data are fit-for-purpose and applicable to the use case • Verify minimum data quality requirements are met as AI needs better data, not just more data |
Data Sampling Methodology Risks | Not fit-for-purpose sampling methodology (e.g. simple random sampling, stratified sampling, etc.) can lead to bias due to lack of complete and accurate representation of the target population. In addition, bias can appear when enough data with prejudice is sampled. This type of bias may occur when there is a plenty of data available, but there is prejudice towards one population versus another. | • Evaluate sampling choices or exclusions to ensure the model will not be trained on a population that is too small or unrepresentative • Ensure sampling methodology generates a balanced (representative and sufficiently large) training data set that is not biased and fair • Assess the impact of data availability, representativeness, missing data, outliers, unbalanced samples, and the choice of imputation methodology on sampling bias. A key method for detecting sampling bias is to perform a deep conceptual review of the data processing (such as exclusions, vintages, sampling processes, and reject inference) and target variable definition • Use a range of available data sets (often alternative and auxiliary) to address population gaps and their impact • Use reject inference, if applicable, to account for the excluded population |
Selection of Data Sources Risks | AI requires large volumes of data which are not always available internally, thus often necessitating the collection of data from a range of primary and secondary sources with different data definitions and collection practices. Lack of metadata match creates a risk of the target population not being represented fully and accurately. | • Comply with legal and firm policy (e.g. authorized sources, PII/sensitive consumption) to avoid operational risks (e.g. legal, social reputation) and justify the cost (i.e. ROI). Refer to the DOE Privacy Order 206.1 to ensure understanding of Personally Identifiable Information (PII). DOE defines PII as any information collected or maintained by the Department about an individual, including but not limited to: education, financial transactions, medical history and criminal or employment history, and information that can be used to distinguish or trace an individual’s identity, such as his/her name, Social Security number, date and place of birth, mother’s maiden name, biometric data, and including any other personal information that is linked or linkable to a specific individual. Additionally, PII can be quite context specific, especially when it involves a combination of elements that might not alone identify a person • Review and assess existing data capability (e.g. storage, transformation, scalability) and governance (e.g. metadata management, lineage) • Test all incoming data from external/non-authorized sources by profiling it as it enters the data ecosystem and before it is used by the models. The relevant metadata should be extracted and compared against a corpus to determine if the data is “recognized.” Structured data can be run through an ML model to spot anomalies and potential defects and to suggest remediation methods before the data is consumed by the model. Once recognized, the data is “tagged” with appropriate metadata and is discoverable |
Training and Testing Data Set Risks | Not fit-for-purpose selection of training and testing data sets can throw off results downstream and negatively impact any decisions being made based on the analysis results. It is possible to introduce bias during the data preparation stage. Its impact on accuracy is easy to measure, but its impact on the model's bias is not easy to measure. | • Define and include in model risk management (MRM) training and testing data set standards and requirements • Verify that the data used for modeling is of the same type, availability, and quality that will be used in production • Create a plan to preprocess the data and create training and testing data sets by assuring that they are clean, free of bias, and in the right format |
Data Quality Risks | Lack of standardized processes for detection and remediation of anomalies, inconsistencies, missing values, and outliers can cause data quality risks. Artificial intelligence systems are only as useful as the data used to train them. Because of conscious and unconscious bias in society, data sets can tend to be skewed towards polarized opinions, and the middle can tend to be muddled. | • Establish a set of rules and standards on data quality, completeness, and timeliness for AI models, with considerations for data privacy, protection, and ownership. The goal of the framework is to identify the risks and controls associated with using data in ways that violate access and usage permissions articulated in policy • Comprehensively assess data quality issues around formatting, lacking metadata, or being "dirty" (i.e. incomplete, incorrect, or inconsistent) • Discuss data quality issues as they are discovered and execute remediation steps in alignment with model risk management (MRM) • Consider enhancing existing data remediation processes to address the high volume of structured and unstructured data that AI models typically ingest. The enhancements could entail centralization of data and feature repositories to source, host, manage and govern data across AI models to facilitate standardized remediation techniques (such as missing value fillers) • Apply a range of perspectives to evaluate and question results before moving forward with data processing |
Lack of Data Representativeness | Lack of data representativeness occurs when elements are given lower/higher chances of being selected into the sample or giving them zero probability of selection; sample size insufficiency; missing data; and left censoring. When this is not recognized during analysis, positive/negative on-coverage bias is potentially induced impacting the quality of the model. | • Test collected data for correct representation of the desired population • Conduct sensitivity analysis across a range of sampling designs (in the absence of information on the sampling design) • Address selection bias (e.g. use weighted generalized linear mixed models and generalized linear mixed models combining both conjugate and normal random effects) |
Imbalanced Data Set | Imbalanced data refers to a situation with classification problems where the classes are not represented equally. If not enough data is present to represent a population, bias will occur. Also, when there is enough data, the data may have more of one population represented which can lead to skewed predictions as well. | • Examine information value of collected data sets and determine their potential impact on the shape of the final model (especially for data close to the decision boundary) • Balance classes in the data (via data preprocessing) before model development |
Missing Data Bias Checks | Missing/incomplete data bias checks can lead to an inaccurate prediction based on a data set that does not fully represent the population. Data bias checks are critical for ensuring that the model is not biased in its predictions. | • Define and include in model risk management (MRM) data bias checks, including: sample selection bias, statistical bias, survivorship bias, seasonality bias, and omitted variable bias • Identify and question existing preconceptions in business processes and actively search for how those biases might manifest themselves in data (for example, bring in outside experts to independently challenge past and current practices) • Evaluate sampling bias and fairness across the model’s lifecycle, keeping in mind that it can arise in the design and construction of the model (e.g. the objective function definition and related transformation logic), as well as the input data, feature engineering, and execution actions as defined in MRM (for example, block biases by eliminating problematic data or removing specific components of the input data set) • Enhance governance techniques and tools to help preserve fairness (for example, hold discussions that examine different definitions of fairness, define and execute processes that hold the organization accountable to fairer outcomes) |
Data Labeling Issues | Labeling training data is necessary for supervised learning and often must be done manually. As such, it often reflects perceptions of humans tasked with labeling. | • Assess controls for the labelling process taking into account potential inconsistencies as a result of cultural backgrounds and other factors before using such data for training (see Trustworthy AI Principles and Missing AI Ethics Governance Risks) (For example, consider leveraging psychological frameworks of moral foundation (e.g. harm/care, fairness/reciprocity, loyalty, authority and purity) [Clifford et al., 2015] as bases for developing a generalizable representation of ethical dilemmas for machine learning-based approaches) • Sample test the integrity of training set labels • Consider using emerging new techniques such as reinforcement learning and in-stream supervision, in which data can be labeled in the course of natural usage, for classes with limited number of training sample |
Data Processing Risks | Data engineering creates the data framework for modeling and includes the integration, aggregation, and transformation of collected data. Feature engineering, a key component of data transformation, involves feature standardization and feature generation, the process of creating attributes to use in a predictive model. This breadth of activities and their complexity opens data engineering to many process-oriented risks. | • Define and include in model risk management (MRM) data processing standards and requirements embedded throughout the entire model lifecycle • Extract features in a comprehensive and meaningful way (recognize that this step is a combination of art and science and requires hypothesizing and testing to extract different signals from the data) • Assess the impact of data availability, representativeness, missing data, outliers, unbalanced samples, and the choice of imputation methodology on feature quality • Justify and document all data processing steps to ensure the process can be replicated in production • Consider data augmentation to gain access to additional features (if needed) • Ensure processed data is fit-for-purpose and aligned with real-world applications • Ensure there is a recovery plan to mitigate issues |
Value Imputation Risks | Data wrangling is a preprocessing step to make data available in the right form for analysis and use in AI systems. There is a risk of not being able to scale the data wrangling process for smaller organizations; however, large organizations have the resources to collect and process data correctly before use in AI systems. A lack of fit-for-purpose value imputation and missing data normalization are part of this risk and can create issues downstream in model output and decisions based on model output. | • Define and include in model risk management (MRM) value imputation standards and requirements • Ensure you have enough information to use imputed observations • Select method(s) most appropriate for the model based on the principle of replacement |
Missing Data Normalization | Missing data normalization can lead to data redundancies and a decrease in data integrity. A drop in data integrity can possibly undermine the quality of the entire model. | • Define and include in model risk management (MRM) data normalization standards and requirements • Select method(s) promoting high normalization with cohesion and loose coupling between classes resulting in similar solutions (e.g. closer conceptually to object-oriented schemas and object-oriented goals) |
Missing Regulatory Compliance Checks | Missing/incomplete regulatory checks will cause large fines and lawsuits to be assessed against organizations. Regulatory compliance implies the need for data protection. The scope of regulatory compliance covers more than just protecting data, and the scope of data protection extends beyond regulatory compliance. | • Define and include in model risk management (MRM) a robust set of regulatory compliance checks embedded throughout the entire model lifecycle • Establish enhanced controls around data access, ownership, collection, storage, transmission, and rights of data subjects to satisfy regulatory requirements (for example, if a piece of data needs to go through due diligence and compliance channels, it must be tagged appropriately as such, and if it is not, its usage should be restricted. This is easier for batch processes where there is opportunity to establish controls. However, for real-time or streaming data, there must be a capability to tag the data as gold/silver/bronze depending on whether it has gone through the necessary controls before being deemed usable by models) • Analyze results and execute actions as defined in MRM • Enhance governance techniques and tools to help preserve regulatory compliance |
Missing Data Ethics Checks | The principles underpinning professional ethics include honesty, integrity, transparency, account ability, confidentiality, objectivity and acting lawfully. There are many ethical challenges when using data for AI systems. Some of them include the boundary between public and private good, privacy and confidentiality, transparency, equity of access and informed use of information. | • Define and include in AI Ethics Governance data ethics checks (see Missing AI Ethics Governance Risks) • Establish enhanced controls around data access, ownership, collection, storage, transmission and rights of data subjects to satisfy ethical requirements, especially for areas where there is a need to pay particular attention above and beyond best practices (for example, develop a custom checklist for data ethics which is relevant and robust. It should be designed to provoke conversations around the key issues where the development team members have particular responsibility and perspective. The checklist should help ensure that even with pressing deadlines, discussions happen to avoid any items that strictly fall into the realm of statistical best practices) • Ensure the created data frame is inspected by an AI ethics council to make a conscious and transparent decision regarding the use of the data for the use case at hand • Analyze findings and execute actions to address identified ethical concerns (as applicable) |
Lack of Standardized Data Workflows | Lack of standardized workflows in end-to-end data management can pose risks. The data preparation phase of the model lifecycle is critical to creating an effective model with useful outputs. Without standardized workflows, there is a risk that information can be lost along the way, and the overall quality of the model may suffer. | • Define and include in model risk management (MRM) data processing standards and requirements • Embed supervisory expectations throughout the model lifecycle to better anticipate risks • Ensure standardized handoffs to eliminate potential disconnect between the results, the data engineer, and the data scientist • Maintain a holistic perspective of all the elements of the model lifecycle — from data sourcing and preprocessing, model design and construction, to implementation, performance assessment, and ongoing monitoring — with controls embedded throughout. Design controls to foster integrity, traceability, and reproducibility of results |
Hand-off Asymmetry Risks | Asynchrony and asymmetry around data workflow hand-offs can be caused when some members of the development team have more information than others on the team. This information asymmetry can result in moral hazards and adverse selection. | • Create open communication around capabilities surrounding collecting, processing, and organizing data to mitigate data quality risk exposure • Understand the capabilities that each team member possesses and any possible misalignment of communication in hand-offs through the workflow of the model lifecycle • Ensure hand-offs are as robust and complete as possible, without any missed communications around the data being passed from one group to another to mitigate asymmetry risks and ensure a successful project/product |
Data Privacy Risks | As artificial intelligence evolves, there is an ever-increasing possibility of the analysis of personal information intruding on personal privacy. For example, working with a range of data sources may result in inadvertently using or revealing sensitive information hidden among anonymized data. | • Refer to the DOE Privacy Order 206.1 to ensure understanding of Personally Identifiable Information (PII). DOE defines PII as any information collected or maintained by the Department about an individual, including but not limited to: education, financial transactions, medical history, criminal or employment history, and information that can be used to distinguish or trace an individual’s identity, such as his/her name, Social Security number, date and place of birth, mother’s maiden name, biometric data, and including any other personal information that is linked or linkable to a specific individual. Additionally, PII can be quite context specific, especially when it involves a combination of elements that might not alone identify a person. • Define and include in model risk management (MRM) a robust set of privacy checks that protect privacy interests in the context of AI. Data protection and privacy controls should be linked to enterprise access and authentication platforms to govern access. Data must be separately encrypted if needed, and also anonymized to prevent it from being identified to specific individuals or entities. (Note: this requires maturing the paradigm of privacy regulation. For example, masking PII data before using it in models is a foundational but insufficient practice as AI magnifies the ability to use personal information in ways that can intrude on privacy interests by raising analysis of personal information to new levels of power and speed) • Enhance governance techniques and tools to help preserve the autonomy and rights of individuals to control their personal data (for example, the model can produce data that could be considered private, and this new data must be tagged and assigned data protection and privacy controls) • Consider using synthetic data (which is fully anonymous and thus exempt from General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), and other data protection regulations) (For example, by using generative adversarial networks (GANs)) • Conduct privacy impact assessments and deploy privacy management tools to evaluate and mitigate privacy risks in advance as required by GDPR and CCPA, as applicable for novel technology and high-risk uses of data • Consider Microsoft’s SmartNoise available in both open source as well as integrated within Azure Machine Learning. SmartNoise is a toolkit jointly developed by Microsoft and Harvard, as part of the Open Differential Privacy initiative. With the new release of SmartNoise differential privacy can be used to not only protect individual’s data but also the full dataset using the new synthetic data capability |
Model Assumptions and Limitations Risks | Incorrect/incomplete/inconsistent model assumptions and unrecognized limitations pose a risk for the model to not fit the situation at hand and can lead to degradation of quality. Combining several independent models into a nested model may amplify this risk. | • Precisely formulate the assumptions underlying the algorithms used in AI models • Check and mitigate consistency of the assumptions across sequentially used algorithms • Test difficult to verify or unknown hypotheses required by a specific technique by testing other techniques that operate under different assumptions to ensure comprehensive analysis (for example, some data may lack numerical bounds, or even if such bounds are available, it might be difficult to carry them through the complex model calculations to establish uncertainty bounds on the final result. If the technique can support Monte Carlo analysis, then it can highlight the sensitivity of the model results to changes in any particular input or set of inputs) • Check and mitigate the underlying assumptions of independent models combined into nested models for consistency (e.g. correlation of outputs) |
Model Selection Risks | Model selection is the process of deciding which techniques are best suited to solve the problem at hand. This process involves experimentation that depends on data. There is a risk that the best model will not be selected as the final model to use due to competing concerns such as complexity, maintainability, and available resources. | • Ensure model selection is informed by the situation at hand and tailored to the specific analytic context, including but not limited to: generation of a strong set of candidate models, consideration of important parameters, and obtaining sufficient data limits on all potential analytical strategies. In cases where the strong set of candidate models cannot be developed, model selection may be less effective and less informative than approaches based on hypothesis testing. Consequently, model selection requires more information about the system of interest, because inferences are contingent on this set of candidate models. The process of developing candidate models is a strength of model selection when done well and a weakness when done poorly. • Review categorization of the model by a diverse team of technical and business resources, as well as representatives of the target audience, that understand the models and their impact and usage • Assess and adjust the approach, as appropriate, to ensure it is fit-for-purpose, explainable, reproducible, and robust • Document decision and capture supporting evidence, including the techniques used for model selection |
Model Training Algorithm Risks | Not fit-for-purpose model training algorithms can lead to degraded decision making because the model output does not relate to the decision in question. With open-source code and commercial systems using that open-source code, there's the possibility that the code includes vulnerabilities (including malicious code) or vulnerable dependencies. | • Ensure the goal as defined by the algorithm is clearly aligned to the real-world business problem statement • Verify that the business goal accounts for all relevant considerations to avoid unintended consequences such as a lack of fairness • Verify data availability, quality, and representativeness to ensure that the output to the business decision is not overstated. If needed, consider expanding the training data set with more information to counterweight potentially problematic data • Verify that the model's design/construction is not compromised • Ensure compliance with specific requirements for use of public open-source libraries • Consider designing and building a standard development and testing environment that will enhance standardization and techniques for easing model remediation • Note: While the risks of AI models are qualitatively similar to those of traditional models, the reliance on high-dimensional data, dynamic retraining, the opacity of the transformation logic and feature engineering can lead to unexpected results and make risks more difficult to identify and assess • Consider Microsoft’s Fairlearn available in both open source as well as integrated within Azure Machine Learning. Fairlearn is a toolkit for assessing and improving fairness in AI. It empowers data scientists and developers to assess and improve the fairness of their AI systems |
Non-optimal Hyperparameter Specifications | Hyperparameters are values configured by developers to control the learning process and can significantly impact an AI model's performance. Selection of hyperparameters is more art than science and needs to be carefully tailored to the situation at hand. | • Define and include in model risk management (MRM) hyperparameter specifications standards and requirements • Assess hyperparameter calibration and evaluate how different parameter settings impact the model’s results and the computational feasibility in production • Document selections made (e.g. accepting specifications of the employed software package, selecting alternative specific values, or using a tuning strategy to choose them appropriately for the specific dataset at hand) • Ensure hyperparameters are continuously assessed as the model lifecycle progresses to ensure they account for the complexity and materiality of the model |
Overfitting and Underfitting | Overfitting or underfitting a model will greatly limit the algorithm's ability to make accurate predictions. Overfitting occurs when a model corresponds too closely to the development data set. In this case, trying to apply the results to data collected in the future would result in problematic or erroneous outcome. Underfitting happens when a machine learning model is not complex enough to accurately capture relationships between a dataset’s features and a target variable. An underfitted model also results in problematic or erroneous outcomes on new data, or data that it wasn’t trained on, and often performs poorly even on training data. | • Define and include in model risk management (MRM) procedures and standards for preventing overfitting and underfitting • Launch programs to mature development teams' awareness of the potential for overfitting and underfitting as well as the downstream impacts (Note: it is necessary to distinguish between data issues that are endemic to all AI, like the incidence of false positives and negatives AND overfitting and underfitting to patterns) • Conduct continuous assessments of overfitting and underfitting as it relates to the outputs and the decisions/revenue tied to the outputs |
Missing Business Inputs | Missing business inputs may impact model interpretability and limit realization of business value from the model. An understanding of what is more valuable for the business: maximizing performance metrics, or understanding input-output relationships, is key to AI success. | • Ensure there is open collaboration across the development team and quality feedback mechanisms in place to collect and appropriately utilize all relevant business inputs |
Encoding Features and Transformation Risks | Feature selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to reduce the computational cost of modeling and in some cases to improve the performance of the model. Issues around encoding features/variable transformation can lead to difficulty in interpretability, longer than expected training times, and issues related to the curse of dimensionality. | • Select features in a precise and robust way to optimize risk mitigation (see Dimensionality Reduction Risks) • Evaluate fairness of feature selection and transformation logic • Justify and document feature selection to ensure it fits the situation at hand (NOTE: Statistical-based feature selection methods involve evaluating the relationship between each input variable and the target variable using statistics and selecting those input variables that have the strongest relationship with the target variable. These methods can be fast and effective, although the choice of statistical measures depends on the data type of both the input and output variables) • Consider centralization of feature repositories to source, host, manage, and govern data across AI models to facilitate standardized remediation techniques |
Dimensionality Reduction Risks | Issues around dimensionality reduction (feature selection, feature extraction) can lead to difficulty in interpretability, longer than expected training times, and issues related to the curse of dimensionality. | • Select the dimensionality reduction methods that best fit the situation at hand (NOTE: in addition to traditional techniques, consider linear discriminant analysis (LDA), neural autoencoders, and t-distributed stochastic neighbor embedding (t-SNE)) • Ensure appropriate level of reducing the dimensionality of the feature space by obtaining a set of principal features to prevent overfitting and underfitting (see Overfitting and Underfitting Risks), better interpretations, and less computational cost with simplification of models • Justify and document selected dimensionality reduction approach |
Missing Reasonability and Accuracy Checks | Missing checks for reasonability and accuracy of feature selection can cause issues within an ML model. Without these checks, interpretability may suffer, training times may be longer than needed or expected, the 'curse of dimensionality' may become difficult to mitigate, and/or overfitting may become an issue. | • Define and include in model risk management (MRM) model reasonability and accuracy checks • Embed reasonability and accuracy checks within the model lifecycle • Ensure involvement of business and technical stakeholders in mitigation of discovered issues (see Trustworthy AI Principles) • Enhance governance techniques and tools to help preserve modeling integrity |
Missing Cross-Validation Checks | Missing/incomplete cross-validation checks can lead to testing the model using training data instead of a proper testing data set completely distinct from the data set used in training the model, thus inhbiting the ability to flag problems like overfitting or selection bias and the possibility of divining valuable insights on how the model will generalize to an independent data set. | • Define and include in model risk management (MRM) cross-validation checks to mitigate overfitting • Embed cross-validation checks within the model lifecycle to assess the sensitivity to key features, model training error, and generalization errors (e.g. precision and recall rate, AUC) • Enhance governance techniques and tools to help preserve modeling integrity and more accurate estimate of out-of-sample accuracy |
Missing Model Replication | Reproducibility refers to the ability of an independent verification team to produce the same results using the same AI method based on the documentation made by the organization. Lack of reproducibility can influence the trustworthiness of the AI product and the organization deploying the AI model. | • Define and include in model risk management (MRM) model replication standards and requirements • Embed model replication within the model lifecycle • Ensure involvement of business and technical stakeholders in mitigation of discovered issues (see Trustworthy AI Principles) • Enhance governance techniques and tools to help preserve modeling integrity |
Missing/Incomplete Issue Resolution | Lack of standardized processes for AI modeling, training, and associated issue resolution will degrade the quality of the model. | • Maintain Issue Log with assigned ownership and responsibility for resolution • Ensure involvement of business and technical stakeholders in mitigation of discovered issues (see Trustworthy AI Principles) • Collect feedback from the issue resolution and capture "maintenance requirements" to help improve the model and the overall model lifecycle over time |
Missing Evaluation Metrics | Not fit-for-purpose model assessments may allow model quality to become degraded because there are no checks in place to ensure quality of output and detect potential issues requiring remediation. | • Define and include in model risk management (MRM) evaluation metrics standards and requirements • Embed model assessment checks within the model lifecycle • Determine if the model will do a good job of predicting the target on new and future data (for example, conduct “stress tests” to examine the model's sensitivity under a variety of cases and evaluate the results of internal reverse-engineering to check the accuracy metric as a proxy for predictive accuracy on future data) |
Missing Algorithm Ethics Checks | Missing/incomplete data ethics checks can degrade the integrity of the model, rendering it unusable for making predictions due to ethics violations. If algorithm ethics checks are not in place, many issues can be introduced into the system. Data is subject to many types of biases. Constraints around data, even if they serve a purpose, can cause decisions to be made based on irrelevant information. Algorithmic principles, such as in neural network computing, can be extremely difficult to resolve due to the fact that they are unlike traditional code. Accidental algorithmic bias can be made quite easily. | • Define and include in AI Ethics Governance algorithm ethics checks (see Missing AI Ethics Governance Risks) • Establish enhanced controls around algorithm development to satisfy ethical AI requirements (see Ethically Aligned AI Design (EAAID)) by understanding decisions and trade-offs made by developers and how they affected outcomes (for example, AI developers should utilize algorithms that help detect and mitigate hidden biases within training data or learned from the model regardless of data quality) • Consider combining game theory and machine learning into one framework in which game theoretic analysis of ethics is used as a feature to train machine learning approaches, while machine learning helps game theory identify ethical aspects which are overlooked. This approach incorporates ethics requirements into AI solutions and reconciles ethics requirements with the agents’ endogenous subjective preferences in order to make ethically aligned decisions (for example, in [Loreggia et al., 2018], the authors proposed an approach to leverage the CP-net formalism to represent the exogenous ethics priorities and endogenous subjective preferences. The authors further established a notion of distance between CPnets so as to enable AI agents to make decisions using their subjective preferences if they are close enough to the ethical principles. This approach helps AI agents balance between fulfilling their preferences and following ethical requirements) • Ensure the developed algorithm is inspected by AI ethics council to make a conscious and transparent decision regarding the use of the model for the use case at hand (for example, having an ethical data frame is not a sufficient practice, as AI magnifies the ability to use data in ways that can intrude on ethical interests by raising analysis of data to new levels of power and speed) • Execute actions to address identified ethical concerns (as applicable) |
Downstream Impact | There are many challenges to the safety and security of AI systems. For example, it is hard to predict all possible AI system behaviors and downstream effects ahead of time, especially when applied to problems that are difficult for humans to solve. It is also hard to build systems that provide both the necessary restrictions for security as well as the necessary flexibility to generate creative solutions or adapt to unusual inputs. | • Verify that the developed model fully complies with defined requirements and fits the situation at hand (e.g. appropriateness of the objective function, sufficient constraining of the exploration space, the model's training reflects the current real world, mitigation of data risks, etc.) • Adversely test the model to discover and mitigate potential downstream impact issues • Evaluate consequences of deploying the developed model through the lens of the trustworthy and ethical AI principles (see Trustworthy AI Principles and Ethically Aligned AI Design (EAAID)) by analyzing developers' ethical choices within the technology |
Biased Outcomes | Recently society is starting to consider the harmful impacts of human biases in AI systems. Bias is inadvertantly incorporated into algorithms in several ways. AI systems learn to make decisions based on training data, which can include biased human decisions or reflect historical or social inequities, even when sensitive variables such as gender, race, etc. are removed. | • Define and include in model risk management (MRM) biased outcomes checks • Embed biased outcomes checks within the model lifecycle to ensure fairness of model predictions. Lack of fairness can arise from a range of sources, including: data that reflects institutional or societal bias (such as gender or race), sample selection bia,s or how the objective function was defined. In consumer applications, the model result can lead to disparate treatment if there is implicit or explicit reference to group membership as a factor in the model or disparate impact if the outcome of the model on members of different groups varies. • Ensure involvement of business and technical stakeholders in mitigation of discovered issues (see Trustworthy AI Principles) (Note: As a best practice, keep in mind that if humans are involved in decisions, bias always exists — and the smaller and less diverse the group, the greater the chance that the bias is not overridden by others) • Enhance governance techniques and tools to help preserve modeling integrity (for example, consider using comprehensive technical ways of defining fairness, such as requiring that models have equal predictive value across groups or requiring that models have equal false positive and false negative rates across groups) • Consider Microsoft’s Fairlearn available in both open source as well as integrated within Azure Machine Learning. Fairlearn is a toolkit for assessing and improving fairness in AI. It empowers data scientists and developers to assess and improve the fairness of their AI systems |
Black Box AI | "Black Box AI" refers to machine learning algorithms that are opaque or inaccessible to human understanding about how they’ve arrived at the conclusions they produce. The risk of black box decisions is particularly significant in heavily regulated fields such as financial and health care industries. A key problem with black box algorithms, in regards to fairness, is the difficulty in understanding if a decision was justified or if it should be challenged. The Alan Turing Institute defines black boxes as “any AI system whose inner workings and rationale are opaque or inaccessible to human understanding.” | • Define and include in model risk management (MRM) black box checks to ensure compliance with company policies, industry standards, and government regulations • Embed black box testing within the model lifecycle to (1) explain models to executives and stakeholders and help them understand the model's value and accuracy, and (2) to debug the models and make informed decisions about how to improve them • Ensure involvement of business and technical stakeholders in mitigation of discovered issues (see Trustworthy AI Principles) • Enhance governance techniques and tools to make models more interpretable and explainable. For example, the Alan Turing Institute advises that the intelligibility and interpretability of an AI model should be prioritized from the outset and that end-to-end transparency and accountability should be optimized above other parameters. In other words, whenever possible, a company or a public institution that aims to automate decisions that will affect humans should use a model that can be interpreted – not a technical black box • Ensure owners of AI systems that qualify as black boxes provide supplementary explanations to shed light on the logic behind the results and behavior of their systems. Frameworks for generating such descriptions include Model Cards for Model Reporting or Datasheets for Datasets which are often used by companies for internal purposes • Consider Microsoft’s InterpretML available in both open source as well as integrated within Azure Machine Learning. InterpretML is a Python toolkit for explaining black-box AI systems and training intelligible models. It offers prediction accuracy, model interpretability, and aims at serving as a unified API. Its Github receives active updates • Operationalize tools required by legal auditors (as applicable) to validate models with respect to regulatory compliance and monitor how models' decisions are impacting humans • Consider Microsoft’s InterpretML available in both open source as well as integrated within Azure Machine Learning. InterpretML is a Python toolkit for explaining black-box AI systems and training intelligible models. It offers prediction accuracy, model interpretability, and aims at serving as a unified API. Its Github receives active updates • Consider Microsoft’s Error Analysis available in both open source as well as integrated within Azure Machine Learning. Error Analysis, toolkit uses machine learning to partition model errors along meaningful dimensions to help better understand the patterns in the errors. It enables to quickly identify subgroups with higher inaccuracy and visually diagnose the root causes behind these errors |
Lack of Model Performance Assessment | Not fit-for-purpose choice of the metric(s) for model performance assessment could impact the quality of validation. Model validation is the set of processes and activities intended to verify that models are performing as expected, in line with their design objectives and intended business uses. Effective validation helps to ensure that models are sound, to identify potential limitations and assumptions, and to assess their possible impact. | • Define and include in model risk management (MRM) model performance assessment standards and requirements as part of validation • Embed model performance checks within the model lifecycle • Select the right validation approach by striking the right balance between human and automated validation (Note: given the lack of a baseline for comparison, as well as more complex AI model development compared to the traditional modeling process, there is a greater emphasis on out-of-sample performance that can be evaluated using cross-validation in conjunction with ensemble learning techniques and stability metrics) • Use the model risk assessment to determine the degree to which validators can rely on the testing performed by developers • Confirm that developers relied on data that is traceable, reliable, and from approved sources • Confirm that the sourcing and any pre-processing of the data were conducted in accordance with approved information security and privacy policies • Confirm that the data used in training and testing the model is of the same type, availability, and quality that will be used in production • Review feature engineering for issues causing missing observations, artificial overlap between the target variable and features (i.e. leakage), overfitting or underfitting in calibration • Review feature selection through the lens of the business intuition and statistical analysis employed to reduce dimensionality and to support the selection • Identify features that are redundant or weakly correlated with the target variable by using statistical analysis such as the Kolmogorov–Smirnov statistic, information value and clustering analysis, and dimensionality reduction (e.g. principal components) • Assess hyperparameter calibration and evaluate how different parameter settings impact the model’s results and the computational feasibility in production • Evaluate stress testing, sensitivity of convergence, and performance when changes are made to how hyperparameters are set under different environments • Test the robustness of different AI techniques with respect to missing data, alternative normalization techniques, and anomalous or noisy data • Evaluate stakeholder impact, including bias and fairness, consistent with the use case and depending upon the model’s inherent risk and complexity. Where necessary, coordinate evaluations with the other control functions (such as compliance) • Consider whether a formal nondiscrimination criterion is necessary in the objective function and associated transformation logic. The appropriate criterion to select will depend upon how fairness is interpreted in the context of the business decision • Determine whether the number of metrics and the results support a conclusion of whether the model is appropriate for its intended purpose • Assess the computational feasibility of the model in the production environment • Assess the quality, depth and breadth of the model documentation.This should include all the information about the model (including model type and architecture, the data used to train the models, the results of “stress tests,” examples of local explanations for chosen inputs whenever possible, etc.) that would allow an independent reviewer to replicate the model without having access to the model code • Ensure involvement of business and technical stakeholders in discovery and mitigation of issues (see Trustworthy AI Principles) |
Model Instability | Model instability (data drift, concept drift, etc.) can cause eventual degradation in model accuracy as time passes. As model accuracy degrades, decisions tied to the outputs could suffer, or there could be loss of revenue tied to the outputs. | • Define and include in model risk management (MRM) model stability check standards and requirements for data drift and concept drift • Embed model stability checks within the model lifecycle • Check for changes in the distribution of data and the relationship between input and output variables (for example, instability of model parameters caused by multicollinearity requires the mitigation of discovered issues) |
Lack of Sensitivity and Scenario Analysis | Missing/inadequate sensitivity/scenario analysis can lead to a lack of model robustness, decreased understanding of the relationships between input and output variables in a system or model, errors in the model, redundancy in parts of model structure, wasting time calibrating parameters that are not sensitive to the model output, overlooking important connections between observations, model inputs, and predictions or forecasts, leading to the development of sub-optimal models. | • Define and include in model risk management (MRM) model sensitivity and scenario testing standards and requirements to develop better working model and improve decision-making tied to the model's outputs • Embed model sensitivity and scenario analysis checks within the model lifecycle • Conduct sensitivity analysis and use developed understanding of the boundary conditions to increase model robustness, decrease errors in the model, increase understanding of the relationship between model inputs and outputs, decrease redundancy in parts of the model structure and overall improve the model, as applicable |
Model Transparency Risks | Lack of transparency limits explainability of the model, makes it difficult to verify that the model has been tested and makes business sense, and prevents a complete understanding of why the model made particular decisions. | • Define and include in model risk management (MRM) model transparency standards and requirements • Embed better communication and robust feedback mechanisms within the development team to promote transparency in model development • Document the process and capture supporting evidence, including the techniques used for model validation • Consider Microsoft’s InterpretML available in both open source as well as integrated within Azure Machine Learning. InterpretML is a Python toolkit for explaining black-box AI systems and training intelligible models. It offers prediction accuracy, model interpretability, and aims at serving as a unified API. Its Github receives active updates • Consider Microsoft’s Error Analysis available in both open source as well as integrated within Azure Machine Learning. Error Analysis, toolkit uses machine learning to partition model errors along meaningful dimensions to help better understand the patterns in the errors. It enables to quickly identify subgroups with higher inaccuracy and visually diagnose the root causes behind these errors |
Lack of Understanding of Feature Importance | Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting outcomes. Feature importance scores play an important role in modeling as they provide insight into the data, insight into the model, and the basis for dimensionality reduction and feature selection that can improve the efficiency and effectiveness of the model. | • Define and include in model risk management (MRM) feature importance standards and requirements to promote model explainability • Embed feature importance checks within the model lifecycle • Establish the dynamic calculation and maintainance of feature importance scores throughout the model development lifecycle. (Note: There are many types and sources of feature importance scores, although popular examples include: statistical correlation scores, coefficients calculated as part of linear models, decision trees, and permutation importance scores) • Identify and mitigate any issues surrounding the understanding of feature importance (Note: These issues cannot be addressed using approaches from traditional modeling; in AI one must iterate with better data, better hyperparameters, better algorithms, and better computational power. Explainability requires (1) identifying algorithmic decisions, (2) deconstructing specific decisions, and (3) establishing a channel by which an individual can seek an explanation) • Consider using a useful framework for assessing a model’s explainability based on five key questions published by the Bank of England in August 2019 - Which features mattered in individual predictions? - What drives the actual projections more generally? - What are the differences between an ML model and a linear one? - How does the ML model work? - How will the model perform under the new states of the world (that aren’t captured in the training data)? • Consider using reverse-engineering algorithms, although it might be difficult and even impossible to do as the complexity of AI models continues to increase |
Model Fairness Risks | AI model fairness is a critical issue that is an ethical and regulatory requirement. Fairness encompasses two distinct notions: disparate treatment and disparate impact. It's not always clear what the absence of bias should look like. | • Define and include in model risk management (MRM) model fairness standards and requirements (see Missing AI Ethics Governance Risks) • Embed fairness checks within the model lifecycle • Involve business and technical stakeholders in the mitigation of discovered issues • Enhancge governance techniques and tools to promote fairness (Note: Fairness is a moral virtue that relates to the justifiability of decisions. Whether or not an algorithmic decision is fair depends on the context of the decision and the ways in which the system is optimized by its developers) • Recognize that fixing discrimination in algorithmic systems is an ongoing process that needs to be continually improved (Note: The latest research points to new ways to adjudicate injustice via narrative thinking, or alternatively, by comparison across cases. One notable controversy in this discussion is whether such systems should satisfy classification parity, which requires parity in accuracy across groups defined by protected attributes, or calibration, which requires similar predictions to have similar meanings across groups defined by protected attributes. Central to this discussion are impossibility results, which show that classification parity and calibration are often incompatible and unsatisfactory measures of fairness, pointing to the existing gap that currently cannot be addressed) |
Model Regulatory Compliance Risks | There is little visibility, especially in complex solutions, into how AI systems come to their conclusions in solving problems or addressing a need, leaving practitioners in a variety of industries flying blind into significant business risks. | • Define and include in model risk management (MRM) a robust set of regulatory compliance checks embedded throughout the entire model lifecycle (see Missing Regulatory Compliance Risks) • Establish enhanced controls around model validation to satisfy regulatory requirements. One key requirement is to have a thorough grasp of how machines make their decisions. That means understanding that legislatures are likely to reject models whose decisions cannot properly be understood even by their designers • Ensure that the model is extensively tested before deployment |
Model Ethical Compliance Risks | While AI on the surface may seem cold, calculating, objective, and unbiased, the truth is that with any human designed mechanism, bias and other ethical implications can be just as much of a risk. | • Define and include in AI Ethics Governance algorithm ethics checks (see Missing AI Ethics Governance Risks) • Establish enhanced controls around algorithm validation to satisfy ethical AI requirements (see Ethically Aligned AI Design (EAAID)) by understanding decisions and trade-offs made by dehvelopers and how they affected outcomes (For example, societal biases encoded in data may be compounded by machine learning models in the context of automated recruiting. One novel methodology leverages biases in word embeddings to mitigate this compounding effect without assuming access to protected attributes) • Ensure that the developed model is inspected by an AI ethics council to make a conscious, transparent decision regarding the use of the model • Ensure that models are extensively tested for ethical compliance before deployment |
Model Performance Evaluation Risks | Inadequate performance evaluation and benchmarking analysis of the model can pose a risk. Algorithm evaluation is the process of assessing a property or properties of an algorithm. In some cases, this assessment is relative, meaning that the evaluation of several algorithms is best suited to a specific application. | • Define and include in model risk management (MRM) model performance evaluation standards and requirements as part of validation • Embed model performance checks within the model lifecycle • Involve business and technical stakeholders in discovery and mitigation of issues (see Trustworthy AI Principles) • Standardize validation tools and techniques to promote consistent procedures and uniform execution of validation assessment • Define multiple ex ante performance measures that are aligned with the goal embedded in the transformation logic and the business purpose. Multiple measures should be employed to evaluate the accuracy of the model’s representation • Conduct an end-to-end evaluation of the model's performance against agreed-upon standards to ensure required level of ethics, performance, transparency, etc. of the integrated system in which the model will be embedded (Note: academia, industry, and government already use a number of standards to help measure performance and gauge technological progress) • Monitor metrics for changes in feature importance while retraining (Note: changes per approved retraining approach are not considered model changes) |
Model Use and Impact Risks | AI systems raise questions concerning the criteria used in automated decision-making. As with any emerging technology, it is important to discourage malicious treatment designed to trick software or use it for undesirable ends. | • Define and include in model risk management (MRM) evaluation of model use and impact risks • Verify understanding of the decisions made during the model design/deployment/validation and the implications of the criteria that the algorithm is optimizing for to ensure algorithms embed ethical considerations and value choices into program decisions (see Trustworthy AI Principles) • Conduct an impact assessment to assess the risks involved in its development and use against the pre-set threshold • Confirm modeling outcomes achieve desired level of precision and consistency, and are aligned with trustworthy AI criteria. • Verify the model outcomes are relevant and informative in understanding whether the desired business outcome is achieved |
Model Documentation | Missing/not complete documentation and follow up on of findings can pose risk for model performance monitoring and remediation of discovered issues. | • Define and include in model risk management (MRM) model documentation standards and requirements • Ensure end-to-end complete documentation of the model development (model’s objective, selection process, design, initial parameterization, retraining, data sources and testing approach), conceptual soundness (regulatory compliance, explainability of model results, sensitivity analysis, benchmarking, outcomes analysis), operationalization (production code) and planned usage to support successful integration of the AI component into the final engineered system • Ensure documentation addresses the needs of varied stakeholders to build institutionalized memory (For example, targeting technical documentation with description of model parameters to developers and technical resources; providing understanding of how to utilize the outputs from the model to business people to help them unpack the black box nature of AI algorithms) |
Model Standardized Workflows | Lack of standardized workflows in the end-to-end modeling could pose a risk for the model development lifecycle. If a team of data scientists is working on the model, a lack of quality communication in the workflow may degrade the quality of the model over time. | • Define and include in model risk management (MRM) modeling/training/validation standards and requirements • Institute and execute feedback mechanisms with robust communication protocols by the development team • Embed supervisory expectations throughout the model lifecycle to better anticipate risks • Ensure standardized handoffs to eliminate potential disconnect between the results, the data engineer and the data scientist • Maintain a holistic perspective of all the elements of the model lifecycle — from data sourcing and pre-processing, model design and construction to implementation, performance assessment and ongoing monitoring — with controls embedded throughout. Design controls to foster integrity, traceability and reproducibility of results |
Model Hand-off Asymmetry | Asynchrony and asymmetry around workflow hand-offs can cause information loss in the knowledge transfer which will impede the development of well constructed algorithms amongst teams of developers. | • Create open communication around capabilities surrounding modeling/training/validation to model quality risk exposure • Understand the capabilities that each team member possesses and any possible misalignment of communication in hand-offs through the workflow of the model lifecycle • Ensure hand-offs are as robust and complete as possible without any missed communications around the data and models being passed from one group to another to mitigate asymmetry risks and ensure a successful project/product |
Model Operationalization Risks | Issues can arise around model operationalization when the tools, data, and practices of the data science organization are not the same tools, data, and practices of the operationalization organization. These organizations often work with their own proprietary tools, such as data notebooks and data science-oriented tools on the data science side during the training phase, and runtime environments, big data infrastructure, and IDE-based development ecosystems on the operations side. There’s no easy bridging of these technologies, often causing a struggle to make the transition from the training phase in the laboratory to the inference phase in the real world. | • Create model management system (MMS) with model registry; release, activation, servicing, scaling in production, and consumption • Define and include in model risk management (MRM) model operationalization standards and requirements • Embed model operationalization checks within the model lifecycle • Develop secure and sustainable infrastructure, as well as data and operations processes to deliver against SLAs. Conduct a simulation for scaling, training, and testing, in addition to upskilling teams to be knowledgeable about using and integrating AI as planned at the outset of the project (For example, address considerations around the need for the model to operate on data sets that may not be as clean as the training sets and to have the computational power necessary to run the model with satisfactory response time) • Stress test the model under different conditions to understand the model’s stability and robustness in production • Ensure model scaling and production are owned and managed by resources responsible for the problem that AI is solving (e.g. line of business, IT operations, etc.). The data science organization responsible for crafting the model needs to hand it off to the organization responsible for the activities surrounding the model's purpose, with the documentation created from the outset of the model lifecycle • Ensure there is a process/tool in place to enable the model owners to monitor, manage, govern, and analyze the results of the model to make sure that it’s meeting their needs |
Solution Environment Risks | Issues and risks can arise regarding software, digital and physical infrastructure such as whether the infrastructure components can support model execution. Some models require massive amounts of computing power that the infrastructure may not be able to handle. | • Define and include in model risk management (MRM) environmental standards and requirements to operate the model and optimize its performance • Develop secure, sustainable infrastructure, data, and operations processes to deliver against SLAs • Conduct robust performance testing to ensure that the infrastructure supporting the AI model lifecycle can handle the intensity of computing power required to operate the model with respect to data capacity, retraining, and calibration • Verify that the optimization algorithm that typically underlies the transformation logic is converging properly and generating sensible results. Confirm that the model performs over a range of “call conditions” • Confirm that minimum standards for deployment have been met • Consider instituting a layered approach to AI model governance to allow modularity, which combines different mechanisms to address the issues and makes it a shared responsibility among all relevant stakeholders |
Integration Risks | There are risks regarding integration with upstream and downstream systems. Some departments use AI components that are tightly embedded within their processes, unbeknownst to risk management organizations. | • Define and include in model risk management (MRM) integration standards and requirements to ensure the end-to-end system's operational sustainability • Ensure that the AI model has interoperability among different platforms, frameworks, and approaches • Conduct a robust integration testing process that meets the agreed upon and documented compliance requirements and specified functional requirements • Verify that the model has been configured and integrated properly into the production environment (errors can arise when firms employ legacy systems, upgrade from one model version to another, or migrate the model from one programming environment into another) |
Scalability Risks | Scaling AI solutions to deal with real data, business users, and customers is fraught with risks and difficulties (including dependence on intensive computer processing and storage requirements, higher data volumes and veracity, increased data security and governance, and the need for change management) that needs to be accounted for to ensure model performance. | • Institute a robust performance monitoring process to ensure that the production infrastructure can handle the intensity of computing power needed to run the model • Ensure that the AI system is scalable and deployable with the right technology infrastructure • Automate the model updating and approval process when any of the model’s metrics fall outside pre-set goals and parameters to allow a newly optimized version of the model to quickly return to production for further monitoring and assessment |
Model Performance Controls and Governance Risks | Lack of standardized monitoring processes can lead to adverse consequences and poor decisions if a model has errors in its design or construction, performs poorly, or is used inappropriately. | • Develop and implement a robust and complete governance, risk and control process (see Governance, Risk Management and Control (GRC)) for monitoring ongoing AI behavior that includes the ability to perform corrective interventions if issues (such as unjustified bias) are observed • Create a comprehensive ongoing monitoring plan to confirm that the model is operating as intended over time that includes tracking model performance (e.g. drift), stability, and alignment with business purpose • Rely on the performance indicators and thresholds established in development to determine the degree of performance deterioration that would warrant further review or revalidation. Evaluate performance after the model is retrained to insignificant changes in feature importance • Consider designing a system to “track” the reasoning at a level which would satisfy not only regulators and legal thresholds, but also internal policy requirements. This level of transparency can be accomplished through rules of liability and other approaches • Include, if necessary, a more stringent or more frequent ongoing monitoring of model explainability by testing input data to identify outliers or cases different from the data on which the model was trained. This process could also entail the use of benchmark models to compare outputs and variances against predefined thresholds to trigger further investigation, revalidation or use of alternative models • Monitor changes in input data against the training data to confirm data quality and the statistical consistency of the new data with the training data to validate that the data-generating process is the same. Evaluate the need to make required changes in the production environment, if applicable • Include checks to confirm that the processing power for the model remains adequate so that the model can be available and reliably accommodate potential usage increases |
Model Supervision Risks | Lack of comprehensive and ongoing tracking of model performance around robustness, accuracy, and consistency may result in delayed detection of performance issues and unintended risk exposure. | • Define and include in model risk management (MRM) model supervision standards and requirements to ensure the system operates as expected • Supervisors will not expect models to be entirely accurate all of the time. Rather, they expect the model output to be underpinned by a sufficient degree of statistical confidence • Integrate supervisors into the critical role of resolving uncertainty and disagreement in decisions whose errors are associated with high negative utility values • Provide users necessary information to help them understand the sensitivity of model performance to changes in inputs at inception and over time to determine whether performance is consistent with their domain expertise and intuition. The users can also determine whether the model is remaining true to the original business purpose and is achieving the desired business outcome |
Issue Log and Resolution Risks | As new capabilities evolve rapidly, support teams will have to be kept closely in the loop to understand how to resolve issues as they come up. If checks are in place, issues can be flagged automatically. For example, if there is a data breach, data could become corrupted and the system would need to be shut down. | • Define and include in model risk management (MRM) model performance issue tracking and resolution standards and requirements to ensure operational continuity • Maintain an Issue Log with assigned ownership and responsibility for resolution across the three stages of dealing with issues: - Ensure every model has a risk tier associated with it, where 1 represents the least risk and 5 represents the most risk when the system should be shutdown entirely (see Not Understood Risk Rating Requirements) - Assess the severity of the risk by evaluating the change between a well functioning system and the identified risk - Identify prescribed action (based on the model's risk tier and the severity of the risk) and accept, reject, transfer, or mitigate the risk, as appropriate • Collect feedback from the issue resolution and capture "maintenance requirements" to help improve the model and the overall model lifecycle over time |
Fallback Procedures Risks | Issues around fallback procedures may arise if robust protocols are not in place to mitigate monitoring continuity risks. | • Define and include in model risk management (MRM) model monitoring standards and requirements to ensure the system operates as expected • Continuously monitor the model's decay over time and track its performance with three types of metrics: statistical, technical, and from the business perspective • Use real-time circuit breakers to set up performance boundaries for the model and establish that the model is performing as intended • Pre-specify benchmark or legacy models and employ them as fallback options when the model's performance boundaries are breached |
Performance Monitoring Reporting Risks | Issues and risks can arise around reporting, such as scope and cadence. If the right metrics are not captured for each model, then an understanding of performance will not be captured and improved upon. For models where concept drift is an issue, this can eventually lead to model failure. | • Define and include in model risk management (MRM) performance monitoring reporting standards and requirements • Define key risk indicators (KRIs)/KPIs to monitor AI drift, bias, and changes in characteristics of retraining population • Operationalize the right cadence for monitoring reporting as predefined in the ongoing monitoring plan and as linked to the risk assessment, with higher-risk models reviewed more frequently than lower-risk ones • Establish a feedback mechanism and engage users, which can include developers and owners, in evaluating model performance over time • Create checkpoints for user intervention over the model lifecycle and provide users an opportunity to effectively challenge model results (this is especially helpful in detecting model drift over time, which can go undetected otherwise) • Determine model's fit for purpose given the availability of new data or potential changes in the business, economic, or regulatory environment • Conduct data analysis by comparing profiling characteristics of the training data set against the current/retraining data to detect potential population shifts across key factors • Differentiate between passive and active changes in model performance - Passive changes can be caused by frequent retraining even when it is in accordance with a documented and approved retraining approach. These changes can lead to changes in the feature importance of the model, which could be tantamount to a model change - Active changes can be made to model methodology, input types, use, monitoring approach, and more. These changes require evaluation of the change in input data against the training data to confirm data quality and the consistency of the new data with the training data going forward • Identify and activate remediation actions following prescribed MRM procedures |
Malpractice Monitoring Reporting Risks | The distribution of liability in malpractice related to outputs from an AI system is complicated. For example, if a solution for a hospital radiology department is developed 'in-house,' the hospital will likely be liable. However, if purchased from a vendor, the issue becomes much more complicated. | • Define and include in enterprise governance, risk management, and control process (see Governance, Risk Management and Control (GRC)) procedures for malpractice monitoring reporting • Create a council of development resources to assess the degree, role and ownership of liability, including government mandates, as it relates to AI malpractice • Define human agency and professional accountability in line with model risk management (see Missing AI Ethics Governance Risks) |
Design vs. Usage Alignment Risks | Missing oversight for ensuring that the actions informed by models' outputs are in line with their design and purpose can lead to lost opportunities for improving models over time. If concept drift is an issue with the particular model, eventually the model will degrade and revenue will be lost, if tied to the model. | • Define and include in model risk management (MRM) usage standards and requirements to ensure the model is used in line with the model design • Use agreed upon model performance metrics to capture the true nature of the overall health of the model • Ensure the model's design and purpose are well documented and monitor its continued alignment with business strategy and drivers • Collect continual feedback and assessment of adherence to design and purpose throughout the model lifecycle |
Performance Documentation Risks | Missing/incomplete documentation and follow up on of findings can result in lost opportunities to improve model performance monitoring. If concept drift is an issue with the model, this risk could lead to eventual degradation of the model over time. | • Define and include in model risk management (MRM) model performance documentation standards and requirements • Document model performance addressing the needs of varied stakeholders (For example, targeting technical documentation to developers and technical resources and providing usage and impact reporting to business people) • Confirm that the monitoring plan aligns with the risk assessment and considers model performance (e.g. drift), stability, and alignment with business purpose, as well as that the performance indicators selected for the plan are appropriate, given the intended business purpose • Confirm that the performance indicators are monitored at an appropriate frequency based on how often the model is retrained |
Socialization and Feedback Risks | Missing/incomplete socialization and feedback integration can lead to letting unwanted bias and unfairness into the model when necessary feedback is missed. This issue can lead to a need for buy-in and support far downstream in the model lifecycle, which can be difficult or impossible. The team may not be diverse enough or not have the opportunity to ask questions and share feedback early on in the lifecycle. | • Define and include in model risk management (MRM) model socialization and feedback standards and requirements • Ensure resources responsible for the problem that AI is solving have open and transparent communication throughout the model lifecycle and mediums for offering continuous feedback • Use feedback to build support and to drive awareness and adoption |
Remediation Process Risks | Missing/incomplete remediation processes risks can lead to degradation in model solution quality. | • Define and include in model risk management (MRM) model performance issue tracking and resolution standards and requirements to ensure operational continuity • Define and maintain a robust remediation process for risks that are flagged in the AI model lifecycle (see Issue Log and Resolution Risks) (for example, implement agile remediation techniques for governance issues given the detrimental impact of the issues on dynamic AI models) • Consider enhancing existing data remediation processes and associated testing infrastructure for model development and validation to address the high volume of structured and unstructured data that AI/ML models typically ingest |
Standardized Performance Monitoring Workflows | Lack of standardized workflows in end-to-end deployment and monitoring can lead to information loss in monitoring which could potentially allow unwanted bias and unfairness into the model workflow. This risk can arise if all of the systems have not been identified, if an organized list of people are not captured, and if proper supervision is not in place. | • Define and include in model risk management (MRM) model performance monitoring standards and requirements • Identify and document all systems and create a complete list of all resources involved in the performance monitoring workflow • Institute and execute feedback mechanisms with robust communication protocols by the operations team • Embed a robust supervision process throughout the model lifecycle to maintain the workflow of the monitoring process and better anticipate risks • Ensure standardized handoffs to eliminate the potential for disconnect between all the systems along the production run of the model • Maintain a holistic perspective of all the elements of the model lifecycle—from data sourcing and pre-processing, model design and construction to implementation, performance assessment and ongoing monitoring—with controls embedded throughout. Design controls to foster integrity, traceability, and reproducibility of results |
Performance Monitoring Workflow Asymmetry | There are risks of information loss along the development path if workflow handoffs regarding performance monitoring are not robust and complete. This risk can arise if proper documentation of the process is not present and if the documentation is not appropriately treated as a risk control in the organization. Some organizations may be more fit to handle the kind of performance workflow required to maintain and continuously improve industrial AI models. | • Create open communication around capabilities surrounding model performance monitoring to model quality risk exposure • Understand the capabilities that each team member possesses and any possible misalignment of communication in hand-offs through the workflow of the model lifecycle • Ensure hand-offs are as robust and complete as possible, without any missed communications around the data and models being passed from one group to another to mitigate asymmetry risks and ensure a successful project/product |
Adversarial Attacks | AI solutions offer new opportunities for adversarial attacks, such as feeding AI systems bad data leading to incorrect predictions and faulty decisions downstream. Data perturbation by adversaries may not be recognizable by humans, but often can be recognizable by machines. | • Define and include in model risk management (MRM) adversarial attack monitoring standards and requirements to prevent fooling the model through malicious input that can cause a malfunction in the model • Enhance governance methodologies and tools (Note: research on adversarial attacks has shown that AI models that reveal more information about the individual data points used to train those models are more robust to data poisoning by adversarial inputs) • Develop effective testing and auditing techniques as well as meaningful certification programs that provide clear guidance to AI developers and operators on addressing AI models' vulnerabilities • Leverage research on adversarial attacks and model data leakage to test AI models for vulnerabilities and assess their overall robustness and resilience to different forms of attacks (For example, 'block switching' designed to provide a never-before-seen defense strategy against adversarial attacks by programming parts of an AI model's layers with randomly assigned run times so that it “fools” the adversary and prevents them from knowing and exploiting model layer weaknesses) • Institute cyber threat hunting to proactively and iteratively search through networks to detect and isolate advanced threats that evade existing security solutions |
Unauthorized Disclosure of PII Information | There is a privacy risk associated with potential de-anonymization of data which can be traceable to an individual. In other words, there is a privacy harm in a system that over-collects information about individuals, even if none of it is exposed to unauthorized viewers. The harm may be of a lesser magnitude, but it is a harm nonetheless. | • Refer to the DOE Privacy Order 206.1 to ensure understanding of Personally Identifiable Information (PII). DOE defines PII as any information collected or maintained by the Department about an individual, including but not limited to: education, financial transactions, medical history, and criminal or employment history, and information that can be used to distinguish or trace an individual’s identity, such as his/her name, Social Security number, date and place of birth, mother’s maiden name, biometric data, and including any other personal information that is linked or linkable to a specific individual. Additionally, PII can be quite context specific, especially when it involves a combination of elements that might not alone identify a person • Ensure adherence to Fair Information Practices (FIPs) which are a set of standards governing the collection and use of personal data and addressing issues of privacy and accuracy • Contact DOE Chief Privacy Officer with questions around PII • Consider Microsoft’s SmartNoise available in both open source as well as integrated within Azure Machine Learning. SmartNoise is a toolkit jointly developed by Microsoft and Harvard, as part of the Open Differential Privacy initiative. With the new release of SmartNoise differential privacy can be used to not only protect individual’s data but also the full dataset using the new synthetic data capability |
Missing Security Requirements (#11 in Model Risks) | Missing security requirements in model design could open the environment up to adversarial attacks and cause drastic issues within the ecosystem. Adversarial players can attempt to make the AI model do something other than what its intended design by corrupting data, algorithms, or platforms, as well as the downstream systems acting on the model's output | • Mature the existing security program by incorporating AI security considerations to: - comprehensively address possible approaches to comprising the system (e.g. taking control of the system itself or feeding the system incorrect data to influence an action or decision that is not in line with the systems correct functioning) - considering the dynamic and integrated risks associated with AI • Ensure model risk management (MRM) policies explicitly reference how information security applies where appropriate |
Data Vendor Risks (#15 in Model Risks) | Vendor risks relate to the pedigree of the data, such as a record of components, inputs, systems, and processes that affect collected data and provide historical context. | • Ensure that the vendor meets firm and model risk management (MRM) vendor management standards, including privacy and information security as appropriate. • Establish and govern data acquisition service level agreements (SLAs) with external vendors that include provision of data provenance information (see Missing Data Pipeline Verification Risks for details) • Evaluate how the data is aggregated by data vendors and their ongoing monitoring processes • Understand the vendor's approach to risk management practices to confirm that the vendor tests for bias and fairness • Develop input validation strategies (for raw data and observed outcomes, if available) to detect invalid data • Built-in reliability checks are key to ensure data trustworthiness • Develop response time tracking against SLAs |
Algorithmic Bias Risk | Using algorithms properly requires understanding their strengths and weaknesses to avoid the risk of algorithmic bias. If left unchecked, biased algorithms can lead to decisions which can have a collective, disparate impact on certain groups of people, even without the programmer’s intention to discriminate. For example, random forests are biased in favor of attributes with more levels. Therefore, the variable importance scores from random forests are not reliable for this type of data. Random forests have also been observed to overfit for some data sets with noisy classification/regression tasks. | • Consider a new random forest method based on completely randomized splitting rules with an acceptance–rejection criterion for quality control. The authors have shown how the proposed acceptance–rejection (AR) algorithm can outperform the standard random forest algorithm (RF) and some of its variants, including extremely randomized (ER) trees and smooth sigmoid surrogate (SSS) trees. Twenty datasets were analyzed to compare prediction performance, and a simulated dataset was used to assess variable selection bias. In terms of prediction accuracy for classification problems, the proposed AR algorithm performed the best, with ER being the second best. For regression problems, RF and SSS performed the best, followed by AR, and lastly by ER. However, each algorithm was most accurate for at least one study. The authors investigated scenarios where the AR algorithm can yield better predictive performance. In terms of variable importance, both RF and SSS demonstrated selection bias in favor of variables with many possible splits, while both ER and AR largely removed this bias. https://link.springer.com/article/10.1007/s00180-019-00929-4 |
Usage of AutoML in Generating Features, Tweaking Representations in Model Architecture, and Parameter Tuning | AutoML functionality automatically runs through all the algorithms and their hyperparameters to produce a leaderboard of the best models. By commoditizing machine learning for process improvement, AutoML raises questions about the interplay between data, models, and human experts and how it should look in the context of risk management. If parameters are automatically generated using AutoML input, there is a risk of the model not being properly calibrated, which could degrade its quality and the quality of the decisions it supports. | • Semi-automate the model development process (especially for complex designs) by adapting the augmentation versus automation idea. As complexity increases for tasks such as feature construction, tweaking representations in model architectures, and parameter tuning, automation can add rigor by examining alternatives and combinations in a more comprehensive manner. However, at the current state of AI maturity, automation cannot replace expert knowledge. AugML enriches the AutoML concept by underscoring the importance of experts, context, and complementary data. • Contextualize the machine learning with representation engineering, i.e. intentional mapping of structured and unstructured data into a meaningful custom data architecture. This representational richness comes from incorporating highly contextualized, problem-specific constructs tailored to the problem at hand and can only be achieved by accounting for individualized characteristics that impact the outcome. This is where expert understanding of the problem at hand, as well as intuition or expertise about the data, is required • Balance depth with breadth through data triangulation by integrating complementary data sources. For example, in a call center where NLP may be used to understand customer sentiment, complementary data such as audio recordings, product reviews, and satisfaction surveys can be used to further empower the predictive power of the model. A variety of data matters as well, not just one large source of data. https://hbr.org/2019/10/the-risks-of-automl-and-how-to-avoid-them • Use AutoML just as a piece of the data science process workflow and not as a whole solution or as a substitute for data scientists to minimize avoidable risk https://analyticsfrontiers.uncc.edu/sites/analyticsfrontiers.uncc.edu/files/media/Presentation_%20Cliff%20Weaver%20Pitfalls%20of%20AutoML-AF19.pdf |
Reliance on Theoretical Selection of Hyperparameters | Machine learning algorithms involve a number of hyperparameters that have to be defined before the algorithm can be executed. Resorting to default values of hyperparameters that are specified in implementing software packages can impact model performance and stability. | • Select an appropriate hyperparameter configuration for the specific dataset at hand. If a comprehensive database of experiments with previous machine learning problems is not available, manually configure them. For example, based on recommendations from the literature, experience, or trial-and-error • Consider using hyperparameter tuning strategies, which are data-dependent, second-level optimization procedures for optimal predictive performance. The goal is to minimize the expected generalization error of the inducing algorithm over a hyperparameter search space of considered candidate configurations, usually by evaluating predictions on an independent test set, or by running a resampling scheme such as cross-validation https://arxiv.org/pdf/1802.09596.pdf . |
Solution Environment Risks (#64 in Model Risks) | There are issues around software, digital and physical infrastructure such as whether those infrastructure components can support the model execution. Some models require massive amounts of computing power that the infrastructure may not be able to handle. | • Define and include in model risk management (MRM) environmental standards and requirements to operate the model and optimize its performance • Develop secure and sustainable infrastructure, data, and operations processes to deliver against SLAs • Conduct robust performance testing to ensure that the infrastructure supporting AI model lifecycle can handle the intensity of computing power required to operate the model with respect to data capacity, retraining, and calibration • Verify that the optimization algorithm that typically underlies the transformation logic is converging properly and generating sensible results and that the model performs over a range of “call conditions” • Confirm that minimum standards for deployment have been met • Consider instituting a layered approach to AI model governance to allow modularity, which combines different mechanisms to address the issues, making it a shared responsibility among all relevant stakeholders |
Integration Risks (#65 in Model Risks) | There are issues around integration with upstream and downstream systems. Some departments are using AI components that are tightly embedded within their processes, unbeknownst to risk management organizations. | • Define and include in model risk management (MRM) integration standards and requirements to ensure the end-to-end system's operational sustainability • Ensure that the AI model has interoperability among different platforms, frameworks, and approaches • Conduct a robust integration testing process that meets the agreed upon and documented compliance requirements and specified functional requirements • Affirm that the model has been configured and integrated properly into the production environment (errors can arise when firms employ legacy systems, upgrade from one model version to another, or migrate the model from one programming environment into another) |
Scalability Risks (#66 in Model Risks) | Scaling AI solutions to deal with real data, business users, and customers is fraught with risks and difficulties (including dependence on intensive computer processing and storage requirements, higher data volumes and veracity, increased data security and governance, and the need for change management) that needs to be accounted for to ensure model performance. | • Institute a robust performance monitoring process to ensure that the production infrastructure can handle the intensity of computing power needed to run the model • Ensure that the AI system is scalable and deployable with the right technology infrastructure • Automate the model updating and approval process when any of the model’s metrics fall outside pre-set goals and parameters to allow a newly optimized version of the model to quickly return to production for further monitoring and assessment |
Third Party Relationships and Security Controls Risk | AI systems introduce new kinds of complexity not found in the traditional IT systems. They are also likely to rely heavily on third party code or relationships and will need to be integrated with several other new and existing IT components (including robotics), which are also intricately connected. This complexity may make it more difficult to identify and manage some security risks and may increase others, such as the risk of outages. | • Conduct AI asset discovery and scanning exercises to surface buried code and risks (see Outdated/Non-Existent Enterprise Governance, Risk Management and Control (GRM)) • Conduct thorough due diligence on the current and prospective vendors utilizing questionnaires or on-site visits if necessary. Evaluate not only a third-party's information security, but also their compliance with regulations for privacy, such as GDRP. Consider evaluating third-party's adherence to industry standards such as NIST or ISO • Gather information around the vendor's security posture by asking questions around thing such as their penetration testing, remediation schedule timeliness, security incident documentation and remediation, business continuity testing, security controls that exist for users like multifactor authentication, security program maturity, adherence to ISO, SOC 1/SOC 2 and NIST compliance and any related documentation • Implement various types of controls to mitigate risk (e.g., restrict access to technical integrations of vendor offerings, limit the types of data and amounts of data that can be input securely) • Track the assets to which the vendor has access, develop actions to mitigate risk exposure, disclosure and notification procedures, external communication strategies and plans to reevaluate the vendor's security and remediation after an incident https://securityboulevard.com/2019/04/third-party-security-risks-to-consider-and-manage/ • Separate AI development environment from the rest of IT infrastructure where possible. For example, use ‘virtual machines’ or ‘containers’ - emulations of a computer system that run inside, but are isolated from the rest of the IT system. These can be pre-configured specifically for machine learning tasks. |
Transferability of Development Code to Production System | Deploying code and AI/machine learning models can be difficult due to the mismatch between the development and production environments. This challenge is further complicated by the fact that production environments are often running larger applications (e.g. software as a service, CRM) where the data science-based components are a subset of the overall functionality and user experience. | • Decide at the model design stage if the code will be plugged into the existing framework (e.g. as a service or microservice), or translated into something more compatible with the existing production environment. Specify all requirements (both functional and non-functional) and all tests, where applicable, for the code to successfully meet before being cleared for deployment • Ensure the feasibility of deployment of the written data science code and models developed, tested, and ultimately made ready for consumption in the production environment. In this context, an environment can be thought of as a specific machine (physical or virtual) running a specific operating system and version that is configured in a very specific way, and with a specific, versioned set of software, programming languages (e.g. Python), and packages installed • Consider using a platform/tools to convert models trained using programming languages developed for scientific and machine learning uses (e.g. Python) into another language (e.g. Java) for deployment. For example, H2O.ai creates highly portable programs deployed in Java (POJOs) and model object optimized (MOJOs) for fast and accurate deployment in any environment, including very large models • Define standardized DevOps-like processes to deploy the code into a production environment. |
Adversarial Attacks (#79 in Model Risks) | AI solutions offer new opportunities for adversarial attacks, such as feeding AI systems bad data which can result in incorrect predictions and faulty decisions downstream. Data perturbation by adversaries, while not usually readily recognizable by humans, can generally be recognized by machines. | • Define and include in model risk management (MRM) adversarial attack monitoring standards and requirements to prevent a model malfunction caused by malicious input • Enhance governance methodologies and tools (Note: research on adversarial attacks has shown that AI models that reveal more information about the individual data points used to train those models are more robust to data poisoning by adversarial inputs) • Develop effective testing and auditing techniques, as well as meaningful certification programs, that provide clear guidance to AI developers and operators on addressing the vulnerabilities of AI models • Leverage research on adversarial attacks and model data leakage to test AI models for vulnerabilities and assess their overall robustness and resilience to different forms of attacks (For example, 'block switching' designed to provide a never-before-seen defense strategy against adversarial attacks by programming parts of an AI model's layers with randomly assigned run times so that it “fools” the adversary and prevents them from knowing and exploiting model layer weaknesses) • Institute cyber threat hunting to proactively and iteratively search through networks to detect and isolate advanced threats that evade existing security solutions |
Insufficient Learning Feedback Loops | Models deployed to a production environment have new and unseen data passing through them to generate a high-performing results (generalization). Because data and the underlying information on which the models are based can change due to trends, behaviors, and other factors, an established cadence of learning feedback loops is required to maintain a targeted level of performance. Without establishing these feedback loops, performance degradation or drift is common. | • Define thresholds to measure the degree of feedback loops in place so that model results do not become stagnate • Design feedback loops into the software to capture feedback effectively • Determine model-specific retraining cadence and mechanism (driven by model risk ranking) - Offline learning (aka batch learning) is when the model is trained outside of production on an entire data set or on a subset of the data (mini-batch). Part of this process involves software, as well as storage and access of data from data stores, such as an RDBMS, NoSQL, data warehouse, and Hadoop. The retrained model needs to be validated (e.g. cross-validation) and iteratively optimized (e.g. grid search, hyperparameter optimization) before it can be deployed back to production. This process is repeated as needed to maintain model performance in production. - Online learning is when the model is retrained and its performance is assessed online in a production environment with production data. This process is usually carried out on a recurring interval (cadence) that can last minutes, days, or longer. Online learning is intended to address the maintenance and upkeep of target performance based on changing data, as in the batch learning case, but also to incrementally update and improve deployed deliverables without retraining on an entire dataset. This process also requires software, performannce, available data storage, and access due to the online nature, necessitating the involvement of network communications, latency, and availability of network resources. • Do not be overly dependent on an algorithm to provide feedback. Consider involving a human at some point in the feedback loop so that the machine and human can work together in order to rectify any kind misjudgments on the machine’s part |
Trustiness of Third Party Open Source Code Bases | The technical landscape of artificial intelligence systems often requires the involvement of many third party vendors. Entire algorithms can be purchased as packages from vendors. While open source software can help developers and IT operations teams, the various open source components are developed and maintained by a diverse group of independent creators. The landscape of different versions of open source packages, their licenses, their evolving security vulnerabilities, their dependencies and updates, can introduce a source of risk as complications arise with standardization, security, and permissions. | • Ensure comprehensive visibility into the open source and commercial components and frameworks used in an application or service. In the November 2019 Gartner research paper “Technology Insight for Software Composition Analysis,” analyst Dale Gardner notes that “comprehensive visibility into the open source and commercial components and frameworks used in an application or service must be considered a mandatory requirement.” Gardner goes on to recommend that organizations “continuously build a detailed software bill of materials (BOM) for each application providing full visibility into components.” Gartner, Dale Gardner, Technology Insight for Software Composition Analysis, Nov. 1, 2019. • Ensure that the model management system (MMS) captures model registry, release, activation, servicing, scaling in production, and consumption, if applicable, as well as information related to the model's usage of (high-quality) open source components, including: the component's licenses and their structure (permissive or restrictive, the most commonly used license or variant), the version of the component (most current version or outdated, the most stable or not, the most secure or not), and the component's maintenance (active robust community or not) • Reference Synopsys’ Open Source Security and Risk Analysis (OSSRA) report for an in-depth snapshot of the current state of open source security, compliance, and code quality risk in commercial software to identify potential codebase and open source risks (Note: 2020 OSSRA research indicates 99% of codebases audited in 2019 contained open source components; 75% of audited codebases contained vulnerabilities, with 49% of codebases containing high-risk vulnerabilities; 33% of codebases contained unlicensed software and 67% of codebases had license conflicts; 82% of codebases had components more than four years out of date and 88% of the codebases had components with no development activity in the last two years) https://www.synopsys.com/software-integrity/resources/analyst-reports/2020-open-source-security-risk-analysis/thankyou.html • Create an up-to-date, accurate software inventory—a.k.a. a software BOM—that includes all open source components, the versions in use, and download locations for each project in use or in development. The BOM should also include all dependencies, or the libraries the code invokes, as well the libraries those dependencies are linked to. Consider using Software Composition Analysis (SCA) tool to build baseline understanding • Monitor for changes in external threats and vulnerability disclosures. The National Vulnerability Database (NVD) is a good source of publicly disclosed vulnerabilities in open source software and a range of secondary sources • Consider putting in place an automated process that tracks open source components, their licenses, and known security vulnerabilities • Create policies to manage open source activities, educate development teams, and engage with open source communities • Prioritize open source vulnerability mitigation efforts based on CVSS (Common Vulnerability Scoring System) scores and CWE (Common Weakness Enumeration) information, as well as on the availability of exploits, not only on “day zero” of a vulnerability disclosure, but over the lifecycle of the open source component - The Common Vulnerability Scoring System (CVSS) is an industry standard for assessing the severity of a vulnerability. Vulnerabilities in the National Vulnerability Database (NVD) have a base score that aids in calculating severity and can be used as a factor for prioritizing remediation. The CVSS score (v2 and v3) provides an overall base score that takes both exploitability and impact into account - Common Weakness Enumeration (CWE) is a list of software or hardware weaknesses that have security ramifications. A CWE tells developers which weakness leads to the vulnerability in question. This information can be helpful and adds one more piece to assessing the severity of the vulnerability. For example, a development team may prioritize a SQL injection differently than a buffer overflow or denial of service • Perform an open source due diligence audit when involved in M&A transactions where software is a major part of the deal |
Poor Governance Framework for Evaluating and Monitoring Third-Party AI Tools | A weak governance framework for evaluating and monitoring third-party AI tools can result in degraded models and a decrease in the quality of the support they offer. | • Establish a comprehensive inventory of third-party relationships, including: outsourcing partners, suppliers of products and services, and important fourth parties (sub-contractors) • Segment third-parties based on risk and refresh regularly to efficiently allocate resources to relationships posing the highest risk. This process should directly tie into a tailored approach for on-going risk monitoring • Onboard and conduct due diligence tests using a comprehensive set of rules, including an assessment of compliance with relevant regulations. Onboarding teams should be put in place for medium-sized to large institutions to identify risks based on materiality criteria • Include in control systems a comprehensive lists of risks, escalation triggers essential for the success of audit routines, and scorecards to monitor risk. Best practice is to have a master register of escalation trigger-points and their risk weights in each category relevant to all firms. That register can then be adapted to the particular circumstances of individual suppliers |
Deficient Technology Environment Design | To fully take advantage of the opportunities presented by AI, the technology environment must be strong enough to manage the computational needs of big data. These needs include adequate storage, computing resources (including CPUs and GPUs), appropriate software, networking power, and security. As more processes and decisions become interconnected, there are more potential weaker links, surfaces exposed to attacks, and components that can malfunction. | • Mature infrastructure/platform risk screen, assessment, and control methodologies (e.g. ensure engineering guarantees are in place, set standards to provide ample margin of safety, conduct adequate stress tests, adapt cloud elastic scaling capability) to accommodate requirements of sustainable AI • Properly embed AI technologies into larger IT systems and infrastructure by using a long-term (e.g. large users of Big Data utilize hyperscale computing environments, which are made up of commodity servers with direct-attached storage, run frameworks like Hadoop or Cassandra, and often use PCIe-based flash storage to reduce latency. Smaller organizations often utilize object storage or clustered network-attached storage (NAS)) • Ensure robust networking hardware (e.g. many organizations are already operating with networking hardware that facilitates 10-gigabit connections and may have to make only minor modifications, such as the installation of new ports, to accommodate a Big Data initiative), or have a direct relationship with bandwidth • Secure network transports, especially for traffic that crosses network boundaries • Utilize servers with enough processing power to support analytics application • Consider cloud storage as an option for disaster recovery and backups of on-premises Big Data solutions (Note: While the cloud is also available as a primary source of storage, many organizations, especially large ones, find that the expense of constantly transporting data to the cloud makes this option less cost-effective than on-premises storage) • Consider building out a dynamic data fabric to optimize the use of AI while mitigating increasing data entropy and creating a future resilient architecture ready to address rapid technology change. Dynamic data fabric is the next-generation, real-time intelligent data layer that can sit between existing applications and AI-enabled applications to allow seamless and real-time data access, integration, and analysis. These applications scale out dynamically to accommodate increases in data volumes and workloads as markets and volatility levels spike in times of crisis. https://www.business-of-data.com/articles/financial-services-data-fabrics?utm_campaign=0106%20-%20Insights%20-%20Community%20%26%20Editorial&utm_medium=email&_hsmi=95207874&_hsenc=p2ANqtz-8M8ML8CauC-aY_JllQvLl6GKfFg8vyjpNN2WjSZoOcoiMxuE_0WMbJ-afN1p-U4e4IFLDFXjWrya2SK9scZns3SSPOcQ&utm_content=95207223&utm_source=hs_email |
Missing Lineage for Open Source Software, Content, and Data (See also Model Risks #14 Missing Data and Pipeline Verification Software Risks #13 Trustiness of Third Party Open Source Code Bases) | Open source provides the componentization, modularity, and flexibility to enable developers and operators to achieve greater levels of velocity, productivity, and efficiency. Not managing open source holistically can expose organizations to maintenance, security, and licensing-related risks and lead to constraining developers' use of open source which may limit productivity and innovation. | • Create an open source strategy to ensure developers and IT operators have access to the open source projects and components they need while doing so in a sanctioned, secure, and compliant way. • Consider managed open source option to offload the time-consuming and complex task of managing open source supply chain to third-party vendor (Note: -vendor-managed open source software has vendor lockdown risk and much higher costs). The benefits include: - Developing with open source components along with the consistency, code quality, governance, security, and support provided by vendors that can consolidate and coordinate enterprise use of open source - Providing enterprise teams with a catalog of vetted, trusted packages that developers can choose from when building applications - Enabling developers and operations teams to focus on new features, products, and innovation, instead of keeping track of software versions, licenses, and dependencies internally https://cdn2.hubspot.net/hubfs/4008838/Managed%20Open%20Source_451%20Group.pdf?utm_campaign=Managed%20open%20source%20-%20451%20report&utm_medium=email&utm_content=85468266&utm_source=hs_automation Note: Most of the AI frameworks, toolkits and applications available today do not implement security at all, relegating them to disconnected experiments and lab implementations. |
Infrastructure Malfunction | Infrastructure malfunctions could be caused be a variety of circumstances. If the infrastructure malfunctions, then models could be degraded or lost. This could lead to a decrease in the quality of decision-making that the models support. | • Put in place strong IT support group that understands the nuances of the systems which support AI applications (e.g. enterprise requires a specialist to identify the roadblocks in the deployment process) • Ensure IT support teams stay current on AI advancements and technologies and continually enhance their technical knowledge around integration, deployment, and applications of AI in the enterprise. Note: The lack of technical know-how is hindering the adoption of AI in most of the organization. Currently only 6% enterprises have a smooth ride adopting AI technologies. • Clearly define roles and responsibilities, as all individuals involved directly or indirectly in applied intelligence must bear the burden of any sort of hardware malfunctions • Do not abdicate the need for human oversight, as right now AI is just a tool that can operate sustainably only by working side-by-side with humans (with autonomous machines operating in enterprises independently of humans still being a hope for the future) |
Cybersecurity Malfunction | The methods underpinning state-of-the-art AI systems are systematically vulnerable to a new type of cybersecurity attack called an “artificial intelligence attack.” Cybersecurity malfunctions open AI systems/models to attacks where adversaries can manipulate these systems/models in order to alter their behavior to serve a malicious end goal. These “AI attacks” are fundamentally different from traditional cyberattacks as they are enabled by inherent limitations in the underlying AI algorithms that currently cannot be fixed. | • Establish a strong cyber support group that understands the types of attacks specific to AI systems (e.g. need to recognize that AI cyber risks come in two forms: (1) the infiltration of legitimate AI programs, and (2) the use of purpose-built AI programs that seek to exploit the vulnerabilities in a target organization’s systems. The machine learning algorithm’s logic can be extremely complex and not readily interpretable or transparent. Therefore, there is a need to understand the modus operandi of AI systems. Since the AI system is empowered to make decisions and deductions in an automated way with little to no human intervention, any compromise and infiltration can go undetected for a while. Thus, even when administrators detect what seems to be a clear violation, the reason for it may remain opaque for a while. That means the violation could be dismissed as simply a glitch in the system even when it’s the result of an attacker’s active efforts at taking over control of the AI system. http://techgenix.com/ai-cyber-risks/ • Create an AI Security Compliance program to reduce the risk of attacks on AI systems and lower the impact of successful attacks. For example, this can be accomplished by encouraging stakeholders to adopt a set of best practices in securing systems against AI attacks, including considering attack risks and surfaces when designing and deploying AI systems, and creating attack response plans. Consider modeling this program on existing compliance programs in other industries, such as PCI compliance for securing payment transactions. https://www.belfercenter.org/publication/AttackingAI • Protect enterprise data at rest and in motion (inside or outside the traditional boundaries of the enterprise) which relies heavily on pattern recognition. This involves combing through the dozens upon dozens of log reports related to authentications and authorizations, data changes, network activity, resource access, malware, critical errors, and more. Characterize what is normal vs. what represents a potential threat in these logs, with heavy emphasis on avoiding false positives and false negatives. https://www.cio.com/article/3295836/when-big-data-and-cybersecurity-collide.html • Put in place automated response mechanisms to respond faster to security incidents and block-off attacks within seconds of detection (Note: consider using AI to pivot the fight against cyber attacks from reactive to proactive detection. Cutting edge solutions include the Blue Hexagon threat detection platform) |
Too Many Permissions | Permissions are attack surfaces and migrating workloads to public cloud environment opens up organizations to a slate of new, cloud-native attack vectors which did not exist in the world of premise-based data centers. In this new environment, workload security is defined by which users have access to the cloud environment and what permissions they have. As a result, protecting against excessive permissions and quickly responding when those permissions are abused becomes the #1 priority for security administrators. | • Mature compliance and governance tools around permissions usage to modern integrated protection that includes software-as-a-service (SaaS) applications, as well as infrastructure-as-a-service (IaaS) or platform-as-a-service (PaaS) environments • Detect excessive permissions (including, but not limited to, permissions mis-characterized as ‘misconfigurations’ which are actually the result of permission misuse or abuse by people who shouldn’t have them) • Look for anomalous activities. For example, a data breach is not one thing going wrong, but a whole list of things going wrong. Most data breaches follow a typical progression, which can be detected and stopped in time. Monitoring for suspicious activity in the cloud account (for example, such as anomalous usage of permissions) will help identify malicious activity in time and stop it before data is exposed • Minimize the gap between granted permissions and used permissions to minimize the risk of exploitation by hackers, who take advantage of unnecessary permissions for malicious purposes • Consider subscribing to a security service for comprehensive protection of workloads hosted (e.g. Radware’s Cloud Workload Protection Service for AWS) https://securityboulevard.com/2019/02/excessive-permissions-are-your-1-cloud-threat/ |
Broken Human-Machine Interface | The human-machine interface (HMI) is the intermediary between a machine and the operating personnel. With the advent of AI, it now goes beyond traditional machines and also relates to computers, digital systems, or devices for the Internet of Things (IoT). If HMI design does not allow users to understand the behaviors of the systems, for example, by not appearing plausible to them, AI performance may be compromised, as AI is only as good as its interface to humans. | • Mature traditional human-machine interface solutions beyond the traditional stand-alone, isolated terminals that were deployed by an OEM (Original Equipment Manufacturer) as part of a machine • Consider new HMI solutions based on machine learning that are either on-premise or pre-configured to send data to the cloud. A fundamental shift in the business model is being enabled by IoT sensors -- a move from products to services. Note: we’re in the early stages of an evolution from highly complex approaches to much more intuitive interfaces for HMI • Conduct an audit in order to assess current position on the maturity scale, followed by establishing a long-term R&D plan • Create new leadership roles responsible for adopting and leveraging the power of HMIs and identifying new investment opportunities and then assigning teams that will be committed to innovation • Create sufficient workforce training and development programs around the skills the workforce needs (as identified by the audit) https://towardsdatascience.com/ai-human-machine-interface-new-business-models-c0611749c8a5 https://www.itransition.com/blog/human-machine-interfaces |
No Adequate Back Up Systems and Fallback Plans (see Model Risks #14 Not Understood Risk Rating Requirements) | AI systems need to be resilient and secure. They need to be safe, including ensuring a fall back plan in case something goes wrong. Without adequate backups and disaster recovery plans in place, the decisions that rely on the AI systems and models can cause unintentional harm. | • Identify the threats associated with AI system/model applications (see AI model inventory in Governance, Risk Management and Control (GRC) and Model Risk Management (MRM) in Workforce Risks) and appropriately mature the existing GRC and MRP • Leverage the system's/model's risk-tiering to determine the effort of risk mitigation, including definition of warning signs (risk triggers), contingency plans describing the specific actions that will be taken if an opportunity or a threat occurs, and the fallback plan (addressing the residual risks) to be implemented when the contingency plan fails or is not fully effective • Schedule frequent backups to prevent loss of data while creating a different backup schedule for different data blocks (e.g. back up high priority data blocks once a day, while only once a week) to eliminate unnecessary backup and reduce storage and costs • Define roles and assign responsibilities for each element of the plan. Consider using the Responsibility Assignment Matrix and the Responsible, Accountable, Consulted, and Informed (RACI) Model • Protect big data backups with strong data encryption technology to ensure that in case of unauthorized access to the backup, the data will be unusable. The data should be encrypted while it is at rest, inside the storage device, and while it is on the move from one location to another • Make sure to address in the contingency/fallback plans legal obligations and reporting to the relevant authorities, as applicable https://bigdataanalyticsnews.com/protecting-data-tips-big-data-backup/#:~:text=Cloud%2Dbased%20services%20are%20very,to%20using%20a%20cloud%20service |
Missing Policies and Procedures | AI policies and procedures are defined to maximize the benefits of AI while minimizing its potential costs and risks. Policies that guide the management of AI system infrastructure must be in place or the models could be met with many challenges that degrade them and the decisions they are designed to make. Beyond ongoing management, key events requiring policies include installation upgrades, change of vendors, and more. | • Develop policies and procedures to guide the build and management of the infrastructure needed to support AI deployment at scale, including the complex capability of leveraging big data for AI (Note: maturing existing policies and procedures built for the traditional infrastructure that does not require big data volumes might be an option) • Ensure comprehensiveness of developed AI policies and procedures by including: - policies/procedures specifically oriented toward governing AI-based technologies and systems (e.g. machine learning, virtual agents, document intelligence, etc.) - policies/procedures that indirectly affect AI-based technology development, but are nominally focused on other emerging technologies or technology in general (e.g. opportunity management, intellectual property management) - policies/procedures in which AI development is neither specifically targeted nor significantly affected, but in which knowledge of plausible AI futures would benefit the broader organization and its functions (e.g. Finance, Technology) |
Cultural Barriers to Adoption | Lack of governance or ineffective decision making can be cultural where staff are reluctant and practice risk aversion particularly when it comes to governing the AI lifecycle. The human element of AI cannot be ignored. | • Implement an operating model for responsible AI adoption • Invest in capabilities that support AI adoption and risk management • Provide training to developers and product/program owners so they understand governance, risk management, and control (GRC), model risk management (MRM), the potential legal and ethical considerations for the development of AI, and their responsibility to safeguard impacted users’ rights, freedoms and interests • Provide support around change management • Embed supervisory expectations throughout the AI lifecycle to better anticipate risks and reduce harm to customers and other stakeholders • Hold development teams and model owners accountable for deploying models that are conceptually sound, thoroughly tested, well-controlled, and appropriate for their intended use |
Missing Oversight by Executive Leadership | Lack of awareness and understanding of the effectiveness of governance, risk management and control (GRC), and model risk management (MRM) used in the AI model lifecycle by senior management can create an unintended risk exposure. | • Create AI curriculum for executive leadership focused on furthering their understanding of AI risks and the need for governance, risk management and control (GRC) • Educate executive leadership on AI risks and the potential legal and ethical considerations for the development of AI, as well as their responsibility to safeguard impacted users’ rights, freedoms, and interests • Educate executive leadership on core principals of AI innovation in a way that helps build and sustain trust 1) Adoption of purposeful approach to AI 2) Agile governance to keep up pace with the evolution of AI and its expanding capabilities 3) Vigilant supervision beyond what the organizations have typically adopted required by the ongoing learning nature of AI |
Missing Oversight from AI Advisory Board | Independent advice and guidance on ethical considerations in AI development provides a second line of defense against ethical problems around AI. | • Establish a multi-disciplinary advisory board drawing advisors from ethics, law, philosophy, technology, privacy, regulations, and science. The advisory board should report to and/or be governed by the Board of Directors • Develop AI ethical design policies and standards for the development of AI, including an AI ethical code of conduct and AI design principles. The AI ethical design standards should define and govern the AI governance and accountability mechanisms to safeguard users, follow social norms, and comply with laws and regulations |
Poorly Defined Roles and Responsibilities | Clearly articulated roles and responsibilities for model developers, users, and validators are required to achieve ownership and accountability for risks. | • Give model developers clarity on all requirements needed to get models approved • Help control functions understand how their responsibilities are allocated |
Outdated/Non-Existent Enterprise Governance, Risk Management and Control (GRM) | Enterprise governance, risk management and control (GRC) needs to fully address AI-specific considerations and risks. | • Enhance existing/develop integrated enterprise governance, risk management and control (GRC) to effectively monitor and control other non-model-related risks arising from AI, such as privacy, information security, and third-party risk. This includes defining processes and supporting a set of policies and control mechanisms to incorporate into the business that encompass three components: - AI Governance – processes used to provide oversight of AI throughout its lifecycle - AI Risk Management – processes used to identify and manage the risks of AI - AI Control – process of validating compliance of AI assets to sated requirements, both internal and from external stakeholders (e.g. regulations) • Build a sustainable AI risk management program following three steps 1. AI Asset Discovery - Identify assets and libraries across the organization, including hidden and shadow AI - Create a searchable AI asset database - Enable community of users with access to drive collaboration and adoption of best practices 2. AI Inventory Management - Evaluate the organization’s AI inventory and AI asset optimization processes - Conduct a preliminary risk assessment and triaging of AI assets - Identify opportunities for targeted governance to reduce risk, optimize cost and create more value 3. AI Governance, Risk Management and Control (GRC) - Baseline the existing AI governance - Develop AI-specific governance, risk management and controls (GRC) strategy - Build implementation plan and mature the GRC framework to include AI applications - Launch AI GRC awareness program/training. |
Outdated/Non-Existent Model Risk Management (MRM) | Model risk management (MRM) needs to fully address AI-specific considerations and risks. | • Enhance existing model risk management (MRM) to address AI-specific risks of AI models and systems with AI components in alignment with the GRC processes and controls. This process includes MRM standards along the model lifecycle: from ideation and design, trough development and review to operationalization and monitoring (Note: the oversight of AI models should be consistent with the processes used for traditional models. Enhancements to policies and procedures should consider the dynamic and integrated risks associated with AI) • Build a sustainable AI model risk management program following five steps 1. Definition, Identification, and Scope - Enhance the existing model identification guidance/aids to include AI techniques - Integrate the enhanced guidance into the existing model identification and attestation process (e.g. innovation programs, new products approval, third-party sourcing, end-user computing) - Define the boundary for complex AI models that will be subject to enhanced standards 2. Risk Rating - Incorporate the AI specific elements into the existing model risk ranking process across Materiality and Impact (e.g. customer interaction, regulatory compliance) and Complexity/Likelihood components of risk ranking (e.g. data type, vendor/open source, libraries/codes, online retraining) - Conduct the model risk tiering of the AI models to identify and prioritize high risk AI models 3. Model Development and Review Standards - Ensure development and review explicitly covers the following: data risks, algorithmic risks, performance risks, computational feasibility, online training, and vendor risk (for open source algorithms) - Enhance standards for change management, targeted model reviews, and ongoing monitoring - Develop new standards, documentation and validation templates in consultation with other control functions (e.g. compliance, technology) - Build risk-based application of the framework (high risk vs. low risk use cases) 4. Governance - Assess role of the Model Risk Management Committee (MRMC) for 2nd line oversight - Update senior management on AI/ML model development, review, and use - Assess the extent of coordination required between MRM and other control frameworks (e.g. cyber, data/privacy, vendor, etc.) and functions 5. Capabilities and Training - Update analytical tools for testing and outcomes analysis for AI models - Enhance ongoing monitoring to capture increased volume and frequency of AI model data updates - Perform MRM resourcing and skillset gap assessment to effectively cover AI models and update hiring and training plans - Conduct trainings to rollout new/enhanced MRM standards/guidance • Build tighter linkage among the model risk management (MRM) framework, GRC, data governance, and other risk management frameworks such as privacy, information security and third-party risk management • Embed the principals of trustworthy AI into the ecosystem from the onset of every initiative • Hold development teams and model owners accountable for deploying models that are conceptually sound, thoroughly tested, well-controlled, and appropriate for their intended use |
No Active Involvement of Internal Audit | Engagement from internal audit is critical to challenge the AI program and give assurance that governance, risk management and control (GRC), model risk management (MRM), and related controls are effective for AI models. | • Ensure internal audit has a seat at the table at key AI-related forums • Determine the balance for challenge of AI program vs. governance/controls • Develop and activate internal audit plan • Document audit procedures and their relationships to other policies • Implement efficient, repeatable AI challenge processes • Ensure internal audit aids the organization through appropriate risk and control decisions |
No Independent Audit | Independent third-party audits against existing AI and technology policies and standards, as well as international standards will enhance users’ and public trust in AI systems. | • Conduct periodic third-party ethical AI and design audits • Include in the audit an evaluation pf the sufficiency and effectiveness of the governance model and controls across the AI lifecycle from problem identification to model training and operation |
Lack of Dedicated AI-focused Risk Officer/ SME | Presence on the development team of a dedicated risk officer/SME who is knowledgeable and focused on ensuring that the unique risks associated with AI are appropriately mitigated along the AI lifecycle is critical to ensure sustainability of the solution. | • Reference the AIRMP to choose the right project management program • Create a standardized checklist specific to your organization to mitigate AI-specific risks • Ensure access to a dedicated risk officer/SME who has the understanding of the unique AI risks and best practices to mitigate them |
Lack of Alignment With Legal on Acquisition and Use of Third Party AI Components | Acquisition of products or services (including, but not limited to: data, models, software, infrastructure, and labor) from third party providers comes with rights which are acquired for their specific type of use. As copyright law is murky and constantly evolving, it is important to check in with Legal to ensure that the rights are secured for the Government (For example, if National Labs acquire data from external sources to develop a solution, we need to make sure we secure the right for the government use as well). | • Bring in the proposed solution in front of EAGB for their direction • Validate the solution with the Federal IT Acquisition Reform Act (FITARA) to ensure compliance with strategies for data center inventor and strategy for consolidating and optimizing the data centers (including planned cost savings) • Ensure the Legal team is engaged in the design of the solution and its components |