In October 2021 the U.S. Food and Drug Administration in conjunction with Health Canada and the MHRA (UK) released guiding principles on Good Machine Learning Practice for Medical Device Development (GMLP).
It comprises ten guiding principles that together deliver AI products that are fit for purpose, adequately monitored, and with strong oversight.
Biologit regularly surveys regulatory guidance to incorporate it into our processes. In this article, we discuss our implementation of these principles in the development and operation of our AI-powered literature screening platform: biologit MLM-AI.
Biologit’s Approach to AI Development
At biologit the AI development process is fully embedded into quality management practices and the ISPE GAMP framework for delivering computerized systems in the life sciences.
To document how we design, build and deliver AI products, we authored the Artificial Intelligence Development Life-Cycle (AIDLC) document covering key stages of model development.
The AIDLC takes into consideration current best practices and existing regulatory guidance (including GMLP). It describes how AI development interacts with the overall product development, the validation approach, and touch points with other core processes such as risk management and CAPA.
The involvement of subject matter experts is made explicit and part of the process, with pharmacovigilance SMEs involved in all stages and with particular responsibility over data quality and validation of results.
To ensure it fully adheres to our quality management process, the AIDLC is a standard operating procedure (SOP) and is treated as a controlled document with the oversight of our quality team.
The above ensures we are meeting industry best practice, as described in the recent article by Trancelerate. (“Adequate model documentation is key to ensure confidence and auditability of AI systems”).
It is also our goal to produce as much public documentation as possible, derived from our internal processes, helping with team accountability and to mitigate AI risks. In the following sections we'll go in more details on how we make use of model fact sheets, articles and product documentation to help with this goal.
GMLP Guiding Principles
In this section, we discuss the ten GMLP guiding principles and our approach to achieving them.
Multi-Disciplinary Expertise Is Leveraged Throughout the Total Product Life Cycle: In-depth understanding of a model’s intended integration into clinical workflow, and the desired benefits and associated patient risks, can help ensure that ML-enabled medical devices are safe and effective and address clinically meaningful needs over the lifecycle of the device.
For AI development, three aligned departments work together: pharmacovigilance, quality, and engineering. Pharmacovigilance experts expect to be involved in all stages from solution design, data labeling, quality control, and validation of models. This facilitates fast and regular feedback to the engineering team, ensuring the delivery of solutions that are fit for purpose. The quality team has oversight of engineering processes and ensures they comply with the company's quality policy.
Good Software Engineering and Security Practices Are Implemented: Model design is implemented with attention to the “fundamentals”: good software engineering practices, data quality assurance, data management, and robust cybersecurity practices. These practices include methodical risk management and design process that can appropriately capture and communicate design, implementation, and risk management decisions and rationale, as well as ensure data authenticity and integrity.
Biologit has been very intentional about how our software engineering practices and guiding principles. This is described in our Software Development Life Cycle (SDLC) document. The SDLC was authored in consultation with industry practitioners who are engaged in building validated software for life sciences and gathering industry best practices and complements the AIDLC for building AI-based software products.
Data quality is monitored on all stages of development and after an AI model is released. Some of our operational processes include:
Apply data quality controls for labeled data, along with labeling quality checks by pharmacovigilance SMEs.
In production, the data pipeline is monitored for missing, corrupted or incomplete files, and is designed for redundancy in case of failures.
Ensure data quality is sufficient for making model predictions safely, according to the input specification described in the fact sheet.
Data used in training models is selected by pharmacovigilance experts considering quality, provenance and traceability of each record before being eligible as a potential candidate for the dataset. All data used for training models is version controlled.
Security is part of our risk management and continuous improvement processes including regular audits of our cloud infrastructure, review of security policies and data access.
Risk management is an ongoing activity that permeates the above concerns: regular risk reviews for the AI development and platform operation ensure we maintain a feedback loop from SMEs, clients and internal monitors.
Clinical Study Participants and Data Sets Are Representative of the Intended Patient Population: Data collection protocols should ensure that the relevant characteristics of the intended patient population (for example, in terms of age, gender, sex, race, and ethnicity), use, and measurement inputs are sufficiently represented in a sample of adequate size in the clinical study and training and test datasets, so that results can be reasonably generalized to the population of interest. This is important to manage any bias, promote appropriate and generalizable performance across the intended patient population, assess usability, and identify circumstances where the model may underperform
Making sure our data sets are representative starts with our pharmacovigilance SMEs: their input is used to map model requirements to the best source data to use in training. The decisions are reflected in our data collection and labeling protocols.
The models in use in biologit MLM-AI derive from data collected from a broad number of scientific literature sources and cover diverse scenarios for safety events. This approach mitigates the risk on inherent regional or product/drug class biases and is periodically reviewed.
Users of biologit MLM-AI can refer to the publicly available model fact sheet to better understand the composition of our training data and its intended uses.
Training Data Sets Are Independent of Test Sets: Training and test datasets are selected and maintained to be appropriately independent of one another. All potential sources of dependence, including patient, data acquisition, and site factors, are considered and addressed to assure independence.
Datasets used in testing are separated before a model is developed or updated; each dataset is isolated from the other and verified for data leakage using content-based duplicate detection.
All datasets are version controlled using Git with Data Version Control (DVC) enabling data scientists to follow well known development workflows and pull-request type approvals for their machine learning artifacts.
Selected Reference Datasets Are Based Upon Best Available Methods: Accepted, best available methods for developing a reference dataset (that is, a reference standard) ensure that clinically relevant and well characterized data are collected and the limitations of the reference are understood. If available, accepted reference datasets in model development and testing that promote and demonstrate model robustness and generalizability across the intended patient population are used
As discussed previously, data sets are derived from multiple literature sources and drug classes and combined to provide a more generalized view of the medical literature, always reflecting the model's intended use.
Model Design Is Tailored to the Available Data and Reflects the Intended Use of the Device: Model design is suited to the available data and supports the active mitigation of known risks, like overfitting, performance degradation, and security risks. The clinical benefits and risks related to the product are well understood, used to derive clinically meaningful performance goals for testing, and support that the product can safely and effectively achieve its intended use. Considerations include the impact of both global and local performance and uncertainty/variability in the device inputs, outputs, intended patient populations, and clinical use conditions.
Managing model risks starts at design stage and comprises the entire life cycle. A number of risk controls are in place from extensive model documentation, ensuring full user oversight of model predictions, and the design of the model's inference pipeline taking into account SME input to minimize risky predictions.
Specifically, models that detect safety events in pharmacovigilance should be conservative on their predictions and minimize the risk of false negatives (ie. failing to identify a safety event).
Sample risks and controls for the design of biologit MLM-AI (full table here)
Focus Is Placed on the Performance of the Human-AI Team: Where the model has a “human in the loop,” human factors considerations and the human interpretability of the model outputs are addressed with emphasis on the performance of the Human-AI team, rather than just the performance of the model in isolation.
Full transparency of results was a key design consideration in biologit MLM-AI: all results are available for users to inspect, QC, and make risk-managed decisions on the desired level of screening automation. Extensive user documentation and training is available to optimize the use of the tool for productivity.
📖Learn more: Effective Medical Literature Workflows with AI
The user interface is designed to provide information in a clear, concise and consistent view.
Testing Demonstrates Device Performance During Clinically Relevant Conditions: Statistically sound test plans are developed and executed to generate clinically relevant device performance information independently of the training data set. Considerations include the intended patient population, important subgroups, clinical environment and use by the Human-AI team, measurement inputs, and potential confounding factors.
During development, the test set are used to verify model performance according to the desired performance metric. Test sets are separate to training data, as previously mentioned.
Model testing is part of our computer systems validation approach and separate operational qualification (OQ) tests are designed with pharmacovigilance SMEs to verify model results and the user interface.
During deployment, new models are first deployed to a test environment for a window of time so that it can be benchmarked against the previous production model.
Users Are Provided Clear, Essential Information: Users are provided ready access to clear, contextually relevant information that is appropriate for the intended audience (such as health care providers or patients) including: the product’s intended use and indications for use, performance of the model for appropriate subgroups, characteristics of the data used to train and test the model, acceptable inputs, known limitations, user interface interpretation, and clinical workflow integration of the model. Users are also made aware of device modifications and updates from real-world performance monitoring, the basis for decision-making when available, and a means to communicate product concerns to the developer.
There is extensive public documentation describing biologit MLM-AI aimed at different target audiences:
A model fact sheet describes the model's intended use, target domains, level of supervision options, expected inputs and outputs and goes on to explain how the platform handles articles that are identified as outside the model's operating window. Fact sheets can be used by technical assessors of the platform and auditors.
Deployment of model updates will include Release Notes, covering new features with links to additional documentation where needed; Model releases are also accompanied by an up-to-date model fact sheet containing technical details on the underlying algorithm, datasets, and experimental results.
Biologit maintains a helpdesk to answer queries concerning the clinical workflow, UI Interpretation or AI models, and is used as input to routinely improve our documentation.
Deployed Models Are Monitored for Performance and Re-training Risks Are Managed: Deployed models have the capability to be monitored in “real world” use with a focus on maintained or improved safety and performance. Additionally, when models are periodically or continually trained after deployment, there are appropriate controls in place to manage risks of overfitting, unintended bias, or degradation of the model (for example, dataset drift) that may impact the safety and performance of the model as it is used by the Human-AI team.
A model monitoring framework using live data is in place and serves as input to biologit's AI risk management process. A periodic review of AI risks and controls drives improvements to model testing & documentation. It also ensures the team stays current with emerging regulatory guidance.
To help manage model re-training risks, model performance statistics and test results are version controlled and can be referred back to help benchmark the release of new models. Model performance is publicly available as part of the model fact sheet.
For more about our work in AI regulatory guidance and technical disclosures of biologit MLM-AI technology, please visit:
About biologit MLM-AI
Biologit MLM-AI is a complete literature screening platform built for pharmacovigilance teams. Its flexible workflow, unified scientific database, and unique AI productivity features deliver fast, inexpensive, and fully traceable results for any screening needs.