top of page

Feature Spotlight: Duplicate Detection in biologit MLM-AI Platform

Updated: Apr 11

The reliable detection of duplicates in literature monitoring and other pharmacovigilance activities is widely recognised as a major challenge. Unmanaged duplicates not only result in re-work but also contribute to inconsistent recording of safety events.

In this article, we delve into the impact of duplicates results from literature searches and introduce the automatic duplicate detection in the Biologit MLM-AI Platform and show how teams engaged in medical literature monitoring of adverse events can improve the quality of results without sacrificing productivity.

Why duplicates matter in medical literature monitoring?

When performing medical literature monitoring, duplicates can arise due to:

  • Authors publishing the same article on different venues (pre-prints, conference proceedings, journals)

  • Publishers distributing articles to multiple databases at different times

Duplicate articles lead to extra screening and QC effort and more opportunities for inconsistent screening. For example: duplicates may cause Individual Safety Case Reports (ICSRs), safety signals or safety information for aggregate reports to be reported multiple times, confusing regulatory clock submission times, which may lead to non-compliances.

Duplicates also hinder the ability of safety teams to perform broad, comprehensive searches in the literature and find relevant events faster. Expanding the search to more databases means even more duplicates to check manually.

To illustrate the issue, the table below summarises search results for the month of May 2023 for a sample three products (Methotrexate, Paclitaxel and Telmisartan) using Biologit MLM-AI Platform. The search was performed on these scientific databases: PubMed, Crossref and DOAJ.

The impact of duplicate detection in medical literature monitoring for pharmacovigilance

The results contained between 20-29% duplicates.

Duplicates increase with the addition of more sources, but note however each source contributes with valuable unique non-duplicate articles that would otherwise have been missed if we didn't include additional databases.

It would be great if duplicates were not an issue, and we could run searches across as multiple databases, improving the quality of results. In the next section we investigate approaches that meet this need.

Duplicates handling in Biologit MLM-AI Platform

Biologit MLM-AI Platform employs a number of approaches to ensure high accuracy detection of duplicates:

Document Object Identifier (DOI)

Most scientific articles are issued a unique identifier by the publisher: the document object identifier can be used to find a citation’s authoritative source irrespective of where the article is retrieved from.

When using the MLM-AI database, searches can take advantage of the DOI collected from source metadata and use it for duplicate detection.

Content Similarity

What if DOIs are not available in the citation details? In this case a more sophisticated complementary approach is needed. MLM-AI Platform collects and normalizes data from multiple scientific literature databases, it is possible to perform content similarity check out of the box.

Checking if two abstracts are identical is not sufficient. To see why, consider the duplicate articles discovered by MLM-AI below. The content is “nearly” identical, can you spot the differences?

Duplicate detection with biologit MLM-AI

Duplicate detection with biologit MLM-AI
Similar but different: small changes in the same article can affect duplicate detection

By using machine learning methods to evaluate content similarity, MLM-AI can detect duplicated content even in noisy abstracts.

False Positive Protection

Finally, to ensure the relevance of results, additional checks are performed: the content of the abstract must be valid, as well as an additional match on the article title’s “signature”.

Sampling and evaluating results of the duplicate detection engine is also a routine risk control employed by our engineering team to maintain high quality results.

Using duplicate detection for faster medical literature screening

Biologit MLM-AI Platform was designed for the productivity and traceability needs of safety surveillance teams, it offers:

Automatically detected duplicate articles in all results

When automated duplicate detection is enabled, all duplicates appear in a separate tab. They have been pre-screened by MLM-AI and are automatically tagged with the “duplicate” exclusion.

Duplicate detection with biologit MLM-AI

All duplicate articles are visible and auditable

To ensure full traceability of automated decisions, users can inspect the duplicate decisions by clicking on any article to learn more about the corresponding duplicate article.

Duplicate detection with biologit MLM-AI

Duplicate articles in periodic searches

Periodic searches are a very common requirement for pharmacovigilance and other safety surveillance tasks. Biologit MLM-AI Platform also supports duplicate detection from previous results for the same search even when duplicates are published on different venues at different dates.


When it comes to detecting safety events and risks, maximising the use of available scientific literature is crucial. However, dealing with duplicate articles can often burden safety surveillance teams unnecessarily.

That's where Biologit MLM-AI Platform comes in. By employing advanced automatic duplicate detection techniques, Biologit MLM-AI Platform eliminates the need for manual handling of duplicates, delivering remarkable productivity and quality benefits transparently and with high accuracy.

Learn More

About biologit MLM-AI

biologit MLM-AI is a complete literature screening platform built for pharmacovigilance, medical device and veterinary vigilance teams. Its flexible workflow, unified global scientific database, which includes local journals and unique AI productivity features deliver fast, inexpensive, and fully traceable results for any screening needs. Teams can save up to 70% of their time and work collaboratively across departments.

biologit MLM-AI


bottom of page