The power of open access adverse events searches - a case study with Crossref and DOAJ

Bruno Ohana
Apr 5, 2022
5 min read

Updated: Feb 21, 2024

Choosing scientific literature databases for safety monitoring

An efficient strategy for searching for medical literature monitoring of adverse events should find as many relevant events as possible and maintain screening efforts within reasonable levels.

What databases to search? Past studies indicate results improve when searching multiple global literature databases. Indeed, regulatory guidelines from the EMA (GVP Module IV, Appendix 2) states that "It is best practice to have selected one or more databases appropriate to a specific product"

In this article we investigate databases that favor open models of scholarly publications that are gaining traction in the academic world.

The Biologit Database is built-in the MLM-AI platform, and offers direct access to various scientific repositories of global reach. With that, we investigate:

Can open access be a cost-effective way to find more adverse events results from the literature?

We considered two well regarded open access sources complementing a mainstream repository (PubMed) for the medical literature monitoring of adverse events:

The Directory of Open Access Journals (DOAJ) indexes open access literature from publishers worldwide. It currently hosts over 5 million records.

Crossref: a community organization dedicated to supporting scholarly communication. The Crossref metadata spans over 120 million records, with a growing proportion being published as open abstracts.

The above open access repositories and PubMed are available from the biologit MLM-AI database; searches are run seamlessly across all sources and de-duplicated with no additional configuration needed.

Can Crossref and DOAJ help us find more adverse events?

We performed a simple evaluation comparing search results from DOAJ and Crossref against PubMed as the benchmark index. We used the functionality available in biologit MLM-AI to extract and de-duplicate articles and use our AI models to perform the initial screening of abstracts for articles flagged as suspect adverse events.

Methodology:

Produce searches for a sample of two reference medications - Etanercept and Clopidogrel. To ensure broad results, our search strategy only included the product synonyms.
Results were produced for the periods of 2 November to 21 November, 2020 (simulating three consecutive weeks of literature screening).
Retrieved abstracts for adverse events were screened: MLM-AI model was used to filter suspected adverse events. The articles containing suspected adverse events undergo further review by a drug safety expert, who then verifies if the article is a valid adverse event.
For articles describing valid adverse events, a final quality check and manual check against PubMed is done to ensure these were indeed unique hits.

The screening process with MLM-AI used in this study.

screening the scientific literature for adverse events

Screening for adverse events in biologit MLM-AI

Our findings

Articles were automatically de-duplicated considering PubMed as the prime source (ie. if it was found in PubMed it was ignored in other sources). The chart below summarizes total articles by source for the two products:

57 unique articles were retrieved from DOAJ and 32 from Crossref (84 in total) for this period, In addition to the 58 articles articles found in PubMed. Together, Crossref + DOAJ comprised 60% of unique results.

finding unique events from the medical literature with the biologit database

Suspected and valid adverse events

Out of the unique articles retrieved from Crossref + DOAJ, 41 were flagged as articles containing suspected adverse events by biologit MLM-AI, and out of those, 20 articles contained valid adverse events for the product of interest, as determined by a drug safety specialist.

In total, articles containing valid adverse events found only in open access databases (Crossref + DOAJ) corresponded to 77% of total valid adverse articles, with the remaining 23% found only in PubMed.

Articles containing adverse events from non-PubMed sources - what do they look like?

Journal status in PubMed and PMC

Using the journal ISSN, it is possible to lookup the journal status in PubMed/PMC here. This can help understand if the journal is or was ever known to PubMed.

Out of the 21 articles marked with valid adverse events, 11 (52%) came from journals whose ISSN is not known to PubMed. For the remainder articles, ISSNs were known to PubMed, with varying indexing status.

One reason indexed journals may not have their content visible is selective publication into PubMed. In the example of this article, the journal appears to follow selective (NIH portfolio) publication presently, as indicated here.

Article with no abstracts

Another interesting observation is the existence of some articles with no abstracts. In this example, the article is a poster presentation where only the full text is present. While we have not investigated the root cause this appears to be causing the article not to be indexed. In any case the full text of the article was presented in Crossref, and hence we were able to retrieve it.

Article recency

Because our search followed the date the article appears in the index (not strictly the publication date), we have found some articles from past years. This could have happened for example if an article was re-published, or if it has only recently been added to the index.

Overall, 57% (12) of articles containing a valid adverse event were from 2020, with the remaining containing publication dates between 2015 and 2019. It may still be useful to investigate these articles, if they were only now being made visible to the index.

Country of origin

The chart below outlines articles containing adverse events by publication country, according to publisher ISSN. The journals from UK and US are also indexed by PubMed/PMC, but the respective articles could not be found in PubMed’s main search engine, as discussed previously.

Conclusions - Finding More with Open Access Repositories

This evaluation compared medical literature monitoring for adverse events using three different data sources and two distinct products. After de-duplicating and screening by PV specialists, we found valuable articles in Crossref and DOAJ that would not have been found otherwise by searching only PubMed as the primary reference index.

Searching open access indexes tap into the growing trend in open academic publications and the potential to reach a wider number of publishers. This is encouraging, but at the same time searching a growing number of complementary sources is challenging: there is large overlap of results that require de-duplication, query strategies need to be translated and maintained in different search engines, and there will be invariably more articles to be screened.

This is where biologit MLM-AI can help: integrating sources into a single database facilitates de-duplication and consistent searching. The increase in volumes accrued by searching more sources is offset by efficiencies in AI screening and the integrated workflow, translating to higher quality and a more cost-effective process.

About biologit MLM-AI

biologit MLM-AI is a complete literature screening platform built for pharmacovigilance teams. Its flexible workflow, unified scientific database, and unique AI productivity features deliver fast, inexpensive, and fully traceable results for any screening needs.