Broader reach in searching for adverse events articles - a case study with DOAJ and Crossref

Updated: May 24

By Naveen Basar and Bruno Ohana.

An efficient strategy for searching for adverse events in scientific literature should find as many relevant events as possible and maintain screening effort within reasonable levels.

Naturally, the next question is where to search? Past studies indicate results do improve when searching multiple established proprietary global literature databases. We decided to investigate databases that favor open models of scholarly publications that are now gaining traction in the academic world. Can open access be a cost-effective way to find more adverse events results from the literature?

In this post, we investigate open access scientific literature sources to complement a mainstream index (PubMed) on medical literature monitoring of adverse events:

The Directory of Open Access Journals (DOAJ) indexes open access literature from publishers worldwide. It currently hosts over 5 million records.

Crossref: a community organization dedicated to supporting scholarly communication. The Crossref metadata spans over 120 million records, with a growing proportion being published as open abstracts.

Can Crossref and DOAJ help us find more adverse events?

We performed a simple evaluation comparing search results from DOAJ and Crossref against PubMed as the benchmark index. We used the functionality available in biologit MLM-AI to extract and de-duplicate articles and use our AI models to perform the initial screening of abstracts for articles flagged as suspect adverse events.


  • Produce searches for a sample of two reference medications - Etanercept and Clopidogrel. To ensure broad results, our search strategy only included the product synonyms.

  • Results were produced for the periods of 2 November to 21 November, 2020 (simulating three consecutive weeks of literature screening).

  • Retrieved abstracts for adverse events were screened: MLM-AI model was used to filter suspected adverse events. Then articles containing suspected adverse events undergo further review by a drug safety expert, who then verifies if the article is a valid adverse event.

  • For articles describing valid adverse events, a final quality check and manual check against PubMed is done to ensure these were indeed unique hits.

The screening process with MLM-AI used in this study.

Screening for adverse events in biologit MLM-AI

Our findings

We de-duplicated results considering PubMed as the prime source (ie. if it was found in PubMed it was ignored in other sources). The chart below summarizes total articles found by source for the two products in our evaluation.

57 unique articles were retrieved from DOAJ and 32 from Crossref (84 in total) for this period, In addition to the 58 articles articles found in PubMed. Together, Crossref + DOAJ comprised 60% of unique results.

Suspected and valid adverse events

Out of the unique articles retrieved from Crossref + DOAJ, 41 were flagged as articles containing suspected adverse events by biologit MLM-AI, and out of those, 20 articles contained valid adverse events for the product of interest, as determined by a drug safety specialist.

In total, articles containing valid adverse events found only in Crossref + DOAJ corresponded to 77% of total valid adverse articles, with the remaining 23% found only in PubMed.

Articles containing adverse events from non-PubMed sources - what do they look like?

Journal status in PubMed and PMC

Using the journal ISSN, it is possible to lookup the journal status in PubMed/PMC here. This can help understand if the journal is or was ever known to PubMed.

Out of the 21 articles marked with valid adverse events, 11 (52%) came from journals whose ISSN is not known to PubMed. For the remainder articles, ISSNs were known to PubMed, with varying indexing status.

One reason indexed journals may not have their content visible is selective publication into PubMed. In the example of this article, the journal appears to follow selective (NIH portfolio) publication presently, as indicated here.

Article with no abstracts

Another interesting observation is the existence of some articles with no abstracts. In this example, the article is a poster presentation where only the full text is present. While we have not investigated the root cause this appears to be causing the article not to be indexed. In any case the full text of the article was presented in Crossref, and hence we were able to retrieve it.

Article recency

Because our search followed the date the article appears in the index (not strictly the publication date), we have found some articles from past years. This could have happened for example if an article was re-published, or if it has only recently been added to the index.

Overall, 57% (12) of articles containing a valid adverse event were from 2020, with the remaining containing publication dates between 2015 and 2019. It may still be useful to investigate these articles, if they were only now being made visible to the index.

Country of origin

The chart below outlines articles containing adverse events by publication country, according to publisher ISSN. The journals from UK and US are also indexed by PubMed/PMC, but the respective articles could not be found in PubMed’s main search engine, as discussed previously.

Conclusions - Finding More with Open Access Repositories

This evaluation compared medical literature monitoring for adverse events using three different data sources and two distinct products. After de-duplicating and screening by PV specialists, we found valuable articles in Crossref and DOAJ that would not have been found otherwise by searching only PubMed as the primary reference index.

Searching indexes such as Crossref and DOAJ tap into the growing trend in open academic publications and the potential to reach a wider number of publishers. This is encouraging, but at the same time searching a growing number of complementary sources is challenging: there is large overlap of results that require de-duplication, query strategies need to be translated and maintained in different search engines, and there will be invariably more articles to be screened.

This is where biologit MLM-AI can help: integrating sources into a single database facilitates de-duplication and consistent searching. The increase in volumes accrued by searching more sources can be offset by efficiencies in AI screening, translating to higher quality and more cost effective process.

Learn more

To learn more about biologit MLM-AI and how it can help your medical literature search needs: