Broader reach in searching for adverse events articles - a case study with DOAJ and Crossref

Updated: Mar 23

By Naveen Basar and Bruno Ohana.

An efficient strategy for searching for adverse events in scientific literature should find as many relevant events as possible and maintain screening effort within reasonable levels.

Naturally, finding more adverse events is directly related to the question of where to search. Past studies suggest results do improve when searching multiple established proprietary global literature databases. We decided to investigate databases that favor open models of scholarly publications, now gaining traction in the academic world. Can they be a cost-effective way to more adverse events results from the literature?

In this post, we investigate the use of alternative scientific literature sources to complement searching for adverse events on a mainstream index (PubMed). In particular we explored:

  • The Directory of Open Access Journals (DOAJ) indexes academic literature with an open access license from publishers worldwide. It currently hosts over 5 million records.

  • Crossref: a community organization dedicated to supporting scholarly communication by generating metadata and providing services for content discoverability. The Crossref metadata spans over 120 million records, with a growing proportion being published as open abstracts.

Can these databases help us find more adverse events?

We performed a simple evaluation comparing search results from DOAJ and Crossref against PubMed as the benchmark index. To help in this effort, we took advantage of biologit MLM-AI’s infrastructure to extract and de-duplicate articles from these three sources, and perform the initial screening of abstracts for adverse events.


  • Produce searches for a sample of two reference medications - Etanercept and Clopidogrel. To ensure broad results, our search strategy only included the product synonyms.

  • Results were produced for the periods of 2 November to 21 November, 2020 (simulating three consecutive weeks of literature screening).

  • Retrieved abstracts for adverse events were screened: MLM-AI model was used to filter suspected adverse events. Then articles containing suspected adverse events undergo further review by a drug safety expert, who then verifies if the article is a valid adverse event.

  • For articles describing valid adverse events, a final quality check and manual check against PubMed is done to ensure these were indeed unique hits.

The screening process with MLM-AI used in this study.

Screening for adverse events in biologit MLM-AI

Our findings

We de-duplicated results considering PubMed as the prime source (ie. if it was found in PubMed it was ignored in other sources). The chart below summarizes total articles found by source for the two products in our evaluation.

57 unique articles were retrieved from DOAJ and 32 from Crossref (84 in total) for this period, In addition to the 58 articles articles found in PubMed. Together, Crossref + DOAJ comprised 60% of unique results.

Suspected and valid adverse events

Out of the unique articles retrieved from Crossref + DOAJ, 41 were flagged as articles containing suspected adverse events by biologit MLM-AI, and out of those, 20 articles contained valid adverse events for the product of interest, as determined by a drug safety specialist.

In total, articles containing valid adverse events found only in Crossref + DOAJ corresponded to 77% of total valid adverse articles, with the remaining 23% found only in PubMed.

Articles containing adverse events from non-PubMed sources - what do they look like?

Journal status in PubMed and PMC

Using the journal ISSN, it is possible to lookup the journal status in PubMed/PMC here. This can help understand if the journal is or was ever known to PubMed.

Out of the 21 articles marked with valid adverse events, 11 (52%) came from journals whose ISSN is not known to PubMed. For the remainder articles, ISSNs were known to PubMed, with varying indexing status.

One reason indexed journals may not have their content visible is selective publication into PubMed. In the example of this article, the journal appears to follow selective (NIH portfolio) publication presently, as indicated here.

Article with no abstracts

Another interesting observation is the existence of some articles with no abstracts. In this example, the article is a poster presentation where only the full text is present. While we have not investigated the root cause this appears to be causing the article not to be indexed. In any case the full text of the article was presented in Crossref, and hence we were able to retrieve it.

Article recency

Because our search followed the date the article appears in the index (not strictly the publication date), we have found some articles from past years. This could have happened for example if an article was re-published, or if it has only recently been added to the index.

Overall, 57% (12) of articles containing a valid adverse event were from 2020, with the remaining containing publication dates between 2015 and 2019. It may still be useful to investigate these articles, if they were only now being made visible to the index.

Country of origin

The chart below outlines articles containing adverse events by publication country, according to publisher ISSN. The journals from UK and US are also indexed by PubMed/PMC, but the respective articles could not be found in PubMed’s main search engine, as discussed previously.


This evaluation compared medical literature monitoring for adverse events using three different data sources and two distinct products. After de-duplicating and screening by PV specialists, we found valuable articles in Crossref and DOAJ that would not have been found otherwise by searching only in a primary reference index: PubMed.

Searching indexes such as Crossref and DOAJ tap into the growing trend in open academic publications and the potential to reach a wider number of publishers. This is encouraging, but at the same time searching a growing number of complementary sources is challenging: there is large overlap of results that require de-duplication, query strategies need to be translated and maintained in different search engines, and there will be invariably more articles to be screened.

We believe these problems can be addressed with the right technology: integrating sources into a single database facilitates de-duplication and consistent searching. The increase in volumes accrued by searching more sources can be offset by efficiencies in AI screening, translating to higher quality and more cost effective process. These innovative principles are behind the design of our solution, biologit MLM-AI, and were applied in practice as part of this study.

Learn more

To learn more about biologit MLM-AI and how it can help your medical literature search needs, get in touch with us and request a demo.