Analysis shows strengths and potential of open bibliometric data compared to Scopus and Web of Science

news
Author
Affiliation

Jack Culbert

GESIS - Leibniz Institute for the Social Sciences

Published

June 10, 2025

A paper was recently published in Scientometrics (Culbert et al. 2025) by members of the KB on a large-scale comparison of three of the bibliometric databases provided in the KB: the Web of Science (WoS), Scopus and OpenAlex.

In this paper, we matched records classified as articles and published between 2015-2022 in the three databases based on DOIs, excluding articles in each database which had another record with the same DOI. This gave us a “shared” corpus of 16,788,282 records (out of the ~71M in WoS, 65M in Scopus and 243M in OpenAlex.) A Venn diagram of the deduplicated match is as follows:

We then focused on calculating the reference coverage in each database, that is the proportion of referenced articles in an article which are available in each database. Our results are as follows:

WoS Scopus OpenAlex
Whole Corpus
Reported Average Reference Count 24.765 31.254
Pre-calculated Average Source Reference Count 16.867 18.692 7.572
Internal Coverage 68.1% 59.8%
Shared Corpus (2015–2022)
All References
Reported Average Reference Count 43.185 43.320
Pre-calculated Average Source Reference Count 33.416 33.363 34.863
Internal Coverage 77.4% 77.0%
References 1996–2022
Calculated Average Reference Count 38.226 38.062
Calculated Average Source Reference Count 31.207 33.359 31.823
Internal Coverage 81.6% 87.6%

Firstly, OpenAlex does not have a Reference Count in its database as it only reports articles inside OpenAlex as references, which we have termed a source reference count. This also prevents us from directly calculating the internal coverage. However, we can assume that the fairly similar reference counts in WoS and Scopus are accurate, and with this assumption we see:

We can therefore infer from our assumption that the reference coverage of OpenAlex is somewhere between 83.6% and 83.2% for articles within the Shared Corpus and with references between 1996 and 2022.

We also wished to check that the figures reported from the providers is accurate, and found that calculating the ratio of references per record across the whole of each corpus was as follows:

Whole Corpus WoS Scopus OpenAlex
Ratio of References per Record 24.765 30.979 7.592
Reported Average Total Reference Count 24.765 31.254
Reported Average Source Reference Count 16.867 18.692 7.572

Which indicates caution may be required when utilising Scopus’ reported reference counts. The discrepancy was lower however for a similar calculation on articles only.

We also performed additional analysis on the corpus also analysing other metadata on a journal aggregated basis, showing that:

  1. The distribution of reference coverage per journal is similar between WoS and Scopus against OpenAlex – implying that the reason for the differing reference coverage is independent of the database. The distribution of reference coverage between WoS and Scopus is similar.
  2. Article Funding information is better captured in both WoS and Scopus than OpenAlex.
  3. Open Access information is similarly captured in all databases.
  4. ORCID identifiers are much better captured in OpenAlex.
  5. Abstracts are better covered in WoS and Scopus than in OpenAlex.

In summary, we believe we been able to identify some interesting differences and similarities between OpenAlex, WoS and Scopus, demonstrating that while OpenAlex captures a larger remit of records it still performs comparably on reference coverage on a modern and “curated” dataset similar to WoS and Scopus. Furthermore we have demonstrated other aspects of the record metadata to be equivalent.

References

Culbert, J., A. Hobert, N. Jahn, N. Haupka, M. Schmidt, P. Donner, and P. Mayr. 2025. “Reference Coverage Analysis of OpenAlex Compared to Web of Science and Scopus.” Scientometrics. https://doi.org/10.1007/s11192-025-05293-3.

Citation

BibTeX citation:
@online{culbert2025,
  author = {Culbert, Jack},
  title = {Analysis Shows Strengths and Potential of Open Bibliometric
    Data Compared to {Scopus} and {Web} of {Science}},
  date = {2025-06-10},
  url = {http://www.open-bibliometrics.de/posts/20250610-OpenAlexCoveragePaper/},
  langid = {en}
}
For attribution, please cite this work as:
Culbert, Jack. 2025. “Analysis Shows Strengths and Potential of Open Bibliometric Data Compared to Scopus and Web of Science.” June 10, 2025. http://www.open-bibliometrics.de/posts/20250610-OpenAlexCoveragePaper/.