Address Information in OpenAlex, Web of Science, and Scopus: First Insights
Introduction
Bibliometric analysis often focuses on the organizational level, with applications in quantitative science studies, research evaluation, and monitoring activities. To identify publications accurately, it is essential to link address information found in bibliometric databases to real organizations. Past experience has shown that the data quality of the two major proprietary databases, Web of Science (WoS) and Scopus, is lacking in several respects. Against this background, the Bibliometrics Working Group at Bielefeld University has carried out address disambiguation, assigning German address records from WoS and Scopus to actual research institutions (Winterhager, Schwechheimer, and Rimmert (2014). Rimmert, Schwechheimer, and Winterhager (2017)). The procedure relies on regular expression patterns and continuously accounts for structural changes in the institutional landscape over time, enabling the unambiguous assignment of address information. All components necessary to perform address disambiguation for German research institutions have recently been published (Lenke and Taubert (2025)).
With the emergence of OpenAlex (Priem, Piwowar, and Orr (2022)), a promising open bibliometric database has entered the field. The OpenBib project, conducted by partners of the German Kompetenznetzwerk Bibliometrie (Schmidt et al. (2025)), aims to explore its potential, develop curation procedures, and publish OpenAlex snapshots with curated data on publications involving German research institutions (Haupka et al. (2025)). For the curation of address information, the address disambiguation methods proven in proprietary bibliometric databases have been adapted and optimized. Before adapting the disambiguation procedure, the characteristics of address information in OpenAlex were explored and compared in two steps. First, the structure of this information was analysed and compared with the two established bibliometric databases, WoS and Scopus, based on a relatively small sample. Second, for more recent publications, address information was explored and compared based on all publications with a DOI covered by the three databases. This blogpost gives an overview about the main results.
Results from the small sample exploration
To examine the structure and characteristics of OpenAlex address data, and to draw lessons for adapting the address disambiguation procedure, a small random sample was constructed. It consists of 5,000 Digital Object Identifiers (DOIs) that are covered by WoS, Scopus, and OpenAlex. For each of the three databases, the address–document combinations related to these DOIs were retrieved. The comparison of the three subsets reveals the following noteworthy aspects:
- Normalization of address–document combinations: The number of distinct address–document combinations is significantly smaller in OpenAlex compared to the other two databases. For the sample, WoS provides 10,926 distinct address–document combinations, Scopus 11,808 combinations, and OpenAlex only 8,511combinations. The substantially smaller number of distinct address–document combinations in OpenAlex results from the normalization efforts undertaken by the OpenAlex team.
- Length of the address string: There are considerable differences in the length of address strings across the databases. The average length within the sample is 64 characters for WoS, 94 for Scopus, and 104 for OpenAlex. Similar differences were also observed regarding the maximum string length: the longest string is 175 characters in the WoS and 333 characters in the Scopus and 623 characters in the OpenAlex sample.
- Average number of address–document combinations: The average number of address–document combinations also differs across the databases. In the 5,000 DOI sample, the average number of addresses per document is a 2.19 in the case of WoS and 2.36 in the case of Scopus, but considerably lower in the case of OpenAlex, with an average of 1.70 addresses per DOI.
Full data base comparison (2019-2024)
For publications with a DOI and publication address information, the five-year period from 2019 to 2024 was analysed. To prepare the analysis, the address disambiguation procedure was applied to the address strings of all three databases. Selected performance indicators of the procedure are reported in Table 1.
Based on disambiguated address data, which assigned addresses to institutions, an identifier was created for every distinct institution–DOI combination. This identifier allowed to determine which institution–DOI combinations are covered by each database. For the comparison of the three databases, the following indicators were calculated: the number of DOIs and the number of institution–document combinations. The results are reported in Table 2.
Moreover, the overlap of the coverage of DOI and related institution-document-combination were investigated for all three data bases.
Conclusion
Within the OpenBIB project, several lessons were learned about address information in OpenAlex and the application of established curation procedures:
First, the structure of OpenAlex’s address information supports data curation through the address disambiguation procedure. However, as the average length of the address strings indicates, this procedure takes longer to run on OpenAlex data than on proprietary databases. At that time, runtimes for proprietary databases ranged between 5 and 12 days, making further performance optimization necessary. In response, our partners at FIZ Karlsruhe re-implemented the address disambiguation procedure, which drastically reduced the runtime and is now publicly available (Lenke and Taubert (2025)).
Second, the results of the address disambiguation procedure show that the proportion of assigned addresses in OpenAlex is lower than in proprietary databases, though still very promising. This ratio is expected to improve once the disambiguation procedure is optimized for OpenAlex.
Third, a full database comparison shows that OpenAlex is larger than the two proprietary databases and also includes most of their DOI and institution–document combinations. The average number of institution–document combinations suggests that coverage of address information is more complete in recent publication years than in earlier ones, as indicated by the small random sample.
To sum up, address information in OpenAlex appears to be evolving. As this process continues, we may witness the emergence of an open alternative source for bibliometric data—after six decades of dominance by proprietary players.
References
Citation
@online{lenke2025,
author = {Lenke, Christopher and Taubert, Niels},
title = {Address {Information} in {OpenAlex,} {Web} of {Science,} and
{Scopus:} {First} {Insights}},
date = {2025-09-23},
url = {http://www.open-bibliometrics.de/posts/20250923-AddressInsights/},
langid = {en}
}