Library

Data Processing Information

The aim of Nature Navigator is to provide detailed analysis and insights on individual research topics in order to aid your strategic and tactical decision making. Below is an explanation of how Springer Nature is processing the data from different raw data sources. It also contains instructions for you to get in contact with us in case you find a mistake in our data or in case you would like to request a rectification/removal of information on this website which is in conflict with your personal legit interest.

I. Data Sources

How does Nature Navigator source data?

Nature Navigator does not actively source data. We process different raw data sources in order to provide the information you can find on topic pages. Which data source is used for a specific topic can be subject to individual licence terms. In general, the different available data sources are the following, but not limited to those:

CrossRef
Dimension from Digital Science
OpenAlex
Lens.org
Internal and proprietary customer data

The raw data on which Nature Navigator insights are based is a derivative of standard bibliographic metadata of scientific content. In order to provide meaningful and accurate summaries of current research landscapes we are evaluating and processing metadata from above sources. This metadata does contain information about individuals, e.g. authors of scientific content and their affiliations. You might therefore be part of topic overviews if one or more content pieces have been authored by you.

What data attributes is Nature Navigator using?

In order to summarise and analyse content for a specific research landscape, we are processing several different metadata attributes of scientific content, including but not limited to the following

Metadata around the publication of content

DOI
ISSN
ISBN
Journal name
Book series
Publisher name
Publication dates
Preprint platform

Metadata around the authorship of content

Author names (this can be you)
Author identifiers, like ORCiD
Contribution status, like being the corresponding author
Author affiliations

Metadata describing the content

Concepts
Domain classifications like research field

Metadata around the reception of content by the community

Citation links to other content
Reference links to other content
Local and global impact metrics of individual content, such as online mentions

Why are the above attributes important and relevant?

Scientific fields are shaped and influenced by the research and more importantly the researchers conducting the research. In order to summarise how a certain research field or topic is evolving and to be able to predict how it might evolve in future, it is therefore key to look at who is contributing with what to the research landscape. Our aim is to provide our users with an objective understanding of the state of certain fields.

II. Data Process

There are two main steps in creating topic overviews and summaries for Nature Navigator:

Identification and selection of relevant content
Aggregation and summarisation of relevant content

How relevant scientific content is selected

The selection of relevant content is based on selection criteria specified by the creator of a topic. The creator can be an editor, data specialist or scientist at Springer Nature as well as individual end users of the platform. There are two basic modes of content selection.

Direct content selection

Direct content selection refers to defining a certain set of metadata related criteria which are used to filter all available content. These criteria are directly related to the attributes we are sourcing, which are described in What data attributes is Nature Navigator using? Following are a few examples of possible direct selection criteria, written in non-technical terms

Available scientific publications from the last 5 years, which mentioned at least one of the following concepts: wind energy, solar energy, hydropower
A predefined list of publication identifiers
Content authored by researchers affiliated with institutions in a certain country. A selection like this can be useful if the aim is to analyse the research landscape of a certain country or research organisation.

In general, we avoid selection bias based on reception or impact metrics, i.e. there is no up-front exclusion of content that is published in journals with lower impact.

Indirect content selection

Another and more powerful way (for our creators) of selecting content is through indirect selection. In this case example content from the desired field of interest is used to select similar content using state-of-the-art classification techniques. Creators provide sample content as well as the type of similarity they would like the machine to use.Indirect selection via similarity is in principle more susceptible to biases due to the selection the creator did while seeding the process. Please see Improper selection criteria for more information.

How relevant scientific content is analysed

There are different levels of analysis that are done on the selected content above.

Statistical analysis

We offer standard ways of aggregating the sum of all content into meaningful charts for analysis. The simplest example of a statistical analysis could be an aggregation of all relevant content based on one or more metadata criteria, e.g. the publication output in the topic per year.

Relationship and network analysis

Certain metadata enable the creation of relationships and connections between the individual content pieces, like the co-authorship (two authors or affiliations contributing to the same content piece), co-citation (two content pieces referencing the same third content pieces) or usage of a similar subset of concepts. We use common network analysis techniques to identify relationships and the strength of relationships within the content selection. The result of such an analysis can be for example the network of authors which surfaces and visualises

Active authors in the field
The amount of content authors contribute to a field
Collaboration between authors that work together
Clustering the topic of “renewable energy” into “wind energy”, “solar energy” and “hydropower” by analysing the overlapping concepts
Clustering researchers who collaborate together on a certain sub-topic

Is your analysis biassed and how do you avoid biases?

In principle, there are different sources for biases in the content processing described above. We are continuously working on minimising the impact of biases where we are in direct control.

Improper selection criteria

Any analysis is influenced directly by the raw data that is fed into it. Hence, the section criteria are a common source for bias by ignoring or suppressing a relevant amount of content. Therefore, creators have to be mindful about the selection criteria they are using. For topics created by Springer Nature we try to use only filter criteria that are directly related to the question, e.g. applying a country filter only makes sense when the research question to be answered is about the research of a given country/region. When working with sample content for indirect selection, we aim to use large input lists with diverse content, e.g. content of many publishers.

Data completeness

Another source for biases is incomplete data. We try to address this by constantly improving and enriching our raw data sources as well as avoiding analysis on attributes that are known to be incomplete. In cases where the current data landscape does not allow for more complete data and we believe this can have an impact on the analysis we aim to indicate this to users.

III. Rectification and removal of data

In case you see a mistake within our content or should you be of the opinion that it infringes your own interest, please do not hesitate to contact us. We are looking forward to hearing from you and to finding a suitable way forward. Please send an email to navigator@nature.com. It will help us tremendously if you send us a link and screenshot of the detail you would like us to have a look at.