You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sections/2_academic_impact/citation_impact.qmd
+8-11Lines changed: 8 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ The citation impact of publications reflects the degree to which they have been
33
33
34
34
## Metrics
35
35
36
-
Citations are affected by two major factors, that we expect to be irrelevant for considerations of impact: the field of research, and the year of publication^[\[1\]](#footnote-7540){#footnote-ref-7540}^. That is, some fields, such as Cell Biology, are much more citation intensive than other fields, such as Mathematics. Additionally, publications that were published in 2010 have had more time to accumulate citations than publications published in 2020. Controlling for these factors^[\[2\]](#footnote-23959){#footnote-ref-23959}^ is resulting in what are often called “normalised” citation indicators [@waltman2019]. Although such normalised citation indicators are more comparable across time and field, they are sometimes also more opaque. For that reason, we explain both normalised metrics and “raw”, non-normalised, citation metrics.
36
+
Citations are affected by two major factors, that we expect to be irrelevant for considerations of impact: the field of research, and the year of publication[^pub-year]. That is, some fields, such as Cell Biology, are much more citation intensive than other fields, such as Mathematics. Additionally, publications that were published in 2010 have had more time to accumulate citations than publications published in 2020. Controlling for these factors[^normalisation-factors] is resulting in what are often called “normalised” citation indicators [@waltman2019]. Although such normalised citation indicators are more comparable across time and field, they are sometimes also more opaque. For that reason, we explain both normalised metrics and “raw”, non-normalised, citation metrics.
37
37
38
38
In addition, we can distinguish between two approaches to calculating metrics based on citations. We can count the citations somehow, and provide a metric based on those citation count. Alternatively, we can use citations to identify which publications are highly cited [@waltman2013]. The reason for doing this is because citation counts themselves are typically very skewed [@radicchi2008], and statistics based on them might be less robust. For example, when taking the average, it might be affected by a single publication that is very highly cited.
39
39
@@ -127,7 +127,7 @@ The disadvantage of normalised citation counts is that they are more opaque. Peo
127
127
128
128
The basis of this metric is counting citations. Counting citations implies that references of publications need to be linked to the publication for which we need to count its citations. There are some challenges and limitations involved with counting citations. This is described in more detail in the data source section. Here, we assume that we somehow have obtained citations counts for all papers of interest.
129
129
130
-
In addition, this metric requires to calculate expected citation counts. Calculating expected citation counts can be challenging, and requires access to many other publications, typically all publications in a database. For this reason, reason, calculating normalised indicators can present considerable challenges. We will here explain the calculation of the expected citation counts for publications in the same field and in the same year, where we assume publications are assigned to exactly one field only^[\[3\]](#footnote-27519){#footnote-ref-27519}^. Note that the normalisation hence depends on the field classification used. See our indicator of fields for more details.
130
+
In addition, this metric requires to calculate expected citation counts. Calculating expected citation counts can be challenging, and requires access to many other publications, typically all publications in a database. For this reason, reason, calculating normalised indicators can present considerable challenges. We will here explain the calculation of the expected citation counts for publications in the same field and in the same year, where we assume publications are assigned to exactly one field only[^field-overlap]. Note that the normalisation hence depends on the field classification used. See our indicator of fields for more details.
131
131
132
132
In more formal notation, let $c_i$ be the citation score of a paper $i$, let $f_i$ be the field of paper $i$ and let $y_i$ be the year of publication of paper $i$. The expected number of citations $e_{fy}$ for a publication in field $f$ and year $y$ is then calculated as the average number of citations for publications in the same field and the same year. Let $S_{fy} = \{i \mid f_i = f \text{~and~} y_i = y\}$ be the set of publications in the same field and year. Let $n_{fy} = |S_{fy}|$ be the number of such publications. Then, the expected number of citations in the same field and year can be defined as
133
133
@@ -260,14 +260,11 @@ The database is closed access, and we therefore do not provide more details abou
260
260
261
261
As already clarified, citations are affected in general by field and publication year, and these are quite clearly causal effects. There are many other factors that correlate with citations [@onodera2015], for which most it is unclear whether the effect is causal. One factor that is consistently associated with more citations is collaboration [@larivière2015], which is potentially driven by network effects [@schulz]. In addition, there is evidence for a clear causal effect of the journal where something is published on citations [@traag2021].
262
262
263
-
1. ::: {#footnote-7540}
264
-
The publication year can also be relevant still, even when we take similar so-called citation windows. For example, we can count citations for only 10 years after the year of publication. However, even then, publications published in 1990 (with counting citations until 2000) will show a different average number of citations from publications published in 2000 (with counting citations until 2010). This is because there has been a growth in the number of publications each year, and additionally an increase in the number of references in publications. [↑](#footnote-ref-7540)
265
-
:::
263
+
[^pub-year]:
264
+
The publication year can also be relevant still, even when we take similar so-called citation windows. For example, we can count citations for only 10 years after the year of publication. However, even then, publications published in 1990 (with counting citations until 2000) will show a different average number of citations from publications published in 2000 (with counting citations until 2010). This is because there has been a growth in the number of publications each year, and additionally an increase in the number of references in publications.
266
265
267
-
2. ::: {#footnote-23959}
268
-
Sometimes, normalisation also considers the “document type” of publications, differentiating for example between editorial letters, research articles or reviews. This would be reasonable if we expect the document type to be unrelated to the impact, as we expect for field of research and year of publication. Whether this is the case can be debated. [↑](#footnote-ref-23959)
269
-
:::
266
+
[^normalisation-factors]:
267
+
Sometimes, normalisation also considers the “document type” of publications, differentiating for example between editorial letters, research articles or reviews. This would be reasonable if we expect the document type to be unrelated to the impact, as we expect for field of research and year of publication. Whether this is the case can be debated.
270
268
271
-
3. ::: {#footnote-27519}
272
-
That is, we assume that the used field classifications do not overlap. Some field classifications do overlap, in which case the normalisation becomes more complicated. One approach to this is to fractionalise publications per field, and then perform normalisation within each field separately, and then average across fields afterwards [@waltman2011]. [↑](#footnote-ref-27519)
273
-
:::
269
+
[^field-overlap]:
270
+
That is, we assume that the used field classifications do not overlap. Some field classifications do overlap, in which case the normalisation becomes more complicated. One approach to this is to fractionalise publications per field, and then perform normalisation within each field separately, and then average across fields afterwards [@waltman2011].
| 1.5 | 2024-04-26 | Third draft | Delugas E., Catalano G. |
33
31
| 1.4 | 2024-04-17 | Peer review | V.A. Traag |
34
32
| 1.3 | 2024-04-04 | Second draft | Delugas E., Catalano G. |
35
33
| 1.2 | 2023-09-11 | Peer review | V.A. Traag |
36
34
| 1.1 | 2023-07-04 | Draft indicator template | Caputo A., Delugas E., Vignetti S. |
37
-
38
35
:::
39
36
40
37
## Description
@@ -99,7 +96,7 @@ Where:
99
96
100
97
### Labour cost savings given the availability of OS resources
101
98
102
-
The labour cost saving metric aims to capture the net effect generated by the availability of OS on the working hours, which is expressed in the personnel cost equivalent for time saved.^[\[1\]](#footnote-2){#footnote-ref-2}^ For example, for a workday saved for a single researcher, the labour cost saving would mirror the daily salary. The savings may happen because of the availability of OS resources that facilitate the reduction of research output duplications (e.g., codes, papers, data) and improve professionals’ productivity by speeding up their work, allowing for task automation. For example, the availability of open data avoids collecting the same data more than once; open code saves time by reducing the need to write code (i.e., programming) from scratch. Similarly, data mining techniques automate information collection, which would otherwise require manual effort. Working time savings also occur due to a potential decrease in transaction costs, as closed environments require more time to obtain information or involve more complex procedures. Labour cost-saving is a helpful metric to gauge the production efficiency gains facilitated by OS resources, as it assesses the variation of one of the two components of the standard productivity indicator.
99
+
The labour cost saving metric aims to capture the net effect generated by the availability of OS on the working hours, which is expressed in the personnel cost equivalent for time saved[^market-wages]. For example, for a workday saved for a single researcher, the labour cost saving would mirror the daily salary. The savings may happen because of the availability of OS resources that facilitate the reduction of research output duplications (e.g., codes, papers, data) and improve professionals’ productivity by speeding up their work, allowing for task automation. For example, the availability of open data avoids collecting the same data more than once; open code saves time by reducing the need to write code (i.e., programming) from scratch. Similarly, data mining techniques automate information collection, which would otherwise require manual effort. Working time savings also occur due to a potential decrease in transaction costs, as closed environments require more time to obtain information or involve more complex procedures. Labour cost-saving is a helpful metric to gauge the production efficiency gains facilitated by OS resources, as it assesses the variation of one of the two components of the standard productivity indicator.
103
100
104
101
#### Measurement
105
102
@@ -215,6 +212,5 @@ Where:
215
212
216
213
Cost savings are directly linked to the metrics of “Innovation output” and “Industry adoption of research findings”. The rationale is that the savings accrued in both time and finances could potentially be redirected towards R&D investments. Over time, this reallocation could result in a significant increase in R&D productivity and innovation. Furthermore, the cost savings over time might trigger “Economic growth of companies” in terms of variations in productivity and assets. For instance, the money saved from lower access and storage costs can be reinvested in the company. This reinvestment could go towards R&D – which also relates to the innovation output indicator – and expanding operational capacity, which can drive revenue growth and increase the company’s assets and innovation capability, both key ingredients of company growth. Similarly, by saving time, companies can achieve more with the same or fewer resources. This means that businesses can offer more products or services or improve the quality of their offerings without a corresponding increase in costs. Over time, this contributes to economic growth by enhancing the company’s competitive edge and market share. The direct involvement of efficiency improvements in the economic growth of companies is also evident in asset optimisation. By maximising the utility of existing assets (mostly intangibles), companies can achieve higher returns on investment (ROI) over time.
217
214
218
-
1. ::: {#footnote-2}
219
-
The market wages are typically biased due to several factors related to labour product markets. However, when considering the researchers wage, it is often argued that they are a good approximation of the social cost of labour given the assumption [@guideto2015]. [↑](#footnote-ref-2)
220
-
:::
215
+
[^market-wages]:
216
+
The market wages are typically biased due to several factors related to labour product markets. However, when considering the researchers wage, it is often argued that they are a good approximation of the social cost of labour given the assumption [@guideto2015].
0 commit comments