PathOS-project · PetrosStav · Aug 25, 2025 · Aug 26, 2025 · Oct 8, 2025 · vtraag
diff --git a/sections/5_reproducibility/reproducibility_composite_confidence_index.qmd b/sections/5_reproducibility/reproducibility_composite_confidence_index.qmd
@@ -0,0 +1,249 @@
+---
+author:
+    - name: P. Stavropoulos
+      orcid: 0000-0003-1664-6554
+      affiliations:
+      - ref: arc
+
+affiliations:
+- id: arc
+  name: Athena Research Center
+  city: Athena
+  country: Greece
+
+title: Reuse of data in research
+---
+
+
+::: {.callout collapse="true"}
+
+
+# History
+
+| Version | Revision date | Revision    | Author              |
+|---------|---------------|-------------|---------------------|
+| 1.0     | 2023-08-25    | First draft | Petros Stavropoulos |
+
+:::
+
+# Description
+
+The **Reproducibility Composite Confidence Index (RCCI)** is a comprehensive indicator that assesses the quality, reusability, and trustworthiness of **research artefacts** (datasets, data collections, code, or software).  
+
+A high RCCI score indicates that an artefact is:  
+
+- **Highly cited** in its field (scholarly impact).  
+- **Frequently reused** by others in the scientific community.  
+- **Accepted and trusted** by peers, as reflected in citation sentiment.  
+- **Well-documented and FAIR-aligned**, with metadata enabling discoverability and reuse.  
+
+This indicator was first introduced and tested in the **TIER2 project** ([tier2-project.eu](https://tier2-project.eu/)), where it was implemented in a pilot **Reproducibility Dashboard** for funders and research-performing organisations (RPOs). The RCCI was presented and reviewed in **two stakeholder webinars** and in discussions with funders and RPOs, where feedback confirmed its value for monitoring research reproducibility.
+
+# Metrics
+
+## RCCI
+
+The RCCI integrates four dimensions into a **single score**:
+
+1. **Normalized Citation Impact (NCI)** → measures academic impact (see [Citation Impact](../2_academic_impact/citation_impact.qmd)).  
+2. **Field-Weighted Reusability Index (FWRI)** → measures how often artefacts are reused relative to others in the same field (based on [Reuse of Code in Research](../5_reproducibility/reuse_of_code_in_research.qmd) and [Reuse of Data in Research](../5_reproducibility/reuse_of_data_in_research.qmd)).  
+3. **FAIR Index (FI)** → measures metadata completeness and alignment with [FAIR data practices](../1_open_science/prevalence_open_fair_data_practices.qmd).  
+4. **Reproducibility Confidence Index (RCI)** → measures community sentiment using polarity of publications (see [Polarity of Publications](../5_reproducibility/polarity_of_publications.qmd)).  
+
+The RCCI is calculated as:
+
+$$
+RCCI = NCI \times FWRI \times FI \times RCI
+$$
+
+A value greater than 1 (after scaling) suggests that artefacts are impactful, widely reused, FAIR-compliant, and positively regarded in the scientific community.
+
+---
+
+### Measurement
+
+#### 1. Normalized Citation Impact (NCI)
+
+**Definition:**  
+The Normalized Citation Impact (NCI) measures how often a publication or research artefact (dataset, code, software) is cited compared to the average citation rate of publications in the **same Field of Science** and **same publication year**. By controlling for disciplinary citation intensity and publication age, NCI allows comparisons of citation performance across different fields and timeframes.  
+
+**Formula:**  
+$$
+NCI = \frac{Citations_{i}}{\overline{Citations}_{f,y}}
+$$  
+
+Where:  
+- $Citations_{i}$ = the number of citations received by publication or artefact $i$.  
+- $\overline{Citations}_{f,y}$ = the mean number of citations for all publications in the same field $f$ and year $y$.  
+
+**Interpretation:**  
+- $NCI = 1$ → the publication/artefact is cited at the world average for its field and year.  
+- $NCI > 1$ → cited more frequently than the average in its field.  
+- $NCI < 1$ → cited less frequently than the average in its field.  
+
+**Connections to other indicators in the Handbook:**  
+- Discussed extensively in [Citation Impact](../2_academic_impact/citation_impact.qmd), where normalised citation indicators are introduced and their methodological challenges explained.  
+- Used in the [Impact of Open Code in Research](../5_reproducibility/impact_of_open_code_in_research.qmd) and [Impact of Open Data in Research](../5_reproducibility/impact_of_open_data_in_research.qmd) indicators to assess the citation performance of publications that make research outputs openly available.  
+
+---
+
+#### 2. Field-Weighted Reusability Index (FWRI)
+
+**Definition:**  
+The Field-Weighted Reusability Index (FWRI) measures how often a research artefact (dataset, code, software) is **reused** compared to the average reuse rate of artefacts in the **same Field of Science** and within a **comparable publication window (e.g. 3 years after release)**.  
+
+Reuse is operationalised through **citation statements (citances)** in publications that have been validated to explicitly indicate that the artefact was reused (e.g. “we used dataset X” or “software Y was applied in our analysis”). This ensures that FWRI captures **practical adoption** rather than generic mentions.  
+
+**Formula:**  
+$$
+FWRI = \frac{Reuse_{i}}{\overline{Reuse}_{f,y}}
+$$  
+
+Where:  
+- $Reuse_{i}$ = the number of validated reuse citances to artefact *i*.  
+- $\overline{Reuse}_{f,y}$ = the mean number of validated reuse citances for artefacts in the same field $f$ and publication year $y$.  
+
+**Interpretation:**  
+- $FWRI = 1$ → the artefact is reused at the world average for its field and year.  
+- $FWRI > 1$ → the artefact is reused more frequently than similar artefacts.  
+- $FWRI < 1$ → the artefact is reused less frequently than similar artefacts.  
+
+**Connections to other indicators in the Handbook:**  
+- Builds upon [Reuse of Code in Research](../5_reproducibility/reuse_of_code_in_research.qmd) and [Reuse of Data in Research](../5_reproducibility/reuse_of_data_in_research.qmd), which measure the raw adoption of code and data in subsequent studies.  
+- Extends these indicators by adding **field-normalisation**, analogous to the way NCI normalises citation impact across fields.  
+- Complements [Impact of Open Data in Research](../5_reproducibility/impact_of_open_data_in_research.qmd), which uses Normalised Citation Impact (NCI) to evaluate the influence of Open Data publications.  
+
+**Relation to methodologies and tools:**  
+- Reuse detection requires analysing **citation statements** with Natural Language Processing and machine learning, as implemented in platforms such the [SciNoBo Toolkit](https://scinobo.ilsp.gr/toolkit).  
+- The SciNoBo toolkit in particular can identify and classify citances by intent (reuse, comparison, generic), polarity (supporting, refuting, neutral), and semantics (claim, method, results, artefact/output), making it possible to operationalise FWRI.  
+
+---
+
+#### 3. FAIR Index (FI)
+
+**Definition:**  
+The FAIR Index (FI) measures the extent to which a research artefact (dataset, code, software) complies with the **FAIR principles**: *Findable, Accessible, Interoperable, and Reusable*.  
+The indicator provides a simple, computational way of assessing FAIRness by checking for the presence and completeness of key metadata elements that are essential for discovery, access, licensing, and reuse.  
+
+**Formula:**  
+$$
+FI = \frac{\# Valid \; Metadata \; Elements}{4}
+$$  
+
+Metadata elements:  
+
+1. **Name** — a clear and unique name for the artefact.  
+2. **Version** — a version number or persistent identifier that distinguishes releases.  
+3. **License** — explicit usage rights (e.g., open license, restricted license).  
+4. **URL** — a persistent and resolvable web link providing access to the artefact.  
+
+Each element is scored as present/valid (1) or missing/invalid (0).  
+- $FI = 1.0$ → all four metadata elements are valid, indicating full FAIR compliance.  
+- $FI = 0.5$ → two elements are valid, indicating partial FAIRness.  
+- $FI = 0$ → no FAIR metadata elements available.  
+
+**Interpretation:**  
+- A high FI indicates that an artefact is **well-documented and accessible**, increasing its chances of being reused reliably by others.  
+- A low FI signals **poor metadata practices**, limiting discoverability and trust in the artefact.  
+
+**Connections to other indicators in the Handbook:**  
+- Directly linked to [Prevalence of Open/FAIR Data Practices](../1_open_science/prevalence_open_fair_data_practices.qmd), which measures the general status of FAIR adoption across publications and datasets.  
+- Complements the **Reuse of Data in Research** and **Reuse of Code in Research** indicators, since proper FAIR metadata often enables practical reuse.  
+
+**Relation to methodologies and tools:**  
+- The **SciNoBo toolkit** can extract and validate metadata from publications and associated artefacts, supporting automated FI scoring at scale.  
+
+---
+
+#### 4. Reproducibility Confidence Index (RCI)
+
+**Definition:**  
+The Reproducibility Confidence Index (RCI) measures how the scientific community perceives the **reliability and reproducibility** of a research artefact (dataset, code, software) based on the polarity of its citations.  
+It incorporates **supporting, neutral, and refuting citances** to determine whether the artefact is generally validated, questioned, or disputed in follow-up research.  
+
+RCI therefore reflects not only the *quantity* of citations, but their *quality* in terms of endorsement or criticism.
+
+**Formula:**  
+$$
+RCI = \frac{(1 \times Positive \; Citations) + (0.5 \times Neutral \; Citations) - (1 \times Negative \; Citations)}{Total \; Citations}
+$$  
+
+**Interpretation:**  
+- $RCI = 1$ → all citations are positive, strong reproducibility confidence.  
+- $RCI ≈ 0$ → balanced or neutral sentiment, no clear consensus on reproducibility.  
+- $RCI < 0$ → predominantly negative citations, low reproducibility confidence.  
+
+**Connections to other indicators in the Handbook:**  
+- Directly based on [Polarity of Publications](../5_reproducibility/polarity_of_publications.qmd), which provides the methodological basis for classifying citances.  
+- Complements **NCI** and **FWRI** by adding a qualitative perception dimension to quantitative measures of citation and reuse.  
+
+**Relation to methodologies and tools:**  
+- **OpenAIRE Research Graph** supports linkage of citations, which can be enriched with polarity classification.  
+- The [SciNoBo Toolkit](https://scinobo.ilsp.gr/toolkit) includes functionality for automated citance classification by intent (reuse, comparison, generic), polarity (supporting, refuting, neutral), and semantics (claim, method, results, artefact/output).  
+
+---
+
+# Datasources
+
+To calculate the RCCI, different types of metadata are required — including citation counts, reuse information, citation polarity, and FAIR metadata.  
+The following datasources provide alternative ways to obtain this information. Not all of them are strictly required for every calculation, but together they offer complementary coverage for retrieving the inputs needed for RCCI and its component indicators.  
+
+- **OpenAIRE Research Graph**  
+  [OpenAIRE](https://graph.openaire.eu/) aggregates metadata on publications, datasets, and software. It supports linking artefacts to publications and can be used to identify reuse cases and citances that indicate how artefacts are cited, which is essential for FWRI and RCI.  
+
+- **OpenAlex**  
+  [OpenAlex](https://openalex.org/) is an openly accessible bibliometric database that provides citation counts, references, and links to associated datasets and software. It can be used to calculate citation-based metrics such as NCI and to identify citation links needed for FWRI and RCI.  
+
+- **Dimensions**  
+  [Dimensions](https://app.dimensions.ai/) offers citation data and normalised indicators such as the Field Citation Ratio (FCR). It provides expected citation baselines by field and year, which are useful for calculating NCI.  
+
+- **Scopus**  
+  [Scopus](https://www.scopus.com/) is a large citation database that includes the Normalized Citation Impact (NCI) indicator. It can serve as a source for citation data and normalised impact values used in RCCI.  
+
+- **Web of Science / InCites**  
+  [Web of Science](https://webofscience.com/) provides citation data and normalised citation metrics through InCites, where the Category Normalised Citation Impact (CNCI) is implemented. This can be used as an alternative to Scopus NCI or Dimensions FCR.  
+
+- **DataCite**  
+  [DataCite](https://datacite.org/) is a registry that provides persistent identifiers (DOIs) and metadata for research datasets and software. It is especially useful for retrieving metadata elements (Name, Version, License, URL) needed for calculating the FAIR Index.  
+
+- **Crossref**  
+  [Crossref](https://www.crossref.org/) maintains extensive metadata for scholarly publications and related outputs, including references and links to datasets and software. It is valuable both for reuse tracking (FWRI) and FAIR metadata extraction (FI).  
+
+- **Zenodo / Figshare / Institutional Repositories**  
+  These repositories host datasets, software, and other artefacts. They expose metadata via APIs, which can be used to evaluate FAIRness and retrieve usage information for reuse analysis.  
+
+- **scite.ai**  
+  [scite.ai](https://scite.ai/) provides classification of citation statements into supporting, refuting, or mentioning. It can be used to measure polarity of publications and calculate the RCI.  
+
+---
+
+# Existing Methodologies
+
+## SciNoBo Toolkit
+
+The [SciNoBo Toolkit](https://scinobo.ilsp.gr/toolkit) has implemented and operationalised the RCCI and its component indicators into a **working monitoring dashboard**.  
+
+- In the **[TIER2 project](https://tier2-project.eu/)**, SciNoBo was used to extract artefacts from project deliverables and publications, link them to citation and reuse data, and compute NCI, FWRI, FI, RCI, and RCCI.  
+- The RCCI results were presented in **pilot dashboards** for funders and RPOs.  
+- The approach was validated and refined through **stakeholder feedback** in webinars and presentations.  
+
+This makes RCCI not only a conceptual indicator, but also one that has been **implemented and tested in practice**.
+
+---
+
+## Other methodologies
+
+While SciNoBo currently offers the most complete implementation, other methodologies and tools can be used to compute individual RCCI components:  
+
+- **Citation normalisation**  
+  NCI can be derived using normalisation approaches described in the [Citation Impact](../2_academic_impact/citation_impact.qmd) indicator, based on expected citation counts per field and year. This methodology is implemented in bibliometric databases such as OpenAlex (FWCI), Web of Science/InCites (CNCI), Scopus (FWCI), and Dimensions (FCR).
+
+- **Reuse detection**  
+  FWRI requires identifying reuse through validated citances. Platforms such as **scite.ai** classify citations as supporting, refuting, or mentioning, while **OpenAIRE Research Graph** can link publications to datasets and software.  
+  These can be complemented with **DataCite** and **Crossref** metadata, which record relationships between publications and artefacts.  
+
+- **FAIR assessment**  
+  The **F-UJI tool** provides an automated method for evaluating dataset FAIRness by checking metadata completeness and quality. This can be used to score the four FAIR Index elements (Name, Version, License, URL).  
+
+- **Polarity classification**  
+  RCI can be measured using **scite.ai**, which classifies citations into supporting, refuting, or mentioning.