An Overview of Essential Resources for the Study of Cognitive Metascience

We are committed to creating a platform that facilitates the sharing of resources and knowledge for anyone interested in cognitive metascience and related fields. Our mission is to advance the study of cognitive metascience and promote open access to valuable resources. Whether you are a researcher, student, or enthusiast, we invite you to explore our curated collection and contribute your own insights, tools, or findings. This platform is open to everyone, and we believe that collective knowledge-sharing can guide us forward in our pursuit of understanding the complexities of this scientific field. Feel free to browse, engage, and join us in the exploration and discovery of resources that can make our scientific work more efficient and impactful.

In the following sections, you will find a list of resources for Natural Language Processing (NLP) and citation analysis, as well as a literature overview regarding scientific theories and peer review processes.

Introductory materials

Tools

Digital Humanities

The AI for Humanists Project

Focus on BERT and LLMs for Digital Humanities.

Tools for NLP

From robust language models like SciBERT to annotation platforms like Doccano, these resources offer a diverse array of tools designed for text analysis and understanding. Whether you're exploring sentiment analysis, entity recognition, or document summarization, these tools provide the necessary infrastructure for work in the field of natural language processing, making them useful for anyone interested in advancing their understanding of linguistics, machine learning, or cognitive science.

Tools

Topic Modeling

Lingo4G

By using Lingo46 you can get an overview of thousands of documents within seconds, instantly drill-down to documents of interest. You can build custom text mining pipelines ranging from simple search to 2D mapping, time-series analysis and duplicate detection. You can combine topics, clusters and 2D document maps into powerful visualizations. Closed source. Our lab has a research licence.

Text Visualization Browser

Annotation Tools

doccano

Doccano is an open-source text annotation tool for humans. It provides annotation features for text classification, sequence labeling, and sequence to sequence tasks. You can create labeled data for sentiment analysis, named entity recognition, text summarization, and so on. Just create a project, upload data, and start annotating. You can build a dataset in hours.

Language Models

SciBERT

SciBERT is a BERT model trained on scientific text. SciBERT is trained on papers from the corpus of semanticscholar.org. Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts. SciBERT has its own vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions. We also include models trained on the original BERT vocabulary (basevocab) for comparison. It results in state-of-the-art performance on a wide range of scientific domain nlp tasks.

Vector databases

WEAVIATE

Weaviate is an open source vector database that stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance and scalability of a cloud-native database, all accessible through GraphQL, REST, and various language clients.

Chatbots

Poe

Multiple LLMs under a single interface.

You

Web search and multiple LLMs (academic subscriptions available).

LM Studio

Run (quantized) LLMs locally.

ChatBot Arena

Ask any question to two anonymous models (e.g., ChatGPT, Claude, Llama) and vote for the better one! You can continue chatting until you identify a winner. Vote won't be counted if model identity is revealed during conversation.

ChatGPT Retrieval

Speech-to-text

OpenAI’s Whisper

Semantic search systems

OP Vault

OP Vault uses the OP Stack (OpenAI + Pinecone Vector Database) to enable users to upload their own custom knowledgebase files and ask questions about their contents. The primary focus is on human-readable content like books, letters, and other documents, making it a practical and valuable tool for knowledge extraction and question-answering. You can upload an entire library's worth of books and documents and recieve pointed answers along with the name of the file and specific section within the file that the answer is based on!

Interact, discover insights and build with unstructured text, image and audio data. Uncover data insights from your text and images - right from your web browser. Make sense of your data with AI computed topics, data labels and groupings and embeddings. Share text, image, and embeddings datasets with your team or customers. Scales from 100 to 100 million unstructured datapoints.

PaperAI

paperai is a semantic search and workflow application for medical/scientific papers. Applications range from semantic search indexes that find matches for medical/scientific queries to full-fledged reporting applications powered by machine learning.

Processing / Corpora systems

SciWING

Scientific Document Processing Toolkit, from PDF processing to citation intent.

SketchEngine

A comprehensive corpus management and search engine, for lexicographers and digital humanities. Many academic institutions have a license.

Scraping

twarc

Twarc is a command line tool and Python library for collecting and archiving Twitter JSON data via the Twitter API. It has separate commands (twarc and twarc2) for working with the older v1.1 API and the newer v2 API and Academic Access (respectively).

Other tools

twXplorer

Search engine for Twitter.

CADE

CADE can and has been used for several different tasks: from general temporal vector space alignment [1] and a more general comparison of language variation [2], to tasks like semantic change detection in diachronic contexts [3,6] and narrative understanding [5].

AI Library

Overview of many useful AI tools.

Articles

Additional resources

Citation Analysis

The Citation Analysis tools featured here are essential for researchers seeking to navigate the scholarly landscape efficiently. From databases like Microsoft Academic Graph to visualization tools like CiteSpace, these resources empower users to uncover trends and gain insights into academic discourse through bibliometric analysis.

Tools

Lens

An aggregator for diverse open knowledge sets, including scholarly works and patents. Offers discovery and analytics tools, ‘The Lens MetaRecord’ integrates multiple identifiers and sources to provide comprehensive and normalized metadata while maintaining provenance.

Free Windows-based program, designed for altmetrics, citation analysis, social web analysis, and webometrics, including link analysis. Data is downloaded through APIs or directly, and various text and citation processing options are provided. Altmetrics and citation analysis cover data sources like Mendeley, Altmetric.com, Google Books, and WorldCat. Social web analysis includes platforms such as YouTube, Twitter, Goodreads, and Flickr.

Microsoft Academic Graph

Depracated project; Knowledge graph with scientific publications, citation relationships and authors; supporting various applications

Dimensions.AI

Database that integrates data and analytical tools in a single platform with over 106 million publications linked to grants, patents, clinical trials, datasets, policy papers, citations and article metrics. Extremely expensive, acquiring free access is near-impossible.

CiteSpace

Visual analytic tool designed for studying scholarly literature trends. The workflow involves analytic tasks and a variety of configurations, emphasizing the importance of understanding underlying concepts and principles. The tool is unique for linking publications with grants, patents, clinical trials, datasets, and policy papers, offering a holistic research landscape.

OpenAIRE

OpenAIRE providesn a large database of research data that is stored in a graph format (that represents relationships between research outputs, citations, funding, organizations and collaborations). Used for research evaluation in replacement of proprietary databases such as Web of Science or Scopus.

Scite.AI

Analyses the textual context of citations, distinguishing between supporting, mentioning, and contrasting citations. Processes full-text articles to create ‘Smart Citations’, which contain information about citation relationships, contextual details, and classification types. Also offers Custom Dashboards, a Zotero Plugin, and a Browser Extension. Sources papers from publishers, preprint servers, and other reputable sources, accessing full-text PDFs and XMLs for analysis.

The CRExplorer uses data from Web of Science (Clarivate Analytics) or Scopus (Elsevier) as input. Publication sets have to be downloaded including the references cited. The program focusses on the analysis of the cited references, in particular on the referenced publication years. Over time, "citation classics" of a field become more pronounced. When the aggregated citations are plotted along the time axis, one obtains a "spectrogram" with distinct peaks. CRExplorer visualizes this spectrogram, cleans the cited references (disambiguation), and uses a smoothing algorithm to suppress the noise.

Cortext Manager

The basic workflow of Cortex Manager is following: (1) upload raw files from various scientific bibliography databases (ISI Thomson Web of Science, Pubmed, etc) and simple CSV files, (2) transform text files into standard corpus database file, (3) perform a series of graphical analysis to produce descriptive statistical analysis, social graphs of entities and timeline based phylogenetic reconstructions, (4) download outputs in format compatible with third party software.

Papers and background

CausalCite: A Causal Formulation of Paper Citations

The paper proposes a novel method called TextMatch for evaluating the significance of scientific papers, addressing the limitations of traditional citation counts. TextMatch utilizes high-dimensional text embeddings generated by large language models (LLMs) to encode papers, extracts similar samples using cosine similarity, and synthesizes a counterfactual sample based on the weighted average of similar papers. The resulting metric, called CausalCite, offers a causal formulation of paper citations. The effectiveness of CausalCite is demonstrated through its high correlation with paper impact, as assessed by scientific experts, and its stability across various sub-fields of AI. The study provides insights for future researchers to leverage this metric for a more comprehensive understanding of a paper's quality. Code and data are available for further exploration.

Peer review

What makes a scientific theory good?

Books and articles about replication crisis (mostly in psychology).

Replication Crisis in Psychology

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.gitignore		.gitignore
Gemfile		Gemfile
README.md		README.md
_config.yml		_config.yml
cog_metasci.png		cog_metasci.png
peer-review.md		peer-review.md
theory.md		theory.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

An Overview of Essential Resources for the Study of Cognitive Metascience

Introductory materials

Digital Humanities

Tools for NLP

Topic Modeling

Annotation Tools

Language Models

Vector databases

Chatbots

Speech-to-text

Semantic search systems

Processing / Corpora systems

Scraping

Other tools

Citation Analysis

Peer review

What makes a scientific theory good?

Books and articles about replication crisis (mostly in psychology).

About

Uh oh!

Languages

cognitive-metascience/awesome-cog-metascience

Folders and files

Latest commit

History

Repository files navigation

An Overview of Essential Resources for the Study of Cognitive Metascience

Introductory materials

Digital Humanities

Tools for NLP

Topic Modeling

Annotation Tools

Language Models

Vector databases

Chatbots

Speech-to-text

Semantic search systems

Processing / Corpora systems

Scraping

Other tools

Citation Analysis

Peer review

What makes a scientific theory good?

Books and articles about replication crisis (mostly in psychology).

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages