ethanxia4
diff --git a/‎.github/workflows/build-docs.yml
Lines changed: 3 additions & 0 deletions b/‎.github/workflows/build-docs.yml
Lines changed: 3 additions & 0 deletions
diff --git a/‎.github/workflows/continuous-integration.yml
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/continuous-integration.yml
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md
Lines changed: 21 additions & 3 deletions b/‎README.md
Lines changed: 21 additions & 3 deletions
diff --git a/‎docs/source/conf.py
Lines changed: 7 additions & 3 deletions b/‎docs/source/conf.py
Lines changed: 7 additions & 3 deletions
diff --git a/‎docs/source/datasets.rst
Lines changed: 4 additions & 0 deletions b/‎docs/source/datasets.rst
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/source/deli.rst
Lines changed: 90 additions & 0 deletions b/‎docs/source/deli.rst
Lines changed: 90 additions & 0 deletions
diff --git a/‎docs/source/fomc.rst
Lines changed: 67 additions & 0 deletions b/‎docs/source/fomc.rst
Lines changed: 67 additions & 0 deletions
@@ -10,6 +10,9 @@ jobs:
     runs-on: ubuntu-latest
     steps:
     - uses: actions/checkout@v3
+    - name: Install Dependencies
+      run: |
+        pip install sphinx sphinx-rtd-theme m2r2
     - name: Sphinx Build
       uses: ammaraskar/sphinx-action@master
       with:
 
@@ -7,7 +7,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python-version: [3.7, 3.8, 3.9, '3.10']
+        python-version: [3.9, '3.10', '3.11', '3.12']
         mongodb-version: [5.0.2]
 
     steps:
 
@@ -4,13 +4,13 @@
 <!-- ALL-CONTRIBUTORS-BADGE:END -->
 
 [![pypi](https://img.shields.io/pypi/v/convokit.svg)](https://pypi.org/pypi/convokit/)
-[![py\_versions](https://img.shields.io/badge/python-3.8%2B-blue)](https://pypi.org/pypi/convokit/)
+[![py\_versions](https://img.shields.io/badge/python-3.9%2B-blue)](https://pypi.org/pypi/convokit/)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 [![license](https://img.shields.io/badge/license-MIT-green)](https://github.com/CornellNLP/ConvoKit/blob/master/LICENSE.md)
 [![Discord Community](https://img.shields.io/static/v1?logo=discord&style=flat&color=red&label=discord&message=community)](https://discord.gg/WMFqMWgz6P)
 
 
-This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a [single unified interface](https://convokit.cornell.edu/documentation/architecture.html) inspired by (and compatible with) scikit-learn.  Several large [conversational datasets](https://github.com/CornellNLP/ConvoKit#datasets) are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is [3.0.0](https://github.com/CornellNLP/ConvoKit/releases/tag/v3.0.0) (released July 17, 2023); follow the [project on GitHub](https://github.com/CornellNLP/ConvoKit) to keep track of updates.
+This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a [single unified interface](https://convokit.cornell.edu/documentation/architecture.html) inspired by (and compatible with) scikit-learn.  Several large [conversational datasets](https://github.com/CornellNLP/ConvoKit#datasets) are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is [3.0.1](https://github.com/CornellNLP/ConvoKit/releases/tag/v3.0.1) (released November 13, 2024); follow the [project on GitHub](https://github.com/CornellNLP/ConvoKit) to keep track of updates.
 
 Read our [documentation](https://convokit.cornell.edu/documentation) or try ConvoKit in our [interactive tutorial](https://colab.research.google.com/github/CornellNLP/ConvoKit/blob/master/examples/Introduction_to_ConvoKit.ipynb).
 
@@ -137,6 +137,24 @@ A collection of all the conversations that occurred over 10 seasons of Friends,
 
 Name for download: `friends-corpus`
 
+### [Federal Open Market Committee (FOMC) Corpus](https://convokit.cornell.edu/documentation/fomc.html)
+
+Transcripts of recurring meetings of the Federal Reserve’s Open Market Committee (FOMC), where important aspects of U.S. monetary policy are decided, covering the period 1977-2008.
+
+Name for download: `fomc-corpus`
+
+### [NPR Interview 2P Dataset Corpus](https://convokit.cornell.edu/documentation/npr-2p.html)
+
+This corpus contains conversations between NPR show hosts and their guests.
+
+Name for download: `npr-2p-corpus`
+
+### [DeliData Dataset Corpus](https://convokit.cornell.edu/documentation/deli.html)
+
+This corpus contains conversations in multi-party problem-solving contexts, containing information about group discussions and team performance.
+
+Name for download: `deli-corpus`
+
 ### [Switchboard Dialog Act Corpus](https://convokit.cornell.edu/documentation/switchboard.html)
 
 A collection of 1,155 five-minute telephone conversations between two participants, annotated with speech act tags.
@@ -180,7 +198,7 @@ Name for download: `spolin-corpus`
 In addition to the provided datasets, you may also use ConvoKit with your own custom datasets by loading them into a `convokit.Corpus` object. [This example script](https://github.com/CornellNLP/ConvoKit/blob/master/examples/converting_movie_corpus.ipynb) shows how to construct a Corpus from custom data.
 
 ## Installation
-This toolkit requires Python >= 3.8.
+This toolkit requires Python >= 3.9.
 
 1. Download the toolkit: `pip3 install convokit`
 2. Download Spacy's English model: `python3 -m spacy download en`
 
@@ -19,6 +19,7 @@
 #
 import os
 import sys
+import sphinx_rtd_theme
 
 _HERE = os.path.dirname(__file__)
 _DOCS_DIR = os.path.abspath(os.path.join(_HERE, ".."))
@@ -55,7 +56,7 @@
 
 # General information about the project.
 project = "convokit"
-copyright = "2017-2023 The ConvoKit Developers"
+copyright = "2017-2024 The ConvoKit Developers"
 author = "The ConvoKit Developers"
 
 # The version info for the project you're documenting, acts as replacement for
@@ -65,7 +66,7 @@
 # The short X.Y version.
 version = "3.0"
 # The full version, including alpha/beta/rc tags.
-release = "3.0.0"
+release = "3.0.1"
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
@@ -126,6 +127,9 @@
 # a list of builtin themes.
 #
 html_theme = "sphinx_rtd_theme"
+
+# Add theme path explicitly
+html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
 # html_context = {"css_files": ["_static/overrides.css"]}
 # Theme options are theme-specific and customize the look and feel of a theme
 # further.  For a list of options available for each theme, see the
@@ -159,7 +163,7 @@
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ["static"]
+html_static_path = ["_static"]
 
 # Add any extra paths that contain custom files (such as robots.txt or
 # .htaccess) here, relative to this directory. These files are copied
 
@@ -27,3 +27,7 @@ Datasets
    Supreme Court Oral Arguments Dataset <supreme.rst>
    Wikipedia Articles for Deletion Dataset <wiki-articles-for-deletion-corpus.rst>
    CaSiNo Corpus <casino-corpus.rst>
+   NPR Interviews 2P Corpus <npr-2p.rst>
+   Federal Open Market Committee Corpus <fomc.rst>
+   FORA Corpus <fora.rst>
+   DeliData Corpus <deli.rst>
@@ -0,0 +1,90 @@
+DeliData Corpus
+===============
+
+DeliData is a dataset designed for analyzing deliberation in multi-party problem-solving contexts. It contains information about group discussions, capturing various aspects of participant interactions, message annotations, and team performance.
+
+The corpus is available upon request from the authors, and a ConvoKit-compatible version can be derived using ConvoKit’s conversion tools. ConvoKit also host the ConvoKit-format deli corpus, which can be directly downloaded following instruction in the Usage section.
+
+For a full description of the dataset collection and potential applications, please refer to the original publication: `Karadzhov, G., Stafford, T., & Vlachos, A. (2023). DeliData: A dataset for deliberation in multi-party problem solving. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2), 1-25.`
+
+Dataset details
+---------------
+
+All ConvoKit metadata attributes retain the original names used in the dataset.
+
+Speaker-level information
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Metadata for each speaker includes the following fields:
+
+* speaker: Identifier or pseudonym of the speaker.
+
+Utterance-level information
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Each utterance includes:
+
+* id: Unique identifier for an utterance.
+* conversation_id: Identifier for the conversation that the utterance belongs to.
+* reply_to: Identifier for the previous utterance in the conversation, if any (null if not a reply).
+* speaker: Name or pseudonym of the utterance speaker.
+* text: Normalized textual content of the utterance with applied tokenization and masked special tokens.
+* timestamp: Null for the entirety of this corpus.
+
+Metadata for each utterance includes:
+
+* annotation_type: Type of utterance deliberation, if annotated (e.g., "Probing" or "Non-probing deliberation"). If unannotated, may be null.
+* annotation_target: Target annotation, indicating the intended focus of the message, such as "Moderation" or "Solution." May be null if not annotated.
+* annotation_additional: Any additional annotations indicating specific deliberative actions (e.g., "complete_solution"), may be null if not annotated.
+* message_type: Type of message, categorized as INITIAL, SUBMIT, or MESSAGE, indicating its function in the dialogue.
+* original_text: Original text as said in the collected conversation; For INITIAL type, contains the list of participants and cards presented. For SUBMIT type, contains the cards submitted
+
+Conversation-level information
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For each conversation we provide:
+
+* id: id of the conversation
+
+Metadata for each conversation includes:
+
+* team_performance: Approximate performance of the team based on user submissions and solution mentions, ranging from 0 to 1, where 1 indicates all participants selected the correct solution.
+* sol_tracker_message: Extracted solution from the current message content.
+* sol_tracker_all: Up-to-date "state-of-mind" for each of the participants, i.e. an approximation of what each participant think the correct solution is at given timestep. This is based on initial solutions, submitted solutions, and solution mentions. team_performance value is calculated based on this column
+* performance_change: Change in team performance relative to the previous utterance.
+
+Usage
+-----
+
+Convert the DeliData Corpus into ConvoKit format using the following notebook: `Converting DeliData to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/DELI/ConvoKit_DeliData_Conversion.ipynb>`_
+
+To download directly with ConvoKit:
+
+>>> from convokit import Corpus, download
+>>> corpus = Corpus(filename=download("deli-corpus"))
+
+
+For some quick stats:
+
+>>> corpus.print_summary_stats()
+
+* Number of Speakers: 30
+* Number of Utterances: 17111
+* Number of Conversations: 500
+
+Additional note
+---------------
+Data License
+^^^^^^^^^^^^
+
+ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable.  The license of the original distribution applies.
+
+Contact
+^^^^^^^
+
+Questions regarding the DeliData corpus should be directed to Georgi Karadzhov ([email protected]).
+
+Files
+^^^^^^^
+
+Request the Official Released DeliData Corpus without ConvoKit formatting: https://delibot.xyz/delidata
@@ -0,0 +1,67 @@
+Federal Open Market Committee (FOMC) Corpus
+===========================================
+
+Transcripts of recurring meetings of the Federal Reserve’s Open Market Committee (FOMC), where important aspects of U.S. monetary policy are decided, covering the period 1977-2008. (108,504 conversational exchanges between 364 speakers of FOMC board members in 268 meetings). 
+
+Distributed together with:
+`Talk it up or play it down? (Un)expected correlations between (de-)emphasis and recurrence of discussion points in consequential U.S. economic policy meetings <https://chenhaot.com/papers/de-emphasis-fomc.html>`_. Chenhao Tan and Lillian Lee. Presented in Text As Data 2016.
+
+Dataset details
+---------------
+
+Speaker-level information
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Speakers in this dataset are FOMC members, indexed by their name as recorded in the transcripts.
+    * id: name of the speaker
+    * chair: (boolean) is speaker FOMC Chair
+    * vice_chair: (boolean) is speaker FOMC Vice-Chair
+
+Utterance-level information
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For each utterance, we provide:
+    * id: index of the utterance (concatenating the meeting date with the utterance’s sequence position)
+    * speaker: the speaker who authored the utterance
+    * conversation_id: ID of meeting
+    * reply_to: id of the sequentially prior utterance (None for the first utterance of a meeting)
+    * text: textual content of the utterance
+    * timestamp: calculated value based off the date of the meeting and the speech index
+
+Metadata for utterances include:
+    * speech_index: index of utterance in the context of the conversation
+    * parsed: parsed version of the utterance text, represented as a SpaCy Doc
+
+Conversational-level information
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Conversations are indexed by a string representing the meeting date. 
+
+Usage
+-----------
+
+Convert the FOMC Corpus into ConvoKit format using this notebook `Converting FOMC Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/FOMC/fomc_to_convokit.ipynb>`_
+
+To download directly with ConvoKit:
+
+>>> from convokit import Corpus, download
+>>> corpus = Corpus(filename=download("fomc-corpus"))
+
+
+For some quick stats:
+
+>>> corpus.print_summary_stats()
+Number of Speakers: 364
+Number of Utterances: 108504
+Number of Conversations: 268
+
+
+Additional note
+---------------
+
+The original dataset can be downloaded `here <https://chenhaot.com/pages/de-emphasis-fomc.html>`_. Refer to the original README for more explanations on dataset construction. 
+
+Contact
+^^^^^^^
+
+Please email any questions to: [email protected] (Cristian Danescu-Niculescu-Mizil).