Merge pull request #4 from BeastByteAI/ner-docs

OKUA1 · web-flow · commit ffc385bafa5e · 2024-07-06T23:45:38.000+02:00
chain of thought docs
diff --git a/src/app/docs/chain-of-thought-text-classification/page.md b/src/app/docs/chain-of-thought-text-classification/page.md
@@ -0,0 +1,46 @@
+---
+title: Chain-of-thought text classification
+nextjs:
+  metadata:
+    title: Chain-of-thought text classification
+    description: Learn about chain-of-thought text classification.
+---
+
+## Overview
+
+Chain-of-thought text classification is similar to zero-shot classification since it does not require any labeled data beforehand. The only difference is that, in addition to the label itself, the model generates some additional reasoning behind its choice. In some cases, such an approach might lead to much better performance, but at the cost of higher token consumption.
+
+Example using GPT-4o:
+
+```python
+from skllm.models.gpt.classification.zero_shot import CoTGPTClassifier
+from skllm.datasets import get_classification_dataset
+
+# demo sentiment analysis dataset
+# labels: positive, negative, neutral
+X, y = get_classification_dataset()
+
+clf = CoTGPTClassifier(model="gpt-4o")
+clf.fit(X,y)
+predictions = clf.predict(X)
+labels, reasoning = predictions[:, 0], predictions[:, 1]
+```
+
+---
+
+## API Reference
+
+The following API reference only lists the parameters needed for the initialization of the estimator. The remaining methods follow the syntax of a scikit-learn classifier.
+
+### CoTGPTClassifier
+```python
+from skllm.models.gpt.classification.zero_shot import CoTGPTClassifier
+```
+
+| **Parameter** | **Type** | **Description**          |
+| ------------- | -------- | ------------------------ |
+| `model`      | `str`  | Model to use, by default "gpt-3.5-turbo". |
+| `default_label`      | `str`  | Default label for failed prediction; if "Random" -> selects randomly based on class frequencies, by default "Random". |
+| `prompt_template`      | `Optional[str]`  | Custom prompt template to use, by default None. |
+| `key`      | `Optional[str]`  | Estimator-specific API key; if None, retrieved from the global config, by default None. |
+| `org`      | `Optional[str]`  | Estimator-specific ORG key; if None, retrieved from the global config, by default None. |
diff --git a/src/app/docs/few-shot-text-classification/page.md b/src/app/docs/few-shot-text-classification/page.md
@@ -10,7 +10,7 @@ nextjs:
 
 Few-shot text classification is a task of classifying a text into one of the pre-defined classes based on a few examples of each class. For example, given a few examples of the class _positive_, _negative_, and _neutral_, the model should be able to classify a new text into one of these classes.
 
-The estimators provided by Scikit-LLM do not automatically select the subset of the training data, and instead use the entire training set to construct the examples. Therefore, if your training set is large, you might want to consider splitting it into training and validation sets, while keeping the training set small (we recommend not to exceed 10 examples per class).
+The estimators provided by Scikit-LLM do not automatically select the subset of the training data, and instead use the entire training set to construct the examples. Therefore, if your training set is large, you might want to consider splitting it into training and validation sets, while keeping the training set small (we recommend not to exceed 10 examples per class). Additionally, it is advisable to permute the order of the samples in order to [avoid the recency bias](https://github.com/iryna-kondr/scikit-llm/issues/104).
 
 Example using GPT-4:
 
@@ -26,13 +26,13 @@ from skllm.datasets import (
 
 # single label
 X, y = get_classification_dataset()
-clf = FewShotGPTClassifier(model="gpt-4")
+clf = FewShotGPTClassifier(model="gpt-4o")
 clf.fit(X,y)
 labels = clf.predict(X)
 
 # multi-label
 X, y = get_multilabel_classification_dataset()
-clf = MultiLabelFewShotGPTClassifier(max_labels=2, model="gpt-4")
+clf = MultiLabelFewShotGPTClassifier(max_labels=2, model="gpt-4o")
 clf.fit(X,y)
 labels = clf.predict(X)
 ```
diff --git a/src/lib/navigation.js b/src/lib/navigation.js
@@ -13,6 +13,7 @@ export const navigation = [
       { title: 'Zero-shot text classification', href: '/docs/zero-shot-text-classification' },
       { title: 'Few-shot text classification', href: '/docs/few-shot-text-classification' },
       { title: 'Dynamic few-shot text classification', href: '/docs/dynamic-few-shot-text-classification' },
+      { title: 'Chain-of-thought text classification', href: '/docs/chain-of-thought-text-classification' },
       { title: 'Tunable text classification', href: '/docs/tunable-text-classification' },
     ],
   },