Skip to content

Commit

Permalink
Deploying to gh-pages from @ b4f2861 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
guillaumekln committed Aug 31, 2023
1 parent 2328530 commit 39b8b17
Show file tree
Hide file tree
Showing 99 changed files with 484 additions and 222 deletions.
2 changes: 1 addition & 1 deletion .buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 07bea52d53b305a69f833d4857a6839a
config: 62bfe7efc6f2f06f839453ad2130ad76
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file modified .doctrees/environment.pickle
Binary file not shown.
Binary file modified .doctrees/generation.doctree
Binary file not shown.
Binary file modified .doctrees/guides/transformers.doctree
Binary file not shown.
Binary file modified .doctrees/parallel.doctree
Binary file not shown.
Binary file modified .doctrees/python/ctranslate2.Encoder.doctree
Binary file not shown.
Binary file modified .doctrees/python/ctranslate2.GenerationStepResult.doctree
Binary file not shown.
Binary file modified .doctrees/python/ctranslate2.Generator.doctree
Binary file not shown.
Binary file modified .doctrees/python/ctranslate2.Translator.doctree
Binary file not shown.
Binary file modified .doctrees/python/ctranslate2.models.Whisper.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified .doctrees/python/ctranslate2.specs.doctree
Binary file not shown.
4 changes: 0 additions & 4 deletions _sources/generation.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,6 @@ step_results = generator.generate_tokens(
At this time the cache size is unlimited and the cache is only cleared when the model is unloaded. Also if the model is loaded on multiple GPUs, each model replica manages its own cache to avoid copying the state between devices.
```

```{seealso}
The example [Chat with Llama 2](https://github.com/OpenNMT/CTranslate2/tree/master/examples/llama2) which caches the system prompt in an interactive chat session.
```

## Special tokens

Special tokens such as the decoder start token `<s>` should be explicitly included in the input if required by the model. No special tokens are added by the generator methods.
Expand Down
15 changes: 11 additions & 4 deletions _sources/guides/transformers.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ CTranslate2 supports selected models from Hugging Face's [Transformers](https://
* BERT
* BLOOM
* CodeGen
* DistilBERT
* Falcon
* Llama
* M2M100
Expand Down Expand Up @@ -119,10 +120,6 @@ predicted_class_ids = logits.argmax(1)
print(predicted_class_ids)
```

```{warning}
The [token type input](https://huggingface.co/docs/transformers/glossary#token-type-ids) is currently not implemented. All tokens are assumed to come from the same sentence.
```

## BLOOM

[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom) is a collection of multilingual language models trained by the [BigScience workshop](https://bigscience.huggingface.co/).
Expand All @@ -146,6 +143,16 @@ results = generator.generate_batch([start_tokens], max_length=30, sampling_topk=
print(tokenizer.decode(results[0].sequences_ids[0]))
```

## DistilBERT

[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert) is a small, fast, cheap and light Transformer Encoder model trained by distilling BERT base.

CTranslate2 only implements the `DistilBertModel` class from Transformers which includes the Transformer encoder. Task-specific layers should be run with PyTorch, similar to the example for {ref}`guides/transformers:bert`.

```bash
ct2-transformers-converter --model distilbert-base-uncased --output_dir distilbert-base-uncased
```

## Falcon

[Falcon](https://huggingface.co/tiiuae/falcon-7b) is a collection of generative language models trained by [TII](https://www.tii.ae/). The example below uses "Falcon-7B-Instruct" which is based on "Falcon-7B" and finetuned on a mixture of chat/instruct datasets.
Expand Down
26 changes: 16 additions & 10 deletions _sources/parallel.md.txt
Original file line number Diff line number Diff line change
@@ -1,33 +1,35 @@
# Multithreading and parallelism

CTranslate2 has 2 levels of parallelization:
## Intra-op multithreading on CPU

* `inter_threads` which is the maximum number of batches executed in parallel.<br/>**=> Increase this value to increase the throughput.**
* `intra_threads` which is the number of computation threads that is used per batch.<br/>**=> Increase this value to decrease the latency on CPU.**
Most model operations (matmul, softmax, etc.) are using multiple threads on CPU. The number of threads can be configured with the parameter `intra_threads` (the default value is 4):

The total number of computing threads launched by the process is `inter_threads * intra_threads`.

```{note}
Even though the model data are shared between parallel replicas, increasing `inter_threads` will still increase the memory usage as some internal buffers are duplicated for thread safety.
```python
translator = ctranslate2.Translator(model_path, device="cpu", intra_threads=8)
```

On GPU, batches processed in parallel are using separate CUDA streams. Depending on the workload and GPU specifications this may or may not improve the global throughput. For better parallelism on GPU, consider using multiple GPUs as described below.
This multithreading is generally implemented with [OpenMP](https://www.openmp.org/) so the threads behavior can also be customized with the different `OMP_*` environment variables.

When OpenMP is disabled (which is the case for example in the Python ARM64 wheels for macOS), the multithreading is implemented with [`BS::thread_pool`](https://github.com/bshoshany/thread-pool).

## Parallel execution
## Data parallelism

Objects running models such as the [`Translator`](python/ctranslate2.Translator.rst) and [`Generator`](python/ctranslate2.Generator.rst) can be configured to process multiple batches in parallel, including on multiple GPUs:

```python
# Create a CPU translator with 4 workers each using 1 thread:
# Create a CPU translator with 4 workers each using 1 intra-op thread:
translator = ctranslate2.Translator(model_path, device="cpu", inter_threads=4, intra_threads=1)

# Create a GPU translator with 4 workers each running on a separate GPU:
translator = ctranslate2.Translator(model_path, device="cuda", device_index=[0, 1, 2, 3])

# Create a GPU translator with 4 workers each using a different CUDA stream:
# (Note: depending on the workload and GPU specifications this may not improve the global throughput.)
translator = ctranslate2.Translator(model_path, device="cuda", inter_threads=4)
```

When the workers are running on the same device, the model weights are shared to save on memory.

Multiple batches should be submitted concurrently to enable this parallelization. Parallel executions are enabled in the following cases:

* When calling methods from multiple Python threads.
Expand All @@ -39,6 +41,10 @@ Multiple batches should be submitted concurrently to enable this parallelization
Parallelization with multiple Python threads is possible because all computation methods release the [Python GIL](https://wiki.python.org/moin/GlobalInterpreterLock).
```

## Model and tensor parallelism

These types of parallelism are not yet implemented in CTranslate2.

## Asynchronous execution

Some methods can run asynchronously with `asynchronous=True`. In this mode, the method returns immediately and the result can be retrieved later:
Expand Down
1 change: 1 addition & 0 deletions _sources/python/ctranslate2.GenerationStepResult.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ GenerationStepResult
**Attributes:**

- :obj:`~ctranslate2.GenerationStepResult.batch_id`
- :obj:`~ctranslate2.GenerationStepResult.hypothesis_id`
- :obj:`~ctranslate2.GenerationStepResult.is_last`
- :obj:`~ctranslate2.GenerationStepResult.log_prob`
- :obj:`~ctranslate2.GenerationStepResult.step`
Expand Down
9 changes: 9 additions & 0 deletions _sources/python/ctranslate2.specs.RotaryScalingType.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
RotaryScalingType
=================

.. autoclass:: ctranslate2.specs.RotaryScalingType
:members:
:undoc-members:
:inherited-members:

**Inherits from:** :class:`enum.IntEnum`
1 change: 1 addition & 0 deletions _sources/python/ctranslate2.specs.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ ctranslate2.specs
ctranslate2.specs.LanguageModelSpec
ctranslate2.specs.LayerSpec
ctranslate2.specs.ModelSpec
ctranslate2.specs.RotaryScalingType
ctranslate2.specs.SequenceToSequenceModelSpec
ctranslate2.specs.TransformerDecoderModelSpec
ctranslate2.specs.TransformerDecoderSpec
Expand Down
2 changes: 1 addition & 1 deletion _static/documentation_options.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
VERSION: '3.18.0',
VERSION: '3.19.0',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
Expand Down
1 change: 1 addition & 0 deletions _static/pygments.css
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
.highlight .cs { color: #3D7B7B; font-style: italic } /* Comment.Special */
.highlight .gd { color: #A00000 } /* Generic.Deleted */
.highlight .ge { font-style: italic } /* Generic.Emph */
.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */
.highlight .gr { color: #E40000 } /* Generic.Error */
.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */
.highlight .gi { color: #008400 } /* Generic.Inserted */
Expand Down
4 changes: 2 additions & 2 deletions conversion.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Model conversion &mdash; CTranslate2 3.18.0 documentation</title>
<title>Model conversion &mdash; CTranslate2 3.19.0 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
Expand Down Expand Up @@ -32,7 +32,7 @@
<a href="index.html" class="icon icon-home"> CTranslate2
</a>
<div class="version">
3.18
3.19
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
Expand Down
4 changes: 2 additions & 2 deletions decoding.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Decoding features &mdash; CTranslate2 3.18.0 documentation</title>
<title>Decoding features &mdash; CTranslate2 3.19.0 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
Expand Down Expand Up @@ -32,7 +32,7 @@
<a href="index.html" class="icon icon-home"> CTranslate2
</a>
<div class="version">
3.18
3.19
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
Expand Down
4 changes: 2 additions & 2 deletions encoding.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Text encoding &mdash; CTranslate2 3.18.0 documentation</title>
<title>Text encoding &mdash; CTranslate2 3.19.0 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
Expand Down Expand Up @@ -32,7 +32,7 @@
<a href="index.html" class="icon icon-home"> CTranslate2
</a>
<div class="version">
3.18
3.19
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
Expand Down
4 changes: 2 additions & 2 deletions environment_variables.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Environment variables &mdash; CTranslate2 3.18.0 documentation</title>
<title>Environment variables &mdash; CTranslate2 3.19.0 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
Expand Down Expand Up @@ -32,7 +32,7 @@
<a href="index.html" class="icon icon-home"> CTranslate2
</a>
<div class="version">
3.18
3.19
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
Expand Down
4 changes: 2 additions & 2 deletions faq.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Frequently asked questions &mdash; CTranslate2 3.18.0 documentation</title>
<title>Frequently asked questions &mdash; CTranslate2 3.19.0 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
Expand Down Expand Up @@ -32,7 +32,7 @@
<a href="index.html" class="icon icon-home"> CTranslate2
</a>
<div class="version">
3.18
3.19
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
Expand Down
8 changes: 2 additions & 6 deletions generation.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Text generation &mdash; CTranslate2 3.18.0 documentation</title>
<title>Text generation &mdash; CTranslate2 3.19.0 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
Expand Down Expand Up @@ -32,7 +32,7 @@
<a href="index.html" class="icon icon-home"> CTranslate2
</a>
<div class="version">
3.18
3.19
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
Expand Down Expand Up @@ -226,10 +226,6 @@ <h2>Prompt caching<a class="headerlink" href="#prompt-caching" title="Permalink
<p class="admonition-title">Note</p>
<p>At this time the cache size is unlimited and the cache is only cleared when the model is unloaded. Also if the model is loaded on multiple GPUs, each model replica manages its own cache to avoid copying the state between devices.</p>
</div>
<div class="admonition seealso">
<p class="admonition-title">See also</p>
<p>The example <a class="reference external" href="https://github.com/OpenNMT/CTranslate2/tree/master/examples/llama2">Chat with Llama 2</a> which caches the system prompt in an interactive chat session.</p>
</div>
</section>
<section id="special-tokens">
<h2>Special tokens<a class="headerlink" href="#special-tokens" title="Permalink to this headline"></a></h2>
Expand Down
16 changes: 12 additions & 4 deletions genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Index &mdash; CTranslate2 3.18.0 documentation</title>
<title>Index &mdash; CTranslate2 3.19.0 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
Expand All @@ -29,7 +29,7 @@
<a href="index.html" class="icon icon-home"> CTranslate2
</a>
<div class="version">
3.18
3.19
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
Expand Down Expand Up @@ -515,6 +515,10 @@ <h2 id="H">H</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="python/ctranslate2.TranslationResult.html#ctranslate2.TranslationResult.hypotheses">hypotheses (ctranslate2.TranslationResult property)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="python/ctranslate2.GenerationStepResult.html#ctranslate2.GenerationStepResult.hypothesis_id">hypothesis_id (ctranslate2.GenerationStepResult property)</a>
</li>
</ul></td>
</tr></table>
Expand Down Expand Up @@ -546,14 +550,16 @@ <h2 id="L">L</h2>
</li>
<li><a href="python/ctranslate2.specs.LayerSpec.html#ctranslate2.specs.LayerSpec">LayerSpec (class in ctranslate2.specs)</a>
</li>
<li><a href="python/ctranslate2.specs.RotaryScalingType.html#ctranslate2.specs.RotaryScalingType.Linear">Linear (ctranslate2.specs.RotaryScalingType attribute)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="python/ctranslate2.converters.TransformersConverter.html#ctranslate2.converters.TransformersConverter.load_model">load_model() (ctranslate2.converters.TransformersConverter method)</a>

<ul>
<li><a href="python/ctranslate2.Translator.html#ctranslate2.Translator.load_model">(ctranslate2.Translator method)</a>
</li>
</ul></li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="python/ctranslate2.converters.TransformersConverter.html#ctranslate2.converters.TransformersConverter.load_tokenizer">load_tokenizer() (ctranslate2.converters.TransformersConverter method)</a>
</li>
<li><a href="python/ctranslate2.GenerationStepResult.html#ctranslate2.GenerationStepResult.log_prob">log_prob (ctranslate2.GenerationStepResult property)</a>
Expand Down Expand Up @@ -771,6 +777,8 @@ <h2 id="R">R</h2>
<li><a href="python/ctranslate2.specs.WhisperSpec.html#ctranslate2.specs.WhisperSpec.revision">(ctranslate2.specs.WhisperSpec property)</a>
</li>
</ul></li>
<li><a href="python/ctranslate2.specs.RotaryScalingType.html#ctranslate2.specs.RotaryScalingType">RotaryScalingType (class in ctranslate2.specs)</a>
</li>
</ul></td>
</tr></table>

Expand Down
4 changes: 2 additions & 2 deletions guides/fairseq.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Fairseq &mdash; CTranslate2 3.18.0 documentation</title>
<title>Fairseq &mdash; CTranslate2 3.19.0 documentation</title>
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/custom.css" type="text/css" />
Expand Down Expand Up @@ -32,7 +32,7 @@
<a href="../index.html" class="icon icon-home"> CTranslate2
</a>
<div class="version">
3.18
3.19
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
Expand Down
4 changes: 2 additions & 2 deletions guides/marian.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Marian &mdash; CTranslate2 3.18.0 documentation</title>
<title>Marian &mdash; CTranslate2 3.19.0 documentation</title>
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/custom.css" type="text/css" />
Expand Down Expand Up @@ -32,7 +32,7 @@
<a href="../index.html" class="icon icon-home"> CTranslate2
</a>
<div class="version">
3.18
3.19
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
Expand Down
4 changes: 2 additions & 2 deletions guides/opennmt_py.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>OpenNMT-py &mdash; CTranslate2 3.18.0 documentation</title>
<title>OpenNMT-py &mdash; CTranslate2 3.19.0 documentation</title>
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/custom.css" type="text/css" />
Expand Down Expand Up @@ -32,7 +32,7 @@
<a href="../index.html" class="icon icon-home"> CTranslate2
</a>
<div class="version">
3.18
3.19
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
Expand Down
Loading

0 comments on commit 39b8b17

Please sign in to comment.