Add backward compatibility with v0 #1518

RobinPicard · 2025-03-28T21:53:08Z

The objective of this PR is to make the v1 of Outlines backward compatible with the current v0.

We want users to still be able to run their code as long as they were using the regular high-level API (some objects have been deleted and are not supported anymore). For the general approach, I tried to do the following:

modify the v1 code as little as possible (notable exception is the OpenAI model)
keep the legacy code as separate as possible from the v1 code (put in a dedicated v0_legacy directory)
start using the v1 code as soon as possible in the legacy objects

Things this PR does:

Restore the model loading functions from v0 (they return a regular v1 model instance)
Update the OpenAI model to support the __ini__ signature of both the v0 and v1 version. In case of a v0 initialization, the methods of the instance call a legacy_instance attribute that implements the legacy interface
Restore the generate functions (text, regex, json...): they now return a GeneratorV0Adapter that stores a v1 generator while providing the expected interface of v0
Add warnings for everything deprecated
Add tests for all restored objects
Fix little issues in v1 code that were encountered along the way

I have not testes the exllamav2 model yet as it requires having a GPU. If someone could try running the tests for it that would be nice

outlines/v0_legacy/generate/cfg.py

outlines/v0_legacy/generate/choice.py

outlines/v0_legacy/generate/format.py

outlines/v0_legacy/generate/fsm.py

outlines/v0_legacy/generate/regex.py

outlines/v0_legacy/generate/text.py

rlouf · 2025-04-02T12:16:58Z

I like the thorough explanation in the warning messages. I'm not sure what is going on with the coverage, is everything tested? Also, before merging we should do a smoke test to make sure the deeplearning.ai notebooks can run with this branch.

cpfiffer · 2025-04-03T19:05:36Z

First -- I have no additional comments. Remi mentioned all the ones I would have. I tried a few examples and the deprecation warnings are comprehensive and very clear about what should be changed. Extremely good work here!

I'll also running smoke tests on the DLAI notebook. The warnings are fine actually because all the notebooks disable warning printouts, so everyone should be able to run things as normal if they copy the code over to a v1 outlines install. Otherwise the course pins the course to outlines==0.2.1.

The code

generator = outlines.generate.json(
    model, 
    Person,
    sampler = outlines.samplers.greedy()
)

yields the error

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[4], line 1
----> 1 from utils import track_logits
      3 generator = outlines.generate.json(
      4     model, 
      5     Person,
      6     sampler = outlines.samplers.greedy()
      7 )
      9 # Add tools to track token probabilities as they are generated

File [~/dottxt/outlines/SC-DotTxt-Outlines-C1/L4/utils.py:16](http://server.languid.ai:8080/lab/tree/SC-DotTxt-Outlines-C1/L4/SC-DotTxt-Outlines-C1/L4/utils.py#line=15)
     13 from numpy.typing import NDArray
     14 import matplotlib.pyplot as plt
---> 16 from outlines.processors.base_logits_processor import OutlinesLogitsProcessor, Array
     18 if TYPE_CHECKING:
     19     from outlines.generate import Generator

ImportError: cannot import name 'Array' from 'outlines.processors.base_logits_processor' ([/home/cameron/dottxt/outlines/outlines/processors/base_logits_processor.py](http://server.languid.ai:8080/lab/tree/SC-DotTxt-Outlines-C1/L4/outlines/processors/base_logits_processor.py))

RobinPicard · 2025-04-08T15:18:17Z

I've added a lot of tests to increase our coverage

cpfiffer · 2025-04-08T18:26:30Z

The exllamav2 tests require nvcc>11, which is a pain to upgrade. I may have to defer it to another time.

cpfiffer · 2025-04-08T18:40:55Z

Update on this error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[4], line 1
----> 1 from utils import track_logits
      3 generator = outlines.generate.json(
      4     model, 
      5     Person,
      6     sampler = outlines.samplers.greedy()
      7 )
      9 # Add tools to track token probabilities as they are generated

File [~/dottxt/outlines/SC-DotTxt-Outlines-C1/L4/utils.py:16](http://server.languid.ai:8080/lab/tree/SC-DotTxt-Outlines-C1/L4/SC-DotTxt-Outlines-C1/L4/utils.py#line=15)
     13 from numpy.typing import NDArray
     14 import matplotlib.pyplot as plt
---> 16 from outlines.processors.base_logits_processor import OutlinesLogitsProcessor, Array
     18 if TYPE_CHECKING:
     19     from outlines.generate import Generator

ImportError: cannot import name 'Array' from 'outlines.processors.base_logits_processor' ([/home/cameron/dottxt/outlines/outlines/processors/base_logits_processor.py](http://server.languid.ai:8080/lab/tree/SC-DotTxt-Outlines-C1/L4/outlines/processors/base_logits_processor.py))

This is due to some custom code I have in the DeepLearning.ai notebooks that does a bunch of weird stuff to the logit processor. This code is very much a bandaid and wasn't really intended to be production code.

That code was sort of formalized here, but it needs more work on the interface and shifting it to use the new v1 interface.

We might be able to just let this one go, especially because it's more of a perk feature than anything else. I flagged it as experimental in the videos so I'm less worried.

cpfiffer · 2025-04-08T18:42:00Z

The next error I found is this one here, due to changes in the regex DSL:

from outlines.types import sentence, digit
from outlines.types.dsl import to_regex

# Write between 1-3 Sentences
reasoning = "Reasoning: " + sentence.repeat(1,2)
# Answer in 1-4 digits
answer = "So the answer is: " + digit.repeat(1,4)

to_regex(reasoning + "\n" + answer)

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[18], line 5
      2 from outlines.types.dsl import to_regex
      4 # Write between 1-3 Sentences
----> 5 reasoning = "Reasoning: " + sentence.repeat(1,2)
      6 # Answer in 1-4 digits
      7 answer = "So the answer is: " + digit.repeat(1,4)

AttributeError: 'Regex' object has no attribute 'repeat'

RobinPicard · 2025-04-08T19:41:27Z

This is a change that's already in the main branch. I'll open a different PR to address it. Is this code using the repeat method also from the DeepLearning.ai notebooks?

cpfiffer · 2025-04-08T20:13:10Z

Yeah, that stuff is used in DLAI.

RobinPicard requested review from jeffreyenos, willkurt and cpfiffer March 28, 2025 21:56

RobinPicard force-pushed the add_backward_compatibility branch 2 times, most recently from c318b9b to a96ab23 Compare March 30, 2025 12:23

RobinPicard added 10 commits March 31, 2025 14:21

Restore the load_lora method to LlamaCpp with a warning

7abe302

Fix error in the generate_stream method of MLXLM

cb155a9

Restore the load_lora method to VLLM with a warning

4a6f897

Small fixes in the Transformers model

f42abdb

Add a whitespace_pattern to JsonSchema

f2f9c5b

Restore the base, function and samplers files in v0_legacy

3bada54

Restore the v0 models' loading functions and update the OpenAI model

cf240a3

Restore the generate functions

52b7493

Include legacy code in the root level __init__

1368fc1

Add tests for v0 legacy code

7c8bbd9

RobinPicard force-pushed the add_backward_compatibility branch from a96ab23 to 7c8bbd9 Compare March 31, 2025 13:49

RobinPicard marked this pull request as ready for review March 31, 2025 14:03

RobinPicard requested a review from rlouf March 31, 2025 14:03