Pydantic

What's coming in V2

&

A review of building python extensions with Rust

Please ask questions as I go along ... I've written this in a hurry so if you're confused, you're probably not alone

Some Background

  • Pydantic does data validation using type hints
  • It was first released in 2017 as a tiny experiment
  • The library has since seen massive growth ... particularly since Sebastián used pydantic in FastAPI. Thanks Sebastián! 🙏
  • Pydantic hasn't been significantly rewritten since v0.0.1
  • The internals are creaking
  • V2 is an opportunity to fix some of footguns but also re-write the internals

The good bits

Everything! (and hopefully nothing for you):

  • all validation is offloaded to another library - pydantic-core which I'm building now

  • Ugly hangovers from mistakes I made while building v0.0.1 can be killed
  • We can use a little of the speed premium provided by pydantic-core to do some stuff more correctly - e.g. smart unions
  • I've given up on the "validation" vs. "parsing" vs. "coercion" debate, I'm just using "validate" - if you don't like it 🖕 (or just use strict=True)

What's changing in V2

"You can't make a Tomlette without breaking a few Gregs!" (Succession S2E9) - best TV pun ever?

I'm going to have to upset some people to get V2 out:

  • a bunch of PRs will have to be closed - users can come back and restart them once the meat of the V2 changes are in main
  • Some functionality will change, I think for the better, but some people will disagree
  • Some hacks and workarounds will no longer be possible as the logic is in rust where you can't mess
  • no more subclasses of basic types(str, bytes) - except with wrap decorators
  • (initially) no pure python implementation of pydantic-core

The bad bits

What's changing in V2

Wrap Validators

class MyModel(BaseModel):
    appointment_time: datetime | None
    
    @validator('appointment_time', mode='wrap')
    def validate_appointment_time(cls, v, handler):
        if v == 'now':
            return datetime.now()
        
        try:
            return handler(v)
        except ValidationError:
            # we don't want to fail, so just use None
            return None

New Features

(Implemented, but not with this nice syntactic sugar)

AKA "The onion" - like middleware

Strict Mode

class StrictModel(BaseModel, strict=True):
    a_string: str
    an_int: int
    set_of_ints: Set[int]


class LaxModel(BaseModel):
    lax_string: str
    strict_string1: str = Field(..., strict=True)
    strict_string2: StrictStr

New Features

(Implemented, but not with this nice syntactic sugar)

(Does not apply when "downcasting" from JSON)

Smart Union

class MyModel(BaseModel):
    int_or_bool: int | bool
    bool_or_int: bool | int


print(MyModel(int_or_bool=1))  #> 1, 1
print(MyModel(int_or_bool=True))  #> True, True
print(MyModel(int_or_bool='1'))  #> 1, True :-( ?

New Features

(Implemented, but not with this nice syntactic sugar)

(Using strict mode)

(Also works with models and model instances)

Optional vs. Nullable

class MyModel1(BaseModel):
    none_allowed_required: str | None
    none_allowed_not_required: str | None = None

      
from typing_extentions import Required, NotRequired


class MyModel2(BaseModel):  # ... maybe, do you want this?
    none_allowed_required: Required[str]
    none_allowed_not_required: NotRequired[str]

New Features

(Implemented, but not with this nice syntactic sugar)

(I'm no longer scared of the word "optional")

Validation without a model

from pydantic import validate

validate(List[int], [1, 2, '3'])  #> [1, 2, 3]

validate(List[int], [1, 2, 3], strict=True)  #> [1, 2, 3]

validate(List[int], [1, 2, '3'], strict=True)
#> raises ValidationError

New Features

(Implemented, but not with this nice syntactic sugar)

No intermediate model required (unlike parse_obj in V1)

Parsing JSON directly

class MyModel(BaseModel):
    name: str
    age: int
    friends: List[int]
    settings: Dict[str, float]

MyModel.validate_json('{...}')

New Features

(Implemented, but not with this nice syntactic sugar)

No json.loads - just rust JSON parsing straight into validation

  • We could add support for other formats (e.g. yaml, toml) the only side affect would be bigger binaries
  • Not yet possible to get the line number :-(

Speed

New Features

Benchmark Speed up
Simple model (str, int, List[int], Dict[str, float]) 15.97x
Simple model - JSON 11.56x
A bool (single value) 3.46x
Recursive model, 50 deep 3.99x
list of typed dicts, length 100 12.14x
list of ints, length 1000 25.49x

And more...

New Features

  • hopefully 🤞 less "performance guilt"™
  • context kwarg to validator functions
  • input value is available in errors
  • cleaning up the namespace - so you can use fields like "json", "dict" and "fields" and more importantly, so we can add more methods in future, either:
    • all methods will have a prefix, e.g. "my_.model_dict()", ".model_json()", ".model_schema()"
    • or, a namespace object: ".m.dict()", ".m.json()"

pydantic-core internals

from pydantic_core import SchemaValidator
schema_validator = SchemaValidator({'type': 'bool'})
print(repr(schema_validator))

Python extensions in Rust

(This code actually runs now!)

Let's start simple

SchemaValidator(name="bool", validator=BoolValidator)
print(schema_validator.validate_python(True)) -> True
print(schema_validator.validate_python(1))    -> True
print(schema_validator.validate_json('true')) -> True

pydantic-core internals

from pydantic_core import SchemaValidator
# Equivalent to: Dict[str, Optional[int]]
schema_validator = SchemaValidator({
    'type': 'dict',
    'keys': {'type': 'str'},
    'values': {'type': 'optional', 'schema': {'type': 'int'}}
})

Python extensions in Rust

Let's get a bit more complicated

SchemaValidator(name="dict", validator=DictValidator {
    strict: false,
    key_validator: Some(StrValidator),
    value_validator: Some(
        OptionalValidator {validator: IntValidator},
    ),
    min_items: None,
    max_items: None,
    try_instance_as_dict: false,
})

pydantic-core internals

class MyModel(BaseModel):
    name: str
    age: int | None = 42
    settings: dict[str, float]
    friends: list[int | str]

Python extensions in Rust

And finally...

(you don't need to read all this)

SchemaValidator(name="MyCoreModel", 
  validator=ModelClassValidator {
    strict: false,
    class: Py(0x12fe7e7c0), (MyCoreModel)
    new_method: Py(0x00101054130), (MyCoreModel.__new__)
    validator: ModelValidator {
        name: "Model",
        fields: [
            ModelField {
                name: "name",
                default: None,
                validator: StrValidator,
            },
            ModelField {
                name: "age",
                default: 42,
                validator: OptionalValidator { 
                  validator: IntValidator 
                },
            },
            ModelField {
                name: "settings",
                default: None,
                validator: DictValidator { ... },
            },
            ModelField {
                name: "friends",
                default: None,
                validator: ListValidator {
                    strict: false,
                    item_validator: Some(
                        UnionValidator {
                            choices: [
                                IntValidator,
                                StrValidator,
                            ],
                        },
                    ),
                    min_items: None,
                    max_items: None,
                },
            },
        ],
        extra_behavior: Ignore,
        extra_validator: None,
    },
})

Many other people have said all this, there are many (much better) talks about it.

But for completeness, the good:

  • Speed is the biggest win
  • Provides a way to hook into existing great libraries written in rust - rtoml and watchfiles - when is someone going to do this for ASGI?
  • Rust's error handling makes easy to catch and deal with errors
  • Rust provides excellent primitives for threading
  • pyo3 is amazing, getting started is very easy

The bad:

  • Writing rust will always be slower than python
  • there's a big learning curve

The ugly:

  • Fighting the borrow checker is boring
  • There will be boilerplate, macros to avoid boilerplate can be even worse...

 

The obvious things

Writing Python extensions in Rust

  • "Performance guilt":
    • With rust there's no penalty for recursion
    • and no penalty for small functions
    • so ... you can build more modular code without paying a performance penalty
    • The theory is that someone can come along in 5 years time and add another type to pydantic-core, and:
      • The type checker and linter will stop them doing dumb things
      • There will be zero runtime penalty if you don't use it
      • Their change can be small since there's no performance penalty for calling out to existing code
  • ​​Contributions ... ?
    • ​Will there be fewer, will they be "better"? or worse?

The less obvious things

Writing Python extensions in Rust

#[derive(Debug, Clone)]
pub struct OptionalValidator {
    validator: Box<dyn Validator>,
}

impl OptionalValidator {
    pub const EXPECTED_TYPE: &'static str = "optional";
}

impl Validator for OptionalValidator {
    fn build(schema: &PyDict, config: Option<&PyDict>) -> PyResult<Box<dyn Validator>> {
        let schema: &PyAny = schema.get_as_req("schema")?;
        Ok(Box::new(Self {
            validator: build_validator(schema, config)?.0,
        }))
    }

    fn validate<'s, 'data>(
        &'s self,
        py: Python<'data>,
        input: &'data dyn Input,
        extra: &Extra,
    ) -> ValResult<'data, PyObject> {
        match input.is_none() {
            true => Ok(py.None()),
            false => self.validator.validate(py, input, extra),
        }
    }

    fn validate_strict<'s, 'data>(
        &'s self,
        py: Python<'data>,
        input: &'data dyn Input,
        extra: &Extra,
    ) -> ValResult<'data, PyObject> {
        match input.is_none() {
            true => Ok(py.None()),
            false => self.validator.validate_strict(py, input, extra),
        }
    }

    fn set_ref(&mut self, name: &str, validator_arc: &ValidatorArc) -> PyResult<()> {
        self.validator.set_ref(name, validator_arc)
    }

    validator_boilerplate!(Self::EXPECTED_TYPE);
}

Writing Python extensions in Rust, example... (please don't cry)

mod optional;

...
lots of code...
...

    validator_match!(
        type_,
        dict,
        config,
        ... all the other validators
        // unions
        self::union::UnionValidator,
        self::optional::OptionalValidator,
        ...
    )

Writing Python extensions in Rust, example...

using my validator

Questions?