Garbage in →

Pydantic →

you're golden!

Samuel Colvin

PyCon LT 18 May 2023

https://pretalx.com/pycon-lt-2023/talk/GMKHFE/

Today

What is Pydantic, why do people seem to like it?
Trouble in paradise
Rust to the rescue - Good, Bad, Ugly
Examples of how Rust helps Pydantic V2 solve your problems
Live demo!

Pydantic

from datetime import datetime
from pydantic import BaseModel


class Talk(BaseModel):
    title: str
    when: datetime | None = None
    mistakes: list[str]

Just type hints get you:

Validation
Coercion/tranformation
Serialization
JSON Schema

You people seemed to like it:

58m downloads/mo
used by all FAANG companies
12% of pro web developers

30s - understand

3m - useful

300hr - usable

Empathy for the developers using our library

But there's a problem...

Pydantic V2

Priorities for V2:

Performance - it was good, but could be better - think of the penguins!
Strict Mode - live up to the name
Composability - you don't always want a model
Maintainability - I maintain Pydantic so I want maintaining Pydantic to be fun

Sad penguin, no snow

What would it look like if we started from scratch?

What about Rust?

The obvious advantages...

Performance
Multithreading - no GIL
Reusing high quality rust libraries
More explicit error handling

(maybe) Less obviously advantages:

Virtually zero cost customisation, even in hot code
Arguably easier to maintain - the compiler picks up more of mistake

Rust - the good

But perhaps most pertinent to Pydantic...

from pydantic import BaseModel

class Qualification(BaseModel):
    name: str
    description: str
    required: bool
    value: int


class Student(BaseModel):
    id: int
    name: str
    qualifications: list[Qualification]
    friends: list[int]

[
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
]

Rust loves this

Deeply recursive code - no stack frames
Small modular components

How Rust?

What does that tree look like?

class Talk(BaseModel):
    title: Annotated[
        str,
        Maxlen(100)
    ]
    attendance: PosInt
    when: datetime | None = None
    mistakes: list[
        tuple[timedelta, str]
    ]

ModelValidator {
  cls: Talk,
  validator: TypeDictValidator [
    Field {
      key: "title",
      validator: StrValidator { max_len: 100 },
    },
    Field {
      key: "attendance",
      validator: IntValidator { min: 0 },
    },
    Field {
      key: "when",
      validator: UnionValidator [
        DateTimeValidator {},
        NoneValidator {},
      ],
      default: None,
    },
    Field {
      key: "mistakes",
      validator: ListValidator {
        item_validator: TupleValidator [
          TimedeltaValidator {},
          StrValidator {},
        ],
      },
    },
  ],
}

Python Interface to Rust

from pydantic_core import SchemaValidator


class Talk:
    ...

talk_validator = SchemaValidator({
    'type': 'model',
    'cls': Talk,
    'schema': {
        'type': 'typed-dict',
        'fields': {
            'title': {'schema': {'type': 'str', 'max_length': 100}},
            'attendance': {'schema': {'type': 'int', 'ge': 0}},
            'when': {
                'schema': {
                    'type': 'default',
                    'schema': {'type': 'nullable', 'schema': {'type': 'datetime'}},
                    'default': None,
                }
            },
            'mistakes': {
                'schema': {
                    'type': 'list',
                    'items_schema': {
                        'type': 'tuple',
                        'mode': 'positional',
                        'items_schema': [{'type': 'timedelta'}, {'type': 'str'}]
                    }
                }
            },
        },
    }
})

some_data = {
    'title': "How Pydantic V2 leverages Rust's Superpowers",
    'attendance': '100',
    'when': '2023-04-22T12:15:00',
    'mistakes': [
        ('00:00:00', 'Screen mirroring confusion'),
        ('00:00:30', 'Forgot to turn on the mic'),
        ('00:25:00', 'Too short'),
        ('00:40:00', 'Too long!'),
    ],
}
talk = talk_validator.validate_python(some_data)
print(talk.mistakes)
"""
[
    (datetime.timedelta(0), 'Screen mirroring confusion'), 
    (datetime.timedelta(seconds=30), 'Forgot to turn on the mic'), 
    (datetime.timedelta(seconds=1500), 'Too short'), 
    (datetime.timedelta(seconds=2400), 'Too long!')
]
"""

class Talk(BaseModel):
    title: Annotated[
        str,
        Maxlen(100)
    ]
    attendance: PosInt
    when: datetime | None = None
    mistakes: list[
        tuple[timedelta, str]
    ]

Pydantic V2 Architecture

Read type hints

construct a "core schema"

pydantic

(pure python)

pydantic-core

(binary + stubs + core-schema)

process core schema

return SchemaValidator

Receive data

call schema_validator(data)

run validator

return the result of validation

Rust - the bad

from __future__ import annotations
from pydantic import BaseModel


class Foo(BaseModel):
    a: int
    f: list[Foo]


f = {'a': 1, 'f': []}
f['f'].append(f)
Foo(**f)

fn main() {
    main();
}

RecursionError is bad, but no RecursionError is worse!

Also no multiple ownership.

Rust - the ugly

class Box:
    def __init__(self, width):
        self.width = width

    def area(self):
        return self.width ** 2

    def __str__(self):
        return f'Box: {self.width}'

box = Box(42)
print(f'{box}, area {box.area()}')

use std::fmt;

struct Box {
    width: i64,
}

impl fmt::Display for Box {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "Box: {}", self.width)
    }
}

impl Box {
    fn new(width: i64) -> Self {
        Self { width }
    }

    fn area(&self) -> i64 {
        self.width * self.width
    }
}

fn main() {
    let b = Box::new(42);
    println!("{b}, area {}", b.area());
}

Rust is significantly more verbose.

Pydantic V2

Examples

Performance

import timeit
from pydantic import BaseModel, __version__

class Model(BaseModel):
    name: str
    age: int
    friends: list[int]
    settings: dict[str, float]

data = {
    'name': 'John',
    'age': 42,
    'friends': list(range(200)),
    'settings': {f'v_{i}': i / 2.0 for i in range(50)}
}
t = timeit.timeit(
    'Model(**data)',
    globals={'data': data, 'Model': Model},
    number=10_000,
)
print(f'version={__version__} time taken {t * 100:.2f}us')

version=1.10.4 time taken 179.81us
version=2.0a3  time taken   7.99us

22.5x speedup

Strict Mode

from pydantic import BaseModel, ValidationError

class Model(BaseModel):
    model_config = dict(strict=True)
    
    age: int
    friends: tuple[int, int]

try:
    Model(age='42', friends=[1, 2])
except ValidationError as e:
    print(e)
    """
    2 validation errors for Model
    age
      Input should be a valid integer [type=int_type, 
        input_value='42', input_type=str]
    friends
      Input should be a valid tuple [type=tuple_type, 
        input_value=[1, 2], input_type=list]
    """

print(Model(age=42, friends=(1, 2)))
#> age=42 friends=(1, 2)

AKA Pedant mode.

Builtin JSON parsing

from pydantic import BaseModel

class Model(BaseModel):
    model_config = dict(strict=True)
    age: int
    friends: tuple[int, int]

print(Model.model_validate_json('{"age": 1, "friends": [1, 2]}'))
#> age=1 friends=(1, 2)

If you're going to be a pedant, you better be right.

Also gives us:

Big performance improvement without 3rd party parsing library
Custom Errors (WIP)
Line numbers in errors (in future)

Wrap Validators

from pydantic import BaseModel, field_validator

class Model(BaseModel):
    x: int

    @field_validator('x', mode='wrap')
    def validate_x(cls, v, handler):
        if v == 'one':
            return 1

        try:
            return handler(v)
        except ValueError:
            return -999

print(Model(x='one'))
#> x=1
print(Model(x=2))
#> x=2
print(Model(x='three'))
#> x=-999

Logic before
Logic after
Catch errors - new error, or default

AKA "The Onion"

Recursive Models

from __future__ import annotations
from pydantic import BaseModel, Field, ValidationError

class Branch(BaseModel):
    length: float
    branches: list[Branch] = Field(default_factory=list)

print(Branch(length=1, branches=[{'length': 2}]))
#> length=1.0 branches=[Branch(length=2.0, branches=[])]

b = {'length': 1, 'branches': []}
b['branches'].append(b)

try:
    Branch.model_validate(b)
except ValidationError as e:
    print(e)
    """
    1 validation error for Branch
    branches.0
      Recursion error - cyclic reference detected 
        [type=recursion_loop, 
         input_value={'length': 1, 'branches': [{...}]}, 
         input_type=dict]
    """

Alias Paths

from pydantic import BaseModel, Field, AliasPath, AliasChoices


class MyModel(BaseModel):
    a: int = Field(validation_alias=AliasPath('foo', 1, 'bar'))
    b: str = Field(validation_alias=AliasChoices('x', 'y'))


m = MyModel.model_validate(
    {
        'foo': [{'bar': 0}, {'bar': 1}],
        'y': 'Y',
    }
)
print(m)
#> a=1 b='Y'

Generics

from typing import Generic, TypeVar

from pydantic import BaseModel

DataT = TypeVar('DataT')

class Response(BaseModel, Generic[DataT]):
    error: int | None = None
    data: DataT | None = None

class Profile(BaseModel):
    name: str
    email: str

def my_profile_view(id: int) -> Response[Profile]:
    if id == 42:
        return Response[Profile](data={'name': 'John', 'email': 'john@example.com'})
    else:
        return Response[Profile](error=404)

print(my_profile_view(42))
#> error=None data=Profile(name='John', email='john@example.com')
Favorite = tuple[int, str]

def my_favorites_view() -> Response[list[Favorite]]:
    return Response[list[Favorite]](data=[(1, 'a'), (2, 'b')])

Serialisation

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

class Profile(BaseModel):
    account_id: int
    user: User

user = User(name='Alice', age=1)
print(Profile(account_id=1, user=user).model_dump())
#> {'account_id': 1, 'user': {'name': 'Alice', 'age': 1}}

class AuthUser(User):
    password: str

auth_user = AuthUser(name='Bob', age=2, password='very secret')
print(Profile(account_id=2, user=auth_user).model_dump())
#> {'account_id': 2, 'user': {'name': 'Bob', 'age': 2}}

Solving the "don't ask the type" problem.

Without BaseModel

from dataclasses import dataclass
from pydantic import TypeAdapter

@dataclass
class Foo:
    a: int
    b: int

@dataclass
class Bar:
    c: int
    d: int

x = TypeAdapter(Foo | Bar)
d = x.validate_json('{"a": 1, "b": 2}')

print(d)
#> Foo(a=1, b=2)

print(x.dump_json(d))
#> b'{"a":1,"b":2}'

BaseModel is still here and widely used, but no longer essentials.

Enter TypeAdapter.

Demo

Needed to move off Google Analytics
Record page views without a cookie
Store in MongoDB
End up with a big JSON file to analyse
Want to see which pages are viewed most

Thank you

Twitter: @pydantic & @samuel_colvin

GitHub: /pydantic & /samuelcolvin

Docs: docs.pydantic.dev

We need your help:

Try pydantic V2 alpha before we release V2!
Applications using Pydantic (without FastAPI)
Are you using Pydantic to process lots of data - if so we'd love to chat to you about the commercial platform we're building

Not Rust vs. Python

But rather: Python as the user* interface for Rust.

(* by user, I mean "application developer")

I'd love to see a generation of libraries for Python (and other high level languages) built in Rust.

Rust

TLS

Routing

HTTP parsing

Validation

DB query

Serializing

Rust/C

Python

Application Logic

HTTPS request lifecycle:

100% of Developer time

1% of CPU cycles

...

Ok, some actual Rust...

Pydantic V2

#[enum_dispatch(CombinedValidator)]
trait Validator {
    const EXPECTED_TYPE: &'static str;

    fn build(schema: &PyDict, config: Option<&PyDict>) -> PyResult<CombinedValidator>;

    fn validate(&self, input: &impl Input, extra: &Extra) -> ValResult<PyObject>;
}

#[enum_dispatch]
enum CombinedValidator {
    Int(IntValidator),
    Str(StrValidator),
    TypedDict(TypedDictValidator),
    Union(UnionValidator),
    TaggedUnion(TaggedUnionValidator),
    Nullable(NullableValidator),
    // ... and 43 more
}

fn build_validator(schema: &PyDict, config: Option<&PyDict>) -> PyResult<CombinedValidator> {
    let schema_type: &str = schema.get_as_req("type")?;
    // really this is a clever macro to avoid the duplication
    match schema_type {
        IntValidator::EXPECTED_TYPE => IntValidator::build(schema, config),
        StrValidator::EXPECTED_TYPE => StrValidator::build(schema, config),
        TypedDictValidator::EXPECTED_TYPE => TypedDictValidator::build(schema, config),
        UnionValidator::EXPECTED_TYPE => UnionValidator::build(schema, config),
        TaggedUnionValidator::EXPECTED_TYPE => TaggedUnionValidator::build(schema, config),
        NullableValidator::EXPECTED_TYPE => NullableValidator::build(schema, config),
        // ... and 43 more
    }
}

trait Input<'a> {
    fn is_none(&self) -> bool;

    fn strict_str(&'a self) -> ValResult<&'a str>;

    fn lax_str(&'a self) -> ValResult<&'a str>;

    fn validate_date(&self, strict: bool) -> ValResult<PyDatetime>;

    fn strict_date(&self) -> ValResult<PyDatetime>;

    // ... and 53 more
}

impl<'a> Input<'a> for PyAny {
    // ...
}

impl<'a> Input<'a> for JsonInput {
    // ...
}

#[pyclass]
struct SchemaValidator {
    validator: CombinedValidator,
}

#[pymethods]
impl SchemaValidator {
    #[new]
    fn py_new(schema: &PyDict, config: Option<&PyDict>) -> PyResult<Self> {
        // We also do magic/evil schema validation using pydantic-core itself
        let validator = build_validator(schema, config)?;
        Ok(SchemaValidator { validator })
    }

    fn validate_python(&self, input: &PyAny, strict: Option<bool>) -> PyResult<PyObject> {
        self.validator.validate(input, &Extra::new(strict))
    }

    fn validate_json(
        &self,
        input_string: &PyString,
        strict: Option<bool>,
    ) -> PyResult<PyObject> {
        let input = parse_string(input_string)?;
        self.validator.validate(&input, &Extra::new(strict))
    }
}

Garbage in →

Pydantic →

you're golden!

Samuel Colvin

Today

Pydantic

But there's a problem...

Pydantic V2

What about Rust?

Rust - the good

How Rust?

Python Interface to Rust

Pydantic V2 Architecture

pydantic

pydantic-core

Rust - the bad

Rust - the ugly

Pydantic V2

Examples

Performance

Strict Mode

Builtin JSON parsing

Wrap Validators

Recursive Models

Alias Paths

Generics

Serialisation

Without BaseModel

Demo

Thank you

Not Rust vs. Python

Ok, some actual Rust...

PyCon LT | Garbage in -> Pydantic -> you're golden!

PyCon LT | Garbage in -> Pydantic -> you're golden!

Samuel Colvin

Garbage in →

Pydantic →

you're golden!

Samuel Colvin

Today

Pydantic

But there's a problem...

Pydantic V2

What about Rust?

Rust - the good

How Rust?

Python Interface to Rust

Pydantic V2 Architecture

pydantic

pydantic-core

Rust - the bad

Rust - the ugly

Pydantic V2

Examples

Performance

Strict Mode

Builtin JSON parsing

Wrap Validators

Recursive Models

Alias Paths

Generics

Serialisation

Without BaseModel

Demo

Thank you

Not Rust vs. Python

Ok, some actual Rust...

PyCon LT | Garbage in -> Pydantic -> you're golden!

More from Samuel Colvin