Garbage in →
Pydantic →
you're golden!
by
Samuel Colvin
PyCon LT 18 May 2023
https://pretalx.com/pycon-lt-2023/talk/GMKHFE/
Today
- What is Pydantic, why do people seem to like it?
- Trouble in paradise
- Rust to the rescue - Good, Bad, Ugly
- Examples of how Rust helps Pydantic V2 solve your problems
- Live demo!
Pydantic
from datetime import datetime
from pydantic import BaseModel
class Talk(BaseModel):
    title: str
    when: datetime | None = None
    mistakes: list[str]
Just type hints get you:
- Validation
- Coercion/tranformation
- Serialization
- JSON Schema
You people seemed to like it:
- 58m downloads/mo
- used by all FAANG companies
- 12% of pro web developers
30s - understand
3m - useful
300hr - usable
Empathy for the developers using our library
But there's a problem...


Pydantic V2
Priorities for V2:
- Performance - it was good, but could be better - think of the penguins!
- Strict Mode - live up to the name
- Composability - you don't always want a model
- Maintainability - I maintain Pydantic so I want maintaining Pydantic to be fun

Sad penguin, no snow
What would it look like if we started from scratch?

What about Rust?
The obvious advantages...
- Performance
- Multithreading - no GIL
- Reusing high quality rust libraries
- More explicit error handling
(maybe) Less obviously advantages:
- Virtually zero cost customisation, even in hot code
- Arguably easier to maintain - the compiler picks up more of mistake
Rust - the good
But perhaps most pertinent to Pydantic...
from pydantic import BaseModel
class Qualification(BaseModel):
    name: str
    description: str
    required: bool
    value: int
class Student(BaseModel):
    id: int
    name: str
    qualifications: list[Qualification]
    friends: list[int]

[
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
    ...,
]Rust loves this
- Deeply recursive code - no stack frames
- Small modular components
How Rust?
What does that tree look like?
class Talk(BaseModel):
    title: Annotated[
        str,
        Maxlen(100)
    ]
    attendance: PosInt
    when: datetime | None = None
    mistakes: list[
        tuple[timedelta, str]
    ]
ModelValidator {
  cls: Talk,
  validator: TypeDictValidator [
    Field {
      key: "title",
      validator: StrValidator { max_len: 100 },
    },
    Field {
      key: "attendance",
      validator: IntValidator { min: 0 },
    },
    Field {
      key: "when",
      validator: UnionValidator [
        DateTimeValidator {},
        NoneValidator {},
      ],
      default: None,
    },
    Field {
      key: "mistakes",
      validator: ListValidator {
        item_validator: TupleValidator [
          TimedeltaValidator {},
          StrValidator {},
        ],
      },
    },
  ],
}Python Interface to Rust
from pydantic_core import SchemaValidator
class Talk:
    ...
talk_validator = SchemaValidator({
    'type': 'model',
    'cls': Talk,
    'schema': {
        'type': 'typed-dict',
        'fields': {
            'title': {'schema': {'type': 'str', 'max_length': 100}},
            'attendance': {'schema': {'type': 'int', 'ge': 0}},
            'when': {
                'schema': {
                    'type': 'default',
                    'schema': {'type': 'nullable', 'schema': {'type': 'datetime'}},
                    'default': None,
                }
            },
            'mistakes': {
                'schema': {
                    'type': 'list',
                    'items_schema': {
                        'type': 'tuple',
                        'mode': 'positional',
                        'items_schema': [{'type': 'timedelta'}, {'type': 'str'}]
                    }
                }
            },
        },
    }
})
some_data = {
    'title': "How Pydantic V2 leverages Rust's Superpowers",
    'attendance': '100',
    'when': '2023-04-22T12:15:00',
    'mistakes': [
        ('00:00:00', 'Screen mirroring confusion'),
        ('00:00:30', 'Forgot to turn on the mic'),
        ('00:25:00', 'Too short'),
        ('00:40:00', 'Too long!'),
    ],
}
talk = talk_validator.validate_python(some_data)
print(talk.mistakes)
"""
[
    (datetime.timedelta(0), 'Screen mirroring confusion'), 
    (datetime.timedelta(seconds=30), 'Forgot to turn on the mic'), 
    (datetime.timedelta(seconds=1500), 'Too short'), 
    (datetime.timedelta(seconds=2400), 'Too long!')
]
"""
class Talk(BaseModel):
    title: Annotated[
        str,
        Maxlen(100)
    ]
    attendance: PosInt
    when: datetime | None = None
    mistakes: list[
        tuple[timedelta, str]
    ]
Pydantic V2 Architecture
Read type hints
construct a "core schema"
pydantic
(pure python)
pydantic-core
(binary + stubs + core-schema)
process core schema
return SchemaValidator
Receive data
call schema_validator(data)
run validator
return the result of validation
Rust - the bad
from __future__ import annotations
from pydantic import BaseModel
class Foo(BaseModel):
    a: int
    f: list[Foo]
f = {'a': 1, 'f': []}
f['f'].append(f)
Foo(**f)fn main() {
    main();
}
RecursionError is bad, but no RecursionError is worse!
Also no multiple ownership.
Rust - the ugly
class Box:
    def __init__(self, width):
        self.width = width
    def area(self):
        return self.width ** 2
    def __str__(self):
        return f'Box: {self.width}'
box = Box(42)
print(f'{box}, area {box.area()}')
use std::fmt;
struct Box {
    width: i64,
}
impl fmt::Display for Box {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "Box: {}", self.width)
    }
}
impl Box {
    fn new(width: i64) -> Self {
        Self { width }
    }
    fn area(&self) -> i64 {
        self.width * self.width
    }
}
fn main() {
    let b = Box::new(42);
    println!("{b}, area {}", b.area());
}
Rust is significantly more verbose.
Pydantic V2
Examples
Performance
import timeit
from pydantic import BaseModel, __version__
class Model(BaseModel):
    name: str
    age: int
    friends: list[int]
    settings: dict[str, float]
data = {
    'name': 'John',
    'age': 42,
    'friends': list(range(200)),
    'settings': {f'v_{i}': i / 2.0 for i in range(50)}
}
t = timeit.timeit(
    'Model(**data)',
    globals={'data': data, 'Model': Model},
    number=10_000,
)
print(f'version={__version__} time taken {t * 100:.2f}us')
version=1.10.4 time taken 179.81us
version=2.0a3  time taken   7.99us
22.5x speedupStrict Mode
from pydantic import BaseModel, ValidationError
class Model(BaseModel):
    model_config = dict(strict=True)
    
    age: int
    friends: tuple[int, int]
try:
    Model(age='42', friends=[1, 2])
except ValidationError as e:
    print(e)
    """
    2 validation errors for Model
    age
      Input should be a valid integer [type=int_type, 
        input_value='42', input_type=str]
    friends
      Input should be a valid tuple [type=tuple_type, 
        input_value=[1, 2], input_type=list]
    """
print(Model(age=42, friends=(1, 2)))
#> age=42 friends=(1, 2)
AKA Pedant mode.
Builtin JSON parsing
from pydantic import BaseModel
class Model(BaseModel):
    model_config = dict(strict=True)
    age: int
    friends: tuple[int, int]
print(Model.model_validate_json('{"age": 1, "friends": [1, 2]}'))
#> age=1 friends=(1, 2)If you're going to be a pedant, you better be right.
Also gives us:
- Big performance improvement without 3rd party parsing library
- Custom Errors (WIP)
- Line numbers in errors (in future)
Wrap Validators
from pydantic import BaseModel, field_validator
class Model(BaseModel):
    x: int
    @field_validator('x', mode='wrap')
    def validate_x(cls, v, handler):
        if v == 'one':
            return 1
        try:
            return handler(v)
        except ValueError:
            return -999
print(Model(x='one'))
#> x=1
print(Model(x=2))
#> x=2
print(Model(x='three'))
#> x=-999
- Logic before
- Logic after
- Catch errors - new error, or default
AKA "The Onion"
Recursive Models
from __future__ import annotations
from pydantic import BaseModel, Field, ValidationError
class Branch(BaseModel):
    length: float
    branches: list[Branch] = Field(default_factory=list)
print(Branch(length=1, branches=[{'length': 2}]))
#> length=1.0 branches=[Branch(length=2.0, branches=[])]
b = {'length': 1, 'branches': []}
b['branches'].append(b)
try:
    Branch.model_validate(b)
except ValidationError as e:
    print(e)
    """
    1 validation error for Branch
    branches.0
      Recursion error - cyclic reference detected 
        [type=recursion_loop, 
         input_value={'length': 1, 'branches': [{...}]}, 
         input_type=dict]
    """Alias Paths
from pydantic import BaseModel, Field, AliasPath, AliasChoices
class MyModel(BaseModel):
    a: int = Field(validation_alias=AliasPath('foo', 1, 'bar'))
    b: str = Field(validation_alias=AliasChoices('x', 'y'))
m = MyModel.model_validate(
    {
        'foo': [{'bar': 0}, {'bar': 1}],
        'y': 'Y',
    }
)
print(m)
#> a=1 b='Y'Generics
from typing import Generic, TypeVar
from pydantic import BaseModel
DataT = TypeVar('DataT')
class Response(BaseModel, Generic[DataT]):
    error: int | None = None
    data: DataT | None = None
class Profile(BaseModel):
    name: str
    email: str
def my_profile_view(id: int) -> Response[Profile]:
    if id == 42:
        return Response[Profile](data={'name': 'John', 'email': 'john@example.com'})
    else:
        return Response[Profile](error=404)
print(my_profile_view(42))
#> error=None data=Profile(name='John', email='john@example.com')
Favorite = tuple[int, str]
def my_favorites_view() -> Response[list[Favorite]]:
    return Response[list[Favorite]](data=[(1, 'a'), (2, 'b')])Serialisation
from pydantic import BaseModel
class User(BaseModel):
    name: str
    age: int
class Profile(BaseModel):
    account_id: int
    user: User
user = User(name='Alice', age=1)
print(Profile(account_id=1, user=user).model_dump())
#> {'account_id': 1, 'user': {'name': 'Alice', 'age': 1}}
class AuthUser(User):
    password: str
auth_user = AuthUser(name='Bob', age=2, password='very secret')
print(Profile(account_id=2, user=auth_user).model_dump())
#> {'account_id': 2, 'user': {'name': 'Bob', 'age': 2}}
Solving the "don't ask the type" problem.
Without BaseModel
from dataclasses import dataclass
from pydantic import TypeAdapter
@dataclass
class Foo:
    a: int
    b: int
@dataclass
class Bar:
    c: int
    d: int
x = TypeAdapter(Foo | Bar)
d = x.validate_json('{"a": 1, "b": 2}')
print(d)
#> Foo(a=1, b=2)
print(x.dump_json(d))
#> b'{"a":1,"b":2}'
BaseModel is still here and widely used, but no longer essentials.
Enter TypeAdapter.
Demo
- Needed to move off Google Analytics
- Record page views without a cookie
- Store in MongoDB
- End up with a big JSON file to analyse
- Want to see which pages are viewed most
Thank you
Twitter: @pydantic & @samuel_colvin
GitHub: /pydantic & /samuelcolvin
Docs: docs.pydantic.dev
We need your help:
- Try pydantic V2 alpha before we release V2!
- Applications using Pydantic (without FastAPI)
- Are you using Pydantic to process lots of data - if so we'd love to chat to you about the commercial platform we're building
Not Rust vs. Python
But rather: Python as the user* interface for Rust.
(* by user, I mean "application developer")
I'd love to see a generation of libraries for Python (and other high level languages) built in Rust.
Rust
TLS
Routing
HTTP parsing
Validation
DB query
Serializing
Rust/C
Python
Application Logic
HTTPS request lifecycle:
100% of Developer time
=
1% of CPU cycles
...
Ok, some actual Rust...
Pydantic V2
#[enum_dispatch(CombinedValidator)]
trait Validator {
    const EXPECTED_TYPE: &'static str;
    fn build(schema: &PyDict, config: Option<&PyDict>) -> PyResult<CombinedValidator>;
    fn validate(&self, input: &impl Input, extra: &Extra) -> ValResult<PyObject>;
}
#[enum_dispatch]
enum CombinedValidator {
    Int(IntValidator),
    Str(StrValidator),
    TypedDict(TypedDictValidator),
    Union(UnionValidator),
    TaggedUnion(TaggedUnionValidator),
    Nullable(NullableValidator),
    // ... and 43 more
}
fn build_validator(schema: &PyDict, config: Option<&PyDict>) -> PyResult<CombinedValidator> {
    let schema_type: &str = schema.get_as_req("type")?;
    // really this is a clever macro to avoid the duplication
    match schema_type {
        IntValidator::EXPECTED_TYPE => IntValidator::build(schema, config),
        StrValidator::EXPECTED_TYPE => StrValidator::build(schema, config),
        TypedDictValidator::EXPECTED_TYPE => TypedDictValidator::build(schema, config),
        UnionValidator::EXPECTED_TYPE => UnionValidator::build(schema, config),
        TaggedUnionValidator::EXPECTED_TYPE => TaggedUnionValidator::build(schema, config),
        NullableValidator::EXPECTED_TYPE => NullableValidator::build(schema, config),
        // ... and 43 more
    }
}
trait Input<'a> {
    fn is_none(&self) -> bool;
    fn strict_str(&'a self) -> ValResult<&'a str>;
    fn lax_str(&'a self) -> ValResult<&'a str>;
    fn validate_date(&self, strict: bool) -> ValResult<PyDatetime>;
    fn strict_date(&self) -> ValResult<PyDatetime>;
    // ... and 53 more
}
impl<'a> Input<'a> for PyAny {
    // ...
}
impl<'a> Input<'a> for JsonInput {
    // ...
}
#[pyclass]
struct SchemaValidator {
    validator: CombinedValidator,
}
#[pymethods]
impl SchemaValidator {
    #[new]
    fn py_new(schema: &PyDict, config: Option<&PyDict>) -> PyResult<Self> {
        // We also do magic/evil schema validation using pydantic-core itself
        let validator = build_validator(schema, config)?;
        Ok(SchemaValidator { validator })
    }
    fn validate_python(&self, input: &PyAny, strict: Option<bool>) -> PyResult<PyObject> {
        self.validator.validate(input, &Extra::new(strict))
    }
    fn validate_json(
        &self,
        input_string: &PyString,
        strict: Option<bool>,
    ) -> PyResult<PyObject> {
        let input = parse_string(input_string)?;
        self.validator.validate(&input, &Extra::new(strict))
    }
}
PyCon LT | Garbage in -> Pydantic -> you're golden!
By Samuel Colvin
PyCon LT | Garbage in -> Pydantic -> you're golden!
- 3,596
 
   
   
  