Code Generation
This library uses code generation in Pydantic, with plans to move to Patito soon.
It works by taking the OpenAPI JSON schemas (a.k.a. "Swagger" schemas), and using the "components" which correspond to entities like "StopPoint" which means something like a bus stop or train platform.
These are 'DTOs' or Data Transport Objects, and correspond pretty neatly to data models, for which Python provides the Pydantic model class for validation.
I rolled my own schema handling for this, it's pretty straightforward. The one awkward part was getting informative names for the DTO models, which required cross-referencing against the old "Unified API" (marked as deprecated i.e. 'do not use' in the official TfL API docs!), in a process I call "reference chasing".
Each API is self-contained
(meaning that one API may redeclare the same entities as another,
and thus the generated code will have the same DTO Pydantic models).
For instance there's a model class called Mode
in 3 different APIs (Line, Journey, and StopPoint), each representing the mode of transport.
These nest inside each other: Pydantic models that contain other Pydantic models deserialise JSON into nested models, so it's an ideal choice here.
For example, here's the Line
class in the Line
API. The class name is the name of the DTO,
i.e. of the schema component/entity, and the API it's found in is stored in the
_source_schema_name
private attribute, which is not used for any functionality, only to keep track
of where these objects are coming from. We can also see from _component_schema_name
that its
original name was "Tfl-19" which isn't very informative (the name 'Line' which ended up as the class
name was found by cross-referencing against the DTOs in the Unified API).
class Line(BaseModel):
"""
Autogenerated from Line::Tfl.Api.Presentation.Entities.Line
"""
model_config = ConfigDict(
alias_generator=AliasGenerator(validation_alias=to_camel_case),
)
Id: str = None
Name: str = None
ModeName: str = None
Disruptions: list["DisruptionModel"]
Created: datetime = None
Modified: datetime = None
LineStatuses: list["LineStatusModel"]
RouteSections: list["MatchedRouteModel"]
ServiceTypes: list["LineServiceTypeInfoModel"]
Crowding: CrowdingModel = None
_source_schema_name: str = PrivateAttr(default='Line')
_component_schema_name: str = PrivateAttr(default='Tfl-19')
LineModel = Line
Notice that this Pydantic model has an alias_generator
set in its model_config
before the fields are declared. This means that for instance a key modeName
will be transformed to
ModeName
. This is necessary because some of the keys in the TfL APIs are reserved words
(in fact words that are part of the Python language itself, and which would therefore produce
unparseable ASTs if used as class attribute
names (the field names in Pydantic model classes).
Also note: due to a current 'quirk' of Pydantic, all of the DTOs have a 'nickname' which is the
class name with 'Model' on the end, and this is used in the nested definitions. So
list["LineStatusModel"]
in the LineStatuses
field definition above would resolve to a list of
the LineStatus
class, and the Line
class itself is given the nickname 'LineModel'.
There were other workarounds but this was the simplest!