Population#
The Population class is a lightweight container for the three core synthetic
population DataFrames that POLARIS uses to drive demand simulation:
households — one row per household (
householdid, location, income, etc.)persons — one row per person (
personid,householdid, age, gender, etc.)vehicles — one row per vehicle (
vehicle_id,hhold, type, etc.)
Population is a NamedTuple, so the three DataFrames are also accessible by index
or via iterable unpacking:
pop = Population.from_dir(my_run_dir)
hh, persons, vehicles = pop
Loading a population#
The most common entry point is from_dir(),
which loads from a model run directory:
from polaris.analyze.population import Population
pop = Population.from_dir(my_run_dir)
print(pop)
# Population(Households: 5641, Persons: 5926, Vehicles: 140)
Fast HDF5 cache#
The first call to from_dir reads from the *Demand.sqlite file and then writes
a population-cache.hdf5 file alongside it. Subsequent calls compare the mtimes
of the SQLite and the HDF5 cache and load from whichever is newer — for a typical
demand database with hundreds of thousands of households this turns a multi-second
SQL read into a sub-second pandas read.
The cache is invalidated automatically: if you re-run the model (or otherwise
modify the demand SQLite), the next from_dir call will fall back to the SQLite
file and refresh the cache.
If you want explicit control over which source is used, you can call the lower level loaders directly:
pop = Population.from_sqlite(demand_db_path)
pop = Population.from_hdf5(hdf5_path)
Modifying a population#
Population provides a small set of pure (non-mutating) transformations that
return a new Population:
exclude_hh_ids#
Drop a set of households (and their persons and vehicles) — useful for cleaning or for building train/test splits:
bad_hh_ids = {123, 456, 789}
filtered = pop.exclude_hh_ids(bad_hh_ids, reason="missing home location")
combine_with#
Append another Population to this one, offsetting the incoming household and
person IDs so they don’t collide. The vehicle hhold column is also offset to
stay consistent:
combined = pop_a.combine_with(pop_b, hh_id_offset=pop_a.households.household.max() + 1)
This is the building block for scenarios where you want to merge multiple sub-populations (e.g., adding a synthetic subset of long-distance travellers).
Saving a population#
Back to SQLite#
to_sqlite writes households and persons back into a demand database. It accepts
three modes for handling existing data:
Mode |
Behaviour |
|---|---|
|
Delete all existing |
|
Replace only the rows whose household IDs overlap |
|
Raise |
pop.to_sqlite(demand_db_path, mode="clean")
If path is a directory rather than a file, to_sqlite will look up
*Demand.sqlite inside it.
To HDF5#
pop.to_hdf5(my_dir / "population-snapshot.hdf5")
The HDF5 writer transparently coerces object/string columns whose values are all
numeric back to numeric dtypes — PyTables can’t otherwise serialize numeric
values stored as object.