Population#

The Population class is a lightweight container for the three core synthetic population DataFrames that POLARIS uses to drive demand simulation:

  • households — one row per household (household id, location, income, etc.)

  • persons — one row per person (person id, household id, age, gender, etc.)

  • vehicles — one row per vehicle (vehicle_id, hhold, type, etc.)

Population is a NamedTuple, so the three DataFrames are also accessible by index or via iterable unpacking:

pop = Population.from_dir(my_run_dir)
hh, persons, vehicles = pop

Loading a population#

The most common entry point is from_dir(), which loads from a model run directory:

from polaris.analyze.population import Population

pop = Population.from_dir(my_run_dir)
print(pop)
# Population(Households: 5641, Persons: 5926, Vehicles: 140)

Fast HDF5 cache#

The first call to from_dir reads from the *Demand.sqlite file and then writes a population-cache.hdf5 file alongside it. Subsequent calls compare the mtimes of the SQLite and the HDF5 cache and load from whichever is newer — for a typical demand database with hundreds of thousands of households this turns a multi-second SQL read into a sub-second pandas read.

The cache is invalidated automatically: if you re-run the model (or otherwise modify the demand SQLite), the next from_dir call will fall back to the SQLite file and refresh the cache.

If you want explicit control over which source is used, you can call the lower level loaders directly:

pop = Population.from_sqlite(demand_db_path)
pop = Population.from_hdf5(hdf5_path)

Modifying a population#

Population provides a small set of pure (non-mutating) transformations that return a new Population:

exclude_hh_ids#

Drop a set of households (and their persons and vehicles) — useful for cleaning or for building train/test splits:

bad_hh_ids = {123, 456, 789}
filtered = pop.exclude_hh_ids(bad_hh_ids, reason="missing home location")

combine_with#

Append another Population to this one, offsetting the incoming household and person IDs so they don’t collide. The vehicle hhold column is also offset to stay consistent:

combined = pop_a.combine_with(pop_b, hh_id_offset=pop_a.households.household.max() + 1)

This is the building block for scenarios where you want to merge multiple sub-populations (e.g., adding a synthetic subset of long-distance travellers).

Saving a population#

Back to SQLite#

to_sqlite writes households and persons back into a demand database. It accepts three modes for handling existing data:

Mode

Behaviour

"clean"

Delete all existing Household and Person rows first

"overwrite"

Replace only the rows whose household IDs overlap

"strict"

Raise ValueError if any household IDs already exist

pop.to_sqlite(demand_db_path, mode="clean")

If path is a directory rather than a file, to_sqlite will look up *Demand.sqlite inside it.

To HDF5#

pop.to_hdf5(my_dir / "population-snapshot.hdf5")

The HDF5 writer transparently coerces object/string columns whose values are all numeric back to numeric dtypes — PyTables can’t otherwise serialize numeric values stored as object.

API Reference#