Model Outputs

Model Outputs#

MOE Results as HDF5 Outputs#

Polaris uses HDF5 to store some of its largest outputs. HDF5 files (sometimes called just H5) store multidimensional arrays/matrices which can be organized into folders and given attributes (strings, numbers, arrays) to describe them.

Currently we store Link and Turn MOE results in the “{Model}-Result.h5” file. These are organised in a “LinkMOE” and “TurnMOE” folder respectively. Each folder has multiple tables (one per attribute), each of which is organized as a matrix with dimensionality Records x Timesteps. When writing to these tables, POLARIS appends a row of link/turn records for each timestep as it runs.

For example, if our timestep is 60 seconds, and we have 1500 links, all of your link tables (except for the ID/Length tables, which are arrays!) would have 1500 columns and 1440 rows (i.e. the number of minutes in the simulation)

For a detailed description of the columns of data available within the H5, please see the Results Database documentation.

Working with H5 in C++#

Reading and writing into HDF5 files is done with the HighFive API and the H5IO class in MasterType.

// Open a file - there are multiple enum options available for how to open the file
// This will close when out of scope, so just create a local variable for it
HighFive::File file(filename, File::OpenOrCreate);

// Separate functions to create a new matrix/table and append rows to it
MT::H5IO::template Create_Matrix<float>(file, "group_name", "table_name", data_vector, rows, cols);
MT::H5IO::template Append_Matrix<float>(file, "group_name", "table_name", data_vector, row_to_write);

// Write_Matrix will automatically create a new matrix if it does not exist, otherwise writes to the row specified
MT::H5IO::template Write_Matrix<float>(file, "group_name", "table_name", data_vector, row_to_write, rows, cols);

// POLARIS reads every matrix into a 1-dimensional vector rather than actual matrices
// We need to specify the type - be very careful not to mix up 32- and 64-bit types!
std::vector<float> data_array = MT::H5IO::template Read_Matrix_Into_Array<float>(file, "group_name", "table_name");

Working H5 in Python#

HDF5 is supported in Python with the H5Py Library. For most of our uses, after opening a file, its groups and tables can be treated as a dictionary, with the additionals “attrs” value containing another dictionary of attributes.

import h5py

with h5py.File(self.result_hdf5, "r") as h5_result:
    # Get attributes
    timesteps = h5_result["link_moe"].attrs["num_timesteps"]
    records = h5_result["link_moe"].attrs["num_records"]
    
    # Get data
    lengths = h5_result["link_moe"]["link_lengths"]          # This is an array
    travel_times = h5_result["link_moe"]["link_travel_time"] # This is a table

    # Write a new table
    h5_result.create_dataset("new_table", 
        data=some_pandas_dataframe, 
        compression="gzip",             # gzip/deflate compression is our standard
        compression_level=comp_level,   # generally level 3 or 4 is appropriate
        scaleoffset=2, 
        dtype="f4")                     # POLARIS expectes 4-byte floating point (ie. float)

Skims as OMX Outputs#

Highway and transit skim files have been moved from our internal binary format to the industry open standard OMX (OpenMatrix File Format). This was done to introduce compression of the data and to better use open source formats. The introduction of compression reduced the file size of our transit skims (which are quite sparse) by a factor of approximately 20.

OMX is a standardized implementation of HDF5 that stores square tables (i.e. matrices). It inherently knows about zoning system and each table stored in OMX is guaranteed to have the same dimensions (num_zones x num_zones)

Each table is named and given attributes representing the time (as an integer with no decimal values) and the metric (i.e. time, distance, cost, etc) being represented in the matrix. The Transit file tables are also labeled with the transit mode.

OMX files have a lookup functionality which maps our human-friendly Zone identifiers to the indices of our matrix. We use this to store the TAZ mapping for each zone in our model (i.e. column 0 represents zone 1).

Working With OMX in C++#

We use a modified version of the omx-cpp API in POLARIS. According to the license OMX uses, changes need to be made note of, so be sure to do that if anything needs changing.

To use OMX, we need to declare all tables ahead of time:

omxHandler->createFile(number_of_tables, rows, cols, table_name_vector, filename, lookup_table_name);
omxHandler->writeMapping(lookup_table_name, id_container_vector);
omxHandler->closeFile();

Then we can add rows as we go:

OMXMatrix omxHandler;
omxHandler.openFile(filename);
omxHandler.writeRow(table_name, row_number, pointer_to_data);

Working With OMX in Python#

PolarisLib uses the pip installable OMX library “openmatrix” in its skim library (polarislib.skims.highway and polarislib.skims.transit). This library works for both OMX and the deprecated binary format (V3 only) to open files and make the data accessible to Python. Some example code can be found in the notebooks/skim_manipulations.ipynb notebook.

Warning

Be careful not to use the omx pip package (provides XML parsing). Make sure to use openmatrix.