Profiling#
Polaris has a built-in profiler which can be called inside functions and logs its output on a successful run. This is a simple profiler with minimal impact on performance, intended to log the time taken on single functions.
To initialize the profiler, we need a few things:
Set the C++ Macro “PROFILER” equal to 1 (or any non-zero value). Without this, the code will be inactive and not compiled (which also means performance will not be impacted when it is disabled). This should be done at the very top of your app.
Initialize the profiler by calling “begin_profile()” near the beginning of your app.
Call “end_and_generate_profile([LOGGING_STREAM])” at the end of your app to see the results of your profiling, replacing “LOGGING_STREAM” with a logger that accepts the “<<” operator (ex. cout, fout, Logger::noticeStream(), etc.)
Adding Timing Points#
After this, when you know the function or block of code you want to profile, you only need to add “SCOPED_TIMER” at the top of the function or block of code.
void Print(float f)
{
SCOPED_TIMER;
cout << f << endl;
}
void ExampleFunction()
{
SCOPED_TIMER;
vector<float> example(100000);
for (auto itr = example.begin(); itr != example.end(); ++itr)
{
SCOPED_TIMER;
*itr = sqrt(rand());
}
for (auto f : example)
{
SCOPED_TIMER;
f *= example.size();
Print(f);
}
}
When the app closes successfully, you will see the results printed:
=======================
POLARIS Profiler Report
=======================
Total time: 413.9275ms
hello_world\hello_world.cpp:52 ExampleFunction(x1): 8327784 cycles, 2.385ms (0.58%, 100.00% w/children)
hello_world\hello_world.cpp:56 ExampleFunction(x1000): 69536 cycles, 0.020ms (0.00%)
hello_world\hello_world.cpp:61 ExampleFunction(x1000): 3945034 cycles, 1.130ms (0.27%, 99.42% w/children)
hello_world\hello_world.cpp:46 Print(x1000): 1433014406 cycles, 410.381ms (99.14%)
First you can see the total runtime of the program, followed by results from each timer that was placed. Breaking these down a bit more:
ExampleFunction hello_world.cpp:52(x1)
The first segment identifies the function the timer was placed in as well as the precise file and line. The number inside the parentheses is the number of times this timer was hit during runtime.
258235643 cycles, 73.953ms
Milliseconds will be more useful for most cases when profiling POLARIS, but (estimated) CPU cycles are also presented for highly precise tests.
(9.93%, 100.00% w/children)
These percentages are the percentage of total runtime. By “children,” we mean timers that a below the scope of this timer - in this case, the two for loops have their own timers, and the separate function Print() also has a timer. So we can see that the initialization of the vector took 9.93% of runtime, while the rest of ExampleFunction() took 100% of runtime.