How to take traces in FireDucks

FireDucks has a trace function that records how long each process such as read_csv, groupby, sort, etc. takes. This article introduces how to use the trace function.

How to output and display trace files

To use the trace function, you do not need to modify the program. Simply set the environment variables as shown below and execute the program to use the trace function.

$ FIREDUCKS_FLAGS="--trace=3" python -mfireducks.pandas your_program.py

After setting the environment variables and executing the program, a file named trace.json is created in the directory where the program was executed. This file is the trace file.

To view a trace file, use either Microsoft Edge or Google Chrome, a web browser with trace viewer functionality. You can start the trace viewer by typing edge://tracing for Microsoft Edge or chrome://tracing for Google Chrome in the address bar.

The following image shows the Trace Viewer running in Microsoft Edge.

Edge TraceViewer

Click the Load button to open the trace file. The execution trace of the program will be displayed graphically. The following image shows the execution trace of one query of the polars-tpch benchmark introduced in this [article] (https://fireducks-dev.github.io/posts/20241206_update_polars-tpch/).

TPCH Q01 Trace

The top shows the time of the whole program (or, more correctly, the time between import fireducks.pandas and the end of the program). Below that, fireducks.core.evaluate is divided into two major blocks. The polars-tpch benchmark run explicitly separates the reading of the parquet file and the execution of the query. Therefore, the evaluation is split into two.

In the first half of the evaluation, you can see that only the fireducks.read_parquet_with_metadata parquet reading process accounts for the execution time. You can also zoom in with the mouse to get a more detailed breakdown of the execution time for the second half of the query, as shown below.

TPCH Q01 Trace Query Detail

How to change the trace file name

The default trace file name is trace.json, but you can set an arbitrary file name as follows: --trace-file=foo.json.

$ FIREDUCKS_FLAGS="--trace=3 --trace-file=foo.json" python -mfireducks.pandas your_program.py

How to output trace summary to standard error

If you only want to see a breakdown of the time spent on each process, you can also display summary information in standard error.

The summary is displayed using the same options described above. Use --trace-file=- instead of the file name.

$ FIREDUCKS_FLAGS="--trace=3 --trace-file=-" python -mfireducks.pandas your_program.py

This is an example of the execution of the polars-tpch benchmark query used in the previous example. Although details such as the order of execution of each process are not available, a summary of the execution time can be viewed.

elapsed            6.071 sec
kernels            5.963 sec  98.22%      101
fallbacks          0.000 sec   0.00%        0
                                        duration sec ratio     count
== kernel ==
fireducks.read_parquet_with_metadata       5.453   89.83%          1
fireducks.filter                           0.293    4.83%          1
fireducks.groupby_agg                      0.089    1.46%          1
fireducks.le.vector.scalar                 0.051    0.83%          1
fireducks.mul.vector.vector                0.042    0.69%          2
fireducks.rsub.vector.scalar               0.023    0.38%          1
fireducks.radd.vector.scalar               0.009    0.15%          1
fireducks.sort_values                      0.002    0.03%          1
fireducks.read_parquet_metadata            0.001    0.02%          1
fireducks.project                          0.000    0.00%          8
== fallback ==
== other ==
top                                        6.071  100.00%          1
create_mlir_func                           0.001    0.02%          3
import pandas                              0.000    0.00%          2
fire.get_string                            0.000    0.00%         22

Conclusion

This article has introduced how to use the trace function in FireDucks.

When using FireDucks, there may be times when you notice a slowdown. In that case, you may be able to find the process that caused the slowdown by tracing with the help of this article.

We hope that you will make full use of FireDucks by using the trace function.