Benchmarks

FireDucks benchmarks

This section shows the results of FireDucks performance benchmarks.

Server Specs

  • CPU: Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz x 2sockets (48HW threads total)
  • Main memory: 256GB

Comparison of DataFrame libraries using TPC-H

Source code of the benchmark

The following graph compares four data frame libraries (pandas, modin, polars, and fireducks) on 22 different queries included in the benchmark. The vertical axis shows how many times faster compared to pandas on a logarithmic scale, where anything greater than 1 indicates that it is faster than pandas. The Scale Factor, which represents the data size, is 10 (dataset of about 10 GB), and the time spent on non-file IO was measured.

The average speedup over pandas for 22 queries was 1.3x for Modin, 13x for Polars, and 18x for FireDucks.

polars-tpch-sf10

The versions of the libraries used were as follows (the latest versions at the time of the measurements).

  • pandas: 2.2.0
  • Modin: 0.26.1
  • Polars: 0.20.7
  • FireDucks: 0.9.8

The following chart shows the comparison results between Polars and FireDucks with larger dataset, Scalar Factor 10, 20 and 50. The vertical axis shows how many times faster FireDucks is compared to Polars. On average, FireDucks is 1.3 times (sf=10), 1.3 times (sf=20), and 1.7 times (sf=50) faster than Polars.

polars-tpch

About the benchmark code

This benchmark is originally from polars/tpch. Because this repository includes all 22 queries for polars but not all for pandas, we have implemented all 22 queries using pandas then run those with FireDucks by import hook. Those queries were also used with pandas and modin for the queries polars/tpch does not provide. All code for the queries is available at fireducks-dev/polars-tpch.

Performance evaluation using TPCx-BB

This section presents a comparison of pandas and FireDucks using TPCx-BB. TPCx-BB includes queries related to data analysis using machine learning and its preprocessing. In this evaluation, we used the pandas implementation of TPCx-BB implemented by the FireDucks development team to perform measurements on pandas and FireDucks. File IO is included in the measurement range.

With TPCx-BB, FireDucks is up to 17 times faster than pandas and 6.7 times faster on average.

TPCx-BB

The versions used in the measurements are as follows

  • pandas-2.1.4
  • fireducks-0.9.3

Benchmark Archive

Older benchmarks