Posts in 2024
  • How to take traces in FireDucks

    Friday, December 20, 2024 in Posts

    FireDucks has a trace function that records how long each process such as read_csv, groupby, sort, etc. takes. This article introduces how to use the trace function. How to output and display trace files To use the trace function, you do not need to …

  • Ensuring compatibility with pandas in the GPU version of FireDucks

    Thursday, December 19, 2024 in Posts

    We are currently developing a GPU version of FireDucks. FireDucks is built with an architecture that translates programs into an intermediate representation at runtime, optimizes them in this intermediate representation, and then compiles and …

  • Exploring performance benefits of FireDucks over cuDF

    Wednesday, December 18, 2024 in Posts

    Research says that Data scientists spend about 45% of their time on data preparation tasks, including loading (19%) and cleaning (26%) the data. Pandas is one of the most popular python libraries for tabular data processing because of its diverse …

  • Cache or Eliminate? How FireDucks increase opportunity of optimization

    Tuesday, December 17, 2024 in Posts

    As described here, FireDucks uses lazy execution model with define-by-run IR generation. Since FireDucks uses MLIR compiler framework to optimize and execute IR, first step of the execution is creating MLIR function which holds operations to be …

  • How to run polars-tpch benchmark with FireDucks

    Friday, December 06, 2024 in Posts

    Recently we have updated the result of polars-tpch benchmark on 4th generation Xeon processor. The latest result can be found here, and also below in this artice, explaining how to reproduce the same. For reproducibility, we have used AWS EC2 for …

  • Unveiling the Optimization Benefit of FireDucks Lazy Execution: Part #2

    Thursday, December 05, 2024 in Posts

    In the previous article, we have talked about how FireDucks can take care pushdown-projection related optimization for read_parquet(), read_csv() etc. In today’s article, we will focus on the efficient caching mechanism by its JIT compiler. …

  • Unveiling the Optimization Benefit of FireDucks Lazy Execution: Part #1

    Thursday, December 05, 2024 in Posts

    The availability of runtime memory is often a challenge faced at processing larger-than-memory-dataset while working with pandas. To solve the problem, one can either shift to a system with larger memory capacity or consider switching to alternative …

  • What to do when FireDucks is slow

    Monday, November 11, 2024 in Posts

    Thank you for your interest in FireDucks. This article describes possible causes and remedies for slow programs using FireDucks. When a pandas program with FireDucks applied is slow, the reason may be the followings. Using ‘apply’ or …

  • Workshop at Bangalore, India

    Thursday, September 19, 2024 in Posts

    We had a workshop on FireDucks with faculties from universities around Bangalore. Thank you for joining and discussion.

  • Have you ever thought of speeding up your data analysis in pandas with a compiler?

    Monday, July 01, 2024 in Posts

    In general, a Data Scientist spends significant efforts in transforming the raw data into a more digestible format before training an AI model or creating visualizations. Traditional tools such as pandas have long been the linchpin in this process, …