FireDucks Own API

FireDucks has its own API that some pandas do not have. Here are some of them.

pandas conversion

FireDucks DataFrame/Series has a to_pandas method that allows conversion to pandas data. This is useful, for example, when using an external library that accepts pandas data.

Also, fireducks.pandas.from_pandas can be used to convert pandas DataFrames/Series to FireDucks.

Explicit intermediate language execution

FireDucks provides lazy execution. Lazy execution allows multiple APIs to be executed at once and is an important feature for speeding up the process through optimization on intermediate languages.

On the other hand, when you want to measure the execution time of individual APIs, such as when verifying the operation of FireDucks, you will need to be creative; most APIs in FireDucks only generate an intermediate language, so they are completed in a very short time, and measuring before and after API calls does not allow you to measure the actual data frame processing time. Therefore, it is not possible to measure the actual data frame processing time by measuring the time before and after the API call.

In such cases, the DataFrame._evaluate method can be used to explicitly execute the API. When _evaluate is called, it executes the intermediate language for the DataFrame that has been created up to that point in time. Therefore, by calling _evaluate before and after the process you want to measure, you can measure the actual processing time.

An example of groupby time measurement is shown below.

df = pd.read_csv(...)._evaluate()      # end read_csv so that it is not in the measurement range
t0 = time.time()                       # start measuring time
g = df.groupby(...).sum()._evaluate()  # immediately after generating intermediate language for groupby
t1 = time.time()                       # end of time measurement

API for feature generation

One of the applications of data frames is feature generation, which is a pre-processing step in machine learning. In feature generation, data frames are processed in various ways to generate features for training in order to create better models, which can be very time-consuming.

FireDucks also provides an API for fast feature generation. Currently, the following two features are typical feature generation methods. These features can be implemented using a combination of pandas APIs, but FireDucks provides them as APIs, and they are preoptimized to be fast, just as the FireDucks compiler does. See the API Doc for details on each API.

  • Aggregate features: fireducks.pandas.aggregate
  • multi-target encoding: fireducks.pandas.multi_target_encoding