Release Note

FireDucks Release Note

0.11.2 (May 16, 2024)

  • Bug Fixes:
    • Fix dependency on pyarrow.
    • Fix dtype of index when merge result is empty.
  • Removing Fallback:
    • Supported aggregate on timedelta columns.

0.11.1 (May 13, 2024)

  • Performance Improvement:
    • Add new IR pattern rewrite optimization pass.
    • DataFrame.merge/join with date32/64 payload column.
  • Bug Fixes:
    • Fixed bug in iloc-getter when there are duplicates in column names
  • Removing Fallback:
    • Supported aggregate methods (max, min, mean etc.) to be performed on timestamp columns.
    • Supported iloc-getter with integer or list-likes column indicator: e.g., df.iloc[:, 0], df.iloc[:, [2,4]] etc.
    • Supported take() with slice object as input.
    • Supported squeeze() For DataFrame and Series.
    • Supported dictionary or casting-methods to be mapped on a Series.
  • New pandas incompatibility:
    • observed parameter of groupby is always true for better performance.

0.11.0 (May 07, 2024)

  • Performance Improvement:
    • groupby.median() and median is now returns non approximate median.
  • Removing Fallback:
    • read_parquet with columns parameter.
    • DataFrame.rename with columns parameter.
  • Others:
    • Upgrade dependent pyarrow to 16.0.0.
    • the importhook feature now can be activated by fireducks.pandas

0.10.9 (Apr 23, 2024)

  • Performance Improvement:
    • groupby.std()
  • Removing Fallback:
    • Supported astype(“datetime64”)
    • Supported DataFrame.dropna(axis=1)
  • Bug Fixes:
    • Fix df.merge returning incorrect result when how is left and key has nulls.
    • Fix an error when “head”, “tail” or “shift” is used in groupby.agg. If any of these is provided as a single aggregator [e.g., df.groupby(...).agg("head")], you can experience speed-up from FireDucks, but when these are provided in combination with another aggregator [e.g., df.groupby(...).agg(["head", "mean"])], the same will be executed by fallbacker.
    • Fix issues in accessing methods from pd.api.types module.
  • Others:
    • Remove version from dependency on numpy.
    • Add experimental profiler for jupyter/ipython. Use %load_ext fireducks.ipyext and %%fireducks.profile cell magic. See here for details.

0.10.8 (Apr 16, 2024)

  • Performance Improvement:
    • groupby two keys with nulls
    • left join with single key
    • left and inner join with single key of category type
  • Removing Fallback:
    • groupby.corrwith among two columns

0.10.7 (Apr 10, 2024)

  • Performance Improvement:
    • Printing dataframe with large dictionary.
  • Removing Fallback:
    • DataFrame/Series astype() with dtype=“category”
  • Bug Fixes:
    • Fixed Join issue with dictionary-typed key columns.
    • Fixed filter issue of a table having multiple index columns with duplicate values
  • Others:
    • Upgrade dependent pyarrow to 15.0.2.

0.10.6 (Apr 02, 2024)

  • Performance Improvement:
    • Series.unique()
    • DataFrame/Series nunique()
    • read_csv with category type
  • Removing Fallback:
    • DataFrame/Series astype() with bool, uint8 etc.
    • supported following parameters for pd.get_dummies(): columns, prefix, prefix_sep, dtype

0.10.5 (Mar 26, 2024)

  • Performance Improvement:
    • groupby.head/tail
    • groupby.size
  • Removing Fallback:
    • dropna=False with groupby
    • groupby.first
    • DataFrame.value_counts
    • supported “normalize” parameter of Series.value_counts
  • Bug Fixes:
    • Fix incorrect fallback of Series.apply
    • Fix str.split issue when expand parameter is specified
    • Fix null assignment issue, e.g., df.mask[cond, “a”] = np.nan

0.10.4 (Mar 13, 2024)

  • Performance Improvement:
    • Groupby, merge/join, sort_value with string key
  • Bug Fixes:
    • Fixed fallback issue with loca/iloc setitem

0.10.3 (Mar 06, 2024)

  • Performance Improvement:
    • Optimized construction of a Series from another Series.
  • Removing Fallback:
    • Supported replace with regex=True
    • Supported loc-assignment for non-numeric index, e.g., df.loc[["a", "c", "d"], "col1"] = 5
  • Bug Fixes:
    • Fixed bug when loc assignment is performed with non-series data (like list etc.) and target frame does not have default index.
    • Fixed NotImplementedError cases related to datetime-string comparison.

0.10.2 (Feb 26, 2024)

  • Performance Improvement:
    • improved index-getter (df.index) by avoiding fallback of data columns
    • sort with uint32/uint64 key
  • Removing Fallback:
    • Supported groupby.shift() for DataFrame and Series
    • Supported take() for DataFrame and Series
    • Supported sample() for DataFrame and Series
    • Supported loc-assignment with positions (e.g., df.loc[[5,2,4], “a”] = 100) for DataFrame and Series

0.10.1 (Feb 19, 2024)

  • Performance improvement:
    • DataFrame.merge
    • DataFrame/Series.sort_values when including null
  • Bug Fix:
    • fixed DataFrame/Series.sort_values with string key and ascending=False

0.10.0 (Feb 13, 2024)

  • Performance improvement:
    • DataFrame/Series.drop_duplicates
    • DataFrame/Series.dropna
  • Removing Fallback:
    • supported astype with numpy types (np.int32, np.int64, np.float32, np.float64)
    • supported conditional loc setter for DataFrame and Series: e.g., df.loc[cond, "a"] = 2; s.loc[cond] = 2
  • Bug Fixes:
    • fixed int-float binop division issue
    • fixed calling issue of StringMethods on LARGE_STRING typed columns
  • Others:
    • update to arrow15

0.9.8 (Feb 5, 2024)

  • Performance improvement:
    • DataFrame.groupby
  • Removing fallback:
    • DataFrame/Series.reset_index with allow_duplicates

0.9.7 (Jan 29, 2024)

  • Removing fallback:
    • Setting index of DataFrame/Series like df.index = ...
    • Index.set_names

0.9.6 (Jan 22, 2024)

  • Performance improvement:
    • move projection optimization: support copy and drop_duplicates.
  • Removing fallback:
    • DataFrame/Series.__repr_html__ to drastically improve speed for displaying on Jupyter notebook.
    • DataFrame/Series.set_axis
    • DataFrame/Series.__setitem__ with array-like
    • DataFrame/Series.set_index with ndarray, drop=True, append=True and verify_integrity=True
    • DataFrame/Series.sort_values with ignore_index=True
  • Bug fix:
    • read_csv with fsspec parameter such as “s3://”

0.9.5 (Jan 15, 2024)

  • Removing fallback:
    • DataFrame/Series.shift
    • DataFrame/Series.pipe

0.9.4 (Dec 28, 2023)

  • Performance improvement
    • DataFrame.copy
  • Removing fallback:
    • DataFrame/Series.iloc setter
    • DataFrame.__array__

0.9.3 (Dec 25, 2023)

  • Performance improvement
    • DataFrame.merge
    • Binary operations
  • Bug fix:
    • Series.__repr__

0.9.2 (Dec 18, 2023)

  • Performance improvement
    • DataFrame.groupby, DataFrame.where
    • IR building
  • Removing fallback:
    • DataFrame.iloc, DataFrame.__repr__
  • Bug fix:
    • read_csv with URL

0.9.1 (Dec 11, 2023)

0.9.0 (Dec 4, 2023)

  • Update to arrow-14.0.1

0.8.8 (Nov 27, 2023)

  • Bug Fix
    • remove unexpected print in read_csv

0.8.7 (Nov 27, 2023)

  • Performance improvement
    • DataFrame.corr
    • DataFrame.dropna
  • Removing fallback:
    • read_csv with default arguments
    • DataFrame.to_csv with encoding=utf8
    • DataFrame.groupby with dropna=True

0.8.6 (Nov 20, 2023)

  • Performance Improvement
    • DataFrame.groupby using cardinarity estimation.
    • DataFrame.corr for less rows DataFrame.
  • Removing fallback
    • DataFrame/Series.mask
    • DataFrame/Series.where
  • Bug Fix:
    • concat for corner cases

0.8.5 (Nov 9, 2023)

  • Improve performance of DataFrame.corr
  • Remove fallback of DataFrame.get_dummies for simple case

0.8.4 (Nov 9, 2023)

  • Performance improvement
    • DataFrame.corr
  • Perfomance improvement by removing fallback (depending on parameters)
    • Series.rolling
    • DataFrame.drop
    • DataFrame/Series.describe
    • DataFrame/Series.skew
    • DataFrame/Series.kurt
    • DataFrame/Series.values
  • Bug Fix
    • Series.__float__/__int__
    • fallback reason of to_csv

0.8.3 (Oct 26, 2023)

  • Add wheel package for python3.11 (tested with python-3.11.4 on ubuntu23.04).
  • Improve performance of merge/join when both frames have default index.
  • Improve pandas compatibility of methods which return a scalar value like Series.aggregate.
  • Remove fallback: DataFrame.columns, DataFrame.pop, fireducks.pandas.join
  • Add kernel tracing (enabled by FIREDUCKS_FLAGS=--trace=3)
  • Add reason to fallback log (enabled by FIREDUCKS_FLAGS=-Wfallback).

0.8.2 (Oct 19, 2023)

  • First public beta release