Release Note
FireDucks Release Note
0.11.2 (May 16, 2024)
- Bug Fixes:
- Fix dependency on pyarrow.
- Fix dtype of index when merge result is empty.
- Removing Fallback:
- Supported aggregate on timedelta columns.
0.11.1 (May 13, 2024)
- Performance Improvement:
- Add new IR pattern rewrite optimization pass.
- DataFrame.merge/join with date32/64 payload column.
- Bug Fixes:
- Fixed bug in iloc-getter when there are duplicates in column names
- Removing Fallback:
- Supported aggregate methods (max, min, mean etc.) to be performed on timestamp columns.
- Supported iloc-getter with integer or list-likes column indicator: e.g., df.iloc[:, 0], df.iloc[:, [2,4]] etc.
- Supported take() with slice object as input.
- Supported squeeze() For DataFrame and Series.
- Supported dictionary or casting-methods to be mapped on a Series.
- New pandas incompatibility:
- observed parameter of groupby is always true for better performance.
0.11.0 (May 07, 2024)
- Performance Improvement:
- groupby.median() and median is now returns non approximate median.
- Removing Fallback:
- read_parquet with
columns
parameter. - DataFrame.rename with
columns
parameter.
- read_parquet with
- Others:
- Upgrade dependent pyarrow to 16.0.0.
- the importhook feature now can be activated by
fireducks.pandas
0.10.9 (Apr 23, 2024)
- Performance Improvement:
- groupby.std()
- Removing Fallback:
- Supported astype(“datetime64”)
- Supported DataFrame.dropna(axis=1)
- Bug Fixes:
- Fix df.merge returning incorrect result when how is left and key has nulls.
- Fix an error when “head”, “tail” or “shift” is used in
groupby.agg
. If any of these is provided as a single aggregator [e.g.,df.groupby(...).agg("head")
], you can experience speed-up from FireDucks, but when these are provided in combination with another aggregator [e.g.,df.groupby(...).agg(["head", "mean"])
], the same will be executed by fallbacker. - Fix issues in accessing methods from pd.api.types module.
- Others:
- Remove version from dependency on numpy.
- Add experimental profiler for jupyter/ipython. Use
%load_ext fireducks.ipyext
and%%fireducks.profile
cell magic. See here for details.
0.10.8 (Apr 16, 2024)
- Performance Improvement:
- groupby two keys with nulls
- left join with single key
- left and inner join with single key of category type
- Removing Fallback:
- groupby.corrwith among two columns
0.10.7 (Apr 10, 2024)
- Performance Improvement:
- Printing dataframe with large dictionary.
- Removing Fallback:
- DataFrame/Series astype() with dtype=“category”
- Bug Fixes:
- Fixed Join issue with dictionary-typed key columns.
- Fixed filter issue of a table having multiple index columns with duplicate values
- Others:
- Upgrade dependent pyarrow to 15.0.2.
0.10.6 (Apr 02, 2024)
- Performance Improvement:
- Series.unique()
- DataFrame/Series nunique()
- read_csv with category type
- Removing Fallback:
- DataFrame/Series astype() with bool, uint8 etc.
- supported following parameters for pd.get_dummies(): columns, prefix, prefix_sep, dtype
0.10.5 (Mar 26, 2024)
- Performance Improvement:
- groupby.head/tail
- groupby.size
- Removing Fallback:
- dropna=False with groupby
- groupby.first
- DataFrame.value_counts
- supported “normalize” parameter of Series.value_counts
- Bug Fixes:
- Fix incorrect fallback of Series.apply
- Fix str.split issue when expand parameter is specified
- Fix null assignment issue, e.g., df.mask[cond, “a”] = np.nan
0.10.4 (Mar 13, 2024)
- Performance Improvement:
- Groupby, merge/join, sort_value with string key
- Bug Fixes:
- Fixed fallback issue with loca/iloc setitem
0.10.3 (Mar 06, 2024)
- Performance Improvement:
- Optimized construction of a Series from another Series.
- Removing Fallback:
- Supported replace with regex=True
- Supported loc-assignment for non-numeric index, e.g.,
df.loc[["a", "c", "d"], "col1"] = 5
- Bug Fixes:
- Fixed bug when loc assignment is performed with non-series data (like list etc.) and target frame does not have default index.
- Fixed NotImplementedError cases related to datetime-string comparison.
0.10.2 (Feb 26, 2024)
- Performance Improvement:
- improved index-getter (df.index) by avoiding fallback of data columns
- sort with uint32/uint64 key
- Removing Fallback:
- Supported groupby.shift() for DataFrame and Series
- Supported take() for DataFrame and Series
- Supported sample() for DataFrame and Series
- Supported loc-assignment with positions (e.g., df.loc[[5,2,4], “a”] = 100) for DataFrame and Series
0.10.1 (Feb 19, 2024)
- Performance improvement:
- DataFrame.merge
- DataFrame/Series.sort_values when including null
- Bug Fix:
- fixed DataFrame/Series.sort_values with string key and ascending=False
0.10.0 (Feb 13, 2024)
- Performance improvement:
- DataFrame/Series.drop_duplicates
- DataFrame/Series.dropna
- Removing Fallback:
- supported astype with numpy types (np.int32, np.int64, np.float32, np.float64)
- supported conditional loc setter for DataFrame and Series: e.g.,
df.loc[cond, "a"] = 2; s.loc[cond] = 2
- Bug Fixes:
- fixed int-float binop division issue
- fixed calling issue of StringMethods on LARGE_STRING typed columns
- Others:
- update to arrow15
0.9.8 (Feb 5, 2024)
- Performance improvement:
- DataFrame.groupby
- Removing fallback:
- DataFrame/Series.reset_index with allow_duplicates
0.9.7 (Jan 29, 2024)
- Removing fallback:
- Setting index of DataFrame/Series like
df.index = ...
- Index.set_names
- Setting index of DataFrame/Series like
0.9.6 (Jan 22, 2024)
- Performance improvement:
- move projection optimization: support copy and drop_duplicates.
- Removing fallback:
- DataFrame/Series.__repr_html__ to drastically improve speed for displaying on Jupyter notebook.
- DataFrame/Series.set_axis
- DataFrame/Series.__setitem__ with array-like
- DataFrame/Series.set_index with ndarray, drop=True, append=True and verify_integrity=True
- DataFrame/Series.sort_values with ignore_index=True
- Bug fix:
- read_csv with fsspec parameter such as “s3://”
0.9.5 (Jan 15, 2024)
- Removing fallback:
- DataFrame/Series.shift
- DataFrame/Series.pipe
0.9.4 (Dec 28, 2023)
- Performance improvement
- DataFrame.copy
- Removing fallback:
- DataFrame/Series.iloc setter
- DataFrame.__array__
0.9.3 (Dec 25, 2023)
- Performance improvement
- DataFrame.merge
- Binary operations
- Bug fix:
- Series.__repr__
0.9.2 (Dec 18, 2023)
- Performance improvement
- DataFrame.groupby, DataFrame.where
- IR building
- Removing fallback:
- DataFrame.iloc, DataFrame.__repr__
- Bug fix:
- read_csv with URL
0.9.1 (Dec 11, 2023)
- Performance improvement
- DataFrame.corr
- Others
0.9.0 (Dec 4, 2023)
- Update to arrow-14.0.1
0.8.8 (Nov 27, 2023)
- Bug Fix
- remove unexpected print in read_csv
0.8.7 (Nov 27, 2023)
- Performance improvement
- DataFrame.corr
- DataFrame.dropna
- Removing fallback:
- read_csv with default arguments
- DataFrame.to_csv with encoding=utf8
- DataFrame.groupby with dropna=True
0.8.6 (Nov 20, 2023)
- Performance Improvement
- DataFrame.groupby using cardinarity estimation.
- DataFrame.corr for less rows DataFrame.
- Removing fallback
- DataFrame/Series.mask
- DataFrame/Series.where
- Bug Fix:
- concat for corner cases
0.8.5 (Nov 9, 2023)
- Improve performance of DataFrame.corr
- Remove fallback of DataFrame.get_dummies for simple case
0.8.4 (Nov 9, 2023)
- Performance improvement
- DataFrame.corr
- Perfomance improvement by removing fallback (depending on parameters)
- Series.rolling
- DataFrame.drop
- DataFrame/Series.describe
- DataFrame/Series.skew
- DataFrame/Series.kurt
- DataFrame/Series.values
- Bug Fix
- Series.__float__/__int__
- fallback reason of to_csv
0.8.3 (Oct 26, 2023)
- Add wheel package for python3.11 (tested with python-3.11.4 on ubuntu23.04).
- Improve performance of merge/join when both frames have default index.
- Improve pandas compatibility of methods which return a scalar value like Series.aggregate.
- Remove fallback: DataFrame.columns, DataFrame.pop, fireducks.pandas.join
- Add kernel tracing (enabled by
FIREDUCKS_FLAGS=--trace=3
) - Add reason to fallback log (enabled by
FIREDUCKS_FLAGS=-Wfallback
).
0.8.2 (Oct 19, 2023)
- First public beta release