Release Note

FireDucks Release Note

1.3.3 (Jul 29, 2025)

Removing Fallbacks:
- Supported DataFrame/Series loc-setitem with Series data.
- Supported renaming for/with non-string column names.
- Supported reduction, binary operations with Scalar of type uint64.
- Supported Series.isin() with ndarray of elements of type datetime.date, Timedelta.
- Supported reduction with Scalar of type date32[day].
Improving pandas compatibility:
- In raising AssertionError for DataFrame.drop() when level is specified for a table without MultiIndex.
- In FloordivUnsafe to handle issues with large integers.
- In DataFrame groupby.size() when input table has MultiIndex.
- In read_csv() when dtype is specified and input file has invalid rows (e.g., number of fields in each line differ).
Others:
- Optimized reindexing for DataFrame.setitem when target table has RangeIndex.
- Optimized performance of DataFrame/Series loc-setitem with python array-likes (list, ndarray etc.).

1.3.2 (Jul 11, 2025)

Removing Fallbacks:
- Supported Series.astype() when target type is object.
- Supported column names of bytes type for read_parquet().
- Supported DataFrame MultiIndex column dropping with specifying level=0.
- Supported DataFrame column dropping for non-string column name(s).
- Supported clip(), abs() to be called on DataFrame/Series instance using numpy module: e.g., np.abs(s), np.clip(s, 0, 1)
Improving pandas compatibility:
- In DataFrame.merge() with key of type ndarray.
Others:
- Upgraded compatibility with pandas 2.3

1.3.1 (Jun 27, 2025)

Removing Fallbacks:
- Supported DataFrame/Series aggregate methods when calling from numpy module: e.g., np.mean(df)
- Supported ddof parameter for DataFrame/Series aggregate methods: std(), var()
- Supported Series binary operation with ndarray
- ‘Supported Series binary operation using Scalar of type: Timedelta’
- ‘Supported DataFrame binary operation using Scalar of type: Timedelta, Timestamp, np.datetime64, datetime.time’
- Supported groupby when key is a list of Series and/or existing column names
- ‘Supported DataFrame/Series loc conditional setitem with given values of type: list, tuple, ndarray’
- Supported column names of type python bytes
- ‘Supported row-wise aggregation (axis=1) when input of DataFrame.agg() is a list of target supported methods: sum, mean, count, all, any’
- Supported Series.isin() with ndarray, list, Series having values of different types: numeric, string, timedelta, timestamp, datetime.time etc.
Improving pandas compatibility:
- In Series logical operations with pd.NaT
- In Series reverse logical operations (rand, ror, rxor)
- In DataFrame all/any with axis=1, when input has missing values
- In processing groupby-rank when number of keys are more than one
- In error handling when given selector is not found for groupby-size
- ‘In reduction of Series of type: Timedelta when unit != ‘’ns’’’
- ‘In reduction of empty Series of type: Uint64, datetime.time’
Others:
- Removing fallback to arrow for RightOuterJoin operation

1.3.0 (Jun 13, 2025)

Removing Fallbacks:
- Avoided fallbacks for DataFrame/Series aggregate methods like sum, max, min when parameters have default values
- Supported axis=1 (row-wise reduction) for DataFrame: count, mean
- Supported axis=None (frame reduction to a scalar) for DataFrame: all, any, max, min, mean
- Supported Series comparison with scalar of types: np.datetime64, pd.Timestamp
- Supported inplace and callable condition for DataFrame.where
- Supported empty property for DataFrame/Series.
- Supported clip with scalar-typed lower/upper bound for DataFrame/Series.
Improving pandas compatibility:
- In extracting time field using DatetimeExtractor (s.dt.time)
- In reduction on a datetime.time typed column.
- In horizontal sum (axis=1) when input frame has missing values.
- In GroupBy with non-existing selector
- With float16 typed column for DataFrame.to_numpy
Others:
- Upgraded to pyarrow-20.0.0

1.2.8 (May 13, 2025)

Removing Fallbacks:
- avoided fallback on dtypes when categorical columns are present
- DataFrame __getattr__ raises AttributeError for __array_function__ without fallback
Bug Fixes:
- fix chained import such as import pandas.io.formats.style
- fix issues in GroupbyTransform with non-default index
- corrected type validator for datetimelike operations
- fixed issue in iloc-processing with lambda method; added tests
- fixed issues in loc-processing with lambda method, timedelta and string comparison. GH-70
- Fix Agg/GroupbyAgg bug when func given to relabel is a list
- fix __git_version__ issue in GH-68
- fix the exception type thrown by Series.cat.categories to be AttributeError instead of RuntimeError
Others:
- caches DataFrame/Series values for multiple calls on same object
- cached shape/index information to be reused in multiple calls on a same DataFrame/Series instance

1.2.7 (Apr 22, 2025)

Removing Fallbacks:
- supported Series CategoricalAccessor categories
- supported StringMethods match
- supported errors=‘coerce’ for to_datetime()
Bug Fixes:
- Fixed groupby.transform with multiindex columns
- Removed unnecessary dependency on IPython from import hook
- Fixed rolling bug when window is non-numeric and numeric min_periods is given by user GH-66
Others:
- modified not to evaluate metadata for multiple calls of dtypes, columns etc. on same data
- Experimental MacOS support.

1.2.6 (Apr 08, 2025)

Removing Fallbacks:
- supported DataFrameGroupBy.rank() with methods = {“average”, “max”, “min”}
- supported DataFrame/Series info(), memory_usage() GH-65
- supported Series.nbytes
Bug Fixes:
- Fixed bug in filtering DataFrame/Series with a list of boolean values
- Fixed DataFrameGroupBy.rank() when nulls are present in key and/or non-key columns
Optimization:
- Improved performance of GroupByTransform()

1.2.5 (Mar 13, 2025)

Performance Improvement:
- Groupby.rank with a group key of small cardinality

1.2.4 (Mar 07, 2025)

Removing Fallbacks:
- supported DataFrameGroupBy.rank() with supported methods as {“first”, “dense”}
- supported Series.aggregate with dictionary
Optimization:
- pushdown optimization supports groupby with selector such as groupby(key)[columns]

1.2.3 (Feb 27, 2025)

Performance Improvement:
- df.shape and df.to_numpy by avoiding evaluation when a cache is available.
Bug Fixes:
- fixed suffix after merge when key and non-key have the same name
- fixed drop_duplicates(keep=False) in case key is all-null
Removing Fallbacks:
- supported groupby.transform
Others:
- fix: change level of fallback log to DEBUG. GH#54

1.2.2 (Feb 12, 2025)

Bug Fixes:
- fixed issue in drop_duplicates/duplicated with presence of nulls in keys
- fixed issue with type of indices array in get_dummies
- fixed get_dummies issue when target column is category type [GH-47]
- fixed issue in duplicated when null is present in input keys [GH-50]
- fixed bugs in rolling-getitem, rolling-aggregate with dictionary
- fixed issue of merge when left_on and right_on are indexes with different name

1.2.1 (Feb 04, 2025)

Bug Fixes:
- fixed issue in casting dictionary of string typed keys to boolean
- fixed issue in casting a list-type column to boolean column
- [pushdown] fix projection inserted when input table has duplicated columns
- fix multiindex join resulting in empty frame in pandas1
Performance Improvement:
- modified astype implementation to switch to column-parallelization when ncols >= nthreads*8 for better performance

1.2.0 (Jan 29, 2025)

Bug Fixes:
- Fix concat, join and merge among DataFrames with different categories
- Support join/merge with DataFrame with mulitiindex columns
Performance Improvement:
- Improve construction of DataFrame/Series from long list
Others:
- Upgrade to pyarrow-19.0.0

1.1.8 (Jan 22, 2025)

Bug Fixes:
- Fixed bug in left join with mask
Removing Fallbacks:
- Remove redundant fallbacks to check non-existing attributes for array protocol
Others:
- Support python3.13

1.1.7 (Jan 15, 2025)

Optimization:
- optimize read_parquet
- optimize sort_values and groupby

1.1.6 (Jan 07, 2025)

Bug Fixes:
- fix getitem from multiindex dataframe. GH#32
- fixed metadata reading issue with read_parquet
Removing Fallbacks:
- supported read_feather

1.1.5 (Dec 25, 2024)

Optimization:
- optimized a pattern df.where(cond).groupby(key, dropna=True).agg(...) as df[cond].groupby(key, dropna=True).agg(...)
- pushdown optimization supports moving projection over concat.

1.1.4 (Dec 17, 2024)

Bug Fixes:
- fixed issue in groupby with multi-keys
- fixed some issues in modifying callable-method to method-name when falling back to pandas
- fixed issue with return type for to_csv() with filename
- fixed inplace update issue with series delitem
Removing Fallbacks:
- supported fallback on DataFrame/Series column-aggregate, groupby-aggregate, when input is a callable method from numpy or Series modules, like np.sum, pd.Series.sum etc.

1.1.3 (Dec 10, 2024)

Bug Fixes:
- Fix read_csv when csv file includes newlines in values.
- Fix the issue in LeftHashJoin
- Fix astype from float to timestamp
Removing Fallbacks:
- supported a few fallback cases for DataFrame/Series loc getter
Optimization:
- Improve project pushdown for read_parquet beyond join/merge.
Others:
- Upgrade dependent pyarrow to 18.1.0.

1.1.2 (Dec 03, 2024)

Bug Fixes:
- fixed issue in median calculation. GH#31
- fixed issue in empty or null Series aggregation
- fixed groupby projection when it includes reordering and duplication.

1.1.1 (Nov 26, 2024)

Performance Improvement:
- Improve performance of left join, for example 1.6x for tpch Q13.
- Improve join/merge with timestamp keys.
Removing Fallbacks:
- Supported DataFrame.fillna() with dictionary-like input
- Supported DataFrame.round() with dictionary, any integer-like
- Supported series.values for columns of string or temporal types
- Supported astype with numpy datetime64 type
Bug Fixes:
- Fix sort order for category data.
- Fixed issue in Series.where with named Series. GH#29
- Fixed issue in DataFrame/Series.take with list of booleans
Optimization:
- Support project pushdown for read_csv/parquet.

1.1.0 (Nov 19, 2024)

Removing Fallbacks:
- supported “expand” parameter in str.split() method.
Bug Fixes:
- fixed issue in dtypes for DataFrame with multi-level columns
Optimization:
- Improve performance of join/merge. About 1.5x at max in our experiments.
Others:
- Upgrade dependent pyarrow to 18.0.0. As pyarrow18, python3.8 is no longer supported.

1.0.11 (Nov 12, 2024)

Removing Fallbacks:
- supported fallback on DataFrame.dtypes in presence of column of types list, date32, large_string
- supported DataFrame.loc with columns slicing e.g., df.loc[:, A: C]
Bug Fixes:
- fixed issue in setting index to an empty DataFrame/Series
Optimization:
- improved sort_values() with key of temporal types
- supported projection-pushdown when projection target is empty e.g., df.sort_values("C")[[]]

1.0.10 (Nov 05, 2024)

Bug Fixes:
- fixed a conditional bug with negative index as for input of DataFrame/Series take()
- fixed issue in sampling empty DataFrame/Series
- fixed a bug in calculation of length of a column of type list containing Nulls.
Removing Fallback:
- supported DataFrame.groupby() with input key of Series type.

1.0.9 (Oct 28, 2024)

Bug Fixes:
- update Join: support list-type payload (GH#20)
- fix: isin() to support CategoricalDtype
- fix: supported to_csv on DataFrame/Series having list or struct-like columns.
Removing Fallback:
- supported fallback on getitem with numeric index for StringMethods (string or list-like columns): e.g., s.str[2]
Performance Improvement:
- optimized calculation (~3x) of length for list-like columns

1.0.8 (Oct 22, 2024)

Bug Fixes:
- Some unsupported rolling functions are implemented.
- fixed issue in slicing a list-like column
- fixed RuntimeError on DataFrame.astype(category).head()
Removing Fallbacks:
- dtype_backend=“pyarrow” parameter of read_csv
- column parameter of to_csv

1.0.7 (Oct 16, 2024)

Removing Fallback:
- remove fallback: read_csv with encoding=utf8
- supported binop comparison with ‘date’ instance as for scalar value

1.0.6 (Oct 07, 2024)

Bug Fixes:
- fix in operator with Series (GT#26).
- fix issue where index setter, df.index = ..., does not work with fallback.

1.0.5 (Sep 20, 2024)

Bug Fixes:
- fixed dump to and read from pickle
- fixed groupby ith selector for key like df.groupby("a")["a"]
Removing Fallback:
- supported ignore_index parameter for drop_duplicates
Performance Improvement:
- added IR optimization df.drop_duplicates(...).reset_index(drop=True) -> df.drop_duplicates(..., ignore_index=True)

1.0.4 (Sep 10, 2024)

Bug Fixes:
- fixed issue on groupby-select-aggregate with kwargs e.g., df.groupby("a")["b"].agg(Sum="sum")
- fixed melt() issue with non-string “value_vars”.
- fixed issue with groupby(…).size() on empty data.
Removing Fallback:
- supported dtype=“string” as for input of astype(), read_csv() etc.
- supported iloc by row-index e.g., df.iloc[0]
- supported “ignore_index” parameter for Series/DataFrame dropna().
Performance Improvement:
- improved overhead of computation on index column, when reset_index(drop=True) is performed followed by dropna, concat, melt, explode.

1.0.3 (Sep 02, 2024)

Bug Fixes:
- fixed join with categorical columns.

1.0.2 (Aug 30, 2024)

Bug Fixes:
- fixed a bug in reading the parquet file when index columns are stored at the beginningg.
Removing Fallback:
- supported datetime properties dt.date, dt.time
- supported aggregate method “last” for GroupBy.

1.0.1 (Aug 28, 2024)

Bug Fixes:
- benchmark-mode with inplace method
Performance Improvement:
- Groupby.nunique with numeric column
- DataFrame.merge for some cases
- added optimization pattern sort_values(...).reset_index(drop=True) -> sort_values(..., ignore_index=True)
Others:
- print optimized IR when FIRE_LOG_LEVEL=3

1.0.0 (Aug 23, 2024)

Bug Fixes:
- fixed issue with dictionary sort
- fixed issue in filling null with null
Removing Fallback:
- supported sort_index on DataFarme and Series
Performance Improvement:
- add JoinWithMaskPat optimization
- add predicate pushdown optimization
Others:
- add test on rockylinux9.2 with python3.11
- fireducks.pandas.__version__ returns version of pandas. Use fireducks.__version__ when version of fireducks is required.

0.13.1 (Aug 14, 2024)

Bug Fixes:
- fixed issue in casting a datetime column from one unit to another (e.g., datetime64[ns] -> datetime64[ms])
- fixed issue in handling range index with step != 1
Removing Fallback:
- supported pd.read_json() with lines=True case
- supported Series.reset_index() with name parameter
- supported DataFrame setitem/getitem with numpy array of dimension Nx1
- supported DataFrame.setitem with non-string key. e.g., df[1] = ... (key is integer)
Performance Improvement:
- improved dropna(axis=0) for input without any nulls

0.13.0 (Jul 30, 2024)

Bug Fixes:
- Fixed filter bug when input mask is having different alignment than in input table.
- Fixed a bug related to an importhook under a FireDucks profiler.
- Fixed merge with on=key for different key types.
- Fixed merge with left_index, right_index for different key types.
- Fixed issue in unit handling for TimeDelta columns.
Others:
- Upgrade dependent pyarrow to 17.0.0.

0.12.6 (Jul 23, 2024)

Removing Fallback:
- supported getter and setter on Series.name
- supported loc-assignment, scalar-assignment related cases with pd.NaT, e.g, df["c"] = pd.NaT
- supported setitem on DataFrame with numeric arrays having None, e.g, df["c"] = [1, None, 3]
Bug Fixes:
- Fixed issue in putting null using np.nan on non-numeric columns (string, timedelta etc.)
- Fixed get_dummies() issue with default dtype for pandas 2x
- Fixed strftime issue with format having “%%S” like escape

0.12.5 (Jul 12, 2024)

Performance Improvement:
- optimized days_in_month
- optimized implementation of microsecond (> 2x)
- improved performance of sample, by avoiding unnecessary checks for negative index
Bug Fixes:
- groupby with timestamp and timedelta column.
- fixed issue with is_leap_year

0.12.4 (Jul 09, 2024)

Performance Improvement:
- improved perfomance of take(axis=0) when input frame has default range index.
- improved performance of sum, mean, count etc. for boolean column.
Removing Fallback:
- supported Datetime Accessor methods: is_leap_year, days_in_month, microsecond
- supported Series.between.
- supported DataFrame filter with numpy-array as mask vector.
- supported following iloc-gettter cases:
  - iloc with arraylike of integers: e.g., df.iloc[[0,2,4]]
  - iloc with range or slice objects: e.g., df.iloc[:3]
  - iloc for projection-filter: df.iloc[:2, :3], df.iloc[[0,3,5], [0,1]] etc.
Bug Fixes:
- fixed issue in groupby-aggregator with duration column as key/non-key.
- fixed issue in boolean casting for column of types: timestamp, timedelta.
- fixed type issue in count() result for column of type: timedelta.
- fixed iloc bug when input frame has duplicate columns.
- fixed issue with strftime("%S") when non-fractional second part is to be formatted.

0.12.3 (Jul 02, 2024)

Performance Improvement:
- read_csv with many columns
- merge with many columns
Removing Fallback:
- supported header parameter for read_csv
- supported list-of-integers to specified as index_col in read_csv()
Bug Fixes:
- fixed issue with aggregation on unsigned numeric columns by supporting unsiged scalars in FireDucks
- fix: reindexing column order after performing arithmetic operation
- fixed read_csv() bug when ‘index_col’ is of boolean-type or contains negative integers
Others:
- remove: dependency on numpy<2.0

0.12.2 (Jun 24, 2024)

Removing Fallback:
- Supported to_datetime() with given format (fixed fallback issue at backend).
- Supported astype() with input as Series: e.g., s.astype(s2.dtype)
- Supported DateTime accessor method total_seconds() on TimeDelta columns.
- Supported Datetime accessor method strftime() on DateTime columns. Huge improvement than pandas implementation of strftime.
Bug Fixes:
- Fixed read_csv() issue when the length of “names” parameter is different than number of fields in the input file.
- Fixed issue in concatting String with Category, String with LargeString columns.
- Fixed issue in to_csv() when input data has multi-level columns and “header” parameter is not True.
- Fixed issue in isin() operation on string column with non-string lookup targets.
Others:
- Optimized “strftime(format) + astype(numeric)” pattern is optimized when format can be treated as numeric datetime field extractor.
- Modified fireducks.ipyext module loading is no longer required, when fireducks.pandas module is already loaded.

0.12.1 (Jun 17, 2024)

Removing Fallback:
- supported sep, na_rep, quoting_style, header etc. parameters for DataFrame/Series to_csv()
Bug Fixes:
- fixed issue in to_csv when columns are of multi-level and header=False; when columns names are single-level non-strings, saved as strings (unlike pandas)
- fixed groupby.shift ignores dropna parameter.
Others:
- add dependency on numpy<2.0.
- support python 3.12
- support older glibc with python 3.9-3.12

0.12.0 (Jun 10, 2024)

Removing Fallback:
- supported min_periods parameter in rolling()
- supported dictionary of parquet files to be loaded using read_parquet()
- supported Datetime Accessor methods day_name(), month_name()
- supported pd.to_datetime() for Series input
Bug Fixes:
- fixed merge bug for multi-index as key
- fixed Series.map(pd.Timestamp.timespamp)
- fixed string to datetime conversion when input timestamp contains microseconds, nanoseconds parts.
- fixed fillna() on string columns with numeric scalar, e.g., df.fillna(0).
- fixed concat to support mixed of single-level and multi-level column names.
- fixed lazy-execution issue in setting Series attributes
Others:
- removed pandas dependency with 1.5.3. Fireducks is now compatible with both pandas 1.5 and 2.2.

0.11.5 (Jun 04, 2024)

Bug Fixes:
- Fixed bug of column dtypes on reading an empty CSV file.
- Fixed issue on calling StringMethods (s.str.upper etc.) on a “category” column with key as string.
- Fixed issue on calling where/mask on empty DataFrame.
- Fixed to return pd.NaT instead of np.nan on calling aggregate methods on empty Series of timedelta, timestamp types.
- Fixed issue on calling any(), all() on a String column.
- Fixed issue on comparing string and datetime columns, string and numeric columns.
- Fixed read_parquet() to support non-string column name
- Fixed issue on calling where/mask on column of type float16
Removing Fallback:
- supported DataFrame.contains
- supported “min_periods” parameter for DataFrame/Series rolling()
Performance Improvement:
- improved groupby when key is of “category” type (upto 2 times).
- improved displaying a DataFrame instance on jupyter-like notebook platforms.
Others:
- importhook now supports -m option to run a library module. e.g., python -m fireducks.imhook -m <other_python_module> …

0.11.4 (May 27, 2024)

Bug Fixes:
- fixed result type of groupby-sum on boolean column from uint64 -> int64 according to pandas
- fixed None check in DataFrame.rename
- fixed read_csv issue with parsing bad-csv files by falling back to pandas
- fixed: DataFrame.drop() issue when string-value is specified as index to be dropped from a datetime column
- fixed: DataFrame.drop() when target column to be dropped is specified as scalar with axis=1 [e.g., df.drop(“c”, axis=1)]
- fixed: logical func of DataFrame/Series unexpected kwargs
- fixed: unnecessary upcast of floating dtypes in to_numpy on a column with nulls.
- fixed bug when groupby results in data columns being empty.
- fixed dictionary mapping on Series, when type of input Series and the type of dictionary-keys do not match
- fixed errors when unsupported aggregate methods (e.g., corr, describe etc.) are provided to groupby-agg
- update: DataFrame/Series.where handles other=nan as null.
Removing Fallback:
- fixed fallback when head/tail/shift etc. is provided as single-value-list to groupby-aggregate [e.g., agg([“head”])];
- supported inplace parameter for Series/DataFrame drop_duplicates
- supported Series.drop
Performance Improvement:
- Improved groupby-aggregate for sum, mean, median, stddev on boolean column
- Improved dictionary mapping on Series

0.11.3 (May 20, 2024)

Performance Improvement:
- Add new optimization to remove uncecesarray sort in groupby.
Bug Fixes:
- fix: Series.dtype where Series.name is non-0 integer
Removing Fallback:
- fixed fallback of sort_values with kind=None
- DataFrame/Series.diff support more integer-like periods

0.11.2 (May 16, 2024)

Bug Fixes:
- Fix dependency on pyarrow.
- Fix dtype of index when merge result is empty.
Removing Fallback:
- Supported aggregate on timedelta columns.

0.11.1 (May 13, 2024)

Performance Improvement:
- Add new IR pattern rewrite optimization pass.
- DataFrame.merge/join with date32/64 payload column.
Bug Fixes:
- Fixed bug in iloc-getter when there are duplicates in column names
Removing Fallback:
- Supported aggregate methods (max, min, mean etc.) to be performed on timestamp columns.
- Supported iloc-getter with integer or list-likes column indicator: e.g., df.iloc[:, 0], df.iloc[:, [2,4]] etc.
- Supported take() with slice object as input.
- Supported squeeze() For DataFrame and Series.
- Supported dictionary or casting-methods to be mapped on a Series.
New pandas incompatibility:
- observed parameter of groupby is always true for better performance.

0.11.0 (May 07, 2024)

Performance Improvement:
- groupby.median() and median is now returns non approximate median.
Removing Fallback:
- read_parquet with columns parameter.
- DataFrame.rename with columns parameter.
Others:
- Upgrade dependent pyarrow to 16.0.0.
- the importhook feature now can be activated by fireducks.pandas

0.10.9 (Apr 23, 2024)

Performance Improvement:
- groupby.std()
Removing Fallback:
- Supported astype(“datetime64”)
- Supported DataFrame.dropna(axis=1)
Bug Fixes:
- Fix df.merge returning incorrect result when how is left and key has nulls.
- Fix an error when “head”, “tail” or “shift” is used in groupby.agg. If any of these is provided as a single aggregator [e.g., df.groupby(...).agg("head")], you can experience speed-up from FireDucks, but when these are provided in combination with another aggregator [e.g., df.groupby(...).agg(["head", "mean"])], the same will be executed by fallbacker.
- Fix issues in accessing methods from pd.api.types module.
Others:
- Remove version from dependency on numpy.
- Add experimental profiler for jupyter/ipython. Use %load_ext fireducks.ipyext and %%fireducks.profile cell magic. See here for details.

0.10.8 (Apr 16, 2024)

Performance Improvement:
- groupby two keys with nulls
- left join with single key
- left and inner join with single key of category type
Removing Fallback:
- groupby.corrwith among two columns

0.10.7 (Apr 10, 2024)

Performance Improvement:
- Printing dataframe with large dictionary.
Removing Fallback:
- DataFrame/Series astype() with dtype=“category”
Bug Fixes:
- Fixed Join issue with dictionary-typed key columns.
- Fixed filter issue of a table having multiple index columns with duplicate values
Others:
- Upgrade dependent pyarrow to 15.0.2.

0.10.6 (Apr 02, 2024)

Performance Improvement:
- Series.unique()
- DataFrame/Series nunique()
- read_csv with category type
Removing Fallback:
- DataFrame/Series astype() with bool, uint8 etc.
- supported following parameters for pd.get_dummies(): columns, prefix, prefix_sep, dtype

0.10.5 (Mar 26, 2024)

Performance Improvement:
- groupby.head/tail
- groupby.size
Removing Fallback:
- dropna=False with groupby
- groupby.first
- DataFrame.value_counts
- supported “normalize” parameter of Series.value_counts
Bug Fixes:
- Fix incorrect fallback of Series.apply
- Fix str.split issue when expand parameter is specified
- Fix null assignment issue, e.g., df.mask[cond, “a”] = np.nan

0.10.4 (Mar 13, 2024)

Performance Improvement:
- Groupby, merge/join, sort_value with string key
Bug Fixes:
- Fixed fallback issue with loca/iloc setitem

0.10.3 (Mar 06, 2024)

Performance Improvement:
- Optimized construction of a Series from another Series.
Removing Fallback:
- Supported replace with regex=True
- Supported loc-assignment for non-numeric index, e.g., df.loc[["a", "c", "d"], "col1"] = 5
Bug Fixes:
- Fixed bug when loc assignment is performed with non-series data (like list etc.) and target frame does not have default index.
- Fixed NotImplementedError cases related to datetime-string comparison.

0.10.2 (Feb 26, 2024)

Performance Improvement:
- improved index-getter (df.index) by avoiding fallback of data columns
- sort with uint32/uint64 key
Removing Fallback:
- Supported groupby.shift() for DataFrame and Series
- Supported take() for DataFrame and Series
- Supported sample() for DataFrame and Series
- Supported loc-assignment with positions (e.g., df.loc[[5,2,4], “a”] = 100) for DataFrame and Series

0.10.1 (Feb 19, 2024)

Performance improvement:
- DataFrame.merge
- DataFrame/Series.sort_values when including null
Bug Fix:
- fixed DataFrame/Series.sort_values with string key and ascending=False

0.10.0 (Feb 13, 2024)

Performance improvement:
- DataFrame/Series.drop_duplicates
- DataFrame/Series.dropna
Removing Fallback:
- supported astype with numpy types (np.int32, np.int64, np.float32, np.float64)
- supported conditional loc setter for DataFrame and Series: e.g., df.loc[cond, "a"] = 2; s.loc[cond] = 2
Bug Fixes:
- fixed int-float binop division issue
- fixed calling issue of StringMethods on LARGE_STRING typed columns
Others:
- update to arrow15

0.9.8 (Feb 5, 2024)

Performance improvement:
- DataFrame.groupby
Removing fallback:
- DataFrame/Series.reset_index with allow_duplicates

0.9.7 (Jan 29, 2024)

Removing fallback:
- Setting index of DataFrame/Series like df.index = ...
- Index.set_names

0.9.6 (Jan 22, 2024)

Performance improvement:
- move projection optimization: support copy and drop_duplicates.
Removing fallback:
- DataFrame/Series.__repr_html__ to drastically improve speed for displaying on Jupyter notebook.
- DataFrame/Series.set_axis
- DataFrame/Series.__setitem__ with array-like
- DataFrame/Series.set_index with ndarray, drop=True, append=True and verify_integrity=True
- DataFrame/Series.sort_values with ignore_index=True
Bug fix:
- read_csv with fsspec parameter such as “s3://”

0.9.5 (Jan 15, 2024)

Removing fallback:
- DataFrame/Series.shift
- DataFrame/Series.pipe

0.9.4 (Dec 28, 2023)

Performance improvement
- DataFrame.copy
Removing fallback:
- DataFrame/Series.iloc setter
- DataFrame.__array__

0.9.3 (Dec 25, 2023)

Performance improvement
- DataFrame.merge
- Binary operations
Bug fix:
- Series.__repr__

0.9.2 (Dec 18, 2023)

Performance improvement
- DataFrame.groupby, DataFrame.where
- IR building
Removing fallback:
- DataFrame.iloc, DataFrame.__repr__
Bug fix:
- read_csv with URL

0.9.1 (Dec 11, 2023)

Performance improvement
- DataFrame.corr
Others
- Add benchmark mode

0.9.0 (Dec 4, 2023)

Update to arrow-14.0.1

0.8.8 (Nov 27, 2023)

Bug Fix
- remove unexpected print in read_csv

0.8.7 (Nov 27, 2023)

Performance improvement
- DataFrame.corr
- DataFrame.dropna
Removing fallback:
- read_csv with default arguments
- DataFrame.to_csv with encoding=utf8
- DataFrame.groupby with dropna=True

0.8.6 (Nov 20, 2023)

Performance Improvement
- DataFrame.groupby using cardinarity estimation.
- DataFrame.corr for less rows DataFrame.
Removing fallback
- DataFrame/Series.mask
- DataFrame/Series.where
Bug Fix:
- concat for corner cases

0.8.5 (Nov 9, 2023)

Improve performance of DataFrame.corr
Remove fallback of DataFrame.get_dummies for simple case

0.8.4 (Nov 9, 2023)

Performance improvement
- DataFrame.corr
Perfomance improvement by removing fallback (depending on parameters)
- Series.rolling
- DataFrame.drop
- DataFrame/Series.describe
- DataFrame/Series.skew
- DataFrame/Series.kurt
- DataFrame/Series.values
Bug Fix
- Series.__float__/__int__
- fallback reason of to_csv

0.8.3 (Oct 26, 2023)

Add wheel package for python3.11 (tested with python-3.11.4 on ubuntu23.04).
Improve performance of merge/join when both frames have default index.
Improve pandas compatibility of methods which return a scalar value like Series.aggregate.
Remove fallback: DataFrame.columns, DataFrame.pop, fireducks.pandas.join
Add kernel tracing (enabled by FIREDUCKS_FLAGS=--trace=3)
Add reason to fallback log (enabled by FIREDUCKS_FLAGS=-Wfallback).

0.8.2 (Oct 19, 2023)

First public beta release