Release Note

FireDucks Release Note

1.1.0 (Nov 19, 2024)

  • Removing Fallbacks:
    • supported “expand” parameter in str.split() method.
  • Bug Fixes:
    • fixed issue in dtypes for DataFrame with multi-level columns
  • Optimization:
    • Improve performance of join/merge. About 1.5x at max in our experiments.
  • Others:
    • Upgrade dependent pyarrow to 18.0.0. As pyarrow18, python3.8 is no longer supported.

1.0.11 (Nov 12, 2024)

  • Removing Fallbacks:
    • supported fallback on DataFrame.dtypes in presence of column of types list, date32, large_string
    • supported DataFrame.loc with columns slicing e.g., df.loc[:, A: C]
  • Bug Fixes:
    • fixed issue in setting index to an empty DataFrame/Series
  • Optimization:
    • improved sort_values() with key of temporal types
    • supported projection-pushdown when projection target is empty e.g., df.sort_values("C")[[]]

1.0.10 (Nov 05, 2024)

  • Bug Fixes:
    • fixed a conditional bug with negative index as for input of DataFrame/Series take()
    • fixed issue in sampling empty DataFrame/Series
    • fixed a bug in calculation of length of a column of type list containing Nulls.
  • Removing Fallback:
    • supported DataFrame.groupby() with input key of Series type.

1.0.9 (Oct 28, 2024)

  • Bug Fixes:
    • update Join: support list-type payload (GH#20)
    • fix: isin() to support CategoricalDtype
    • fix: supported to_csv on DataFrame/Series having list or struct-like columns.
  • Removing Fallback:
    • supported fallback on getitem with numeric index for StringMethods (string or list-like columns): e.g., s.str[2]
  • Performance Improvement:
    • optimized calculation (~3x) of length for list-like columns

1.0.8 (Oct 22, 2024)

  • Bug Fixes:
    • Some unsupported rolling functions are implemented.
    • fixed issue in slicing a list-like column
    • fixed RuntimeError on DataFrame.astype(category).head()
  • Removing Fallbacks:
    • dtype_backend=“pyarrow” parameter of read_csv
    • column parameter of to_csv

1.0.7 (Oct 16, 2024)

  • Removing Fallback:
    • remove fallback: read_csv with encoding=utf8
    • supported binop comparison with ‘date’ instance as for scalar value

1.0.6 (Oct 07, 2024)

  • Bug Fixes:
    • fix in operator with Series (GT#26).
    • fix issue where index setter, df.index = ..., does not work with fallback.

1.0.5 (Sep 20, 2024)

  • Bug Fixes:
    • fixed dump to and read from pickle
    • fixed groupby ith selector for key like df.groupby("a")["a"]
  • Removing Fallback:
    • supported ignore_index parameter for drop_duplicates
  • Performance Improvement:
    • added IR optimization df.drop_duplicates(...).reset_index(drop=True) -> df.drop_duplicates(..., ignore_index=True)

1.0.4 (Sep 10, 2024)

  • Bug Fixes:
    • fixed issue on groupby-select-aggregate with kwargs e.g., df.groupby("a")["b"].agg(Sum="sum")
    • fixed melt() issue with non-string “value_vars”.
    • fixed issue with groupby(…).size() on empty data.
  • Removing Fallback:
    • supported dtype=“string” as for input of astype(), read_csv() etc.
    • supported iloc by row-index e.g., df.iloc[0]
    • supported “ignore_index” parameter for Series/DataFrame dropna().
  • Performance Improvement:
    • improved overhead of computation on index column, when reset_index(drop=True) is performed followed by dropna, concat, melt, explode.

1.0.3 (Sep 02, 2024)

  • Bug Fixes:
    • fixed join with categorical columns.

1.0.2 (Aug 30, 2024)

  • Bug Fixes:
    • fixed a bug in reading the parquet file when index columns are stored at the beginningg.
  • Removing Fallback:
    • supported datetime properties dt.date, dt.time
    • supported aggregate method “last” for GroupBy.

1.0.1 (Aug 28, 2024)

  • Bug Fixes:
    • benchmark-mode with inplace method
  • Performance Improvement:
    • Groupby.nunique with numeric column
    • DataFrame.merge for some cases
    • added optimization pattern sort_values(...).reset_index(drop=True) -> sort_values(..., ignore_index=True)
  • Others:
    • print optimized IR when FIRE_LOG_LEVEL=3

1.0.0 (Aug 23, 2024)

  • Bug Fixes:
    • fixed issue with dictionary sort
    • fixed issue in filling null with null
  • Removing Fallback:
    • supported sort_index on DataFarme and Series
  • Performance Improvement:
    • add JoinWithMaskPat optimization
    • add predicate pushdown optimization
  • Others:
    • add test on rockylinux9.2 with python3.11
    • fireducks.pandas.__version__ returns version of pandas. Use fireducks.__version__ when version of fireducks is required.

0.13.1 (Aug 14, 2024)

  • Bug Fixes:
    • fixed issue in casting a datetime column from one unit to another (e.g., datetime64[ns] -> datetime64[ms])
    • fixed issue in handling range index with step != 1
  • Removing Fallback:
    • supported pd.read_json() with lines=True case
    • supported Series.reset_index() with name parameter
    • supported DataFrame setitem/getitem with numpy array of dimension Nx1
    • supported DataFrame.setitem with non-string key. e.g., df[1] = ... (key is integer)
  • Performance Improvement:
    • improved dropna(axis=0) for input without any nulls

0.13.0 (Jul 30, 2024)

  • Bug Fixes:
    • Fixed filter bug when input mask is having different alignment than in input table.
    • Fixed a bug related to an importhook under a FireDucks profiler.
    • Fixed merge with on=key for different key types.
    • Fixed merge with left_index, right_index for different key types.
    • Fixed issue in unit handling for TimeDelta columns.
  • Others:
    • Upgrade dependent pyarrow to 17.0.0.

0.12.6 (Jul 23, 2024)

  • Removing Fallback:
    • supported getter and setter on Series.name
    • supported loc-assignment, scalar-assignment related cases with pd.NaT, e.g, df["c"] = pd.NaT
    • supported setitem on DataFrame with numeric arrays having None, e.g, df["c"] = [1, None, 3]
  • Bug Fixes:
    • Fixed issue in putting null using np.nan on non-numeric columns (string, timedelta etc.)
    • Fixed get_dummies() issue with default dtype for pandas 2x
    • Fixed strftime issue with format having “%%S” like escape

0.12.5 (Jul 12, 2024)

  • Performance Improvement:
    • optimized days_in_month
    • optimized implementation of microsecond (> 2x)
    • improved performance of sample, by avoiding unnecessary checks for negative index
  • Bug Fixes:
    • groupby with timestamp and timedelta column.
    • fixed issue with is_leap_year

0.12.4 (Jul 09, 2024)

  • Performance Improvement:
    • improved perfomance of take(axis=0) when input frame has default range index.
    • improved performance of sum, mean, count etc. for boolean column.
  • Removing Fallback:
    • supported Datetime Accessor methods: is_leap_year, days_in_month, microsecond
    • supported Series.between.
    • supported DataFrame filter with numpy-array as mask vector.
    • supported following iloc-gettter cases:
      • iloc with arraylike of integers: e.g., df.iloc[[0,2,4]]
      • iloc with range or slice objects: e.g., df.iloc[:3]
      • iloc for projection-filter: df.iloc[:2, :3], df.iloc[[0,3,5], [0,1]] etc.
  • Bug Fixes:
    • fixed issue in groupby-aggregator with duration column as key/non-key.
    • fixed issue in boolean casting for column of types: timestamp, timedelta.
    • fixed type issue in count() result for column of type: timedelta.
    • fixed iloc bug when input frame has duplicate columns.
    • fixed issue with strftime("%S") when non-fractional second part is to be formatted.

0.12.3 (Jul 02, 2024)

  • Performance Improvement:
    • read_csv with many columns
    • merge with many columns
  • Removing Fallback:
    • supported header parameter for read_csv
    • supported list-of-integers to specified as index_col in read_csv()
  • Bug Fixes:
    • fixed issue with aggregation on unsigned numeric columns by supporting unsiged scalars in FireDucks
    • fix: reindexing column order after performing arithmetic operation
    • fixed read_csv() bug when ‘index_col’ is of boolean-type or contains negative integers
  • Others:
    • remove: dependency on numpy<2.0

0.12.2 (Jun 24, 2024)

  • Removing Fallback:
    • Supported to_datetime() with given format (fixed fallback issue at backend).
    • Supported astype() with input as Series: e.g., s.astype(s2.dtype)
    • Supported DateTime accessor method total_seconds() on TimeDelta columns.
    • Supported Datetime accessor method strftime() on DateTime columns. Huge improvement than pandas implementation of strftime.
  • Bug Fixes:
    • Fixed read_csv() issue when the length of “names” parameter is different than number of fields in the input file.
    • Fixed issue in concatting String with Category, String with LargeString columns.
    • Fixed issue in to_csv() when input data has multi-level columns and “header” parameter is not True.
    • Fixed issue in isin() operation on string column with non-string lookup targets.
  • Others:
    • Optimized “strftime(format) + astype(numeric)” pattern is optimized when format can be treated as numeric datetime field extractor.
    • Modified fireducks.ipyext module loading is no longer required, when fireducks.pandas module is already loaded.

0.12.1 (Jun 17, 2024)

  • Removing Fallback:
    • supported sep, na_rep, quoting_style, header etc. parameters for DataFrame/Series to_csv()
  • Bug Fixes:
    • fixed issue in to_csv when columns are of multi-level and header=False; when columns names are single-level non-strings, saved as strings (unlike pandas)
    • fixed groupby.shift ignores dropna parameter.
  • Others:
    • add dependency on numpy<2.0.
    • support python 3.12
    • support older glibc with python 3.9-3.12

0.12.0 (Jun 10, 2024)

  • Removing Fallback:
    • supported min_periods parameter in rolling()
    • supported dictionary of parquet files to be loaded using read_parquet()
    • supported Datetime Accessor methods day_name(), month_name()
    • supported pd.to_datetime() for Series input
  • Bug Fixes:
    • fixed merge bug for multi-index as key
    • fixed Series.map(pd.Timestamp.timespamp)
    • fixed string to datetime conversion when input timestamp contains microseconds, nanoseconds parts.
    • fixed fillna() on string columns with numeric scalar, e.g., df.fillna(0).
    • fixed concat to support mixed of single-level and multi-level column names.
    • fixed lazy-execution issue in setting Series attributes
  • Others:
    • removed pandas dependency with 1.5.3. Fireducks is now compatible with both pandas 1.5 and 2.2.

0.11.5 (Jun 04, 2024)

  • Bug Fixes:
    • Fixed bug of column dtypes on reading an empty CSV file.
    • Fixed issue on calling StringMethods (s.str.upper etc.) on a “category” column with key as string.
    • Fixed issue on calling where/mask on empty DataFrame.
    • Fixed to return pd.NaT instead of np.nan on calling aggregate methods on empty Series of timedelta, timestamp types.
    • Fixed issue on calling any(), all() on a String column.
    • Fixed issue on comparing string and datetime columns, string and numeric columns.
    • Fixed read_parquet() to support non-string column name
    • Fixed issue on calling where/mask on column of type float16
  • Removing Fallback:
    • supported DataFrame.contains
    • supported “min_periods” parameter for DataFrame/Series rolling()
  • Performance Improvement:
    • improved groupby when key is of “category” type (upto 2 times).
    • improved displaying a DataFrame instance on jupyter-like notebook platforms.
  • Others:
    • importhook now supports -m option to run a library module. e.g., python -m fireducks.imhook -m <other_python_module> …

0.11.4 (May 27, 2024)

  • Bug Fixes:
    • fixed result type of groupby-sum on boolean column from uint64 -> int64 according to pandas
    • fixed None check in DataFrame.rename
    • fixed read_csv issue with parsing bad-csv files by falling back to pandas
    • fixed: DataFrame.drop() issue when string-value is specified as index to be dropped from a datetime column
    • fixed: DataFrame.drop() when target column to be dropped is specified as scalar with axis=1 [e.g., df.drop(“c”, axis=1)]
    • fixed: logical func of DataFrame/Series unexpected kwargs
    • fixed: unnecessary upcast of floating dtypes in to_numpy on a column with nulls.
    • fixed bug when groupby results in data columns being empty.
    • fixed dictionary mapping on Series, when type of input Series and the type of dictionary-keys do not match
    • fixed errors when unsupported aggregate methods (e.g., corr, describe etc.) are provided to groupby-agg
    • update: DataFrame/Series.where handles other=nan as null.
  • Removing Fallback:
    • fixed fallback when head/tail/shift etc. is provided as single-value-list to groupby-aggregate [e.g., agg([“head”])];
    • supported inplace parameter for Series/DataFrame drop_duplicates
    • supported Series.drop
  • Performance Improvement:
    • Improved groupby-aggregate for sum, mean, median, stddev on boolean column
    • Improved dictionary mapping on Series

0.11.3 (May 20, 2024)

  • Performance Improvement:
    • Add new optimization to remove uncecesarray sort in groupby.
  • Bug Fixes:
    • fix: Series.dtype where Series.name is non-0 integer
  • Removing Fallback:
    • fixed fallback of sort_values with kind=None
    • DataFrame/Series.diff support more integer-like periods

0.11.2 (May 16, 2024)

  • Bug Fixes:
    • Fix dependency on pyarrow.
    • Fix dtype of index when merge result is empty.
  • Removing Fallback:
    • Supported aggregate on timedelta columns.

0.11.1 (May 13, 2024)

  • Performance Improvement:
    • Add new IR pattern rewrite optimization pass.
    • DataFrame.merge/join with date32/64 payload column.
  • Bug Fixes:
    • Fixed bug in iloc-getter when there are duplicates in column names
  • Removing Fallback:
    • Supported aggregate methods (max, min, mean etc.) to be performed on timestamp columns.
    • Supported iloc-getter with integer or list-likes column indicator: e.g., df.iloc[:, 0], df.iloc[:, [2,4]] etc.
    • Supported take() with slice object as input.
    • Supported squeeze() For DataFrame and Series.
    • Supported dictionary or casting-methods to be mapped on a Series.
  • New pandas incompatibility:
    • observed parameter of groupby is always true for better performance.

0.11.0 (May 07, 2024)

  • Performance Improvement:
    • groupby.median() and median is now returns non approximate median.
  • Removing Fallback:
    • read_parquet with columns parameter.
    • DataFrame.rename with columns parameter.
  • Others:
    • Upgrade dependent pyarrow to 16.0.0.
    • the importhook feature now can be activated by fireducks.pandas

0.10.9 (Apr 23, 2024)

  • Performance Improvement:
    • groupby.std()
  • Removing Fallback:
    • Supported astype(“datetime64”)
    • Supported DataFrame.dropna(axis=1)
  • Bug Fixes:
    • Fix df.merge returning incorrect result when how is left and key has nulls.
    • Fix an error when “head”, “tail” or “shift” is used in groupby.agg. If any of these is provided as a single aggregator [e.g., df.groupby(...).agg("head")], you can experience speed-up from FireDucks, but when these are provided in combination with another aggregator [e.g., df.groupby(...).agg(["head", "mean"])], the same will be executed by fallbacker.
    • Fix issues in accessing methods from pd.api.types module.
  • Others:
    • Remove version from dependency on numpy.
    • Add experimental profiler for jupyter/ipython. Use %load_ext fireducks.ipyext and %%fireducks.profile cell magic. See here for details.

0.10.8 (Apr 16, 2024)

  • Performance Improvement:
    • groupby two keys with nulls
    • left join with single key
    • left and inner join with single key of category type
  • Removing Fallback:
    • groupby.corrwith among two columns

0.10.7 (Apr 10, 2024)

  • Performance Improvement:
    • Printing dataframe with large dictionary.
  • Removing Fallback:
    • DataFrame/Series astype() with dtype=“category”
  • Bug Fixes:
    • Fixed Join issue with dictionary-typed key columns.
    • Fixed filter issue of a table having multiple index columns with duplicate values
  • Others:
    • Upgrade dependent pyarrow to 15.0.2.

0.10.6 (Apr 02, 2024)

  • Performance Improvement:
    • Series.unique()
    • DataFrame/Series nunique()
    • read_csv with category type
  • Removing Fallback:
    • DataFrame/Series astype() with bool, uint8 etc.
    • supported following parameters for pd.get_dummies(): columns, prefix, prefix_sep, dtype

0.10.5 (Mar 26, 2024)

  • Performance Improvement:
    • groupby.head/tail
    • groupby.size
  • Removing Fallback:
    • dropna=False with groupby
    • groupby.first
    • DataFrame.value_counts
    • supported “normalize” parameter of Series.value_counts
  • Bug Fixes:
    • Fix incorrect fallback of Series.apply
    • Fix str.split issue when expand parameter is specified
    • Fix null assignment issue, e.g., df.mask[cond, “a”] = np.nan

0.10.4 (Mar 13, 2024)

  • Performance Improvement:
    • Groupby, merge/join, sort_value with string key
  • Bug Fixes:
    • Fixed fallback issue with loca/iloc setitem

0.10.3 (Mar 06, 2024)

  • Performance Improvement:
    • Optimized construction of a Series from another Series.
  • Removing Fallback:
    • Supported replace with regex=True
    • Supported loc-assignment for non-numeric index, e.g., df.loc[["a", "c", "d"], "col1"] = 5
  • Bug Fixes:
    • Fixed bug when loc assignment is performed with non-series data (like list etc.) and target frame does not have default index.
    • Fixed NotImplementedError cases related to datetime-string comparison.

0.10.2 (Feb 26, 2024)

  • Performance Improvement:
    • improved index-getter (df.index) by avoiding fallback of data columns
    • sort with uint32/uint64 key
  • Removing Fallback:
    • Supported groupby.shift() for DataFrame and Series
    • Supported take() for DataFrame and Series
    • Supported sample() for DataFrame and Series
    • Supported loc-assignment with positions (e.g., df.loc[[5,2,4], “a”] = 100) for DataFrame and Series

0.10.1 (Feb 19, 2024)

  • Performance improvement:
    • DataFrame.merge
    • DataFrame/Series.sort_values when including null
  • Bug Fix:
    • fixed DataFrame/Series.sort_values with string key and ascending=False

0.10.0 (Feb 13, 2024)

  • Performance improvement:
    • DataFrame/Series.drop_duplicates
    • DataFrame/Series.dropna
  • Removing Fallback:
    • supported astype with numpy types (np.int32, np.int64, np.float32, np.float64)
    • supported conditional loc setter for DataFrame and Series: e.g., df.loc[cond, "a"] = 2; s.loc[cond] = 2
  • Bug Fixes:
    • fixed int-float binop division issue
    • fixed calling issue of StringMethods on LARGE_STRING typed columns
  • Others:
    • update to arrow15

0.9.8 (Feb 5, 2024)

  • Performance improvement:
    • DataFrame.groupby
  • Removing fallback:
    • DataFrame/Series.reset_index with allow_duplicates

0.9.7 (Jan 29, 2024)

  • Removing fallback:
    • Setting index of DataFrame/Series like df.index = ...
    • Index.set_names

0.9.6 (Jan 22, 2024)

  • Performance improvement:
    • move projection optimization: support copy and drop_duplicates.
  • Removing fallback:
    • DataFrame/Series.__repr_html__ to drastically improve speed for displaying on Jupyter notebook.
    • DataFrame/Series.set_axis
    • DataFrame/Series.__setitem__ with array-like
    • DataFrame/Series.set_index with ndarray, drop=True, append=True and verify_integrity=True
    • DataFrame/Series.sort_values with ignore_index=True
  • Bug fix:
    • read_csv with fsspec parameter such as “s3://”

0.9.5 (Jan 15, 2024)

  • Removing fallback:
    • DataFrame/Series.shift
    • DataFrame/Series.pipe

0.9.4 (Dec 28, 2023)

  • Performance improvement
    • DataFrame.copy
  • Removing fallback:
    • DataFrame/Series.iloc setter
    • DataFrame.__array__

0.9.3 (Dec 25, 2023)

  • Performance improvement
    • DataFrame.merge
    • Binary operations
  • Bug fix:
    • Series.__repr__

0.9.2 (Dec 18, 2023)

  • Performance improvement
    • DataFrame.groupby, DataFrame.where
    • IR building
  • Removing fallback:
    • DataFrame.iloc, DataFrame.__repr__
  • Bug fix:
    • read_csv with URL

0.9.1 (Dec 11, 2023)

0.9.0 (Dec 4, 2023)

  • Update to arrow-14.0.1

0.8.8 (Nov 27, 2023)

  • Bug Fix
    • remove unexpected print in read_csv

0.8.7 (Nov 27, 2023)

  • Performance improvement
    • DataFrame.corr
    • DataFrame.dropna
  • Removing fallback:
    • read_csv with default arguments
    • DataFrame.to_csv with encoding=utf8
    • DataFrame.groupby with dropna=True

0.8.6 (Nov 20, 2023)

  • Performance Improvement
    • DataFrame.groupby using cardinarity estimation.
    • DataFrame.corr for less rows DataFrame.
  • Removing fallback
    • DataFrame/Series.mask
    • DataFrame/Series.where
  • Bug Fix:
    • concat for corner cases

0.8.5 (Nov 9, 2023)

  • Improve performance of DataFrame.corr
  • Remove fallback of DataFrame.get_dummies for simple case

0.8.4 (Nov 9, 2023)

  • Performance improvement
    • DataFrame.corr
  • Perfomance improvement by removing fallback (depending on parameters)
    • Series.rolling
    • DataFrame.drop
    • DataFrame/Series.describe
    • DataFrame/Series.skew
    • DataFrame/Series.kurt
    • DataFrame/Series.values
  • Bug Fix
    • Series.__float__/__int__
    • fallback reason of to_csv

0.8.3 (Oct 26, 2023)

  • Add wheel package for python3.11 (tested with python-3.11.4 on ubuntu23.04).
  • Improve performance of merge/join when both frames have default index.
  • Improve pandas compatibility of methods which return a scalar value like Series.aggregate.
  • Remove fallback: DataFrame.columns, DataFrame.pop, fireducks.pandas.join
  • Add kernel tracing (enabled by FIREDUCKS_FLAGS=--trace=3)
  • Add reason to fallback log (enabled by FIREDUCKS_FLAGS=-Wfallback).

0.8.2 (Oct 19, 2023)

  • First public beta release