Import hooks: how to use FireDucks without modifying your programs
This is Osamu Daido from the FireDucks development team. In today’s developers’ blog, I would like to introduce the import hook feature of FireDucks. This feature enables you to use FireDucks without modifying your existing programs at all.
I’ll explain how to use hooks when running Python files on the command line and how to enable hooks in IPython or Jupyter Notebook.
What is an import hook?
FireDucks behaves in the same way as the original pandas, so it’s easy to get started by simply modifying an import statement as follows:
# import pandas as pd
import fireducks.pandas as pd
However, even if it’s just a single line, finding and replacing import statements with FireDucks in your programs which use pandas may be annoying. Moreover, if you want to use FireDucks in a third-party library that works with pandas, it’s not practical to modify all import statements in that library.
As mentioned in Get Started, FireDucks has a utility called an import hook.
Please specify the following options for the Python interpreter when you run your_script.py
on the command line.
python3 -m fireducks.imhook your_script.py
With this feature, fireducks.pandas
is imported instead of pandas
when the Python interpreter attempts to import pandas
.
Keep in mind that this does not edit the source code of your_script.py
, but rather dynamically hacks the import process while executing the program.
Example of an import hook
Let’s see it in action with a simple Python script, print_classname.py
, as shown below.
This script outputs the repr string of the DataFrame class.
import pandas as pd
print(pd.DataFrame)
If you run it normally, the output is as follows:
$ python3 print_classname.py
<class 'pandas.core.frame.DataFrame'>
With the import hook, the output becomes different from the previous one, as follows! 🥳
$ python3 -m fireducks.imhook print_classname.py
<class 'fireducks.pandas.frame.DataFrame'>
So, yes, you can use dataframes of FireDucks even though you haven’t edited the source code.
Limitations
No shebang support
Currently, execution by shebang (#!...
) is not supported.
#!/usr/bin/python3
import pandas as pd
print(pd.DataFrame)
You cannot enable an import hook, as you cannot specify the -m
option for the Python interpreter (hmm, it’s of course).
$ chmod +x print_classname_shebang.py
$ ./print_classname_shebang.py
<class 'pandas.core.frame.DataFrame'>
No combination with other executable modules
The import hook feature cannot be used concurrently with other tools invoked by the -m
option, as only one -m
option can be passed to the Python interpreter.
No subprocess support
If you start a new Python process using the subprocess
module, the import hook settings are not inherited by that subprocess.
How to use import hooks in Jupyter Notebook
The import hook feature is also available in Jupyter Notebook. Currently, however, you cannot specify an option when starting Jupyter, and you must activate a hook explicitly in the first cell of your notebook.
import fireducks.importhook
fireducks.importhook.activate_hook("fireducks.pandas", "pandas")
There may not be much benefit to using import hooks if you’re just using pandas in your own notebook. On the other hand, import hooks also work with third-party libraries that use pandas, so it’s useful if you want to utilize such libraries in your notebook.
If you want to disable a hook, please call the following function.
fireducks.importhook.deactivate_hook()
However, if you mix dataframes from the original pandas with ones from FireDucks, you will likely encounter errors (probably with complicated and mysterious error messages). Basically, it is recommended to keep a hook enabled once you enable it.
How to use import hooks with IPython CLI
With IPython, you can enable a hook manually in the same way as in Jupyter Notebook described above. Another option is to start the IPython CLI as follows (this is an example with bash):
python3 -m fireducks.imhook "$(which ipython)"
Well, it’s a bit of an unusual way, but it works!
Wrap-up
FireDucks is still under research and development, so you may face errors and problems if you switch from using pandas to FireDucks. We have been working on improving features of FireDucks every day since the release of the beta version. Your feedback, bug reports, and feature requests are welcome! Please see our contact information for further details.
To sum up, I’ve shown you how to use FireDucks without modifying your existing programs at all. If you want to try FireDucks, please refer to Get Started and User Guide documents. For information on how much faster FireDucks is compared to pandas, please check out our Benchmarks.
May the Acceleration be with you, FireDucks Development Team