Skip to contents

scrubwren stands for Specialized Collection of Reticulate Utilities: Better WRappers for a smoother ExperieNce. Its logo features the White-browed Scrubwren, a common bird in eastern and southeastern Australia.

This package provides a set of R wrappers and helper functions for reticulate, designed to reduce the friction users often encounter when using Python deep learning libraries from R. While reticulate is powerful, some aspects can feel unintuitive or cumbersome, particularly for first-time users. At times, it even seems easier to write a Python function and run it via system(), rather than wrestling with reticulate, especially when it feels like it is typing far more in R than it would in Python to achieve the same result.

If you have an idea for a Python feature that could be wrapped more cleanly for use in R, feel free to open a GitHub issue. Suggestions with a potential implementation approach are welcome, but even ideas without a concrete proposal are appreciated.

Installation

You can install the development version of scrubwren from GitHub with:

# install.packages("remotes")
remotes::install_github("TengMCing/scrubwren")

1. Explictly initialized Python session

In reticulate only one Python session can be bound to the current R session, and it is done silently when one call reticulate’s most functions. py_init() makes this process explicit. The interpreter path can be supplied by the python_path argument, otherwise it will discover the Python using reticulate::py_discover_config()

py_init()
#> ℹ Initialized Python 3.11 from '/Users/patrickli/.virtualenvs/tf/bin/python'.

2. Asscesible Python built-in functions

In reticulate, if you would like to directly use Python built-in functions like type(), you need to first import as with reticulate::import_builtins() into an R variable, and then call it. According to the reticulate documentation, this is primarily because the difference in the set of built-in functions in Python 2 and 3. In scrubwren, the built-in functions will be loaded automatically into py_builtins and ready to be used. It happens when you load the package, so we don’t have the name conflicts problem.

When importing the built-in functions, a message will be issued telling you where those built-in functions are imported from. However, if you set reticulate to use a different Python interpreter, then you need to re-import the builtin

names(py_builtins)[1:5]
#> [1] "abs"   "aiter" "all"   "anext" "any"

1. Define Python class with py_class()

In reticulate, Python class definitions must be provided as a list via the defs argument in PyClass. With py_class(), you can instead supply them as regular function arguments. You can still specify classname as a character string and inherit as a list of Python objects, as usual.

However, PyClass does not allow you to disable automatic conversion of Python objects to R, which means that any method defined in the class will return an R object whenever possible. This can be frustrating for data analysis tasks that rely on object indexing, since R starts at 1 while Python starts at 0. The py_class function lets you control this behavior through the convert argument.

Employee <- py_class("Employee", convert = FALSE,
                     `__init__` = function(self, name, id) {
                       self$name <- name
                       self$id <- id
                       return(py_builtins$None)
                     },
                     get_email = function(self) {
                       paste0(self$name, "_", self$id, "@company.com")
                     })
Mike <- Employee("Mike", "1234")
Mike$get_email()
#> 'Mike_1234@company.com'
Mike$get_email() |> class()
#> [1] "python.builtin.str"    "python.builtin.object"

2. Turning automatic conversion on/off with py_convert_on() / py_convert_off()

py_convert_on() enables automatic conversion of Python objects to R, while py_convert_off() disables it. Note that the Python object must also be represented as an R environment for this to work, this typically includes Python modules or object instances, but not Python classes.

Mike$get_email() |> class()
#> [1] "python.builtin.str"    "python.builtin.object"

py_convert_on(Mike)
Mike$get_email() |> class()
#> [1] "character"

py_convert_off(Mike)
Mike$get_email() |> class()
#> [1] "python.builtin.str"    "python.builtin.object"

3. Call the superclass initializer with py_super()$`__init__()` or py_super_init()

In reticulate, there is no formal documentation on how to call a superclass initializer when defining __init__ via PyClass. Inspecting its source reveals that PyClass injects a super() function into the environment of each class method. This allows you to call the superclass initializer with super()$`__init__`(). The scrubwren package makes this explicit by re-exporting super() as py_super() and providing a convenient wrapper py_super_init() for py_super()$`__init__()`.

Salary <- py_class("Salary", inherit = Employee, convert = FALSE,
                   `__init__` = function(self, name, id, salary) {
                     py_super_init(name, id)
                     self$salary <- salary
                     return(py_builtins$None)
                   },
                   get_salary_summary = function(self) {
                     list(ID = self$id,
                          Name = self$name,
                          Email = self$get_email(),
                          Salary = self$salary)
                   })

Mike_salary <- Salary("Mike", "1234", 1000)
Mike_salary$get_salary_summary()
#> {'ID': '1234', 'Name': 'Mike', 'Email': 'Mike_1234@company.com', 'Salary': 1000.0}
Mike_salary$get_email()
#> 'Mike_1234@company.com'