scrubwren
stands for Specialized Collection of Reticulate Utilities: Better WRappers for a smoother ExperieNce. Its logo features the White-browed Scrubwren, a common bird in eastern and southeastern Australia.
This package provides a set of R wrappers and helper functions for reticulate
, designed to reduce the friction users often encounter when using Python deep learning libraries from R. While reticulate
is powerful, some aspects can feel unintuitive or cumbersome, particularly for first-time users. At times, it even seems easier to write a Python function and run it via system()
, rather than wrestling with reticulate
, especially when it feels like it is typing far more in R than it would in Python to achieve the same result.
If you have an idea for a Python feature that could be wrapped more cleanly for use in R, feel free to open a GitHub issue. Suggestions with a potential implementation approach are welcome, but even ideas without a concrete proposal are appreciated.
Installation
You can install the development version of scrubwren from GitHub with:
# install.packages("remotes")
remotes::install_github("TengMCing/scrubwren")
1. Explictly initialized Python session
In reticulate
only one Python session can be bound to the current R session, and it is done silently when one call reticulate
’s most functions. py_init()
makes this process explicit. The interpreter path can be supplied by the python_path
argument, otherwise it will discover the Python using reticulate::py_discover_config()
py_init()
#> ℹ Initialized Python 3.11 from '/Users/patrickli/.virtualenvs/tf/bin/python'.
2. Asscesible Python built-in functions
In reticulate
, if you would like to directly use Python built-in functions like type()
, you need to first import as with reticulate::import_builtins()
into an R variable, and then call it. According to the reticulate
documentation, this is primarily because the difference in the set of built-in functions in Python 2 and 3. In scrubwren
, the built-in functions will be loaded automatically into py_builtins
and ready to be used. It happens when you load the package, so we don’t have the name conflicts problem.
When importing the built-in functions, a message will be issued telling you where those built-in functions are imported from. However, if you set reticulate
to use a different Python interpreter, then you need to re-import the builtin
names(py_builtins)[1:5]
#> [1] "abs" "aiter" "all" "anext" "any"
1. Define Python class with py_class()
In reticulate
, Python class definitions must be provided as a list via the defs
argument in PyClass
. With py_class()
, you can instead supply them as regular function arguments. You can still specify classname
as a character string and inherit
as a list of Python objects, as usual.
However, PyClass
does not allow you to disable automatic conversion of Python objects to R, which means that any method defined in the class will return an R object whenever possible. This can be frustrating for data analysis tasks that rely on object indexing, since R starts at 1 while Python starts at 0. The py_class
function lets you control this behavior through the convert
argument.
Employee <- py_class("Employee", convert = FALSE,
`__init__` = function(self, name, id) {
self$name <- name
self$id <- id
return(py_builtins$None)
},
get_email = function(self) {
paste0(self$name, "_", self$id, "@company.com")
})
Mike <- Employee("Mike", "1234")
Mike$get_email()
#> 'Mike_1234@company.com'
Mike$get_email() |> class()
#> [1] "python.builtin.str" "python.builtin.object"
2. Turning automatic conversion on/off with py_convert_on()
/ py_convert_off()
py_convert_on()
enables automatic conversion of Python objects to R, while py_convert_off()
disables it. Note that the Python object must also be represented as an R environment for this to work, this typically includes Python modules or object instances, but not Python classes.
Mike$get_email() |> class()
#> [1] "python.builtin.str" "python.builtin.object"
py_convert_on(Mike)
Mike$get_email() |> class()
#> [1] "character"
py_convert_off(Mike)
Mike$get_email() |> class()
#> [1] "python.builtin.str" "python.builtin.object"
3. Call the superclass initializer with py_super()$`__init__()`
or py_super_init()
In reticulate
, there is no formal documentation on how to call a superclass initializer when defining __init__
via PyClass
. Inspecting its source reveals that PyClass
injects a super()
function into the environment of each class method. This allows you to call the superclass initializer with super()$`__init__`()
. The scrubwren
package makes this explicit by re-exporting super()
as py_super()
and providing a convenient wrapper py_super_init()
for py_super()$`__init__()`
.
Salary <- py_class("Salary", inherit = Employee, convert = FALSE,
`__init__` = function(self, name, id, salary) {
py_super_init(name, id)
self$salary <- salary
return(py_builtins$None)
},
get_salary_summary = function(self) {
list(ID = self$id,
Name = self$name,
Email = self$get_email(),
Salary = self$salary)
})
Mike_salary <- Salary("Mike", "1234", 1000)
Mike_salary$get_salary_summary()
#> {'ID': '1234', 'Name': 'Mike', 'Email': 'Mike_1234@company.com', 'Salary': 1000.0}
Mike_salary$get_email()
#> 'Mike_1234@company.com'