+ - 0:00:00
Notes for current slide
Notes for next slide

Making your own OOP system in R

Patrick Li

Data Science Research Software Study Group (a.k.a. Brown Bag Session)

2022-08-09

1 / 24

Abstraction

Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”

In R, we use abstraction to describe data and data analysis procedure

2 / 24

Abstraction

Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”

In R, we use abstraction to describe data and data analysis procedure

  • memory to represent complex data structures

For example, this is how R internal accesses integers from an integer vector

This C macro is adopted from: r-source/src/include/Defn.h
Treat every 4 bytes of a memory block as a signed integer
↓~~~~~↓
#define INTEGER(x) (int *) (((SEXPREC_ALIGN *) x) + 1)
↑~~~~~~~~~~~~~~~~~~~~~~~~~↑
Use pointer arithmetic to skip the header, e.g. vector length
x INTEGER(x)[0] INTEGER(x)[1]
↓ ↓ ↓
*--------*---------------*---------------*
Memomry Model: | Header | 4 bytes (int) | 4 bytes (int) | ...
*--------*---------------*---------------*
2 / 24

Abstraction

Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”

In R, we use abstraction to describe data and data analysis procedure

  • memory to represent complex data structures
  • functions to group code that performs a specific task together

A untested and inefficient Fibonacci number generator

fib <- function(k) ifelse(k > 2, fib(k - 1) + fib(k - 2), k - 1)
2 / 24

Abstraction

Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”

In R, we use abstraction to describe data and data analysis procedure

  • memory to represent complex data structures
  • functions to group code that performs a specific task together
  • Loops to perform operations repetitively over a container

Bubble sort

x <- rnorm(100)
for (i in 1:(length(x) - 1))
for (j in 1:(length(x) - i))
if (x[j] > x[j + 1])
{
tmp_x <- x[j]
x[j] <- x[j + 1]
x[j + 1] <- tmp_x
}
2 / 24

Abstraction

Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”

There are no zero-cost abstraction. Abstractions have run time, build time, and human costs.

  • Every R expression needs to be parsed, compiled and interpreted by the R Runtime.

Every function call has a run time cost

x <- y <- 2
foo <- function(x, y) x * y
microbenchmark::microbenchmark(x * y, foo(x, y), times = 10000)
## Unit: nanoseconds
## expr min lq mean median uq max neval cld
## x * y 0 42 53.809 42 43 9834 10000 a
## foo(x, y) 167 250 1809.937 251 292 15172375 10000 a
3 / 24

Abstraction

Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”

There are no zero-cost abstraction. Abstractions have run time, build time, and human costs.

  • Every R expression needs to be parsed, compiled and interpreted by the R Runtime.
  • Zero run time and build time costs doesn't necessary mean zero human costs.
3 / 24

Abstraction

Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”

There are no zero-cost abstraction. Abstractions have run time, build time, and human costs.

  • Every R expression needs to be parsed, compiled and interpreted by the R Runtime.
  • Zero run time and build time costs doesn't necessary mean zero human costs.

Each abstraction must provide more benefit than cost.

3 / 24

Abstraction

Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”

In R, we use abstraction to describe data and data analysis procedure

  • memory to represent complex data structures
  • functions to group code that performs a specific task together
  • Loops to perform operations repetitively over a container

OOP to group instructions with the state they operate on?

4 / 24

Abstraction

Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”

In R, we use abstraction to describe data and data analysis procedure

  • memory to represent complex data structures
  • functions to group code that performs a specific task together
  • Loops to perform operations repetitively over a container

OOP to group instructions with the state they operate on?

Sure, as long as it produces more human-readable code with acceptable overhead.

4 / 24

OOP

OOP stands for object-oriented programming, which is a programming paradigm based on objects. Usually, object is defined as a special type of data structure that can hold both attributes (data) and methods (associated behaviours).

R has built-in OOP systems

  • S3
  • S4

There is another popular OOP system R6 developed by Winston Chang.

More details can be found in Advanced R.

We will not focus on comparing them in this talk.

5 / 24

bandicoot

Some programming languages are fundamentally built upon OOP such as Python.

6 / 24

bandicoot

Some programming languages are fundamentally built upon OOP such as Python.

bandicoot tries to provide a set of tools to build Python-like OOP system in R.

devtools::install_github("TengMCing/bandicoot")
library(bandicoot)
6 / 24

bandicoot

Some programming languages are fundamentally built upon OOP such as Python.

bandicoot tries to provide a set of tools to build Python-like OOP system in R.

devtools::install_github("TengMCing/bandicoot")
library(bandicoot)
  • started in last year's Christmas holiday
  • originally built for defining hierarchical structures of models used in visual inference experiments
6 / 24

bandicoot

Some programming languages are fundamentally built upon OOP such as Python.

bandicoot tries to provide a set of tools to build Python-like OOP system in R.

devtools::install_github("TengMCing/bandicoot")
library(bandicoot)
  • started in last year's Christmas holiday
  • originally built for defining hierarchical structures of models used in visual inference experiments

bandicoot used environment to emulate the OOP system

which is similar to R6, but built differerntly

6 / 24

environment

Every R function associates with an environment.

environment(function(){})
## <environment: R_GlobalEnv>
7 / 24

environment

Every R function associates with an environment.

environment(function(){})
## <environment: R_GlobalEnv>

An environment can reference itself with proper setup.

env <- new.env()
env$self <- env
identical(env, env$self)
## [1] TRUE
7 / 24

environment

Every R function associates with an environment.

environment(function(){})
## <environment: R_GlobalEnv>

An environment can reference itself with proper setup.

env <- new.env()
env$self <- env
identical(env, env$self)
## [1] TRUE

A function can access its enclosing environment in the function body.

env$foo <- function() self
environment(env$foo) <- env
identical(env, env$foo())
## [1] TRUE
7 / 24

environment

To prevent function from directly accessing attributes other than self, an additional environment is needed with the parent environment to be the same.

env <- new.env()
env$x <- 1
env$..method_env.. <- new.env(parent = parent.env(env))
env$..method_env..$self <- env
lobstr::tree(env)
## <environment: 0x7f98d6735250>
## ├─x: 1
## └─..method_env..: <environment: 0x7f98d67d6fe8>
## └─self: <environment: 0x7f98d6735250> (Already seen)
8 / 24

environment

To prevent function from directly accessing attributes other than self, an additional environment is needed with the parent environment to be the same.

env <- new.env()
env$x <- 1
env$..method_env.. <- new.env(parent = parent.env(env))
env$..method_env..$self <- env
lobstr::tree(env)
## <environment: 0x7f98d6735250>
## ├─x: 1
## └─..method_env..: <environment: 0x7f98d67d6fe8>
## └─self: <environment: 0x7f98d6735250> (Already seen)
env$foo <- function() x
environment(env$foo) <- env$..method_env..
try(env$foo())
## Error in env$foo() : object 'x' not found
8 / 24

environment

To prevent function from directly accessing attributes other than self, an additional environment is needed with the parent environment to be the same.

env <- new.env()
env$x <- 1
env$..method_env.. <- new.env(parent = parent.env(env))
env$..method_env..$self <- env
lobstr::tree(env)
## <environment: 0x7f98d6735250>
## ├─x: 1
## └─..method_env..: <environment: 0x7f98d67d6fe8>
## └─self: <environment: 0x7f98d6735250> (Already seen)
env$foo2 <- function() self$x
environment(env$foo2) <- env$..method_env..
try(env$foo2())
## [1] 1
8 / 24

register_method()

All this can be done by register_method()

env <- new.env()
env$x <- 1
register_method(env, foo = function() self$x)
lobstr::tree(env)
## <environment: 0x7f98e554f848>
## ├─x: 1
## ├─foo: function()
## └─..method_env..: <environment: 0x7f98f37b4878>
## └─self: <environment: 0x7f98e554f848> (Already seen)
9 / 24

register_method()

The container and the self pointer name can be customized

env <- new.env()
foo2 <- function() this$x + 1
register_method(env, foo_two = foo2, foo_three = foo2,
container_name = "container", self_name = "this")
lobstr::tree(env)
## <environment: 0x7f98d4fc11e8>
## ├─container: <environment: 0x7f98d535ded0>
## │ └─this: <environment: 0x7f98d4fc11e8> (Already seen)
## ├─foo_two: function()
## └─foo_three: function()
10 / 24

BASE class

In Python, there is an object class, which provides essential attributes and methods for the OOP system (check data model for more details).

In bandicoot, BASE class is the default object class, but you can write your own if you want advanced features.

names(BASE)
## [1] "..mro.." "..str.." "..len.." "..class.."
## [5] "..new.." "..repr.." "del_attr" "has_attr"
## [9] "set_attr" "get_attr" "..type.." "..dir.."
## [13] "..methods.." "..method_env.." "..init.." "..instantiated.."
## [17] "..class_tree.." "instantiate"
11 / 24

new_class()

new_class() is used for defining a new class, including new object class.

new_class(BASE, class_name = "CAT")
##
## ── <CAT class>
  • bandicoot: static method dispatch. Method that needs to be called is decided at "build" time.
  • Python: dynamic method dispatch. Method that needs to be called is decided at run time.

Primary concern is the overhead of dynamic method lookup and the difficulty of managing relationships between saved environments.

12 / 24

Features of BASE

String representation of the object

CAT <- new_class(BASE, class_name = "CAT")
CAT$..str..()
## [1] "<CAT class>"
13 / 24

Features of BASE

Parent class and the object class name

CAT$..bases..
## [1] "BASE"
CAT$..type..
## [1] "CAT"
13 / 24

Features of BASE

Class constructor and initializer (a virtual function)

CAT$..new..()
##
## ── <CAT object>
CAT$..init..
## function(...) return(invisible(self))
## <environment: 0x7f98e03970e8>
13 / 24

Features of BASE

Class instantiation

little_cat <- CAT$instantiate()
little_cat$..str..()
## [1] "<CAT object>"

"Official" String representation of the object

little_cat$..repr..()
## [1] "CAT$instantiate()"
13 / 24

Workflow

This is a simple STAFF class defined in Python

class STAFF(object):
def __init__(self, name, age):
self.name = name
self.age = age
def get_email(self):
return f"ʕ•́ᴥ•̀ʔっ♡{self.name}@company.com"
14 / 24

Workflow

Define a class with class description (a class factory)

class_STAFF <- function(env = new.env(parent = parent.frame())) {
new_class(BASE, env = env, class_name = "STAFF")
init_ <- function(name, age) {
self$name <- name
self$age <- age
}
get_email_ <- function() {
glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com")
}
register_method(env, ..init.. = init_, get_email = get_email_)
return(env)
}
STAFF <- class_STAFF()
15 / 24

Workflow

1. New an environment env if it is not provided

class_STAFF <- function(env = new.env(parent = parent.frame())) {
new_class(BASE, env = env, class_name = "STAFF")
init_ <- function(name, age) {
self$name <- name
self$age <- age
}
get_email_ <- function() {
glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com")
}
register_method(env, ..init.. = init_, get_email = get_email_)
return(env)
}
STAFF <- class_STAFF()
16 / 24

Workflow

2. New a class STAFF in the environment env

class_STAFF <- function(env = new.env(parent = parent.frame())) {
new_class(BASE, env = env, class_name = "STAFF")
init_ <- function(name, age) {
self$name <- name
self$age <- age
}
get_email_ <- function() {
glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com")
}
register_method(env, ..init.. = init_, get_email = get_email_)
return(env)
}
STAFF <- class_STAFF()
17 / 24

Workflow

3. Define the initializer ..init.. to capture name and age

class_STAFF <- function(env = new.env(parent = parent.frame())) {
new_class(BASE, env = env, class_name = "STAFF")
init_ <- function(name, age) {
self$name <- name
self$age <- age
}
get_email_ <- function() {
glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com")
}
register_method(env, ..init.. = init_, get_email = get_email_)
return(env)
}
STAFF <- class_STAFF()
18 / 24

Workflow

4. Define a method get_email to get the email address

class_STAFF <- function(env = new.env(parent = parent.frame())) {
new_class(BASE, env = env, class_name = "STAFF")
init_ <- function(name, age) {
self$name <- name
self$age <- age
}
get_email_ <- function() {
glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com")
}
register_method(env, ..init.. = init_, get_email = get_email_)
return(env)
}
STAFF <- class_STAFF()
19 / 24

Workflow

5. Register methods

class_STAFF <- function(env = new.env(parent = parent.frame())) {
new_class(BASE, env = env, class_name = "STAFF")
init_ <- function(name, age) {
self$name <- name
self$age <- age
}
get_email_ <- function() {
glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com")
}
register_method(env, ..init.. = init_, get_email = get_email_)
return(env)
}
STAFF <- class_STAFF()
20 / 24

Workflow

6. Return the class/environment

class_STAFF <- function(env = new.env(parent = parent.frame())) {
new_class(BASE, env = env, class_name = "STAFF")
init_ <- function(name, age) {
self$name <- name
self$age <- age
}
get_email_ <- function() {
glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com")
}
register_method(env, ..init.. = init_, get_email = get_email_)
return(env)
}
STAFF <- class_STAFF()
21 / 24

Workflow

7. Build an instance of class STAFF

Patrick <- STAFF$instantiate(name = "Patrick", age = 18)

8. Get name, age, and email of the object

Patrick$name
## [1] "Patrick"
Patrick$age
## [1] 18
Patrick$get_email()
## ʕ•́ᴥ•̀ʔっ♡Patrick@company.com
22 / 24

More realistic examples

yi=xi+ei,eiN(0,1+xi2),i=1,...,n.

library(visage)
z <- rand_normal(mu = 0, sigma = 1)
x <- rand_uniform(-1, 1)
e <- closed_form(~sqrt(1 + x^2) * z)
y <- closed_form(~x + e)
y
##
## ── <CLOSED_FORM object>
## EXPR = x + e
## - x: <RAND_UNIFORM object>
## [a: -1, b: 1]
## - e: <CLOSED_FORM object>
## EXPR = sqrt(1 + x^2) * z
## - x: <RAND_UNIFORM object>
## [a: -1, b: 1]
## - z: <RAND_NORMAL object>
## [mu: 0, sigma: 1]
23 / 24

More realistic examples

y$gen(5, rhs_val = TRUE) |> y$as_dataframe()
## .lhs x z e
## 1 0.4491225 -0.4944114 0.8458046 0.9435338
## 2 0.8321905 -0.7577553 1.2672238 1.5899458
## 3 -2.4221872 -0.8848750 -1.1512932 -1.5373122
## 4 0.3525353 0.1907469 0.1589231 0.1617884
## 5 -1.2473103 -0.1662772 -1.0663918 -1.0810331
24 / 24

Abstraction

Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”

In R, we use abstraction to describe data and data analysis procedure

2 / 24
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow