class: center, middle, inverse, title-slide #
Making your own OOP system in
R
### Patrick Li ### Data Science Research Software Study Group (a.k.a. Brown Bag Session) ### 2022-08-09 --- # Abstraction Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions” In `R`, we use **abstraction** to describe data and data analysis procedure --- count: false # Abstraction Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions” In `R`, we use **abstraction** to describe data and data analysis procedure - `memory` to represent complex data structures For example, this is how `R internal` accesses **integers** from an **integer vector** <!-- This C macro is adopted from: r-source/src/include/Defn.h --> ```markdown This C macro is adopted from: r-source/src/include/Defn.h Treat every 4 bytes of a memory block as a signed integer ↓~~~~~↓ #define INTEGER(x) (int *) (((SEXPREC_ALIGN *) x) + 1) ↑~~~~~~~~~~~~~~~~~~~~~~~~~↑ Use pointer arithmetic to skip the header, e.g. vector length x INTEGER(x)[0] INTEGER(x)[1] ↓ ↓ ↓ *--------*---------------*---------------* Memomry Model: | Header | 4 bytes (int) | 4 bytes (int) | ... *--------*---------------*---------------* ``` --- count: false # Abstraction Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions” In `R`, we use **abstraction** to describe data and data analysis procedure - `memory` to represent complex data structures - `functions` to group code that performs a specific task together A untested and inefficient Fibonacci number generator ```r fib <- function(k) ifelse(k > 2, fib(k - 1) + fib(k - 2), k - 1) ``` --- count: false # Abstraction Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions” In `R`, we use **abstraction** to describe data and data analysis procedure - `memory` to represent complex data structures - `functions` to group code that performs a specific task together - `Loops` to perform operations repetitively over a container Bubble sort ```r x <- rnorm(100) for (i in 1:(length(x) - 1)) for (j in 1:(length(x) - i)) if (x[j] > x[j + 1]) { tmp_x <- x[j] x[j] <- x[j + 1] x[j + 1] <- tmp_x } ``` --- # Abstraction Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions” > There are no zero-cost abstraction. Abstractions have **run time**, **build time**, and **human costs**. - Every `R` expression needs to be **parsed**, **compiled** and **interpreted** by the `R` Runtime. Every function call has a run time cost ```r x <- y <- 2 foo <- function(x, y) x * y microbenchmark::microbenchmark(x * y, foo(x, y), times = 10000) ``` ``` ## Unit: nanoseconds ## expr min lq mean median uq max neval cld ## x * y 0 42 53.809 42 43 9834 10000 a ## foo(x, y) 167 250 1809.937 251 292 15172375 10000 a ``` --- count: false # Abstraction Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions” > There are no zero-cost abstraction. Abstractions have **run time**, **build time**, and **human costs**. - Every `R` expression needs to be **parsed**, **compiled** and **interpreted** by the `R` Runtime. - Zero run time and build time costs doesn't necessary mean zero human costs. --- count: false # Abstraction Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions” > There are no zero-cost abstraction. Abstractions have **run time**, **build time**, and **human costs**. - Every `R` expression needs to be **parsed**, **compiled** and **interpreted** by the `R` Runtime. - Zero run time and build time costs doesn't necessary mean zero human costs. > Each abstraction must provide more benefit than cost. --- # Abstraction Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions” In `R`, we use **abstraction** to describe data and data analysis procedure - `memory` to represent complex data structures - `functions` to group code that performs a specific task together - `Loops` to perform operations repetitively over a container **OOP** to group instructions with the state they operate on? --- count: false # Abstraction Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions” In `R`, we use **abstraction** to describe data and data analysis procedure - `memory` to represent complex data structures - `functions` to group code that performs a specific task together - `Loops` to perform operations repetitively over a container **OOP** to group instructions with the state they operate on? Sure, as long as it produces more **human-readable code** with **acceptable overhead**. --- # OOP **OOP** stands for **object-oriented programming**, which is a programming paradigm based on **objects**. Usually, object is defined as a special type of data structure that can hold both **attributes** (data) and **methods** (associated behaviours). `R` has built-in **OOP** systems - `S3` - `S4` There is another popular **OOP** system `R6` developed by Winston Chang. More details can be found in [Advanced R](https://adv-r.hadley.nz/oo.html). We will not focus on comparing them in this talk. --- # `bandicoot` Some programming languages are fundamentally built upon **OOP** such as `Python`. -- `bandicoot` tries to provide a set of tools to build `Python-like` **OOP** system in `R`. ```r devtools::install_github("TengMCing/bandicoot") library(bandicoot) ``` -- - started in last year's Christmas holiday - originally built for defining hierarchical structures of models used in visual inference experiments -- ### `bandicoot` used `environment` to emulate the **OOP** system #### which is similar to `R6`, but built differerntly --- # `environment` Every `R` `function` associates with an `environment`. ```r environment(function(){}) ``` ``` ## <environment: R_GlobalEnv> ``` -- An `environment` can reference itself with proper setup. ```r env <- new.env() *env$self <- env identical(env, env$self) ``` ``` ## [1] TRUE ``` -- A `function` can access its `enclosing environment` in the `function body`. ```r env$foo <- function() self *environment(env$foo) <- env identical(env, env$foo()) ``` ``` ## [1] TRUE ``` --- # `environment` To prevent function from directly accessing attributes other than `self`, an additional `environment` is needed with the `parent environment` to be the same. ```r env <- new.env() env$x <- 1 env$..method_env.. <- new.env(parent = parent.env(env)) env$..method_env..$self <- env lobstr::tree(env) ``` ``` ## <environment: 0x7f98d6735250> ## ├─x: 1 ## └─..method_env..: <environment: 0x7f98d67d6fe8> ## └─self: <environment: 0x7f98d6735250> (Already seen) ``` -- ```r env$foo <- function() x environment(env$foo) <- env$..method_env.. try(env$foo()) ``` ``` ## Error in env$foo() : object 'x' not found ``` --- count: false # `environment` To prevent function from directly accessing attributes other than `self`, an additional `environment` is needed with the `parent environment` to be the same. ```r env <- new.env() env$x <- 1 env$..method_env.. <- new.env(parent = parent.env(env)) env$..method_env..$self <- env lobstr::tree(env) ``` ``` ## <environment: 0x7f98d6735250> ## ├─x: 1 ## └─..method_env..: <environment: 0x7f98d67d6fe8> ## └─self: <environment: 0x7f98d6735250> (Already seen) ``` ```r env$foo2 <- function() self$x environment(env$foo2) <- env$..method_env.. try(env$foo2()) ``` ``` ## [1] 1 ``` --- # `register_method()` All this can be done by `register_method()` ```r env <- new.env() env$x <- 1 register_method(env, foo = function() self$x) lobstr::tree(env) ``` ``` ## <environment: 0x7f98e554f848> ## ├─x: 1 ## ├─foo: function() ## └─..method_env..: <environment: 0x7f98f37b4878> ## └─self: <environment: 0x7f98e554f848> (Already seen) ``` --- # `register_method()` The container and the self pointer name can be customized ```r env <- new.env() foo2 <- function() this$x + 1 register_method(env, foo_two = foo2, foo_three = foo2, container_name = "container", self_name = "this") lobstr::tree(env) ``` ``` ## <environment: 0x7f98d4fc11e8> ## ├─container: <environment: 0x7f98d535ded0> ## │ └─this: <environment: 0x7f98d4fc11e8> (Already seen) ## ├─foo_two: function() ## └─foo_three: function() ``` --- # `BASE` class In `Python`, there is an `object` class, which provides essential attributes and methods for the OOP system (check [data model](https://docs.python.org/3/reference/datamodel.html) for more details). In `bandicoot`, `BASE` class is the default `object` class, but you can write your own if you want advanced features. ```r names(BASE) ``` ``` ## [1] "..mro.." "..str.." "..len.." "..class.." ## [5] "..new.." "..repr.." "del_attr" "has_attr" ## [9] "set_attr" "get_attr" "..type.." "..dir.." ## [13] "..methods.." "..method_env.." "..init.." "..instantiated.." ## [17] "..class_tree.." "instantiate" ``` --- # `new_class()` `new_class()` is used for defining a new class, including new `object` class. ```r new_class(BASE, class_name = "CAT") ``` ``` ## ``` ``` ## ── <CAT class> ``` - `bandicoot`: **static method dispatch**. Method that needs to be called is decided at "build" time. - `Python`: **dynamic method dispatch**. Method that needs to be called is decided at run time. Primary concern is the overhead of **dynamic method lookup** and the difficulty of managing relationships between saved environments. --- # Features of `BASE` String representation of the object ```r CAT <- new_class(BASE, class_name = "CAT") CAT$..str..() ``` ``` ## [1] "<CAT class>" ``` --- count: false # Features of `BASE` Parent class and the object class name ```r CAT$..bases.. ``` ``` ## [1] "BASE" ``` ```r CAT$..type.. ``` ``` ## [1] "CAT" ``` --- count: false # Features of `BASE` Class **constructor** and **initializer** (a **virtual function**) ```r CAT$..new..() ``` ``` ## ``` ``` ## ── <CAT object> ``` ```r CAT$..init.. ``` ``` ## function(...) return(invisible(self)) ## <environment: 0x7f98e03970e8> ``` --- count: false # Features of `BASE` Class instantiation ```r little_cat <- CAT$instantiate() little_cat$..str..() ``` ``` ## [1] "<CAT object>" ``` "Official" String representation of the object ```r little_cat$..repr..() ``` ``` ## [1] "CAT$instantiate()" ``` --- # Workflow This is a simple `STAFF` class defined in `Python` ```python class STAFF(object): def __init__(self, name, age): self.name = name self.age = age def get_email(self): return f"ʕ•́ᴥ•̀ʔっ♡{self.name}@company.com" ``` --- # Workflow Define a class with class description (a class factory) ```r class_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env) } STAFF <- class_STAFF() ``` --- # Workflow #### 1. New an environment `env` if it is not provided ```r *class_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env) } STAFF <- class_STAFF() ``` --- # Workflow #### 2. New a class `STAFF` in the environment `env` ```r class_STAFF <- function(env = new.env(parent = parent.frame())) { * new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env) } STAFF <- class_STAFF() ``` --- # Workflow #### 3. Define the initializer `..init..` to capture `name` and `age` ```r class_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") * init_ <- function(name, age) { * self$name <- name * self$age <- age * } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env) } STAFF <- class_STAFF() ``` --- # Workflow #### 4. Define a method `get_email` to get the email address ```r class_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } * get_email_ <- function() { * glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") * } register_method(env, ..init.. = init_, get_email = get_email_) return(env) } STAFF <- class_STAFF() ``` --- # Workflow #### 5. Register methods ```r class_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } * register_method(env, ..init.. = init_, get_email = get_email_) return(env) } STAFF <- class_STAFF() ``` --- # Workflow #### 6. Return the class/environment ```r class_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) * return(env) } STAFF <- class_STAFF() ``` --- # Workflow #### 7. Build an instance of class `STAFF` ```r Patrick <- STAFF$instantiate(name = "Patrick", age = 18) ``` #### 8. Get name, age, and email of the object ```r Patrick$name ``` ``` ## [1] "Patrick" ``` ```r Patrick$age ``` ``` ## [1] 18 ``` ```r Patrick$get_email() ``` ``` ## ʕ•́ᴥ•̀ʔっ♡Patrick@company.com ``` --- # More realistic examples `$$y_i = x_i + e_i, \quad e_i \sim N(0, 1+x_i^2), \quad i=1,...,n.$$` ```r library(visage) z <- rand_normal(mu = 0, sigma = 1) x <- rand_uniform(-1, 1) e <- closed_form(~sqrt(1 + x^2) * z) y <- closed_form(~x + e) y ``` ``` ## ``` ``` ## ── <CLOSED_FORM object> ## EXPR = x + e ## - x: <RAND_UNIFORM object> ## [a: -1, b: 1] ## - e: <CLOSED_FORM object> ## EXPR = sqrt(1 + x^2) * z ## - x: <RAND_UNIFORM object> ## [a: -1, b: 1] ## - z: <RAND_NORMAL object> ## [mu: 0, sigma: 1] ``` --- # More realistic examples ```r y$gen(5, rhs_val = TRUE) |> y$as_dataframe() ``` ``` ## .lhs x z e ## 1 0.4491225 -0.4944114 0.8458046 0.9435338 ## 2 0.8321905 -0.7577553 1.2672238 1.5899458 ## 3 -2.4221872 -0.8848750 -1.1512932 -1.5373122 ## 4 0.3525353 0.1907469 0.1589231 0.1617884 ## 5 -1.2473103 -0.1662772 -1.0663918 -1.0810331 ``` <!-- --- --> <!-- #### Please check [some of the classes I wrote](https://github.com/TengMCing/visage/tree/master/R) and the [reference page](https://tengmcing.github.io/visage/reference/index.html). --> <!-- #### So far I have written 10 classes for my visual inference research using this system. It works well for me and it is very easy to debug since the design is simple. -->