RInspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”
In R, we use abstraction to describe data and data analysis procedure
Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”
In R, we use abstraction to describe data and data analysis procedure
memory to represent complex data structuresFor example, this is how R internal accesses integers from an integer vector
This C macro is adopted from: r-source/src/include/Defn.h Treat every 4 bytes of a memory block as a signed integer ↓~~~~~↓#define INTEGER(x) (int *) (((SEXPREC_ALIGN *) x) + 1) ↑~~~~~~~~~~~~~~~~~~~~~~~~~↑ Use pointer arithmetic to skip the header, e.g. vector length x INTEGER(x)[0] INTEGER(x)[1] ↓ ↓ ↓ *--------*---------------*---------------*Memomry Model: | Header | 4 bytes (int) | 4 bytes (int) | ... *--------*---------------*---------------*Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”
In R, we use abstraction to describe data and data analysis procedure
memory to represent complex data structuresfunctions to group code that performs a specific task togetherA untested and inefficient Fibonacci number generator
fib <- function(k) ifelse(k > 2, fib(k - 1) + fib(k - 2), k - 1)Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”
In R, we use abstraction to describe data and data analysis procedure
memory to represent complex data structuresfunctions to group code that performs a specific task togetherLoops to perform operations repetitively over a containerBubble sort
x <- rnorm(100)for (i in 1:(length(x) - 1)) for (j in 1:(length(x) - i)) if (x[j] > x[j + 1]) { tmp_x <- x[j] x[j] <- x[j + 1] x[j + 1] <- tmp_x }Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”
There are no zero-cost abstraction. Abstractions have run time, build time, and human costs.
R expression needs to be parsed, compiled and interpreted by the R Runtime. Every function call has a run time cost
x <- y <- 2foo <- function(x, y) x * ymicrobenchmark::microbenchmark(x * y, foo(x, y), times = 10000)
## Unit: nanoseconds## expr min lq mean median uq max neval cld## x * y 0 42 53.809 42 43 9834 10000 a## foo(x, y) 167 250 1809.937 251 292 15172375 10000 aInspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”
There are no zero-cost abstraction. Abstractions have run time, build time, and human costs.
R expression needs to be parsed, compiled and interpreted by the R Runtime. Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”
There are no zero-cost abstraction. Abstractions have run time, build time, and human costs.
R expression needs to be parsed, compiled and interpreted by the R Runtime. Each abstraction must provide more benefit than cost.
Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”
In R, we use abstraction to describe data and data analysis procedure
memory to represent complex data structuresfunctions to group code that performs a specific task togetherLoops to perform operations repetitively over a containerOOP to group instructions with the state they operate on?
Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”
In R, we use abstraction to describe data and data analysis procedure
memory to represent complex data structuresfunctions to group code that performs a specific task togetherLoops to perform operations repetitively over a containerOOP to group instructions with the state they operate on?
Sure, as long as it produces more human-readable code with acceptable overhead.
OOP stands for object-oriented programming, which is a programming paradigm based on objects. Usually, object is defined as a special type of data structure that can hold both attributes (data) and methods (associated behaviours).
R has built-in OOP systems
S3S4There is another popular OOP system R6 developed by Winston Chang.
More details can be found in Advanced R.
We will not focus on comparing them in this talk.
bandicootSome programming languages are fundamentally built upon OOP such as Python.
bandicootSome programming languages are fundamentally built upon OOP such as Python.
bandicoot tries to provide a set of tools to build Python-like OOP system in R.
devtools::install_github("TengMCing/bandicoot")library(bandicoot)bandicootSome programming languages are fundamentally built upon OOP such as Python.
bandicoot tries to provide a set of tools to build Python-like OOP system in R.
devtools::install_github("TengMCing/bandicoot")library(bandicoot)
bandicootSome programming languages are fundamentally built upon OOP such as Python.
bandicoot tries to provide a set of tools to build Python-like OOP system in R.
devtools::install_github("TengMCing/bandicoot")library(bandicoot)
bandicoot used environment to emulate the OOP systemR6, but built differerntlyenvironmentEvery R function associates with an environment.
environment(function(){})
## <environment: R_GlobalEnv>environmentEvery R function associates with an environment.
environment(function(){})
## <environment: R_GlobalEnv>An environment can reference itself with proper setup.
env <- new.env()env$self <- envidentical(env, env$self)
## [1] TRUEenvironmentEvery R function associates with an environment.
environment(function(){})
## <environment: R_GlobalEnv>An environment can reference itself with proper setup.
env <- new.env()env$self <- envidentical(env, env$self)
## [1] TRUEA function can access its enclosing environment in the function body.
env$foo <- function() selfenvironment(env$foo) <- envidentical(env, env$foo())
## [1] TRUEenvironmentTo prevent function from directly accessing attributes other than self, an additional environment is needed with the parent environment to be the same.
env <- new.env()env$x <- 1env$..method_env.. <- new.env(parent = parent.env(env))env$..method_env..$self <- envlobstr::tree(env)
## <environment: 0x7f98d6735250>## ├─x: 1## └─..method_env..: <environment: 0x7f98d67d6fe8>## └─self: <environment: 0x7f98d6735250> (Already seen)environmentTo prevent function from directly accessing attributes other than self, an additional environment is needed with the parent environment to be the same.
env <- new.env()env$x <- 1env$..method_env.. <- new.env(parent = parent.env(env))env$..method_env..$self <- envlobstr::tree(env)
## <environment: 0x7f98d6735250>## ├─x: 1## └─..method_env..: <environment: 0x7f98d67d6fe8>## └─self: <environment: 0x7f98d6735250> (Already seen)env$foo <- function() xenvironment(env$foo) <- env$..method_env..try(env$foo())
## Error in env$foo() : object 'x' not foundenvironmentTo prevent function from directly accessing attributes other than self, an additional environment is needed with the parent environment to be the same.
env <- new.env()env$x <- 1env$..method_env.. <- new.env(parent = parent.env(env))env$..method_env..$self <- envlobstr::tree(env)
## <environment: 0x7f98d6735250>## ├─x: 1## └─..method_env..: <environment: 0x7f98d67d6fe8>## └─self: <environment: 0x7f98d6735250> (Already seen)env$foo2 <- function() self$xenvironment(env$foo2) <- env$..method_env..try(env$foo2())
## [1] 1register_method()All this can be done by register_method()
env <- new.env()env$x <- 1register_method(env, foo = function() self$x)lobstr::tree(env)
## <environment: 0x7f98e554f848>## ├─x: 1## ├─foo: function()## └─..method_env..: <environment: 0x7f98f37b4878>## └─self: <environment: 0x7f98e554f848> (Already seen)register_method()The container and the self pointer name can be customized
env <- new.env()foo2 <- function() this$x + 1register_method(env, foo_two = foo2, foo_three = foo2, container_name = "container", self_name = "this")lobstr::tree(env)
## <environment: 0x7f98d4fc11e8>## ├─container: <environment: 0x7f98d535ded0>## │ └─this: <environment: 0x7f98d4fc11e8> (Already seen)## ├─foo_two: function()## └─foo_three: function()BASE classIn Python, there is an object class, which provides essential attributes and methods for the OOP system (check data model for more details).
In bandicoot, BASE class is the default object class, but you can write your own if you want advanced features.
names(BASE)
## [1] "..mro.." "..str.." "..len.." "..class.." ## [5] "..new.." "..repr.." "del_attr" "has_attr" ## [9] "set_attr" "get_attr" "..type.." "..dir.." ## [13] "..methods.." "..method_env.." "..init.." "..instantiated.."## [17] "..class_tree.." "instantiate"new_class()new_class() is used for defining a new class, including new object class.
new_class(BASE, class_name = "CAT")
#### ── <CAT class>bandicoot: static method dispatch. Method that needs to be called is decided at "build" time.Python: dynamic method dispatch. Method that needs to be called is decided at run time.Primary concern is the overhead of dynamic method lookup and the difficulty of managing relationships between saved environments.
BASEString representation of the object
CAT <- new_class(BASE, class_name = "CAT")CAT$..str..()
## [1] "<CAT class>"BASEParent class and the object class name
CAT$..bases..
## [1] "BASE"CAT$..type..
## [1] "CAT"BASEClass constructor and initializer (a virtual function)
CAT$..new..()
#### ── <CAT object>CAT$..init..
## function(...) return(invisible(self))## <environment: 0x7f98e03970e8>BASEClass instantiation
little_cat <- CAT$instantiate()little_cat$..str..()
## [1] "<CAT object>""Official" String representation of the object
little_cat$..repr..()
## [1] "CAT$instantiate()"This is a simple STAFF class defined in Python
class STAFF(object): def __init__(self, name, age): self.name = name self.age = age def get_email(self): return f"ʕ•́ᴥ•̀ʔっ♡{self.name}@company.com"Define a class with class description (a class factory)
class_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env)}STAFF <- class_STAFF()env if it is not providedclass_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env)}STAFF <- class_STAFF()STAFF in the environment envclass_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env)}STAFF <- class_STAFF()..init.. to capture name and ageclass_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env)}STAFF <- class_STAFF()get_email to get the email addressclass_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env)}STAFF <- class_STAFF()class_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env)}STAFF <- class_STAFF()class_STAFF <- function(env = new.env(parent = parent.frame())) { new_class(BASE, env = env, class_name = "STAFF") init_ <- function(name, age) { self$name <- name self$age <- age } get_email_ <- function() { glue::glue("ʕ•́ᴥ•̀ʔっ♡{self$name}@company.com") } register_method(env, ..init.. = init_, get_email = get_email_) return(env)}STAFF <- class_STAFF()STAFFPatrick <- STAFF$instantiate(name = "Patrick", age = 18)
Patrick$name
## [1] "Patrick"Patrick$age
## [1] 18Patrick$get_email()
## ʕ•́ᴥ•̀ʔっ♡Patrick@company.comyi=xi+ei,ei∼N(0,1+x2i),i=1,...,n.
library(visage)z <- rand_normal(mu = 0, sigma = 1)x <- rand_uniform(-1, 1)e <- closed_form(~sqrt(1 + x^2) * z)y <- closed_form(~x + e)y
#### ── <CLOSED_FORM object>## EXPR = x + e## - x: <RAND_UNIFORM object>## [a: -1, b: 1]## - e: <CLOSED_FORM object>## EXPR = sqrt(1 + x^2) * z## - x: <RAND_UNIFORM object>## [a: -1, b: 1]## - z: <RAND_NORMAL object>## [mu: 0, sigma: 1]y$gen(5, rhs_val = TRUE) |> y$as_dataframe()
## .lhs x z e## 1 0.4491225 -0.4944114 0.8458046 0.9435338## 2 0.8321905 -0.7577553 1.2672238 1.5899458## 3 -2.4221872 -0.8848750 -1.1512932 -1.5373122## 4 0.3525353 0.1907469 0.1589231 0.1617884## 5 -1.2473103 -0.1662772 -1.0663918 -1.0810331
Inspired by the CppCon 2019 talk - Chandler Carruth “There Are No Zero-cost Abstractions”
In R, we use abstraction to describe data and data analysis procedure
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |