Some large datasets are pushing memory and some functions I’m writing to the limit. I wanted to ask some questions about subsetting, of matrices and arrays in particular:

  1. Does defining a variable as a subset of another lead to copy? For instance
x <- matrix(rnorm(20*30), nrow=20, ncol=30)
y <- x[, 1:10]

Some exploration with object_size from pryr seems to indicate that a copy is made when y is created, but I’d like to be sure.

  1. If I enter a subset of a matrix/array as argument to a function, does it get copied before the function is started? For instance in
x <- matrix(rnorm(20*30), nrow=20, ncol=30)
y <- dnorm(0, mean=x[,1:10], sd=1)

I wonder if the data in x[,1:10] are copied and then given as input to dnorm.

I’ve heard that data.table allows one to work with subsets without copies being made (unless necessary), but it seems that one is constrained to two dimensions only – no arrays – that way.

Cheers!

  • stravanasuOP
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    Thank you for the suggestion! Worth looking at parquet and arrow indeed.