Function select_neighbours selects subset of rows from data set. This is useful if data is large and we need just a sample to calculate profiles.

select_neighbours(
  observation,
  data,
  variables = NULL,
  distance = gower::gower_dist,
  n = 20,
  frac = NULL
)

Arguments

observation

single observation

data

set of observations

variables

names of variables that shall be used for calculation of distance. By default these are all variables present in data and observation

distance

the distance function, by default the gower_dist() function.

n

number of neighbors to select

frac

if n is not specified (NULL), then will be calculated as frac * number of rows in data. Either n or frac need to be specified.

Value

a data frame with selected rows

Details

Note that select_neighbours() function is S3 generic. If you want to work on non standard data sources (like H2O ddf, external databases) you should overload it.

Examples

library("ingredients")

new_apartment <- DALEX::apartments[1,]
small_apartments <- select_neighbours(new_apartment, DALEX::apartments_test, n = 10)

new_apartment
#>   m2.price construction.year surface floor no.rooms    district
#> 1     5897              1953      25     3        1 Srodmiescie
small_apartments
#>      m2.price construction.year surface floor no.rooms    district
#> 2285     5875              1970      27     3        1 Srodmiescie
#> 1073     5886              1960      36     2        1 Srodmiescie
#> 3261     5859              1945      39     2        1 Srodmiescie
#> 6647     5952              1938      30     2        1 Srodmiescie
#> 1198     5821              1947      43     2        1 Srodmiescie
#> 4309     5794              1947      31     3        2 Srodmiescie
#> 9527     6080              1947      27     1        1 Srodmiescie
#> 8110     5614              1957      44     4        1 Srodmiescie
#> 9510     5860              1937      39     2        1 Srodmiescie
#> 2408     5912              1989      24     3        1 Srodmiescie