R/select_neighbours.R
select_neighbours.Rd
Function select_neighbours
selects subset of rows from data set.
This is useful if data is large and we need just a sample to calculate profiles.
select_neighbours(
observation,
data,
variables = NULL,
distance = gower::gower_dist,
n = 20,
frac = NULL
)
single observation
set of observations
names of variables that shall be used for calculation of distance.
By default these are all variables present in data
and observation
the distance function, by default the gower_dist()
function.
number of neighbors to select
if n
is not specified (NULL), then will be calculated as frac
* number of rows in data
.
Either n
or frac
need to be specified.
a data frame with selected rows
Note that select_neighbours()
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
library("ingredients")
new_apartment <- DALEX::apartments[1,]
small_apartments <- select_neighbours(new_apartment, DALEX::apartments_test, n = 10)
new_apartment
#> m2.price construction.year surface floor no.rooms district
#> 1 5897 1953 25 3 1 Srodmiescie
small_apartments
#> m2.price construction.year surface floor no.rooms district
#> 2285 5875 1970 27 3 1 Srodmiescie
#> 1073 5886 1960 36 2 1 Srodmiescie
#> 3261 5859 1945 39 2 1 Srodmiescie
#> 6647 5952 1938 30 2 1 Srodmiescie
#> 1198 5821 1947 43 2 1 Srodmiescie
#> 4309 5794 1947 31 3 2 Srodmiescie
#> 9527 6080 1947 27 1 1 Srodmiescie
#> 8110 5614 1957 44 4 1 Srodmiescie
#> 9510 5860 1937 39 2 1 Srodmiescie
#> 2408 5912 1989 24 3 1 Srodmiescie