The deleting process starts with the extraction of correlated pairs of columns. Later, they are sorted decreasingly by correlation value and we count the number of occurrences to sort column names later by the amount of them. In the end we remove columns iteratively from the most occurring one, so we will remove the bare minimum of columns in the end.

delete_correlated_values(data, y, verbose = TRUE)

Arguments

data

A data source before preprocessing, that is one of the major R formats: data.table, data.frame, matrix, and so on.

y

A string that indicates a target column name.

verbose

A logical value, if set to TRUE, provides all information about the process, if FALSE gives none.

Value

A list with 2 items: data set with removed correlated columns and names of removed columns.