Function returns weights for model training. The purpose of this weights is to mitigate bias in statistical parity. In fact this could potentially worsen the overall performance in other fairness metrics. This affects also model's performance metrics (accuracy).
reweight(protected, y)
protected | factor, protected variables with subgroups as levels (sensitive attributes) |
---|---|
y | numeric, vector with classes 0 and 1, where 1 means favorable class. |
numeric, vector of weights
Method produces weights for each subgroup for each class. Firstly assumes that protected variable and class are independent and calculates expected probability of this certain event (that subgroup == a and class = c). Than it calculates the actual probability of this event based on empirical data. Finally the weight is quotient of those probabilities
This method was implemented based on Kamiran, Calders 2011 https://link.springer.com/content/pdf/10.1007/s10115-011-0463-8.pdf
data("german")
data <- german
data$Age <- as.factor(ifelse(data$Age <= 25, "young", "old"))
data$Risk <- as.numeric(data$Risk) - 1
# training 2 models
weights <- reweight(protected = data$Age, y = data$Risk)
gbm_model <- gbm::gbm(Risk ~ ., data = data)
#> Distribution not specified, assuming bernoulli ...
gbm_model_weighted <- gbm::gbm(Risk ~ ., data = data, weights = weights)
#> Distribution not specified, assuming bernoulli ...
gbm_explainer <- DALEX::explain(gbm_model, data = data[, -1], y = data$Risk)
#> Preparation of a new explainer is initiated
#> -> model label : gbm ( default )
#> -> data : 1000 rows 9 cols
#> -> target variable : 1000 values
#> -> predict function : yhat.gbm will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package gbm , ver. 2.1.8 , task classification ( default )
#> -> predicted values : numerical, min = 0.1249317 , mean = 0.6971694 , max = 0.97531
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -0.9486537 , mean = 0.002830555 , max = 0.7990927
#> A new explainer has been created!
gbm_weighted_explainer <- DALEX::explain(gbm_model_weighted, data = data[, -1], y = data$Risk)
#> Preparation of a new explainer is initiated
#> -> model label : gbm ( default )
#> -> data : 1000 rows 9 cols
#> -> target variable : 1000 values
#> -> predict function : yhat.gbm will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package gbm , ver. 2.1.8 , task classification ( default )
#> -> predicted values : numerical, min = 0.13108 , mean = 0.6993996 , max = 0.9760045
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -0.9436872 , mean = 0.0006003612 , max = 0.8184989
#> A new explainer has been created!
fobject <- fairness_check(gbm_explainer, gbm_weighted_explainer,
protected = data$Age,
privileged = "old",
label = c("original", "weighted")
)
#> Creating fairness classification object
#> -> Privileged subgroup : character ( Ok )
#> -> Protected variable : factor ( Ok )
#> -> Cutoff values for explainers : 0.5 ( for all subgroups )
#> -> Fairness objects : 0 objects
#> -> Checking explainers : 2 in total ( compatible )
#> -> Metric calculation : 13/13 metrics calculated for all models
#> Fairness object created succesfully
# fairness check
fobject
#>
#> Fairness check for models: original, weighted
#>
#> original passes 3/5 metrics
#> Total loss : 1.004565
#>
#> weighted passes 5/5 metrics
#> Total loss : 0.519821
#>
plot(fobject)
# radar
plot(fairness_radar(fobject))