The fifa dataset is a preprocessed players_20.csv dataset which comes as a part of "FIFA 20 complete player dataset" at Kaggle.

data(fifa)

Format

a data frame with 5000 rows, 42 columns and rownames

Source

The players_20.csv dataset was downloaded from the Kaggle site and went through few transformations. The complete dataset was obtained from https://www.kaggle.com/stefanoleone992/fifa-20-complete-player-dataset#players_20.csv on January 1, 2020.

Details

It contains 5000 'overall' best players and 43 variables. These are:

  • short_name (rownames)

  • nationality of the player (not used in modeling)

  • overall, potential, value_eur, wage_eur (4 potential target variables)

  • age, height, weight, attacking skills, defending skills, goalkeeping skills (37 variables)

It is advised to leave only one target variable for modeling.

Source: https://www.kaggle.com/stefanoleone992/fifa-20-complete-player-dataset

All transformations:

  1. take 43 columns: [3, 5, 7:9, 11:14, 45:78] (R indexing)

  2. take rows with value_eur > 0

  3. convert short_name to ASCII

  4. remove rows with duplicated short_name (keep first)

  5. sort rows on overall and take top 5000

  6. set short_name column as rownames

  7. transform nationality to factor

  8. reorder columns