The fifa dataset is a preprocessed players_20.csv dataset which comes as a part of "FIFA 20 complete player dataset" at Kaggle.



a data frame with 5000 rows, 42 columns and rownames


The players_20.csv dataset was downloaded from the Kaggle site and went through few transformations. The complete dataset was obtained from on January 1, 2020.


It contains 5000 'overall' best players and 43 variables. These are:

  • short_name (rownames)

  • nationality of the player (not used in modeling)

  • overall, potential, value_eur, wage_eur (4 potential target variables)

  • age, height, weight, attacking skills, defending skills, goalkeeping skills (37 variables)

It is advised to leave only one target variable for modeling.


All transformations:

  1. take 43 columns: [3, 5, 7:9, 11:14, 45:78] (R indexing)

  2. take rows with value_eur > 0

  3. convert short_name to ASCII

  4. remove rows with duplicated short_name (keep first)

  5. sort rows on overall and take top 5000

  6. set short_name column as rownames

  7. transform nationality to factor

  8. reorder columns