The titanic data is a complete list of passengers and crew members on the RMS Titanic. It includes a variable indicating whether a person did survive the sinking of the RMS Titanic on April 15, 1912.

data(titanic)
data(titanic_imputed)

Format

a data frame with 2207 rows and 9 columns

Source

This dataset was copied from the stablelearner package and went through few variable transformations. The complete list of persons on the RMS titanic was downloaded from https://www.encyclopedia-titanica.org on April 5, 2016. The information given in sibsp and parch was adopoted from a data set obtained from https://biostat.app.vumc.org/wiki/Main/DataSets.

Details

This dataset was copied from the stablelearner package and went through few variable transformations. Levels in embarked was replaced with full names, sibsp, parch and fare were converted to numerical variables and values for crew were replaced with 0. If you use this dataset please cite the original package.

From stablelearner: The website https://www.encyclopedia-titanica.org offers detailed information about passengers and crew members on the RMS Titanic. According to the website 1317 passengers and 890 crew member were abord. 8 musicians and 9 employees of the shipyard company are listed as passengers, but travelled with a free ticket, which is why they have NA values in fare. In addition to that, fare is truely missing for a few regular passengers.

  • gender a factor with levels male and female.

  • age a numeric value with the persons age on the day of the sinking.

  • class a factor specifying the class for passengers or the type of service aboard for crew members.

  • embarked a factor with the persons place of of embarkment (Belfast/Cherbourg/Queenstown/Southampton).

  • country a factor with the persons home country.

  • fare a numeric value with the ticket price (0 for crew members, musicians and employees of the shipyard company).

  • sibsp an ordered factor specifying the number if siblings/spouses aboard; adopted from Vanderbild data set (see below).

  • parch an ordered factor specifying the number of parents/children aboard; adopted from Vanderbild data set (see below).

  • survived a factor with two levels (no and yes) specifying whether the person has survived the sinking.

NOTE: The titanic_imputed dataset use following imputation rules.

  • Missing `age` is replaced with the mean of the observed ones, i.e., 30.

  • For sibsp and parch, missing values are replaced by the most frequently observed value, i.e., 0.

  • For fare, mean fare for a given class is used, i.e., 0 pounds for crew, 89 pounds for the 1st, 22 pounds for the 2nd, and 13 pounds for the 3rd class.