The titanic
data is a complete list of passengers and crew members on the RMS Titanic.
It includes a variable indicating whether a person did survive the sinking of the RMS
Titanic on April 15, 1912.
a data frame with 2207 rows and 9 columns
This dataset was copied from the stablelearner
package and went through few variable
transformations. The complete list of persons on the RMS titanic was downloaded from
https://www.encyclopedia-titanica.org on April 5, 2016. The information given
in sibsp
and parch
was adopoted from a data set obtained from https://biostat.app.vumc.org/wiki/Main/DataSets.
This dataset was copied from the stablelearner
package and went through few variable
transformations. Levels in embarked
was replaced with full names, sibsp
, parch
and fare
were converted to numerical variables and values for crew were replaced with 0.
If you use this dataset please cite the original package.
From stablelearner
: The website https://www.encyclopedia-titanica.org offers detailed information about passengers and crew
members on the RMS Titanic. According to the website 1317 passengers and 890 crew member were abord.
8 musicians and 9 employees of the shipyard company are listed as passengers, but travelled with a
free ticket, which is why they have NA
values in fare
. In addition to that, fare
is truely missing for a few regular passengers.
gender a factor with levels male
and female
.
age a numeric value with the persons age on the day of the sinking.
class a factor specifying the class for passengers or the type of service aboard for crew members.
embarked a factor with the persons place of of embarkment (Belfast/Cherbourg/Queenstown/Southampton).
country a factor with the persons home country.
fare a numeric value with the ticket price (0
for crew members, musicians and employees of the shipyard company).
sibsp an ordered factor specifying the number if siblings/spouses aboard; adopted from Vanderbild data set (see below).
parch an ordered factor specifying the number of parents/children aboard; adopted from Vanderbild data set (see below).
survived a factor with two levels (no
and yes
) specifying whether the person has survived the sinking.
NOTE: The titanic_imputed
dataset use following imputation rules.
Missing `age` is replaced with the mean of the observed ones, i.e., 30.
For sibsp and parch, missing values are replaced by the most frequently observed value, i.e., 0.
For fare, mean fare for a given class is used, i.e., 0 pounds for crew, 89 pounds for the 1st, 22 pounds for the 2nd, and 13 pounds for the 3rd class.