Kaggle竞赛(lecture 1-1 入门)Titanic介绍

Titanic: Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

目录

1Kaggle介绍

1.1规则

Kaggle竞赛(lecture 1-1 入门)Titanic介绍
End Date: 1/7/2020 12:00 AM UTC

This is a fun competition aimed at helping you get started with machine learning. While the Titanic dataset is publically available on the internet, looking up the answers defeats the entire purpose. So seriously, don’t do that.

1.2排行榜

Kaggle竞赛(lecture 1-1 入门)Titanic介绍
从2012-10-1开始-2017-2-1有两万多人提交答案了
每天几百人now

1.3讨论

1.4内核

2概述

2.1描述

从这里开始如果…

您是数据科学和机器学习的新手,或者正在寻找Kaggle预测竞赛的简单介绍。

比赛说明

RMS泰坦尼克号的沉没是历史上最臭名昭着的沉船之一。1912年4月15日,在首航期间,泰坦尼克号撞上一座冰山后沉没,2224名乘客和机组人员中有1502人遇难。这一耸人听闻的悲剧震撼了国际社会,并导致了更好的船舶安全条例。

沉船导致生命损失的原因之一是乘客和船员没有足够的救生艇。虽然幸存下来的运气有一些因素,但一些人比其他人更有可能生存,比如妇女,儿童和上层阶级。

在这个挑战中,我们要求你完成对什么样的人可能生存的分析。特别是,我们要求你运用机器学习的工具来预测哪些乘客幸存下来的悲剧。

实践技能

二进制分类
Python和R基础知识

2.2评估

Goal

It is your job to predict if a passenger survived the sinking of the Titanic or not.
For each PassengerId in the test set, you must predict a 0 or 1 value for the Survived variable.

Metric

Your score is the percentage of passengers you correctly predict. This is known simply as “accuracy”.

Kaggle竞赛(lecture 1-1 入门)Titanic介绍

2.3常见问题

Kaggle竞赛(lecture 1-1 入门)Titanic介绍

Kaggle竞赛(lecture 1-1 入门)Titanic介绍

2.4教程

Kaggle竞赛(lecture 1-1 入门)Titanic介绍
Kaggle竞赛(lecture 1-1 入门)Titanic介绍

3数据

Overview

The data has been split into two groups:

training set (train.csv)
test set (test.csv)
The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.

The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.

We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.

Data Dictionary
Kaggle竞赛(lecture 1-1 入门)Titanic介绍
Variable Notes

pclass: A proxy for socio-economic status (SES)
1st = Upper
2nd = Middle
3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way…
Sibling = brother, sister, stepbrother, stepsister
Spouse = husband, wife (mistresses and fiancés were ignored)

parch: The dataset defines family relations in this way…
Parent = mother, father
Child = daughter, son, stepdaughter, stepson
Some children travelled only with a nanny, therefore parch=0 for them.

Kaggle竞赛(lecture 1-1 入门)Titanic介绍