午夜av不卡一区二区三区,777色在色在线播放免费,99se精品视频在线播放

　　幸運(yùn)的是，交叉驗(yàn)證是一個(gè)很簡(jiǎn)單的避免過擬合的方法。交叉驗(yàn)證，你將你的數(shù)據(jù)分成一些數(shù)字部分(或子類)。我們使用 3 作為一個(gè)例子。然后你這樣做：

　　合并第一二部分，訓(xùn)練一個(gè)模型，在第三部分上做預(yù)測(cè)。

　　合并第一三部分，訓(xùn)練一個(gè)模型，在第二部分上做預(yù)測(cè)。

　　合并第二三部分，訓(xùn)練一個(gè)模型，在第一部分上做預(yù)測(cè)。

　　這種方式，評(píng)價(jià)我們生成的預(yù)測(cè)的精度的整個(gè)數(shù)據(jù)集和曾經(jīng)訓(xùn)練我們的模型沒有相同的數(shù)據(jù)。

　　9：預(yù)測(cè)

　　我們可以使用極好的 scikit-learn 庫來做預(yù)測(cè)。我們將使用skelearn的一個(gè)助手來將數(shù)據(jù)分成交叉驗(yàn)證的子類，然后用每一個(gè)子類分別來訓(xùn)練算法做預(yù)測(cè)。最后，我們將得到一個(gè)預(yù)測(cè)列表，每一個(gè)列表項(xiàng)包含了相關(guān)子類的預(yù)測(cè)數(shù)據(jù)。

　　# Import the linear regression class

　　from sklearn.linear_model import LinearRegression

　　# Sklearn also has a helper that makes it easy to do cross validation

　　from sklearn.cross_validation import KFold

　　# The columns we'll use to predict the target

　　predictors = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]

　　# Initialize our algorithm class

　　alg = LinearRegression()

　　# Generate cross validation folds for the titanic dataset. It return the row indices corresponding to train and test.

　　# We set random_state to ensure we get the same splits every time we run this.

　　kf = KFold(titanic.shape[0], n_folds=3, random_state=1)

　　predictions = []

　　for train, test in kf:

　　# The predictors we're using the train the algorithm. Note how we only take the rows in the train folds.

　　train_predictors = (titanic[predictors].iloc[train,:])

　　# The target we're using to train the algorithm.

　　train_target = titanic["Survived"].iloc[train]

　　# Training the algorithm using the predictors and target.

　　alg.fit(train_predictors, train_target)

　　# We can now make predictions on the test fold

　　test_predictions = alg.predict(titanic[predictors].iloc[test,:])

　　predictions.append(test_predictions)

　　10：測(cè)試(評(píng)價(jià))誤差

　　現(xiàn)在我們有了預(yù)測(cè)結(jié)果，我們可以測(cè)試我們的誤差了。

　　第一步我們需要先定義誤差的度量標(biāo)準(zhǔn)，所以我們先算出我們模型的精度。從Kaggle競(jìng)賽的描述，誤差的度量標(biāo)準(zhǔn)是正確預(yù)測(cè)的百分比。我們將使用這個(gè)相同的度量標(biāo)準(zhǔn)來測(cè)試我們本地模型的性能。

　　這個(gè)度量標(biāo)準(zhǔn)將基本上是 predictions 中找到的值和他們?cè)?titanic['Survived'] 的副本中準(zhǔn)確對(duì)應(yīng)的值的數(shù)量然后再除以乘客的總數(shù)。

　　在我們這么做之前，我們需要先將三個(gè)預(yù)測(cè)數(shù)據(jù)集合并到一個(gè)列中。因?yàn)槊恳粋€(gè)預(yù)測(cè)數(shù)據(jù)集是一個(gè)numpy(python科學(xué)計(jì)算庫[注：真正的科學(xué)計(jì)算庫應(yīng)該是scipy，而numpy主要是矩陣數(shù)組等數(shù)據(jù)處理運(yùn)算])數(shù)組，我們可以使用一個(gè)numpy方法將他們連接到一個(gè)列里。

　　算出 predictions 預(yù)測(cè)值中和 titanic["Survived"] 副本中準(zhǔn)確相同的值的比例。這個(gè)計(jì)算結(jié)過應(yīng)該是一個(gè)浮點(diǎn)數(shù)(小數(shù))并將它賦值給變量 accuracy 。

　　import numpy as np

　　# The predictions are in three separate numpy arrays. Concatenate them into one.

　　# We concatenate them on axis 0, as they only have one axis.

　　predictions = np.concatenate(predictions, axis=0)

　　# Map predictions to outcomes (only possible outcomes are 1 and 0)

　　predictions[predictions > .5] = 1

　　predictions[predictions <=.5] = 0

　　accuracy = sum(predictions[predictions == titanic['Survived']])/(titanic['Survived'].count())

　　11：邏輯回歸

　　我們有了我們的第一個(gè)預(yù)測(cè)結(jié)果!可是結(jié)果并不是很好，只有78.3%的正確率。在視頻中，我們?cè)岬揭环N方式使線性回歸的輸出值介于 0 和 1 之間。這種方法叫做邏輯回歸。

　　一個(gè)好的方法就是將邏輯回歸當(dāng)成是線性回歸的邏輯輸出，所以他的值就是 0 和1 。用邏輯函數(shù) logit function 來完成。輸入任何值到邏輯函數(shù)都將通過“壓縮”極值匹配成 0 和 1 。這對(duì)我們來說非常完美，因?yàn)槲覀冎魂P(guān)心兩種輸出結(jié)果。

　　Sklearn 有一個(gè)邏輯回歸的類我們可以使用。通過使用一個(gè) Sklearn 助手函數(shù)可以使我們所有的交叉驗(yàn)證和測(cè)試變得更簡(jiǎn)單。

　　12：處理測(cè)試集

　　我們的正確度已經(jīng)可以了，但是還不是非常好。我們?nèi)稳豢梢試L試一些方法來使它變得更好，在下一個(gè)任務(wù)將會(huì)討論。

3/4 首頁上一頁 1 2 3 4 下一頁尾頁

Kaggle入門教程