然后我們?cè)趐ython環(huán)境中通過
reload(kNN)
來重新加載kNN.py模塊,然后調(diào)用
kNN.datingClassTest()
得到結(jié)果:
the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1...the classifier came back with: 3, the real answer is: 3the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 1the total error rate is: 0.050000
所以我們看到,數(shù)據(jù)集的錯(cuò)誤率是5%,這里會(huì)有一定的偏差,因?yàn)槲覀冸S機(jī)選取的數(shù)據(jù)可能會(huì)不同。
- 使用算法
我們使用上面建立好的分類器構(gòu)建一個(gè)可用的系統(tǒng),通過輸入這些特征值幫她預(yù)測(cè)喜歡程度。我們來編寫代碼:
def classifyPerson() : resultList = ['not', 'small doses', 'large does'] percentTats = float(raw_input("percent of time spent>")) miles = float(raw_input("flier miles per year?")) ice = float(raw_input("liters of ice-cream?")) datingDataMat, datingLabels = file2matrix('datingTestSet2.txt') normMat, ranges, minVals = autoNorm(datingDataMat) inArr = array([miles, percentTats, ice]) classifierResult = classify0((inArr - minVals) / ranges, normMat, datingLabels, 3) print "you will like this person: ", resultList[classifierResult - 1]
這里的代碼大家比較熟悉了,就是加入了raw_input用于輸入,我們來看結(jié)果:
>>> reload(kNN)<module 'kNN' from 'kNN.py'>>>> kNN.classifyPerson()percent of time spent>?10flier miles per year?10000liters of ice-cream?0.5you will like this person: small doses
我們?cè)谧鼋徦惴ǖ臅r(shí)候也發(fā)現(xiàn),并沒有做訓(xùn)練算法
這一環(huán)節(jié),因?yàn)槲覀儾恍枰?xùn)練,直接計(jì)算就好了。
同時(shí)我們也發(fā)現(xiàn)一個(gè)問題,k-近鄰算法特別慢,它需要對(duì)每一個(gè)向量進(jìn)行距離的計(jì)算,這該怎么優(yōu)化呢?