python - scikit - Weird ValueError en el puntaje KNeighborsClassifier
python sklearn nearest neighbors (1)
Está intentando colocar un clasificador de 3 vecinos más cercanos con solo un punto de datos. Eso no funciona. Por cierto, hay funciones para aprender curvas y curvas de validación en scikit-learn.
Quiero trazar las curvas de aprendizaje de un clasificador K Nearest Neighbors. Tengo el siguiente código:
X_train = #training data
Y_train = #target variables
best_neighbors = #number of neighbors which gave highest score (3)
idx = len(X_train)/5000
scores = pd.DataFrame(np.zeros((idx+1, 2)), index=np.arange(1, len(X_train), 5000), columns=[''Train Score'', ''CV Score''])
for i in range(1, len(X_train), 5000):
X_train_set = X_train[:i]
Y_train_set = Y_train[:i]
neigh = KNeighborsClassifier(n_neighbors = best_neigbors)
neigh.fit(X_train_set, Y_train_set)
train_score = neigh.score(X_train, Y_train)
cv_score = neigh.score(X_test, Y_test)
scores[''Train Score''][i] = train_score
scores[''CV Score''][i] = cv_score
Este código funcionó perfectamente antes, por ejemplo, con un Árbol de decisiones o un Bosque aleatorio, pero aquí aparece el siguiente error extraño:
ValueError Traceback (most recent call last)
<ipython-input-6-95e645e75971> in <module>()
10 neigh.fit(X_train_set, Y_train_set)
11
---> 12 train_score = neigh.score(X_train, Y_train)
13 cv_score = neigh.score(X_test, Y_test)
14
//anaconda/lib/python2.7/site-packages/sklearn/base.pyc in score(self, X, y, sample_weight)
289 """
290 from .metrics import accuracy_score
--> 291 return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
292
293
//anaconda/lib/python2.7/site-packages/sklearn/neighbors/classification.pyc in predict(self, X)
145 X = atleast2d_or_csr(X)
146
--> 147 neigh_dist, neigh_ind = self.kneighbors(X)
148
149 classes_ = self.classes_
//anaconda/lib/python2.7/site-packages/sklearn/neighbors/base.pyc in kneighbors(self, X, n_neighbors, return_distance)
316 **self.effective_metric_params_)
317
--> 318 neigh_ind = argpartition(dist, n_neighbors - 1, axis=1)
319 neigh_ind = neigh_ind[:, :n_neighbors]
320 # argpartition doesn''t guarantee sorted order, so we sort again
//anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc in argpartition(a, kth, axis, kind, order)
689 except AttributeError:
690 return _wrapit(a, ''argpartition'',kth, axis, kind, order)
--> 691 return argpartition(kth, axis, kind=kind, order=order)
692
693
ValueError: kth(=2) out of bounds (1)
¿Alguna idea de lo que esto significa y cómo puedo solucionar esto?
EDITAR: Después de actualizar scikit-learn a la versión 0.16, recibí el siguiente error:
ValueError Traceback (most recent call last)
<ipython-input-66-21f434a289fc> in <module>()
10 neigh.fit(X_train_set, Y_train_set)
11
---> 12 train_score = neigh.score(X_train, Y_train)
13 cv_score = neigh.score(X_test, Y_test)
14
//anaconda/lib/python2.7/site-packages/sklearn/base.pyc in score(self, X, y, sample_weight)
293 """
294 from .metrics import accuracy_score
--> 295 return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
296
297
//anaconda/lib/python2.7/site-packages/sklearn/neighbors/classification.pyc in predict(self, X)
136 X = check_array(X, accept_sparse=''csr'')
137
--> 138 neigh_dist, neigh_ind = self.kneighbors(X)
139
140 classes_ = self.classes_
//anaconda/lib/python2.7/site-packages/sklearn/neighbors/base.pyc in kneighbors(self, X, n_neighbors, return_distance)
337 raise ValueError(
338 "Expected n_neighbors <= %d. Got %d" %
--> 339 (train_size, n_neighbors)
340 )
341 n_samples, _ = X.shape
ValueError: Expected n_neighbors <= 1. Got 3