sklearn scikit neighbors nearestneighbors nearest learn force brute python scikit-learn

python - scikit - Weird ValueError en el puntaje KNeighborsClassifier



python sklearn nearest neighbors (1)

Está intentando colocar un clasificador de 3 vecinos más cercanos con solo un punto de datos. Eso no funciona. Por cierto, hay funciones para aprender curvas y curvas de validación en scikit-learn.

Quiero trazar las curvas de aprendizaje de un clasificador K Nearest Neighbors. Tengo el siguiente código:

X_train = #training data Y_train = #target variables best_neighbors = #number of neighbors which gave highest score (3) idx = len(X_train)/5000 scores = pd.DataFrame(np.zeros((idx+1, 2)), index=np.arange(1, len(X_train), 5000), columns=[''Train Score'', ''CV Score'']) for i in range(1, len(X_train), 5000): X_train_set = X_train[:i] Y_train_set = Y_train[:i] neigh = KNeighborsClassifier(n_neighbors = best_neigbors) neigh.fit(X_train_set, Y_train_set) train_score = neigh.score(X_train, Y_train) cv_score = neigh.score(X_test, Y_test) scores[''Train Score''][i] = train_score scores[''CV Score''][i] = cv_score

Este código funcionó perfectamente antes, por ejemplo, con un Árbol de decisiones o un Bosque aleatorio, pero aquí aparece el siguiente error extraño:

ValueError Traceback (most recent call last) <ipython-input-6-95e645e75971> in <module>() 10 neigh.fit(X_train_set, Y_train_set) 11 ---> 12 train_score = neigh.score(X_train, Y_train) 13 cv_score = neigh.score(X_test, Y_test) 14 //anaconda/lib/python2.7/site-packages/sklearn/base.pyc in score(self, X, y, sample_weight) 289 """ 290 from .metrics import accuracy_score --> 291 return accuracy_score(y, self.predict(X), sample_weight=sample_weight) 292 293 //anaconda/lib/python2.7/site-packages/sklearn/neighbors/classification.pyc in predict(self, X) 145 X = atleast2d_or_csr(X) 146 --> 147 neigh_dist, neigh_ind = self.kneighbors(X) 148 149 classes_ = self.classes_ //anaconda/lib/python2.7/site-packages/sklearn/neighbors/base.pyc in kneighbors(self, X, n_neighbors, return_distance) 316 **self.effective_metric_params_) 317 --> 318 neigh_ind = argpartition(dist, n_neighbors - 1, axis=1) 319 neigh_ind = neigh_ind[:, :n_neighbors] 320 # argpartition doesn''t guarantee sorted order, so we sort again //anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc in argpartition(a, kth, axis, kind, order) 689 except AttributeError: 690 return _wrapit(a, ''argpartition'',kth, axis, kind, order) --> 691 return argpartition(kth, axis, kind=kind, order=order) 692 693 ValueError: kth(=2) out of bounds (1)

¿Alguna idea de lo que esto significa y cómo puedo solucionar esto?

EDITAR: Después de actualizar scikit-learn a la versión 0.16, recibí el siguiente error:

ValueError Traceback (most recent call last) <ipython-input-66-21f434a289fc> in <module>() 10 neigh.fit(X_train_set, Y_train_set) 11 ---> 12 train_score = neigh.score(X_train, Y_train) 13 cv_score = neigh.score(X_test, Y_test) 14 //anaconda/lib/python2.7/site-packages/sklearn/base.pyc in score(self, X, y, sample_weight) 293 """ 294 from .metrics import accuracy_score --> 295 return accuracy_score(y, self.predict(X), sample_weight=sample_weight) 296 297 //anaconda/lib/python2.7/site-packages/sklearn/neighbors/classification.pyc in predict(self, X) 136 X = check_array(X, accept_sparse=''csr'') 137 --> 138 neigh_dist, neigh_ind = self.kneighbors(X) 139 140 classes_ = self.classes_ //anaconda/lib/python2.7/site-packages/sklearn/neighbors/base.pyc in kneighbors(self, X, n_neighbors, return_distance) 337 raise ValueError( 338 "Expected n_neighbors <= %d. Got %d" % --> 339 (train_size, n_neighbors) 340 ) 341 n_samples, _ = X.shape ValueError: Expected n_neighbors <= 1. Got 3