빅데이터 서비스 교육/머신러닝

머신러닝 데이터 예측 (KNN모델)

Manly 2022. 6. 23. 11:19
반응형

보스턴 집값 데이터 예측

from sklearn.datasets import load_boston
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from sklearn.model_selection import train_test_split

 

1. 문제정의

  • 보스턴 데이터를 활용하여 집값을 예측해보자 (회귀)

2. 데이터 수집

  • 로드된 보스턴 데이터 활용

3. 데이터 전처리

  • X_train, X_test, y_train, y_test 만들기

4. EDA 생략

5. 모델링

  • KNN 회귀모델 (K값은 자유롭게 찾아보자)
from sklearn.neighbors import KNeighborsRegressor
data = load_boston()
data

data.keys()
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])

df = pd.DataFrame(data.data,columns=data.feature_names)
df

CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.00632	18.0	2.31	0.0	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98
1	0.02731	0.0	7.07	0.0	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14
2	0.02729	0.0	7.07	0.0	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03
3	0.03237	0.0	2.18	0.0	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94
4	0.06905	0.0	2.18	0.0	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33
...	...	...	...	...	...	...	...	...	...	...	...	...	...
501	0.06263	0.0	11.93	0.0	0.573	6.593	69.1	2.4786	1.0	273.0	21.0	391.99	9.67
502	0.04527	0.0	11.93	0.0	0.573	6.120	76.7	2.2875	1.0	273.0	21.0	396.90	9.08
503	0.06076	0.0	11.93	0.0	0.573	6.976	91.0	2.1675	1.0	273.0	21.0	396.90	5.64
504	0.10959	0.0	11.93	0.0	0.573	6.794	89.3	2.3889	1.0	273.0	21.0	393.45	6.48
505	0.04741	0.0	11.93	0.0	0.573	6.030	80.8	2.5050	1.0	273.0	21.0	396.90	7.88
506 rows × 13 columns

X = df
y = data.target

 

X_train,X_test, y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=1)

 

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(354, 13)
(152, 13)
(354,)
(152,)

k_model = KNeighborsRegressor()
k_model.fit(X_train,y_train)

KNeighborsRegressor()

k_model.score(X_test,y_test)   # R2(결정계수) 최고점수는 1

# 50프로의 설명력을 가진다
# 공학 -> 0.7이 넘어야 유의미 , 사회과학 -> 0.3이 넘어야 유의미

0.5099572909288324
 
test_l=[]
train_l=[]
for k in range(1,50,2):
    k_model = KNeighborsRegressor(n_neighbors=k)
    k_model.fit(X_train,y_train)
    
    s1 = k_model.score(X_train,y_train)
    s2 = k_model.score(X_test,y_test)
    
    train_l.append(s1)
    test_l.append(s2)
    
plt.figure(figsize=(10,10))
plt.plot(range(1,50,2),train_l,label='train')
plt.plot(range(1,50,2),test_l,label='test')
plt.legend()
plt.show()

 

 

반응형

'빅데이터 서비스 교육 > 머신러닝' 카테고리의 다른 글

Cross validation  (0) 2022.06.24
머신러닝 Decision Tree  (0) 2022.06.24
Decision Tree  (0) 2022.06.23
KNN 모델  (0) 2022.06.22
머신러닝  (0) 2022.06.21