MNIST손글씨 데이터를 이용한 Classification

블루베리브라우니 2021. 7. 30. 18:40

2021. 7. 30. 18:40

처음 Classification을 접할 때 한 번씩은 해본다는 손글씨 분류 문제입니다

간단하게 트레이닝하는 방법은 여기저기 많으니

조금 살을 붙여서 천천히 확인하면서 진행해보려 합니다

사용한 버전

tensorflow = 2.2.0

import tensorflow as tf
from tensorflow.keras.datasets.mnist import load_data
from tensorflow.keras.models import Sequential
from tensorflow.keras import models
from tensorflow.keras.layers import Dense, Input, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.utils import plot_model


from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

# mnist 불러오기
(x_train_full, y_train_full),(x_test, y_test) = load_data(path='mnist.npz')

# training data (70%), validation data (30%) 구분
x_train, x_val, y_train, y_val = train_test_split(x_train_full, y_train_full, test_size=0.3)

# data 갯수 확인
print('총 데이터: {}\t레이블: {}'.format(x_train_full.shape, y_train_full.shape))
print('학습 데이터: {}\t레이블: {}'.format(x_train.shape, y_train_full.shape))
print('검증 데이터: {}\t레이블: {}'.format(x_val.shape, y_val.shape))
print('테스트 데이터: {}\t레이블: {}'.format(x_test.shape, y_test.shape))

총 데이터: (60000, 28, 28) 레이블: (60000,)

학습 데이터: (42000, 28, 28) 레이블: (60000,)

검증 데이터: (18000, 28, 28) 레이블: (18000,)

테스트 데이터: (10000, 28, 28) 레이블: (10000,)

총 60000개의 28*28 이미지 데이터가 존재함을 확인할 수 있습니다

# 60000개의 데이터중 랜덤으로 5개 데이터 확인

plt.style.use('seaborn-white')
num_sample = 5

random_idxs = np.random.randint(60000, size=num_sample)

plt.figure(figsize=(num_sample*3, num_sample*2))
for i, idx in enumerate(random_idxs):
    img = x_train_full[idx, :]
    label = y_train_full[idx]
    
    plt.subplot(1, len(random_idxs), i+1)
    plt.imshow(img)
    plt.title("Index: {}, Label: {}".format(idx, label))

60000개의 데이터중 랜덤한 5개의 데이터를 출력해 보았고 셀을 반복하면 다른 5개의 데이터가 출력됩니다

출력되는 데이터의 개수를 조절하고 싶다면 num_sample을 조절해봅시다

# 데이터 전처리
x_train = x_train / 255.
x_val = x_val / 255.
x_test = x_test / 255.

y_train = to_categorical(y_train)
y_val = to_categorical(y_val)
y_test = to_categorical(y_test)

x값의 전처리는 28*28 의 각 픽셀을 0~255의 범위에서 0~1의 범위로 변환합니다

(가독성을 위해 소숫점 한 자리로 잘랐습니다)

y값의 전처리에 쓰인 to_categorical() 함수는

0으로 된 배열을 만들고 해당 위치에만 1을 넣는 함수입니다.

# 모델 구성
model = Sequential([Input(shape=(28, 28), name='input'),
                  Flatten(input_shape=(28, 28), name='flatten'),
                  Dense(100, activation='relu', name='dense1'),
                  Dense(64, activation='relu', name='dense2'),
                  Dense(32, activation='relu', name='dense3'),
                  Dense(10, activation='softmax', name='output')])

Input layer 에서 flatten을 이용하여 1차원 배열로 변환 후

Dense를 이용하여 Fully connected layer를 구성하였습니다

활성화 함수는 중간층은 모두 relu를 이용하였고

다중 분류이므로 출력층은 softmax를 이용했습니다

model.summary()

summary() 함수를 사용하여 파라미터 수를 확인할 수 있습니다

plot_model(model)

plot_model() 함수를 사용하여 모델을 시각화하여 볼 수 있습니다

model.compile(loss='categorical_crossentropy',
             optimizer='sgd',
             metrics=['accuracy'])

모델을 어떻게 학습시키는지에 대한 설정입니다

loss (손실 함수)는 categorical_crossentropy

optimizer는 sgd

metrics (평가지표)는 accuracy를 이용하였습니다

loss는 개/고양이 같은 이중 분류의 경우 binary_crossentropy를 이용하기도 하며

optimizer는 모델에 따라 adam 등 다양한 다른 optimizer를 이용하기도 합니다

# 모델학습
history =  model.fit(x_train, y_train,
                    epochs=30,
                    batch_size=128,
                    validation_data=(x_val, y_val))

...

30 epochs를 트레이닝하였고

batch_size는 128로 설정하였습니다

진행중~~~

30초 멍때리면 끝

history.history.keys()

dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

history_dict = history.history

loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(loss)+1)
fig = plt.figure(figsize=(12, 6))

ax1 = fig.add_subplot(1, 2, 1)
ax1.plot(epochs, loss, color='blue', label='train_loss')
ax1.plot(epochs, val_loss, color='red', label='val_loss')
ax1.set_title('Train ans Validation loss')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('loss')
ax1.grid()
ax1.legend()

accuracy = history_dict['accuracy']
val_accuracy = history_dict['val_accuracy']

ax2 = fig.add_subplot(1, 2, 2)
ax2.plot(epochs, accuracy, color='blue', label='train_accuracy')
ax2.plot(epochs, val_accuracy, color='red', label='val_accuracy')
ax2.set_title('Train ans Validation Accuracy')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Accuracy')
ax2.grid()
ax2.legend()

plt.show()

validation인 빨간색 선을 주목해야합니다

loss는 계속 감소하고 accuracy는 계속 증가하는 것을 보아 완전히 학습되진 않은 것으로 추정됩니다 (underfitting 상태)

그래도 꽤 높은 값이니 그냥 진행해봅시다

# 모델 평가
model.evaluate(x_test, y_test) 
print(x_test.shape, y_test.shape)

313/313 [==============================] - 0s 1ms/step - loss: 0.1464 - accuracy: 0.9565 (10000, 28, 28) (10000, 10)

test data로 모델을 평가해보았을 때

loss : 0.1464 accuracy : 0.9565로 확인되었습니다

num_sample = 4

random_idxs = np.random.randint(10000, size=num_sample)


plt.figure(figsize=(num_sample*3, num_sample*2))
for i, idx in enumerate(random_idxs):
    img = x_test[idx, :]
    label = y_test[idx]

    pred_ysi = model.predict(x_test[random_idxs])
    arg_pred_yi = np.argmax(pred_ysi, axis=1)
    
    plt.subplot(1, len(random_idxs), i+1)
    plt.imshow(img)
    plt.title("Label: {}, predicted label: {}".format(np.argmax(label), arg_pred_yi[i]))

이 셀을 반복적으로 run 하여 다른 랜덤 한 데이터를 볼 수도 있습니다

물론 틀린 값도 나올 수 있어요

# Confusin Maxtrix (혼동행렬)
from sklearn.metrics import confusion_matrix
import seaborn as sns
sns.set(style='white')

plt.figure(figsize=(9, 9))
pred_ys = model.predict(x_test)
arg_pred_y = np.argmax(pred_ys, axis=1)
cm = confusion_matrix(np.argmax(y_test, axis=-1), np.argmax(pred_ys, axis=-1))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('predicted Label')
plt.ylabel('True Label')
plt.show()

싸이킷런 (sklearn)을 import 해서 confusion martrix를 확인할 수도 있습니다

model.save('mnist_model.h5')

모델 저장 하기

'딥러닝 > tensorflow' 카테고리의 다른 글

predict? predict_classes? 단일이미지 예측 하기 (0)	2021.08.13
tensorflow 모델과 webcam을 이용한 Classification (0)	2021.08.12
custom data를 이용한 transfer learning (0)	2021.08.06
이미지 전처리, 증폭에 사용하는 ImageDataGenerator (0)	2021.07.31
개와 고양이 데이터를 이용한 CNN Classification (0)	2021.07.31

Blueberry Brownie's Blog

MNIST손글씨 데이터를 이용한 Classification

'딥러닝 > tensorflow' 카테고리의 다른 글

+ Recent posts

티스토리툴바