KY Portfolio

colab에서 dataset이 zip으로 된 파일을 받아서 이미지 파일들을 알맞은 폴더에 맞게 정리하는 작업을 정돈해놔야했다. 여러 모듈에 대한 이해도가 부족해서 이렇게라도 해놓아야 다시 쓸 수 있는 것 같았습니다 ㅠㅠ

후에 os, shutil, Path, yaml, glob, random, json 에 대해서 정리를 해야겠다는 생각을 했습니다.

구글드라이브 연동

from google.colab import drive
drive.mount('/content/drive')

split-folers 모듈을 사용해서 train, valid, test 폴더 생성, 그 안에 클래스별로 폴더가 분리

drive_path = '/content/drive/MyDrive/Datasets/'

zipfile_name = 'zip 파일 이름'
dataset_path = f'/content/Datasets/{zipfile_name}'

extract_path = f'{drive_path}{zipfile_name}'
zipfile_path = f'{drive_path}/{zipfile_name}.zip'

train_path = f'{dataset_path}/train/'
valid_path = f'{dataset_path}/valid/'
test_path = f'{dataset_path}/test/'

if not os.path.exists(dataset_path):
    os.mkdir(dataset_path)
if not os.path.exists(extract_path):
    os.mkdir(extract_path)

if not os.path.exists(train_path):
    os.mkdir(train_path)
if not os.path.exists(valid_path):
    os.mkdir(valid_path)
if not os.path.exists(test_path):
    os.mkdir(test_path)

! pip install split-folders

import zipfile
data = zipfile.ZipFile(zipfile_path)
data.extractall(path=extract_path)

# 폴더 분리하기
import splitfolders
splitfolders.ratio(input=extract_path, output=dataset_path, seed=2023, ratio=(.8, .1, .1))

DataFrame 이용하는 방법

def make_path_label_df(filepaths):

    labels = [str(filepaths[i]).split("/")[-2] \
              for i in range(len(filepaths))]

    filepaths = pd.Series(filepaths, name='Filepath').astype(str)
    labels = pd.Series(labels, name='Label')

    df = pd.concat([filepaths, labels], axis=1)
    df = df.sample(frac=1,random_state=0).reset_index(drop = True)
    return df

dir_ = Path(train_path)
filepaths = list(dir_.glob(r'**/*.png'))
train_df = make_path_label_df(filepaths)

dir_ = Path(test_path)
filepaths = list(dir_.glob(r'**/*.png'))
test_df = make_path_label_df(filepaths)

dir_ = Path(val_path)
filepaths = list(dir_.glob(r'**/*.png'))
val_df = make_path_label_df(filepaths)

Array 이용하는 방법

def split_xy(filepaths):

    labels = [str(filepaths[i]).split("/")[-2] \
              for i in range(len(filepaths))]

    filepaths = pd.Series(filepaths, name='Filepath').astype(str)
    imgs = [keras.utils.load_img(filepath, target_size=(280, 280, 3)) for filepath in filepaths]

    labels = pd.Series(labels, name='Label').map({'normal':0, 'abnormal':1})
    return imgs, labels

train_path = f'{dataset_path}{zipfile_name}/train/'
valid_path = f'{dataset_path}{zipfile_name}/valid/'
test_path = f'{dataset_path}{zipfile_name}/test/'

dir_ = Path(train_path)
filepaths = list(dir_.glob(r'**/*.png'))
x_train, y_train = split_xy(filepaths)


dir_ = Path(val_path)
filepaths = list(dir_.glob(r'**/*.png'))
x_valid, y_valid = split_xy(filepaths)


dir_ = Path(test_path)
filepaths = list(dir_.glob(r'**/*.png'))
x_test, y_test = split_xy(filepaths)

저작자표시 비영리 변경금지

'Experience > - KT AIVLE School' 카테고리의 다른 글

KT AIVLE School 8주차 정리 - 미니프로젝트 3차 후기 (0)	2023.03.20
KT AIVLE School 7주차 정리 - Object Detection (0)	2023.03.15
KT AIVLE School 7주차 정리 - 전이 학습과 파인 튜닝 (0)	2023.03.15
KT AIVLE School 7주차 정리 - 상황별 Data Augmentation (0)	2023.03.15
KT AIVLE School 7주차 정리 - CNN 모델 만들고 사용하기 (0)	2023.03.15
KT AIVLE School 7주차 정리 - Keras callbacks (0)	2023.03.15

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

KY Portfolio

KT AIVLE School 7주차 정리 - Colab에서 이미지 파일 정리

구글드라이브 연동

split-folers 모듈을 사용해서 train, valid, test 폴더 생성, 그 안에 클래스별로 폴더가 분리

DataFrame 이용하는 방법

Array 이용하는 방법

'Experience > - KT AIVLE School' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

KT AIVLE School 7주차 정리 - Colab에서 이미지 파일 정리

구글드라이브 연동

split-folers 모듈을 사용해서 train, valid, test 폴더 생성, 그 안에 클래스별로 폴더가 분리

DataFrame 이용하는 방법

Array 이용하는 방법

'Experience > - KT AIVLE School' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역