KY Portfolio

누락된 값 분석

import missingno as msno

ax = msno.matrix(df)
plt.show()

# 파일 저장
# from time import time, localtime
# today = localtime(time())
# ax.get_figure().savefig(f'images/mlpr_{today.tm_mon}{today.tm_mday}.png')

fig, ax = plt.subplots(figsize=(16, 6))
(1 - df.isna().mean()).abs().plot.bar(ax=ax)

# 파일 저장
# from time import time, localtime
# today = localtime(time())
# fig.savefig(f'images/mlpr_{today.tm_mon}{today.tm_mday}.png', dpi=300)

누락된 값 대치

from sklearn import impute
from sklearn.experimental import enable_iterative_imputer

target = '컬럼명'
imputer = impute.IterativeImputer(
                                    missing_values = np.nan,    # 찾을 값 (결측값)
                                    initial_strategy = 'mean',  # 대치할 방법, 'most_frequent'
                                    verbose = 0
                                    )
imputed = imputer.fit_transform(train_x[target])
train_x.loc[:, target] = imputed
imputed = imputer.fit_transform(test_x[target])
test_x.loc[:, target] = imputed

drop_cols = ['컬럼명1', '컬럼명2']
df = df.drop(columns=drop_cols, inplace=True)
df = df.dropna()
df = df.dropna(axis=1)

누락된 값이 있었다는 범주형 컬럼 생성

def add_missing_indicator(col):
    def wrapper(df):
        return df[col].isna().astype(int)
    return wrapper

df = df.assign(컬럼명_missing=add_missing_indicator('컬럼명'))

열 이름 수정

import janitor as jn
jn.clean_names(df) # 스네이크 케이스로 컬럼명 정리 (그러나 앞뒤 공백은 처리 못해줌)

def clean_col(name):
    return name.strip().lower().replace(' ', '_')  # 양쪽 여백 제거, 소문자, 스네이크케이스

df.rename(columns=clean_col)

정규식으로 추출

# 이름컬럼에서 .앞의 알파벳 추출
df['컬럼명'].str.extract(
	"(A-Za-z]+)\.", expand=False
).head() 

"""
0	Miss
1	Mr
...
"""

저작자표시 비영리 변경금지

'Experience > - KT AIVLE School' 카테고리의 다른 글

KT AIVLE School 7주차 정리 - CNN's Layers 정리 (0)	2023.03.13
KT AIVLE School 6주차 정리 - FI, PFI, SHAP (0)	2023.03.10
KT AIVLE School 6주차 정리 - 미니프로젝트 후기 (0)	2023.03.08
KT AIVLE School 4주차 정리 - 회귀, 분류 모델 선택 방법 (2)	2023.03.06
KT AIVLE School 5주차 정리 - Keras (Functional) (0)	2023.03.03
KT AIVLE School 5주차 정리 - Keras (Sequential) (0)	2023.02.28

KY Portfolio

KT AIVLE School 6주차 정리 - 전처리 고급

누락된 값 분석

누락된 값 대치

누락된 값이 있었다는 범주형 컬럼 생성

열 이름 수정

정규식으로 추출

'Experience > - KT AIVLE School' 카테고리의 다른 글

댓글

티스토리툴바

KT AIVLE School 6주차 정리 - 전처리 고급

누락된 값 분석

누락된 값 대치

누락된 값이 있었다는 범주형 컬럼 생성

열 이름 수정

정규식으로 추출

'Experience > - KT AIVLE School' 카테고리의 다른 글

관련글

댓글

티스토리툴바