본문 바로가기
AI/- Library

[CatBoost] 알아보기

by Yoojacha 2023. 3. 9.

KT AIVLE School에서 미니 프로젝트 2차 때 저에게 현자타임을 안겨준 엄청난 분류 성능을 가진 CatBoost를 정리합니다.


Pool 사용하기

train_pool = Pool(x_train, y_train) 
eval_pool = Pool(x_val, y_val) 
test_pool = Pool(x_test) 

model = CBC(iterations=100,
            # depth=2, 
            # learning_rate=1, 
            loss_function='Logloss',
            random_seed=1,
            task_type="GPU",
            verbose=True)

model.fit(
    train_pool,
    # cat_features=cat_features,
    eval_set=eval_pool,
    plot=True
)

y_pred = model.predict(test_pool, prediction_type='Class')
y_pred_proba = model.predict(test_pool, prediction_type='Probability')
y_preds_raw_vals = model.predict(test_pool, prediction_type='RawFormulaVal')
print("Class", y_pred)
print("Proba", y_pred_proba)
print("Raw", y_preds_raw_vals)

GridSearchCV 사용하기

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import log_loss, make_scorer

model = CBC(task_type='GPU', border_count=None, random_seed=0)

params = {
          'iterations': [600, 700, 800, 900],
          'depth': [3, 4, 5, 6],
          'loss_function': ['MultiClass'],
          'l2_leaf_reg': np.logspace(-20, -19, 3),
          'leaf_estimation_iterations': [10],
          'eval_metric': ['Accuracy'],
          'logging_level':['Silent'],
         }

LogLoss = make_scorer(log_loss, greater_is_better=False, needs_proba=True)

grid_search = GridSearchCV(model, 
                           param_grid=params, 
                           scoring=LogLoss, 
                           learning_rate=1,
                           cv=5, 
                           verbose=True)

history = grid_search.fit(x_train, y_train)

Pool과 CV 에 대한 정리는 추후 예정!

 


 

Quick start

Use one of the following examples after installing the Python package to get started: CatBoostClassifier. CatBoostRegressor. CatBoost. CatBoostClassifier.

catboost.ai

 

댓글