XGBoostingの最適化(Optuna)と特徴量
作成中ですが、暫定で投稿。
ちなみに、xgboostでは欠損値のnanはそのまま取り扱えるようです。
最適化
試行錯誤的に、動く条件を探索しました。
import xgboost as xgb def objective_xgb(trial): # if trial.number == 0 : learning_rate = trial.suggest_loguniform('learning_rate', 0.3, 0.3) gamma = trial.suggest_loguniform('gamma', 1e-8, 1e-8) max_depth = trial.suggest_int('max_depth', 6, 6) min_child_weight = trial.suggest_loguniform('min_child_weight', 1.0, 1.0) #max_delta_step = trial.suggest_uniform('max_delta_step', 1e-10, 1e-10) subsample = trial.suggest_uniform('subsample', 1.0, 1.0) reg_lambda = trial.suggest_uniform('reg_lambda', 1.0, 1.0) reg_alpha = trial.suggest_uniform('reg_alpha', 0.0, 0.0) else: learning_rate = trial.suggest_loguniform('learning_rate', 1e-8, 1.0) gamma = trial.suggest_loguniform('gamma', 1e-15, 1e-5) max_depth = trial.suggest_int('max_depth', 1, 20) min_child_weight = trial.suggest_loguniform('min_child_weight', 1e-8, 1e3) #max_delta_step = trial.suggest_uniform('max_delta_step', 0, 1.0) subsample = trial.suggest_uniform('subsample', 0.0, 1.0) reg_lambda = trial.suggest_uniform('reg_lambda', 0.0, 1000.0) reg_alpha = trial.suggest_uniform('reg_alpha', 0.0, 1000.0) #reg_alpha = trial.suggest_loguniform('reg_alpha', 1e-15, 1e4) # clf = xgb.XGBRegressor( learning_rate = learning_rate, subsample = subsample, max_depth = max_depth, min_child_weight = min_child_weight, max_delta_step = 0, # 1e-10で発散したため、0で固定 reg_lambda = reg_lambda, gamma = gamma, reg_alpha = reg_alpha, #objective='reg:squarederror' ) scores = [] for train_index, test_index in kf.split(X, y): # X_train = scaler.transform( X[train_index] ) y_train = y[train_index] # X_test = scaler.transform( X[test_index] ) y_test = y[test_index] # clf.fit(X_train,y_train) # y_pred = clf.predict(X_test) # scores.append((rmspe(y_test,y_pred))) # return np.mean(np.array(scores))
optuna.logging.disable_default_handler() # Optunaの出力を抑制する #optuna.logging.enable_default_handler() # Optunaで出力する n_trials = 5 # optuna study = optuna.create_study() study.optimize(objective_xgb, n_trials=n_trials)
特徴量評価
そのほか
WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
誤差設定に関するエラーがでてくる場合があるようで、objective='reg:squarederror'
の指定で解決できる場合があります。
regressor = XGBRegressor(tree_method='gpu_hist', random_state=0, objective='reg:squarederror')
参考にしました
最適化
特徴量
Feature Importanceって結局何なの?|Yotaro Katayama|note