新闻中心

EEPW首页 > 智能计算 > 业界动态 > 剑指LightGBM和XGboost！斯坦福发表NGBoost算法

剑指LightGBM和XGboost！斯坦福发表NGBoost算法

Stanford ML Group 最近在他们的论文中发表了一个新算法，其实现被称为 。该算法利用自然梯度将不确定性估计引入到梯度增强中。本文试图了解这个新算法，并与其他流行的 boosting 算法 LightGBM 和 XGboost 进行比较，以了解它在实践中是如何工作的。

James Pond 在 Unsplash 杂志上的照片

1.什么是自然梯度增强？

Base learners

2.经验验证：与 LightGBM 和 XGBoost 的比较

billy lee 在 Unsplash 杂志上的照片

# import packages

import pandas as pd

from ngboost.ngboost import NGBoost

from ngboost.learners import default_tree_learner

from ngboost.distns import Normal

from ngboost.scores

import MLE import lightgbm as lgb

import xgboost as xgb

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error

from math import sqrt

# feature engineering

tr, te = Nanashi_solution(df)

# NGBoost

ngb = NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(),

ngboost = ngb.fit(np.asarray(tr.drop(['SalePrice'],1)),

np.asarray(tr.SalePrice))

y_pred_ngb = pd.DataFrame(ngb.predict(te.drop(['SalePrice'],1)))

# LightGBM

ltr = lgb.Dataset(tr.drop(['SalePrice'],1),label=tr['SalePrice'])

param = {

'bagging_freq': 5,

'bagging_fraction': 0.6,

'bagging_seed': 123,

'boost_from_average':'false',

'boost': 'gbdt',

'feature_fraction': 0.3,

'learning_rate': .01,

'max_depth': 3,

'metric':'rmse',

'min_data_in_leaf': 128,

'min_sum_hessian_in_leaf': 8,

'tree_learner': 'serial',

'objective': 'regression',

'verbosity': -1,

'random_state':123,

'max_bin': 8,

'early_stopping_round':100

}

lgbm = lgb.train(param,ltr,num_boost_round=10000,valid_sets= [(ltr)],verbose_eval=1000)

y_pred_lgb = lgbm.predict(te.drop(['SalePrice'],1))

y_pred_lgb = np.where(y_pred>=.25,1,0)

# XGBoost

params = {

'max_depth': 4, 'eta': 0.01,

'objective':'reg:squarederror',

'eval_metric': ['rmse'],

'booster':'gbtree',

'verbosity':0,

'sample_type':'weighted',

'max_delta_step':4,

'subsample':.5,

'min_child_weight':100,

'early_stopping_round':50

}

dtr, dte = xgb.DMatrix(tr.drop(['SalePrice'],1),label=tr.SalePrice),

xgb.DMatrix(te.drop(['SalePrice'],1),label=te.SalePrice)

num_round = 5000

xgbst = xgb.train(params,dtr,num_round,verbose_eval=500)

y_pred_xgb = xgbst.predict(dte)

# Check the results

print('RMSE: NGBoost',

round(sqrt(mean_squared_error(X_val.SalePrice,y_pred_ngb)),4))

print('RMSE: LGBM',

round(sqrt(mean_squared_error(X_val.SalePrice,y_pred_lgbm)),4))

print('RMSE: XGBoost',

round(sqrt(mean_squared_error(X_val.SalePrice,y_pred_xgb)),4))

NGBoost 与其他 boosting 算法最大的区别之一是可以返回每个预测的概率分布。这可以通过使用 pred_dist 函数可视化。此函数能够显示概率预测的结果。

# see the probability distributions by visualising

Y_dists = ngb.pred_dist(X_val.drop(['SalePrice'],1))

y_range = np.linspace(min(X_val.SalePrice), max(X_val.SalePrice), 200)

dist_values = Y_dists.pdf(y_range).transpose()

# plot index 0 and 114

idx = 114

plt.plot(y_range,dist_values[idx])

plt.title(f"idx: {idx}")

plt.tight_layout()

plt.show()

NGBoost 是一种返回概率分布的 boosting 算法。

NGBoost 预测与其他流行的 boosting 算法相比具有很大的竞争力。

*参考文献：

[1] T. Duan, et al., NGBoost: Natural Gradient Boosting for Probabilistic Prediction (2019), ArXiv 1910.03225

via：https://towardsdatascience.com/ngboost-explained-comparison-to-lightgbm-and-xgboost-fda510903e53@Peter_Dong