Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data

Moses Apambila Agebure; Japheth Kodua Wiredu; Stephen Akobre

doi:doi:10.11648/j.mlr.20261101.14

Research Article |

| Peer-Reviewed

Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data

Moses Apambila Agebure^*

, Japheth Kodua Wiredu

, Stephen Akobre

Published in Machine Learning Research (Volume 11, Issue 1)

Received: 26 April 2026 Accepted: 9 May 2026 Published: 12 June 2026

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Gradient Boosting has become one of the approaches design to improve general predictive performance as well as overcome some speciﬁc learning challenges. Though mature, there are still new adaptive variants being created to enhance ﬂexibility, efﬁciency, as well as overall predictive power. However, there are limited benchmarking studies that sought to establish the generalisation abilities of these techniques especially the newer variants under varying conditions. This study, therefore, conducts a systematic analysis of seven Gradient Boosting models: XGBoost, LightGBM, CatBoost, HistGradientBoosting, GradientBoosting, AdaBoost, and the adaptive MorphBoost on ten benchmark datasets different challenges. All models were trained using a ﬁxed 80:20 train–test split, with 3-fold cross-validation performed solely on the training portion to estimate stability. Performance was measured using accuracy, F1-score, and ROC-AUC to guarantee fairness and reproducibility. The ﬁndings indicate that CatBoost produced the highest mean accuracy of 0.9400 and a near-perfect ROC-AUC of 0.9915, which means that it can effectively generalize across diverse data types. HistGradientBoosting is identiﬁed as the most stable model across datasets with a good level of performance and computational efﬁciency, and it is currently followed by LightGBM and XGBoost. MorphBoost shows promise on binary and high-dimensional datasets where its implementation is fully supported, though its current lack of native multiclass handling limits general applicability. Generally, the research conﬁrms that there is no single model that ﬁts all circumstances; rather, dataset characteristics directly inﬂuence model performance. These results offer real-world guidance on the choice of boosting models and point to the areas where future research, particularly in adaptive and hybrid boosting techniques can be used to further enhance performance and generalization.

Published in	Machine Learning Research (Volume 11, Issue 1)
DOI	10.11648/j.mlr.20261101.14
Page(s)	37-52
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2026. Published by Science Publishing Group

Keywords

Gradient Boosting, XGBoost, LightGBM, CatBoost, MorphBoost, NGBoost, Ensemble Learning, Stacking Ensemble, Tabular Data, Explainable AI

References

[1]	Aabaah, I., Wiredu, J. K., & Batowise, B. E. (2024). Optimizing initial guesses for nonlinear system solvers using machine learning: A comparative study of classification algorithms. SSRN. https://doi.org/10.2139/ssrn.5155541
[2]	Aabaah, I., Wiredu, J. K., Batowise, B. E., & Seidu, N. A. (2025). Revolutionizing nursing and midwifery informatics curriculum evaluation in Ghana: A data-driven machine learning approach. Journal of Information Systems and Informatics, 7(1), 442–460.
[3]	Bergstra, J., Yamins, D., & Cox, D. D. (2013). Making a science of model search: Hyperparameter optimization in hundreds of dimensions. Proceedings of the 30th International Conference on Machine Learning, 115–123.
[4]	Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
[5]	Bkheet, S. A., Khamis, G. S. M., Alenazi, A., Almalih, W. A., Bashier, M. M., & Mohammed, Z. M. S.(2025). Comparative performance of gradient boosting and random forest for smart home device classification. Preprints, 202502.0690.v1. https://doi.org/10.20944/preprints202502.0690.v1
[6]	Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural networks, 106, 249- 259.
[7]	Cai, Y., Feng, J., Wang, Y., Ding, Y., Hu, Y., & Fang, H. (2024). The optuna–lightgbm–xgboost model: A novel approach for estimating carbon emissions based on the electricity–carbon nexus. Applied Sciences, 14(11), 4632.
[8]	Caruana, R., Karampatziakis, N., & Yessenalina, A.(2008). An empirical evaluation of supervised learning in high dimensions. In Proceedings of the 25th International Conference on Machine Learning (pp. 96–103). https://doi.org/10.1145/1390156.1390179
[9]	Caruana, R., Munson, A., & Niculescu-Mizil, A.(2006, December). Getting the most out of ensemble selection. In Sixth International Conference on Data Mining (ICDM’06) (pp. 828-833). IEEE.
[10]	Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
[11]	Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785
[12]	Chen, Z. (2025). A unified comparison of five advanced ensemble learners for wine quality prediction. arXiv preprint, 2506.06327v1.
[13]	Chevalier, D., & Côté, M.-P. (2025). From point to probabilistic gradient boosting for claim frequency and severity prediction. European Actuarial Journal. https://doi.org/10.1007/s13385-025-00428-5
[14]	Dal Pozzolo, A., Caelen, O., Johnson, R. A., & Bontempi, G. (2015). Calibrating probability with undersampling for unbalanced classification. In 2015 IEEE Symposium Series on Computational Intelligence (pp. 159–166). https://doi.org/10.1109/SSCI.2015.33
[15]	Dietterich, T. G. (2000). Ensemble methods in machine learning. In Multiple classifier systems (pp. 1–15). Springer. https://doi.org/10.1007/3-540-45014-9 1
[16]	Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: Gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363. https://doi.org/10.48550/arXiv.1810.11363
[17]	Duan, T., Avati, A., Ding, D. Y., Thai, K. K., Basu, S., Ng, A. Y., & Schuler, A. (2020). NGBoost: Natural gradient boosting for probabilistic prediction. Proceedings of the 37th International Conference on Machine Learning.
[18]	Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x
[19]	Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118. https://doi.org/10.1038/nature21056
[20]	Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15, 3133–3181.
[21]	Florek, P., & Zagdański, A. (2023). Benchmarking state-of-the-art gradient boosting algorithms for classification. arXiv preprint arXiv:2305.17094.
[22]	Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. https://doi.org/10.1006/jcss.1997.1504
[23]	Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
[24]	Ghosh, K., Bellinger, C., Corizzo, R., Branco, P., Krawczyk, B., & Japkowicz, N. (2024). The class imbalance problem in deep learning. Machine Learning, 113(7), 4845-4901.
[25]	Haddaway, N. R., Page, M. J., Pritchard, C. C., & McGuinness, L. A. (2022). PRISMA2020: An R package and Shiny app for producing PRISMA 2020-compliant flow diagrams, with interactivity for optimised digital transparency and Open Synthesis. Campbell Systematic Reviews, 18, e1230. https://doi.org/10.1002/cl2.1230
[26]	Hand, D. J., & Till, R. J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2), 171–186. https://doi.org/10.1023/A:1010920819831
[27]	Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer.
[28]	He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
[29]	Ileri, K. (2025). Comparative analysis of CatBoost, LightGBM, XGBoost, RF, and DT methods optimised with PSO to estimate the number of k-barriers for intrusion detection in wireless sensor networks. International Journal of Machine Learning and Cybernetics, 16, 6937–6956. https://doi.org/10.1007/s13042-025-02654-5
[30]	Imani, M., Beikmohammadi, A., & Arabnia, H. R.(2025). Comprehensive analysis of random forest and XGBoost performance with SMOTE, ADASYN, and GNUS under varying imbalance levels. Technologies, 13(3), 88. https://doi.org/10.3390/technologies13030088
[31]	Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/science.aaa8415
[32]	Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (pp. 3146–3154).
[33]	Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232. https://doi.org/10.1007/s13748-016-0094-0
[34]	Kriuk, B. (2025). MorphBoost: Self-organizing universal gradient boosting with adaptive tree morphing. arXiv preprint, 2511.13234v1.
[35]	Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring. European Journal of Operational Research, 247(1), 124–136. https://doi.org/10.1016/j.ejor.2015.05.030
[36]	Limas Ptr, A. F., Siregar, M. M., & Daniel, I.(2024). Analysis of gradient boosting, XGBoost, and CatBoost on mobile phone classification. Journal of Computer Networks, Architecture and High Performance Computing, 6(2), 661–670. https://doi.org/10.47709/cnahpc.v6i2.3790
[37]	Luo, J., Yuan, Y., & Xu, S. (2025). Improving GBDT performance on imbalanced datasets: An empirical study of class-balanced loss functions. Neurocomputing, 634, 129896.
[38]	Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (pp. 4765–4774).
[39]	Nanini, S., Abid, M., Mamouni, Y., Wiedemann, A., Jouvet, P., & Bourassa, S. (2025). Development and comparative analysis of machine learning models for hypoxemia severity triage in CBRNE emergency scenarios using physiological and demographic data from medical-grade devices. arXiv preprint, 2410.23503v1.
[40]	Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7, 21. https://doi.org/10.3389/fnbot.2013.00021
[41]	Nguyen, N., & Ngo, D. (2025). Comparative analysis of boosting algorithms for predicting personal default. Cogent Economics & Finance, 13(1), 2465971. https://doi.org/10.1080/23322039.2025.2465971
[42]	Nugroho, S. W. M. (2025). Stacking ensemble learning: Combining XGBoost, LightGBM, CatBoost, and AdaBoost with random forest meta model. Research Square. https://doi.org/10.21203/rs.3.rs-7944070/v1
[43]	Olson, R. S., La Cava, W., Orzechowski, P., Urbanowicz, R. J., & Moore, J. H. (2018). PMLB: A large benchmark suite for machine learning evaluation and comparison. BioData Mining, 11(1), 36. https://doi.org/10.1186/s13040-018-0183-8
[44]	Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay,´E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
[45]	Probst, P., Wright, M. N., & Boulesteix, A. L. (2019). Hyperparameters and tuning strategies for random forest. WIREs Data Mining and Knowledge Discovery, 9(3), e1301. https://doi.org/10.1002/widm.1301
[46]	Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems (pp. 6638–6648).
[47]	Provost, F., & Fawcett, T. (2013). Data science for business. O’Reilly Media.
[48]	Rafie, Z., Sedaghat Talab, M., Ebrahim Zadeh Koor, B., Garavand, A., Salehnasab, C., & Ghaderzadeh, M.(2025). Leveraging XGBoost and explainable AI for accurate prediction of type 2 diabetes. BMC Public Health, 25, 3688. https://doi.org/10.1186/s12889-025-24953-w
[49]	Rivaldo, Taufik, R., Ilman, I. S., & Wulansari, O. D. E.(2025). A comparative study of XGBoost, LightGBM, and CatBoost models for customer churn prediction in the banking industry. Computer Science Unila Publishing Network. https://doi.org/10.23960/pepadun.v6i2.277
[50]	Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249.
[51]	Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432
[52]	Verbeke, W., Dejaeger, K., Martens, D., Hur, J., & Baesens, B. (2012). New insights into churn prediction in the telecommunication sector. European Journal of Operational Research, 218(1), 211–229. https://doi.org/10.1016/j.ejor.2011.09.038
[53]	Wiredu, J. K., Akobre, S., Jibreel, F., & Abubakari, A. R.(2026). Assessing the Effectiveness of Machine Learning Classifiers in Handling Imbalanced Datasets. IJSAT–International Journal on Science and Technology, 17(1). https://doi.org/10.71097/IJSAT.v17.i1.10291
[54]	Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1
[55]	Yıldız, A. Y., & Kalayci, A. (2025, June). Gradient boosting decision trees on medical diagnosis over tabular data. Proceedings of the 2025 IEEE International Conference on AI and Data Analytics (ICAD), pp. 1–8. IEEE.
[56]	Zhou, Z. H. (2012). Ensemble methods: Foundations and algorithms. CRC Press.

Cite This Article

Plain Text BibTeX RIS

APA Style

Agebure, M. A., Wiredu, J. K., Akobre, S. (2026). Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data. Machine Learning Research, 11(1), 37-52. https://doi.org/10.11648/j.mlr.20261101.14

Copy | Download

ACS Style

Agebure, M. A.; Wiredu, J. K.; Akobre, S. Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data. Mach. Learn. Res. 2026, 11(1), 37-52. doi: 10.11648/j.mlr.20261101.14

Copy | Download

AMA Style

Agebure MA, Wiredu JK, Akobre S. Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data. Mach Learn Res. 2026;11(1):37-52. doi: 10.11648/j.mlr.20261101.14

Copy | Download

@article{10.11648/j.mlr.20261101.14,
  author = {Moses Apambila Agebure and Japheth Kodua Wiredu and Stephen Akobre},
  title = {Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data
},
  journal = {Machine Learning Research},
  volume = {11},
  number = {1},
  pages = {37-52},
  doi = {10.11648/j.mlr.20261101.14},
  url = {https://doi.org/10.11648/j.mlr.20261101.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20261101.14},
  abstract = {Gradient Boosting has become one of the approaches design to improve general predictive performance as well as overcome some speciﬁc learning challenges. Though mature, there are still new adaptive variants being created to enhance ﬂexibility, efﬁciency, as well as overall predictive power. However, there are limited benchmarking studies that sought to establish the generalisation abilities of these techniques especially the newer variants under varying conditions. This study, therefore, conducts a systematic analysis of seven Gradient Boosting models: XGBoost, LightGBM, CatBoost, HistGradientBoosting, GradientBoosting, AdaBoost, and the adaptive MorphBoost on ten benchmark datasets different challenges. All models were trained using a ﬁxed 80:20 train–test split, with 3-fold cross-validation performed solely on the training portion to estimate stability. Performance was measured using accuracy, F1-score, and ROC-AUC to guarantee fairness and reproducibility. The ﬁndings indicate that CatBoost produced the highest mean accuracy of 0.9400 and a near-perfect ROC-AUC of 0.9915, which means that it can effectively generalize across diverse data types. HistGradientBoosting is identiﬁed as the most stable model across datasets with a good level of performance and computational efﬁciency, and it is currently followed by LightGBM and XGBoost. MorphBoost shows promise on binary and high-dimensional datasets where its implementation is fully supported, though its current lack of native multiclass handling limits general applicability. Generally, the research conﬁrms that there is no single model that ﬁts all circumstances; rather, dataset characteristics directly inﬂuence model performance. These results offer real-world guidance on the choice of boosting models and point to the areas where future research, particularly in adaptive and hybrid boosting techniques can be used to further enhance performance and generalization.
},
 year = {2026}
}

Copy | Download

TY  - JOUR
T1  - Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data

AU  - Moses Apambila Agebure
AU  - Japheth Kodua Wiredu
AU  - Stephen Akobre
Y1  - 2026/06/12
PY  - 2026
N1  - https://doi.org/10.11648/j.mlr.20261101.14
DO  - 10.11648/j.mlr.20261101.14
T2  - Machine Learning Research
JF  - Machine Learning Research
JO  - Machine Learning Research
SP  - 37
EP  - 52
PB  - Science Publishing Group
SN  - 2637-5680
UR  - https://doi.org/10.11648/j.mlr.20261101.14
AB  - Gradient Boosting has become one of the approaches design to improve general predictive performance as well as overcome some speciﬁc learning challenges. Though mature, there are still new adaptive variants being created to enhance ﬂexibility, efﬁciency, as well as overall predictive power. However, there are limited benchmarking studies that sought to establish the generalisation abilities of these techniques especially the newer variants under varying conditions. This study, therefore, conducts a systematic analysis of seven Gradient Boosting models: XGBoost, LightGBM, CatBoost, HistGradientBoosting, GradientBoosting, AdaBoost, and the adaptive MorphBoost on ten benchmark datasets different challenges. All models were trained using a ﬁxed 80:20 train–test split, with 3-fold cross-validation performed solely on the training portion to estimate stability. Performance was measured using accuracy, F1-score, and ROC-AUC to guarantee fairness and reproducibility. The ﬁndings indicate that CatBoost produced the highest mean accuracy of 0.9400 and a near-perfect ROC-AUC of 0.9915, which means that it can effectively generalize across diverse data types. HistGradientBoosting is identiﬁed as the most stable model across datasets with a good level of performance and computational efﬁciency, and it is currently followed by LightGBM and XGBoost. MorphBoost shows promise on binary and high-dimensional datasets where its implementation is fully supported, though its current lack of native multiclass handling limits general applicability. Generally, the research conﬁrms that there is no single model that ﬁts all circumstances; rather, dataset characteristics directly inﬂuence model performance. These results offer real-world guidance on the choice of boosting models and point to the areas where future research, particularly in adaptive and hybrid boosting techniques can be used to further enhance performance and generalization.

VL  - 11
IS  - 1
ER  -

Copy | Download

Author Information

Moses Apambila Agebure

Department of Computer Science, University of Technology and Applied Sciences, Navrongo, Ghana

Contact Email

http://orcid.org/0000-0003-3555-8349
Japheth Kodua Wiredu

Department of Computer Science, Regentropfen University College, Bolgatanga, Ghana

Contact Email

http://orcid.org/0009-0008-0313-5011
Stephen Akobre

Department of Cyber Security and Computer Engineering Technology, University of Technology and Applied Sciences, Navrongo, Ghana

Contact Email

http://orcid.org/0000-0003-3320-212X

Download PDF

Submit an Article

Sections

Plain Text BibTeX RIS

APA Style

Agebure, M. A., Wiredu, J. K., Akobre, S. (2026). Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data. Machine Learning Research, 11(1), 37-52. https://doi.org/10.11648/j.mlr.20261101.14

Copy | Download

ACS Style

Agebure, M. A.; Wiredu, J. K.; Akobre, S. Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data. Mach. Learn. Res. 2026, 11(1), 37-52. doi: 10.11648/j.mlr.20261101.14

Copy | Download

AMA Style

Agebure MA, Wiredu JK, Akobre S. Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data. Mach Learn Res. 2026;11(1):37-52. doi: 10.11648/j.mlr.20261101.14

Copy | Download

@article{10.11648/j.mlr.20261101.14,
  author = {Moses Apambila Agebure and Japheth Kodua Wiredu and Stephen Akobre},
  title = {Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data
},
  journal = {Machine Learning Research},
  volume = {11},
  number = {1},
  pages = {37-52},
  doi = {10.11648/j.mlr.20261101.14},
  url = {https://doi.org/10.11648/j.mlr.20261101.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20261101.14},
  abstract = {Gradient Boosting has become one of the approaches design to improve general predictive performance as well as overcome some speciﬁc learning challenges. Though mature, there are still new adaptive variants being created to enhance ﬂexibility, efﬁciency, as well as overall predictive power. However, there are limited benchmarking studies that sought to establish the generalisation abilities of these techniques especially the newer variants under varying conditions. This study, therefore, conducts a systematic analysis of seven Gradient Boosting models: XGBoost, LightGBM, CatBoost, HistGradientBoosting, GradientBoosting, AdaBoost, and the adaptive MorphBoost on ten benchmark datasets different challenges. All models were trained using a ﬁxed 80:20 train–test split, with 3-fold cross-validation performed solely on the training portion to estimate stability. Performance was measured using accuracy, F1-score, and ROC-AUC to guarantee fairness and reproducibility. The ﬁndings indicate that CatBoost produced the highest mean accuracy of 0.9400 and a near-perfect ROC-AUC of 0.9915, which means that it can effectively generalize across diverse data types. HistGradientBoosting is identiﬁed as the most stable model across datasets with a good level of performance and computational efﬁciency, and it is currently followed by LightGBM and XGBoost. MorphBoost shows promise on binary and high-dimensional datasets where its implementation is fully supported, though its current lack of native multiclass handling limits general applicability. Generally, the research conﬁrms that there is no single model that ﬁts all circumstances; rather, dataset characteristics directly inﬂuence model performance. These results offer real-world guidance on the choice of boosting models and point to the areas where future research, particularly in adaptive and hybrid boosting techniques can be used to further enhance performance and generalization.
},
 year = {2026}
}

Copy | Download

TY  - JOUR
T1  - Gradient Boosting Revisited: Comparative Analysis of Selected Advances on Real-World Tabular Data

AU  - Moses Apambila Agebure
AU  - Japheth Kodua Wiredu
AU  - Stephen Akobre
Y1  - 2026/06/12
PY  - 2026
N1  - https://doi.org/10.11648/j.mlr.20261101.14
DO  - 10.11648/j.mlr.20261101.14
T2  - Machine Learning Research
JF  - Machine Learning Research
JO  - Machine Learning Research
SP  - 37
EP  - 52
PB  - Science Publishing Group
SN  - 2637-5680
UR  - https://doi.org/10.11648/j.mlr.20261101.14
AB  - Gradient Boosting has become one of the approaches design to improve general predictive performance as well as overcome some speciﬁc learning challenges. Though mature, there are still new adaptive variants being created to enhance ﬂexibility, efﬁciency, as well as overall predictive power. However, there are limited benchmarking studies that sought to establish the generalisation abilities of these techniques especially the newer variants under varying conditions. This study, therefore, conducts a systematic analysis of seven Gradient Boosting models: XGBoost, LightGBM, CatBoost, HistGradientBoosting, GradientBoosting, AdaBoost, and the adaptive MorphBoost on ten benchmark datasets different challenges. All models were trained using a ﬁxed 80:20 train–test split, with 3-fold cross-validation performed solely on the training portion to estimate stability. Performance was measured using accuracy, F1-score, and ROC-AUC to guarantee fairness and reproducibility. The ﬁndings indicate that CatBoost produced the highest mean accuracy of 0.9400 and a near-perfect ROC-AUC of 0.9915, which means that it can effectively generalize across diverse data types. HistGradientBoosting is identiﬁed as the most stable model across datasets with a good level of performance and computational efﬁciency, and it is currently followed by LightGBM and XGBoost. MorphBoost shows promise on binary and high-dimensional datasets where its implementation is fully supported, though its current lack of native multiclass handling limits general applicability. Generally, the research conﬁrms that there is no single model that ﬁts all circumstances; rather, dataset characteristics directly inﬂuence model performance. These results offer real-world guidance on the choice of boosting models and point to the areas where future research, particularly in adaptive and hybrid boosting techniques can be used to further enhance performance and generalization.

VL  - 11
IS  - 1
ER  -

Copy | Download