Abstract: Tabular data is the most prevalent form of structured data, necessitating robust models for classification and regression tasks. Traditional models like eXtreme Gradient Boosting (XGBoost) ...
TabPFGen is a Python library for generating high-quality synthetic tabular data using energy-based modeling and stochastic gradient Langevin dynamics (SGLD). It supports both classification and ...
The deep learning-based approaches to Tabular Data Learning (TDL), classification and regression, have shown competing performance, compared to their conventional counterparts. However, the latent ...
Abstract: Evaluating synthetic data requires careful consideration of both utility and privacy. This study analyzes 12 synthesizers across 17 tabular health datasets, providing large-scale, comparable ...