Abstract: Q-learning and double Q-learning are well-known sample-based, off-policy reinforcement learning algorithms. However, Q-learning suffers from overestimation bias, while double Q-learning ...
Abstract: Debt collection is utilized for risk control after credit card delinquency. The existing rule-based method tends to be myopic and non-adaptive due to the delayed feedback. Reinforcement ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results