These vision-based systems are unable to track controllers when they move out of the camera’s field-of-view (out-of-FOV). To overcome this limitation, we employ sensor fusion and a learning-based model. Specifically, we employ ultrasound sensors on the HMD and controllers to obtain ranging information. We combine this information with predictions from an auto-regressive forecasting model that is built with a recurrent neural network.