...To reduce this learning time, the team of scientists sought to combine these two steps into one, using what they called a “forward gradient.” The idea is to pass the data only once through the neural network and calculate a direct approximation of the gradient from that path. Inevitably, since the entire network is not tracked, this reduces computation time.
The first calculations made in this way seem rather encouraging to them: “ From the point of view of automatic differentiation applied to machine learning, the ‘holy grail’ is whether the practical benefit of gradient descent can be achieved using only the forward gradient, thus eliminating the need for backpropagation. This could change the computational complexity of typical machine learning training pipelines, reduce training time and energy costs, influence the design of devices for machine learning, and even have implications with respect to the biological plausibility of backpropagation. “. ...