How does the backpropagation algorithm function within neural networks? Can you provide an intuitive explanation of the algorithm? What limitations does it have when compared to other optimization techniques?
Bonus: Can you derive the backpropagation algorithm formally and demonstrate its effectiveness?