Okay ... the answer really is "much better than other RNNs of any type". And tra...

MontyCarloHall · on May 23, 2023

RNNs have a theoretically infinite context size, but in practice it’s limited by vanishing gradients due to too much recursion. That’s why the recursive units are usually LSTMs or GRUs that have explicit functionality to cut off the recursion, using a learnable threshold that’s a function of both the current and recursive inputs.

In your example, x[n-1] = f(x[n-2], x[n-1]). So really, your expression should be

RNN: x[n] = f( … f( f( f( f(k0, x[0]), x[1]), x[2]), x[3]), …, x[n-1])

where k0 is a learnable parameter.