Context is, indeed, everything. But humans don't "do it" at all the way Transformers do. In my personal opinion, big contexts are a big problem for AI, kind of the way that "burning stuff"—though super effective for generating power—has become a big problem for the humanity. It's true that human brains have access to large contexts, but they're not "on tap" at all times like they are in (most) ML models. Rather, they're "accessed" (in quotes because it's not a great word for it) through associative processes, across multiple modalities, then kind of funnelled through a rather small "on tap" memory.
Of course, in much the same way that "burning stuff" got too far ahead in the race to generate power, big-bigger-biggest has got an early and significant lead as the go-to technique for generating good sequences in ML. It's a bit of a shame... I keep poking around for people researching more cognitively-grounded approaches, but they're rare.