I will give a difference answer, systems that do online learning are certainly not deterministic in the common sense of the world as their internal changes based on non deterministic behaviour.
Systems that rely on noise generation via non deterministic processes are also non deterministic.
This non determinism is rooted in the change of parts of the state or the input, but for identical state and inputs, the systems are deterministic as long as no bitflips or quantum effects occur in the silicon.
Can you show ordering equivariance of the single matrix with the two matrices?
This form of Attention much be equivariant with respect to token order, eg
I am using rot here for token rotation.