Thanks - that’s where I had started leaning, but wanted to be sure. And just to be clear, I’d functionally need to “feed through” the data through the transformer in a tokenized manner since the shape of the input vector is variable? So basically split the input vector to the layer into chunks with their indexes as the queries in the attention layer. And in the forward pass just loop through the input vector until I’m done. u/Green_ninjas, u/pm_me_your_pay_slips
Thanks - that’s where I had started leaning, but wanted to be sure. And just to be clear, I’d functionally need to “feed through” the data through the transformer in a tokenized manner since the shape of the input vector is variable? So basically split the input vector to the layer into chunks with their indexes as the queries in the attention layer. And in the forward pass just loop through the input vector until I’m done. u/Green_ninjas, u/pm_me_your_pay_slips