[D] Arbitrary Channel count in network needs to be reduced to 1 channel

Powerful-Cow7564@alien.top · 1 year ago

Thanks - that’s where I had started leaning, but wanted to be sure. And just to be clear, I’d functionally need to “feed through” the data through the transformer in a tokenized manner since the shape of the input vector is variable? So basically split the input vector to the layer into chunks with their indexes as the queries in the attention layer. And in the forward pass just loop through the input vector until I’m done. u/Green_ninjas, u/pm_me_your_pay_slips

Powerful-Cow7564@alien.top · 1 year ago

[D] Arbitrary Channel count in network needs to be reduced to 1 channel