What are your thoughts on the DallE3 “paper” which doesn’t cover technical or architectural details? The only useful takeaway seems to be “higher quality data is better” and “image captioning models that provide a great amount of detail can create good datasets.”
You can try to check these 2 additional resources:
- AMA in OpenAI discord
- this interview with 1st author https://www.youtube.com/watch?app=desktop&v=pgaTOX-RUQ4