Same architecture, nicer presentation

2024 February 16

Today I Learned that the Fuyu architecture and the best architecture for OCR recognition according to paperswithcode are the same.

The Fuyu architecture

The Fuyu architecture.

The DTrOCR architecture

The DtrOCR architecture.

Both of them consist of the following steps. Chop an image into pieces, then, pass it to a transformer decoder.

In the case o DtrOCR the model was GPT-2 and the case o Fuyu, it was something in house.

