A general-purpose MoE multimodal beat every dedicated vision model on my father's handwriting
I assumed a specialized vision model would win. I was wrong. A head-to-head on a hard handwriting corpus ended with the general-purpose MoE on top.
I assumed a specialized vision model would win. I was wrong. A head-to-head on a hard handwriting corpus ended with the general-purpose MoE on top.