A general-purpose MoE multimodal beat every dedicated vision model on my father's handwriting
I assumed a specialized vision model would win. I was wrong. A head-to-head on a hard handwriting corpus ended with the general-purpose MoE on top.
I assumed a specialized vision model would win. I was wrong. A head-to-head on a hard handwriting corpus ended with the general-purpose MoE on top.
How I run a multimodal LLM on four-year-old hardware to read a family archive without sending anything to the cloud.
I have three collections of family letters spanning a century. Until recently, reading them properly would have taken years. Now it takes an afternoon.