Post
31
Reinforcement learning can sometimes lead to emergent behavior through much simpler training setups compared to large scale pre-training.
I explored this idea by running a small GRPO experiment on Qwen3.5 4B, and the results were pretty exciting.
Hypothesis: improving visual mathematical reasoning may also improve the model’s ability to transcribe LaTeX from images.
I wrote a short breakdown of the experiment here:
https://hanzlajavaid.github.io/blog/grpo-experiment-exploring-emergent-properties/
I explored this idea by running a small GRPO experiment on Qwen3.5 4B, and the results were pretty exciting.
Hypothesis: improving visual mathematical reasoning may also improve the model’s ability to transcribe LaTeX from images.
I wrote a short breakdown of the experiment here:
https://hanzlajavaid.github.io/blog/grpo-experiment-exploring-emergent-properties/