Is this MSA-PT or MSA-CPT?

#20
by sokann - opened

From https://github.com/ggml-org/llama.cpp/pull/24908#issuecomment-4820585273, the model seems to perform noticeably better with MSA, compared to with dense attention. Will be good to confirm that the model was indeed pretrained with MSA. Thanks!

Sign up or log in to comment