Submitted by Bai LiChen 13 MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model catnip 76 1