Music-Driven Enhanced Dance Performance Generation by Integrating Seq2Seq and Human Pose Recognition

Music-Driven Dance Generation Seq2Seq Architecture Human Pose Recognition Cross-Modal Alignment Exercise Enhancement Style Transfer

Authors

Vol. 7 No. 2 (2026): June
Research Articles

Downloads

To address the accuracy bottleneck in the naturalness and rhythm synchronization of music-driven dance generation, an enhanced dance generation model integrating sequence-to-sequence modeling and human pose recognition was developed to improve the synchronization, naturalness, and structural consistency of generated movements. The model uses multi-scale music features as input, extracts temporal music semantics through a bidirectional long short-term memory network and an attention mechanism, and optimizes motion structure by incorporating skeleton keypoint feedback, thereby achieving joint modeling of music semantics and human motion. Experimental results on the AIST++ and DanceTrack datasets demonstrate that the proposed model achieves a beat alignment error as low as 0.12 s, a joint point error of 11.2 px, and a motion smoothness score of 2.41. In the generation of a 90-second dance sequence, the beat error is reduced by more than 32% compared with mainstream models, and the model achieves a high score of 0.97 in the evaluation of complex dance symmetries such as “arm-lifting rotation.” These results indicate that the joint modeling of music semantics and skeletal structure effectively improves movement coordination and rhythm matching in dance generation, enabling the production of natural and coordinated dance movements adaptable to different dance styles.