Music-Driven Enhanced Dance Performance Generation by Integrating Seq2Seq and Human Pose Recognition
Downloads
To address the accuracy bottleneck in the naturalness and rhythm synchronization of music-driven dance generation, an enhanced dance generation model integrating sequence-to-sequence modeling and human pose recognition was developed to improve the synchronization, naturalness, and structural consistency of generated movements. The model uses multi-scale music features as input, extracts temporal music semantics through a bidirectional long short-term memory network and an attention mechanism, and optimizes motion structure by incorporating skeleton keypoint feedback, thereby achieving joint modeling of music semantics and human motion. Experimental results on the AIST++ and DanceTrack datasets demonstrate that the proposed model achieves a beat alignment error as low as 0.12 s, a joint point error of 11.2 px, and a motion smoothness score of 2.41. In the generation of a 90-second dance sequence, the beat error is reduced by more than 32% compared with mainstream models, and the model achieves a high score of 0.97 in the evaluation of complex dance symmetries such as “arm-lifting rotation.” These results indicate that the joint modeling of music semantics and skeletal structure effectively improves movement coordination and rhythm matching in dance generation, enabling the production of natural and coordinated dance movements adaptable to different dance styles.
Downloads
[1] Xinlei, S. (2023). Folk dance and music art of the new generation: China's experience. Voprosy Istorii, 3(1), 170-177. doi:10.31166/voprosyistorii202303statyi31.
[2] Han, B., Li, Y., Shen, Y., Ren, Y., & Han, F. (2024). Dance2MIDI: Dance-driven multi-instrument music generation. Computational Visual Media, 10(4), 791–802. doi:10.1007/s41095-024-0417-1.
[3] He, D. (2025). Seq2Seq Text Recognition Method for Large-Scale Corpus Linguistics Knowledge Based on Transformer. International Journal of High Speed Electronics and Systems, 34(01), 2540069. doi:10.1142/S0129156425400695.
[4] Li, K., & Santos, E. (2024). Artificial Intelligence Choreography: 3D Dance Generation Based on Deep Generative Adversarial Networks. Journal of Network Intelligence, 9(3), 1725–1741. doi:10.6025/jni/2024/9/3/1725-1741.
[5] Kim, W., Sung, J., Saakes, D., Huang, C., & Xiong, S. (2021). Ergonomic postural assessment using a new open-source human pose estimation technology (OpenPose). International Journal of Industrial Ergonomics, 84, 103164. doi:10.1016/j.ergon.2021.103164.
[6] Zhou, Z., Huo, Y., Huang, G., Zeng, A., Chen, X., Huang, L., & Li, Z. (2025). QEAN: quaternion-enhanced attention network for visual dance generation. Visual Computer, 41(2), 961–973. doi:10.1007/s00371-024-03376-5.
[7] Zeng, D. (2025). AI-Powered Choreography Using a Multilayer Perceptron Model for Music-Driven Dance Generation. Informatica (Slovenia), 49(20), 137–148. doi:10.31449/inf.v49i20.8103.
[8] Yang, Z., Wen, Y. H., Chen, S. Y., Liu, X., Gao, Y., Liu, Y. J., Gao, L., & Fu, H. (2024). Keyframe Control of Music-Driven 3D Dance Generation. IEEE Transactions on Visualization and Computer Graphics, 30(7), 3474–3486. doi:10.1109/TVCG.2023.3235538.
[9] Kim, J., Kwon, B., Kim, J., & Lee, S. (2023). MNET++: Music-Driven Pluralistic Dancing Toward Multiple Dance Genre Synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12), 15036–15050. doi:10.1109/TPAMI.2023.3312092.
[10] Au, H. Y., Chen, J., Jiang, J., & Guo, Y. (2024). ReChoreoNet: Repertoire-based Dance Re-choreography with Music-conditioned Temporal and Style Clues. Machine Intelligence Research, 21(4), 771–781. doi:10.1007/s11633-023-1478-9.
[11] Wang, Q., Tong, G., & Zhou, S. (2023). A Study of Dance Movement Capture and Posture Recognition Method Based on Vision Sensors. HighTech and Innovation Journal, 4(2), 283–293. doi:10.28991/HIJ-2023-04-02-03.
[12] Jhansi Rani, C., & Devarakonda, N. (2023). Generative adversarial network based data augmentation and quantum based convolution neural network for the classification of Indian classical dance forms. Journal of Intelligent and Fuzzy Systems, 45(4), 6107–6125. doi:10.3233/JIFS-231183.
[13] Zhou, Q., Jiang, D. L., & Wang, G. (2024). 3D Dance Movement Recognition Based on Somatic Interaction Devices and Neural Networks. Journal of Network Intelligence, 9(4), 2290–2303.
[14] Bao, C., & Sun, Q. (2023). Generating Music with Emotions. IEEE Transactions on Multimedia, 25, 3602–3614. doi:10.1109/TMM.2022.3163543.
[15] Liang, X., Li, W., Huang, L., & Gao, C. (2024). DanceComposer: Dance-to-Music Generation Using a Progressive Conditional Music Generator. IEEE Transactions on Multimedia, 26(6), 10237–10250. doi:10.1109/TMM.2024.3405734.
[16] Cai, X., Wang, T., Lu, R., Jia, S., & Sun, H. (2023). Automatic generation of Labanotation based on human pose estimation in folk dance videos. Neural Computing and Applications, 35(35), 24755–24771. doi:10.1007/s00521-023-08206-8.
[17] Li, W., Wu, L., Wen, X., Feng, Q., Zhou, T., Yang, L., & Yin, Z. (2024). Runoff simulation study based on LSTM-Seq2seq model optimized by attention mechanism. Journal of Glaciology and Geocryology, 46(3), 980–992. doi:10.7522/j.issn.1000-0240.2024.0078.
[18] Li, W., Li, K., Yue, Y., Wang, J., Xu, H., & Luo, Y. (2024). ISAR Range Alignment Based on a Spatiotemporal Attention-Seq2Seq Network. Journal of Signal Processing, 40(9), 1659–1673. doi:10.12466/xhcl.2024.09.008.
[19] Yang, L., Wei, C., Yang, J., Ma, J., Guo, H., Cheng, L., & Li, Z. (2024). Seq2Seq-AFL: Fuzzing via sequence-to-sequence model. International Journal of Machine Learning and Cybernetics, 15(10), 4403–4421. doi:10.1007/s13042-024-02153-z.
[20] Tingting, L., Bo, L., & Chunzhu, L. (2024). Aircraft trajectory prediction within terminal area based on Seq2Seq-attention model. Science Technology and Engineering, 24(9), 3882-3895.
[21] Huang, J., Huang, X., Yang, L., & Tao, Z. (2024). Dance-conditioned artistic music generation by creative-GAN. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 107(5), 836-844. doi:10.1587/transfun.2023EAP1059.
[22] Piekut, B. (2024). Sound against Music. TDR - The Drama Review - A Journal of Performance Studies, 68(2), 35–54. doi:10.1017/S1054204324000066.
[23] Zhang, C., Zhang, H., Pu, T., & Pan, J. (2025). Supply Chain Demand Forecasting Based on Data Mining Algorithm and Seq2Seq. International Journal of Control, Automation and Systems, 23(1), 89–104. doi:10.1007/s12555-024-0141-8.
[24] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv Preprint, arXiv:1409.0473. doi:10.48550/arXiv.1409.0473.
[25] Shi, Y., & Han, S. (2025). Multimedia interactive creative dance choreography integrating intelligent chaotic art algorithms. Journal of Computational Methods in Sciences and Engineering, 25(4), 2976–2991. doi:10.1177/14727978251318055.
[26] Zhou, Q., Li, M., Zeng, Q., Aristidou, A., Zhang, X., Chen, L., & Tu, C. (2023). Let’s all dance: Enhancing amateur dance motions. Computational Visual Media, 9(3), 531–550. doi:10.1007/s41095-022-0292-6.
[27] Siyao, L., Yu, W., Gu, T., Lin, C., Wang, Q., Qian, C., Loy, C. C., & Liu, Z. (2023). Bailando++: 3D Dance GPT with Choreographic Memory. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12), 14192–14207. doi:10.1109/TPAMI.2023.3319435.
[28] Hasanvand, M., Nooshyar, M., Moharamkhani, E., & Selyari, A. (2023). Machine Learning Methodology for Identifying Vehicles Using Image Processing. Artificial Intelligence and Applications, 1(3), 154–162. doi:10.47852/bonviewAIA3202833.
[29] Cheng, Y., Jiang, Y., & Wang, Y. (2024). Music-stylized hierarchical dance synthesis with user control. Virtual Reality and Intelligent Hardware, 6(5), 339–357. doi:10.1016/j.vrih.2024.06.004.
[30] Jiang, H., & Yan, Y. (2024). Sensor based Dance Coherent Action Generation Model using Deep Learning Framework. Scalable Computing: Practice and Experience, 25(2), 1073–1090. doi:10.12694/scpe.v25i2.2648.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.





















