Design of a Picture-seeing and Talking System Based on Attention Mechanism

Kexin Ye; Lei Zhang; Tianran Li

American Journal of Electrical and Electronic Engineering. 2022, 10(1), 6-23
DOI: 10.12691/AJEEE-10-1-2

Original Research

Design of a Picture-seeing and Talking System Based on Attention Mechanism

Kexin Ye^1,, Lei Zhang¹ and Tianran Li¹

¹College of Electrical and Automation Engineering, Nanjing Normal University,Nanjing,China

Pub. Date: July 27, 2022

Full Text PDF

Cite this paper

Kexin Ye, Lei Zhang and Tianran Li. Design of a Picture-seeing and Talking System Based on Attention Mechanism. American Journal of Electrical and Electronic Engineering. 2022; 10(1):6-23. doi: 10.12691/AJEEE-10-1-2

Abstract

In order to solve this problem, this paper proposes an image title generation model based on deep loop architecture. This model combines some new achievements in computer vision and machine translation, and can generate natural sentences that accurately describe the image for an uncomplicated physical image. The model is trained to maximize the accuracy of the target description sentence in a given training image. The training data set was mainly completed by the MSCOCO data set, and in the later adjustment stage, some feature pictures I specifically looked for on the Internet were included for improvement. Test experiments on several data sets verify that the model has the ability of basic accurate image description. This model is usually more accurate in the case of uncomplicated physical pictures, which I have verified both qualitatively and quantitatively. The final result can input a qualified image and output a natural language sentence to describe the main content of the image.

Keywords

artificial intelligence, deep circulation, computer vision and machine translation, MSCOCO data

Copyright

This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References

[1]	Liu Siyang. Research on Natural Language Reasoning Based on Sequence-Tree Encoder Fusing Syntactic Information [D]. Zhejiang University, 2018.

[2]	Guo Jimin. Research and implementation of object recognition method based on deep neural network [D]. University of Electronic Science and Technology of China, 2018.

[3]	Zhang Yanqi. Image Chinese semantic understanding based on deep learning [D]. Harbin Institute of Technology, 2017.

[4]	Ma Xinrui. Research on Image Recognition Algorithm Based on Primitive Feature Analysis [D]. Xidian University, 2019.

[5]	Song Jiantao. Research and implementation of early warning system based on agricultural product traceability platform [D]. Beijing University of Technology, 2018.

[6]	Han Guo Feng. Personalized recommendation algorithm for tourist attractions based on transfer learning [D]. Shaanxi University of Science and Technology, 2019.

[7]	Guo Fei. Research on target detection of mechanical parts based on deep learning [D]. Lanzhou University of Technology, 2019.

[8]	Chen Jiaming. Research and simulation of satellite positioning error compensation technology based on convolutional neural network [D]. Beijing University of Posts and Telecommunications, 2018.

[9]	Liu Lei. Application of Tensorflow-based Recurrent Neural Network Model in Air Quality Prediction in Shanghai [D]. Shanghai Normal University, 2019.

[10]	Gao Maoting, Xu Binyuan. Recommendation algorithm based on recurrent neural network [J]. Computer Engineering, 2019, 45(08):198-202+209.

[11]	Wu Haoyu. Research and Application of Text Description Generation Image Algorithm Based on Generative Adversarial Network [D]. Nanjing Normal University, 2019.

[12]	Fu Yuan. A method and system for testing RDMA data transmission on Tensorflow software [J]. Information Communication, 2019(08): 137-138.

[13]	Yin Yuecheng. Experimental research on turning of DT4E pure iron materials [D]. Dalian University of Technology, 2019.

[14]	Lv Ruru. Research on original information collection and processing system of digital copier [D]. Nanjing Forestry University, 2011.

[15]	Xie Pengfei. Geostatistical inversion method based on deep learning [D]. Yangtze University, 2019.

[16]	Zhi Shuaifeng. Research on 3D object recognition technology based on convolutional neural network [D]. National University of Defense Technology, 2017.

[17]	Dou Min. Design and implementation of video semantic analysis system based on CNN and LSTM [D]. Nanjing University of Posts and Telecommunications, 2018.

[18]	Yang Xiaochun, Hou Jixiang, Zheng Han, Wang Bin. An image description generation method based on multiple attention mechanisms and external knowledge [P]. Liaoning Province: CN112784848A, 2021-05-11.

[19]	Ge Hongwei, Yan Zehang. An automatic image caption generation method based on multimodal attention [P]. Liaoning Province: CN108829677B, 2021-05-07.

[20]	Hu Fei, Peng Liang, Zhong Wei, Fang Li, Ye Long, Zhang Qin. Object detection method, device and medium based on image and category attention [P]. Beijing: CN112733944A, 2021-04-30.

[21]	Zhou Yiwen. Research on Question Answering System for Legal Field [D]. Hunan University, 201.