Based on image subtitles proposed in the field of computer vision, a system capable of outputting interesting subtitles

It is no exaggeration to say that laughter is a special high-level function that only humans have. So, what causes the expression of human laughter? Recently, scientists at Tokyo Denki University and Japan’s National Institute of Advanced Industrial Science and Technology (AIST) have proposed a new method that can generate funny captions by using it.

I want to ask you a question: What is an effective expression that can cause human laughter? In this article, in order to think about this issue from an academic perspective, we used a computer to generate an image caption that can make people laugh. We have built a system based on image subtitles proposed in the field of computer vision that can output interesting subtitles. In addition, we also proposed the "Funny Score", which can flexibly give weights based on an evaluation database. The funny score can more effectively bring out the "laughter" to optimize the model. In addition, we built a self-collected BoketeDB, which contains a theme (image) and interesting subtitles (text) posted on "Bokete", which is an Image Ogiri website. In the experiment, we verify the effectiveness of the proposed method by comparing the results obtained using the proposed method with the results obtained using MS COCO pre-trained CNN + LSTM (which is a baseline created by humans). We call the proposed method a Neural Joking Machine (NJM), which uses the BoketeDB pre-trained model.

Figure 1: Interesting subtitle samples generated by NJM from image input

It is no exaggeration to say that laughter is a special high-level function that only humans have. In the analysis of laughter, as Wikipedia says, “laughter is considered to be a change in composition (mode)”, and laughter often occurs when the recipient’s composition changes. However, the perspective of laughter depends to a large extent on the location of the recipient. Therefore, it is very difficult to quantitatively measure laughter. Recently, Image Ogiri, a network service such as "Bokete", has appeared, in which users post interesting subtitles on themed pictures, and the subtitles are also evaluated in an SNS-like environment. Users compete to get the most "stars". Although quantifying laughter is considered a very difficult task, the correspondence between Bokete assessment and images allows us to deal with laughter quantitatively. Image captioning is an active topic in computer vision, and we believe that humorous image captioning can be achieved. The main contributions of this article are as follows:

Based on recent research on image captioning in the field of computer vision, we propose a framework for a fun caption generator.

We defined Funny Score, which is a weighting system based on the evaluation of existing funny captions in the database. And this fun score is often used in the loss function.

We collected data to create BoketeDB from the web service Bokete. The database contains 999,571 image and subtitle pairs.

BoketeDB

In the experimental part, we compared the proposed method based on interest scores and BoketeDB pre-training parameters with the baseline provided by MS COCO pre-trained CNN + LSTM. We also compare the results of NJM with interesting subtitles provided by humans. In human evaluation, the results provided by this method rank lower than those provided by humans (22.59% vs 67.99%), but the ranking is higher than the baseline (9.41%). Finally, we show the interesting subtitles generated in several images.

Figure 2: The proposed CNN + LSTM architecture for interesting caption generation

Related research

With the significant research progress made in deep neural networks (DNNs), we found that the combination of convolutional neural networks and recurrent neural networks (CNN+RNN) is a successful model for feature extraction and sequence processing. Although there is no clear division, CNN is usually used for image processing, and RNN is usually used for text processing. In addition, these two fields are mutually integrated. One successful application is to use CNN+LSTM (CNN+Long Short-Term Memory) to generate image captions. This technology can automatically generate text from image input. However, we believe that image captioning requires human intuition and emotion. In this article, we will help guide an image caption for interesting expression. Next, we will introduce related research on the generation of humorous image captions.

Wang et al. proposed an automatic "meme" generation technology. A meme is a funny image, usually containing humorous text. Wang et al. statistically analyzed the correlation between meme and comments, thereby modeling probability dependence (such as the dependence of images and text) and automatically generating meme.

Chandrasekaran et al. constructed an analyzer to quantify the "visual humor" in image input, thereby enhancing image humor. They also constructed a dataset containing interesting (3200) and uninteresting (3200) human labeled images to evaluate visual humor. You can train the "interestingness" of an image by defining 5 stages.

Figure 3: Comparison of output results: The "Human" row represents the subtitles provided by human users and ranks highest on the Bokete website. The "NJM" line represents the result generated by applying the proposed model based on Funny Score and BoketeDB. The "STAIR subtitle" column shows the Japanese translation result of MS COCO.

the methods proposed

We use the proposed funny score for weight evaluation to effectively train the funny caption generator. We use CNN + LSTM as the benchmark, but we have been exploring effective scoring functions and database construction. We call the proposed method the Neural Joke Machine (NJM), which is combined with the BoketeDB pre-trained model.

CNN + LSTM

The flow of the proposed method is shown in Figure 2. Basically, we used the CNN + LSTM model used in Show and Tell, but CNN was replaced by ResNet-152 as an image feature extraction method. Next, we will describe in detail how to use the funny score to calculate the loss function. This function can properly evaluate the number of stars and its "interestingness".

Funny Score (Funny Score)

The Bokete Ogiri website uses the number of stars to assess how interesting the subtitles are. Users evaluate the "interestingness" of the published subtitles and assign one to three stars for the subtitles. Therefore, interesting titles tend to be assigned more stars. Therefore, we are focusing on the number of stars in order to propose an effective training method, where the fun score allows us to evaluate the fun of the subtitles. According to the results of our previous experiments, the fun score with 100 stars is regarded as the threshold. In other words, when the number of stars is less than 100, the fun score outputs the loss value L; on the contrary, when the number of stars exceeds 100, the fun score returns L -1.0. The loss value L is calculated using LSTM as the average of each mini-batch.

Figure 4. Visualization results obtained using the proposed NJM

All in all, in this article, we propose a method that can generate funny captions by using it. We built Bokete DB, which contains a theme (image) and corresponding interesting subtitles published on the Bokete Ogiri website. Through weight evaluation, we effectively train a fun subtitle generator with interesting scores. Although we use CNN+LSTM as the benchmark, we are always exploring an effective scoring function and database structure. The experiments of this research show that NJM is much more interesting than the benchmark STAIR subtitles.

Servo Power Cable

Servo connectors provides a wide range of metric for small Sensors and actuators.The ingress protection is available and rated to IP 67, these connectors are ideally suited for industrial control networks where small sensors are required. Connectors are either factory TPU over-molded or panel receptacles supplied with sold-cup for wire connecting or with PCB panel solder contacts. Field attachable / mountable Connector is also available for your choice.

Servo Power Cable,M40 Male Power Connector,Custom M40 Power Plug,High Current Power Connector

Kunshan SVL Electric Co.,Ltd , https://www.svlelectric.com