Training Watson to see what’s on your plate

Training Watson to see what’s on your plate

Today, we’re introducing our latest AI research in the form of a new beta feature: the IBM Watson Visual Recognition food model. This feature provides a built-in capability for recognizing 2,000+ different foods within images, providing enhanced specificity and accuracy in this content domain compared to Visual Recognition’s general tagging feature. Using the food model, restaurant diners can easily compare their meals to ones from previous visits to the establishment, while restaurants can better understand how often their food is being shared across social media. The food model is the first of many pre-built custom models that will accelerate the time-to-value for developers to create custom solutions for different domains using Watson Visual Recognition. Like free refill French fries – we think the possibilities are bottomless!


Photo of a platter of oysters with results of the returned tags provided by the Watson Visual Recognition food model

The genesis behind our efforts stemmed from the observation that users of food- and nutrition-logging apps get frustrated by the manual process of tracking their meals.

What if we could train a system to automatically identify the foods at popular restaurant chains and simplify food logging? With frequent lunch-time trips to restaurants near the lab, we took photos of known foods and trained a first version of the food recognition model. This use case was an example of “food in context” – where the system recognized foods from known menus. We could always refer back to the menu if we, or the system, were unsure of what it was seeing. We were never hungry, and often the results of our daily training experiments ended up as leftovers for dinner! But, like a good plate of brownies, we found food visual recognition to be addictive!

The larger challenge we took on was what we called “food in the wild,” where the system doesn’t know the restaurant menu or a user’s food history. We started by searching for images of many different foods online, which produced an initial noisy data set with weakly labeled images. We did a lot of work to match the correct foods to the correct labels to clean up the data set, and today we have the largest known collection of more than 1.5 million labeled food images corresponding to 2,000+ different foods. We further developed a taxonomy around the foods that allowed us to classify foods hierarchically. To improve the system’s accuracy, we came up with a novel idea to exploit this food hierarchy in combination with deep learning methods for fine-grained recognition. This model forms the basis of the Visual Recognition food model.

Using the food model in the Visual Recognition API, Watson focuses specifically on the food shown in the photo. Thus, it is different from general visual tagging, which identifies other information in a photo, such as a plate, knife, blanket, strawberry, table, and people in a picture of food.


Photo of chocolate-covered strawberries and the returned tags provided by the Watson Visual Recognition food model

With the food model, the system homes in only on the food in the photo – in the example here, this would be the strawberries. The accuracy of food identification is only one piece of our model. The system’s recognition goes deeper by performing fine-grain recognition of the foods. In the case of the strawberry dish, it might also tag the photo as “strawberry dipped in chocolate” when that label applies.  Using the hierarchy, the service might also label the photo as a “fruit dish,” which gives a higher-level category for the food. Traditionally, deep learning gives you a list of flat classification scores, but by utilizing the hierarchy and fine-grain classification, we trained the deep learning model to make better mistakes even when a food cannot be identified accurately [i].

As important as it is to teach the system “what is a plate of strawberries” – we had to teach the system what is food and what is not food. To make the service as efficient as possible, the food and non-food classifier and the fine-grained food recognition classifier share most parts of the deep learning networks while having separate branches at the top-level of the network. To make a prediction on a test image, the system only needs a single, very-fast forward pass through the food model to detect and categorize the foods.

Now that Watson has become an expert in recognizing what you’re eating, we’re excited to see the applications and interpretations developers and data scientists will build on our technology!

[i] Hui Wu, Michele Merler, Rosario Uceda-Sosa and John Smith. “Learning to make better mistakes: semantics-aware visual food recognition”. ACM Multimedia Conference, 2016.

The post Training Watson to see what’s on your plate appeared first on IBM Blog Research.



via IBM Blog Research

May 18, 2017 at 04:03AM

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s