opensource.google.com

Menu

Semantic Reactor: A tool for experimenting with NLU models

Friday, March 27, 2020

Companies are using natural language understanding (NLU) to create digital personal assistants, customer service bots, and semantic search engines for reviews, forums and the news.

However, the perception that using NLU and machine learning is costly and time consuming prevents a lot of potential users from exploring its benefits.

To dispel some of the intimidation of using NLU, and to demonstrate how it can be easily used with pre-trained, generic models, we have released a tool, the Semantic Reactor, and open-sourced example code, The Mystery of the Three Bots.

The Semantic Reactor

The Semantic Reactor is a Google Sheets Add-On that allows the user to sort lines of text in a sheet using a variety of machine-learning models. It is released as a whitelisted experiment, so if you would like to check it out, fill out this application at the Google Cloud AI Workshop. Once approved, you’ll be emailed instructions on how to install it.

The tool offers ranking methods that determine how the list will be sorted. With the semantic similarity method, the lines more similar in meaning to the input will be ranked higher.



With the input-response method, the lines that are the most appropriate conversational responses are ranked higher.

Why use the Semantic Reactor?

There are a lot of interesting things you can do with the Semantic Reactor, but let’s look at the following two:
  • Writing dialogue for a bot that exists within a well-defined environment and has a clear purpose (like a customer service bot) using semantic similarity.
  • Searching within large collections of text, like from a message board. For that, we will use input-response.

Writing Dialogue for a Bot Using Semantic Similarity

For the sake of an example, let’s say you are writing dialogue for a bot that answers questions about a product, in this case, cookies.

If you’ve been running a cookie hotline for a while, you probably can list the most common cookie questions. With that data, you can create your cookie bot. Start by opening a Google Sheet and writing the common questions and answers (questions in the A column, answers in the B).

Here is the start of what that Sheet might look like. Make a copy of the Sheet, which will allow you to use the Semantic Reactor Add-on. Use the tool to experiment with new QA pairs and how each model reacts to them.

Here are a few queries to try, using the semantic similarity rank method:

Query: What are cookie ingredients?
Returns: What are cookies made of?

Query: Are cookies biscuits?
Returns: Are cookies also called biscuits?

Query: What should I serve with cookies?
Returns: What drinks go well with cookies?



Of course, that small list of responses won’t cover many of the questions people will ask your cookie bot. What the Reactor allows you to do is quickly add new QA pairs as you learn about what your users want to ask.

For example, maybe people are asking a lot about cookie calories.

You’d write the new question in column A, and the new answer in column B, and then test a few different phrasings with the Reactor. You might need to tweak the target response a few times to make sure it matches a wide variety of phrasings. You should also experiment with the three different models to see which one performs the best.

For instance, let’s say the new target question you want the model to match to is: “How many calories does a typical cookie have?”

That question might be phrased by users as:
  • Are cookies caloric?
  • A lot of calories in a cookie?
  • Will cookies wreck my diet?
  • Are cookies fattening?


The more you test with live users, the more you’ll find that they phrase their questions in ways you don’t expect. As with all things based on machine learning, constantly refreshing data, testing and improvement is all part of the process.

Searching Through Text Using Input-Response

Sometimes you can’t anticipate what users are going to ask, and sometimes you might be dealing with a lot of potential responses, maybe thousands. In cases like that, you should use the input-response ranking method. That means the model will examine the list of potential responses and then rank each one according to what it thinks is the most likely response.

Here is a Sheet containing a list of simple conversational responses. Using the input-response ranking method, try a few generic conversational openers like “Hello” or “How’s it going?”

Note that in input-response mode, the model is predicting the most likely conversational response to an input and not the most semantically similar response.

Note that “Hello,” in input-response mode, returns “Nice to meet you.” In semantic similarity mode, “Hello” returns what the model thinks is semantically closest to “Hello,” which is “What’s up?”

Now try your own! Add potential responses. Switch between the models and ranking methods to see how it changes the results (be sure to hit the “reload” button every time you add new responses).

Example Code

One of the models available on TensorFlow Hub is the Universal Sentence Encoder Lite. It’s only 1.6MB and is suitable for use within websites and on-device applications.

An open sourced sample game that uses the USE Lite is Mystery of the Three Bots on Github. It’s a simple demonstration that shows how you can use a small semantic ML model to drive conversations with game characters. The corpora the game uses were created and tested using the Semantic Reactor.

You can play a running version of the game here. You can experiment with the corpora of two of the characters, the Maid and the Butler, contained within this Sheet. Be sure to make a copy of the Sheet so you can edit and add new QA pairs.

Where To Get The Models Used Within The Semantic Reactor

All of the models used in the Semantic Reactor are published and available online.
  • Local – Minified TensorFlow.js version of the Universal Sentence Encoder.
  • Basic Online – Basic version of the Universal Sentence Encoder.
  • Multilingual Online – Universal Sentence Encoder trained on question/ answer pairs in 16 languages.

Final Thoughts

These language models are far from perfect. They use their training to give a best estimate on what to return based on the list of responses you gave it. Machine learning is about calculation, prediction, and training. Models can be improved over time with more data and tuning, and in turn, be made more accurate.

Also, because conversational models are trained on dialogue between people, and because people are biased, the models will display biases that exist in the data that they were trained on, sometimes in ways you can’t predict. For more on model bias, and more detail about how these models were trained, see the Semantic Experiences for Developers page.

By Ben Pietrzak, Steve Pucci, Aaron Cohen — Google AI  

A Season of Docs story

Tuesday, March 24, 2020

Lack of clear and reliable documentation is one of the main shortcomings of many open source projects. Last year, Google set out to help change that by announcing the first ever Season of Docs.

Season of Docs is an initiative that brings together technical writers and open source projects to collaborate for a few months, benefitting both the communities and writers.

This is the story of Audrey Tavares, one of the writers who signed up for Season of Docs.

Turning incipient curiosity into an opportunity

In 2019, Audrey was completing the Technical and Professional Communication program at Glendon College, exploring technical writing out of curiosity. One of Google’s technical writers, Nicola Yap, completed the same program and visited Audrey’s class in March to talk about her career. It was an enlightening experience, showing technical writing as an attractive alternative with plenty of opportunities, and introducing Audrey to Season of Docs.

For Audrey, this experience meant stepping into unknown territory—she knew nothing about open source software. Naturally, the first step was to familiarize herself with the communities and understand the software development paradigm. After spending time learning she submitted her Technical Writer application—which was accepted—and was assigned to Oppia, an online educational platform.

Main challenges

Audrey had two mentors to help her on her journey: one in India and the other in the United States. As you can imagine, this revealed the first challenge—time zones. While the first few days were stressful, as navigating schedules across time zones was a daunting task,with a little work, they soon came up with an arrangement that worked for everyone.

The second challenge was learning the tools. For most of us, writing a document involves opening a word processor and typing some text, however, Audrey was about to find out, things are a bit more intricate when it comes to documenting code.

When presented with the choice of a documentation tool set, Audrey decided on Read the Docs. It seemed like a very popular tool among open source communities. How hard can it be to use, right? Well, it’s not so much about how difficult it is, but how different it is for someone unfamiliar with a common software development workflow since it entails learning a few things:
Audrey was not dismayed. She pushed forward and gradually learned these new tools. Both mentors were always available, willing to help, and answered all of her questions. Their mentorship was key to her success.

Every end is a new beginning

After Season of Docs was over, Audrey decided to remain part of the Oppia community to actively contribute to make the platform even better.

The experience allowed Audrey to walk away from Season of Docs with a new set of technical skills, communication skills with software engineers, an extended professional network, and a new item in her résumé. She now works as a technical writer for a software company in Toronto.

Applications for Season of Docs 2020 start on April 13 for open source organizations and on May 11 for technical writers. Check the official announcement to learn how to participate.

By Geri Ochoa, Google Cloud

Announcing Season of Docs 2020

Monday, March 23, 2020

Google Open Source is delighted to announce Season of Docs 2020!

Season of Docs brings technical writers and open source projects together for a few months to work on open source documentation. 2019 was the first year of Season of Docs, bringing together open source organizations and technical writers to create 44 successful documentation projects!

Docs are key to open source success

Survey after survey show the importance of good documentation in how developers choose and use open source:
  • 72% of surveyed developers say “Established policies and documentation” is a key decision factor when choosing open source
  • 93% of surveyed developers say “Incomplete or outdated documentation is a pervasive problem” in open source
  • “Lack of documentation” was the top reason developers gave for deciding against using an open source project
Open source communities know this, and still struggle to produce good documentation. Why? Because creating documentation is hard. But...

There are people who know how to do docs well. Technical writers know how to structure a documentation site so that people can find and understand the content. They know how to write docs that fit the needs of their audience. Technical writers can also help optimize a community’s processes for open source contribution and onboarding new contributors.

Season of Docs brings open source projects and technical writers together with the shared goal of creating great documentation. The writers bring their expertise to the projects, and the project mentors help the technical writers learn more about open source and new technologies. Communities gain new docs contributors and technical writers gain valuable open source skills.

Together the technical writers and mentors build a new doc set, improve the structure of the existing docs, develop a much-needed tutorial, or improve contribution processes and guides. See more ideas for technical writing projects.

By working together in Season of Docs we raise awareness of open source, docs, and technical writing.

How does it work?

April 13 – May 4Open source organizations apply to take part in Season of Docs
May 11Google publishes the list of accepted mentoring organizations, along with their ideas for documentation projects
May 11 – July 9Technical writers choose the project they’d like to work on and submit their proposals to Season of Docs
August 10Google announces the accepted technical writer projects
August 11 – September 11Community bonding: Technical writers get to know mentors and the open source community, and refine their projects in collaboration with their mentors
September 11 – December 6Technical writers work with open source mentors on the accepted projects, and submit their work at the end of the period
January 7, 2021Google publishes the list of successfully-completed projects.
See the timeline for details, including the provision for projects that run longer than three months.

Join us

Explore the Season of Docs website at g.co/seasonofdocs to learn more about participating in the program. Use our logo and other promotional resources to spread the word. Check out the timeline and FAQ, and get ready to apply!

By Erin McKean, Google Open Source
.