Translate your Twitter feed

According to a CommonSense Advisory study, 75 percent of people surveyed prefer to buy products in their native language. For those with limited English, this preference increases to 80 percent or more. The data collected also reveals that more than half of consumers buy only at websites where information is presented in their native language.

Social media holds much insight into customers, and most of that data is generated by non-English speakers. Enterprises that can harness non-English data for sentiment analysis and trend identification gain a distinct competitive advantage.

Twitter has become one of the most important resources for identifying the trends, wants, and needs of consumers. Currently, Twitter has about a billion registered users. The Twitter community is highly diverse, with 77 percent of Twitter’s most active users located outside the United States. This diversity presents a unique challenge for enterprises looking to expand their market share outside their home region, because they must be able to identify and understand market trends in multiple languages.

According to RightScale 2014 State of the Cloud Report, geographic reach is one of the top five benefits that enterprises observe with cloud investments.

This tutorial shows you how to build an app on IBM® Bluemix™ that translates the data feed from Twitter and normalizes the input into English. The app uses the Watson Machine Translation and Watson Language Identification services from the Liberty for Java™ runtime. The app’s users can then perform sentiment analysis and textual analytics to better understand trends from a global perspective.

READ:Getting started with the Watson Machine Translation service

READ:Getting started with the Watson Language Identification service

READ:Creating apps with Liberty for Java

READ:Twitter4J code examples

The Watson Machine Translation service on Bluemix greatly simplifies the task of normalizing content from multiple languages so that you can perform detailed sentiment analysis from a global perspective.

What you’ll need for your application

  • A Bluemix account and a DevOps Services account, both linked to your IBM ID
  • Familiarity with Java, Twitter, Twitter4J, Bootstrap, and jQuery
  • Experience using Server-Sent Events

Step 1. Create a Bluemix app, fork the source code, and bind services

  1. Log in to Bluemix. In the dashboard, click the CREATE AN APP button. Find and click Liberty for Java from the available runtimes in the catalog. Enter a name and host of your liking and click CREATE.
  2. Locate the Language Identification and Machine Translation services in the catalog and bind them to your application.
  3. Scroll up and click this tutorial’s Get the code button.
  4. In the steveatkin | TranslateTweets project’s overview page, click the FORK PROJECT button (enter your DevOps Services credentials if you’re not already logged in). Enter a name for your new project, select the Deploy to Bluemix check box, and choose an organization and space for billing purposes.

Step 2. Load the services credentials

As with other Bluemix services and add-ons, you must extract the credentials and service URLs for the Watson Machine Translation and Watson Language Identification services from the VCAP_SERVICES runtime environment variable.

The machine_translation section of the VCAP_SERVICES environment variable includes a subsection labeled credentials. The credentials subsection contains the URL, username, and password of the RESTful service that you’ll use to access the Watson Machine Translation Service:

 "machine_translation": [
      {
         "name": "mt-watson",
         "label": "machine_translation",
         "plan": "machine_translation_free_plan",
         "credentials": {
            "url": "https://gateway.watsonplatform.net/",
            "sids": [
               {
                  "sid": "mt-ptbr-enus",
                  "description": "translation from Portuguese to English"
               },
               {
                  "sid": "mt-enus-ptbr",
                  "description": "translation from English to Portuguese"
               },
               {
                  "sid": "mt-enus-eses",
                  "description": "translation from English to Spanish"
               },
               {
                  "sid": "mt-eses-enus",
                  "description": "translation from Spanish to English"
               },
               {
                  "sid": "mt-frfr-enus",
                  "description": "translation from French to English"
               },
               {
                  "sid": "mt-enus-frfr",
                  "description": "translation from English to French"
               },
               {
                  "sid": "mt-arar-enus",
                  "description": "translation from Arabic to English"
               }
            ],
            "username": "xxxxxxxxxxx",
            "password": "xxxxxxxxxxx"
         }
      }
   ]

Similar to the machine_translation section, the language_identification section contains the URL, username, and password for accessing the Watson Language Identification service:

"language_identification": [
      {
         "name": "lang-watson",
         "label": "language_identification",
         "plan": "language_identification_free_plan",
         "credentials": {
            "url": "https://gateway.watsonplatform.net/",
            "sids": [
               {
                  "sid": "lid-generic",
                  "description": "language identification of any text"
               }
            ],
            "username": "xxxxxxxxxxxx",
            "password": "xxxxxxxxxxxx"
         }
      }
   ]

This sample code fragment illustrates how you can parse the Watson Machine Translation credentials from the VCAP_SERVICES environment variable. You can also use this same code fragment to parse the credentials for the Watson Language Identification service by simply replacing machine_translation with language_identification:

 JSONObject sysEnv = getVcapServices();
      if (sysEnv == null) {
    return;
 }

      if (sysEnv.containsKey("machine_translation")) {
    JSONArray services = (JSONArray)sysEnv.get("machine_translation");
    JSONObject service = (JSONObject)services.get(0);
    JSONObject credentials = (JSONObject)service.get("credentials");
    String baseURLTranslation = (String)credentials.get("url");
    String usernameTranslation = (String)credentials.get("username");
    String passwordTranslation = (String)credentials.get("password");            
     }

Step 3. Configure Twitter4J

  1. Log in to Twitter Application Management and create a new application.
  2. In your DevOps Services project, open the src > twitter4j.properties file and add the Twitter application credentials that you just generated:
    debug=true
    oauth.consumerKey=xxxxxxxxxxxxxxxxx
    oauth.consumerSecret=xxxxxxxxxxxxxx
    oauth.accessToken=xxxxxxxxxxxxxxxxx
    oauth.accessTokenSecret=xxxxxxxxxxx
    

When using the Twitter APIs, it’s important to avoid exceeding Twitter’s API rate limits. The app’s TwitterAsyncService class searches for only the most popular Tweets for a subject and limits the results to the first page of Tweets to remain within Twitter’s API rate limits:

Query query = new Query(searchTerm);
query.setResultType(Query.POPULAR);

Twitter twitter = TwitterFactory.getSingleton();
// Just get the first page of results to avoid exceeding the Twitter rate limit
QueryResult result = twitter.search(query);

List<Status> tweets = result.getTweets();

Step 4. Configure servlet for Server-Sent Events

You’ll use Server-Sent Events to stream your translated Tweets to the client application running in the browser. When you use Server-Sent Events, a server can push updates to a client whenever it wants without having to make repeated requests for data from the client. Server-Sent Events use the traditional HTTP protocol and are supported by almost all major browser vendors (with the notable exception of Internet Explorer).

To enable Server-Sent Events in Java servlets, all you need to do is specify in the @WebServlet annotation that the servlet will be asynchronous:

@WebServlet(urlPatterns = {"/Tweet"}, asyncSupported = true)

To correctly handle the content generated by the servlet in the client, you must set the response content type to text/event-stream and the character encoding to be in UTF-8:

response.setContentType("text/event-stream");
response.setCharacterEncoding("UTF-8");

In the client application, all that you need to do to handle Server-Sent Events is to register an event handler for the onmessage event from the JavaScript EventSource object. You also need to add an event listener to close the stream from the EventSource when the server indicates that it is finished sending content to your client application:

source.onmessage = function(event) {
   var tweet = JSON.parse(event.data);
          $('tweets').bootstrapTable('append', [tweet]);
     };

    source.addEventListener('finished', function(event) {
        source.close();
     }, false);

Step 5. Call the Machine Translation and Language Identification services

To identify the language and translate your Tweets, you’ll call the Language Identification and Machine Translation services by using the URLs, usernames, and passwords that you obtained in Step 2 when parsing the VCAP_SERVICES environment variable.

The Language Identification service takes your Tweet as input and returns to you the language your Tweet is in, using the standard IETF BCP-47 language tags:

public String identify(String text) {
   String language = "";
   List<NameValuePair> qparams = new ArrayList<NameValuePair>();
   qparams.add(new BasicNameValuePair("txt",text));
   qparams.add(new BasicNameValuePair("sid","lid-generic"));
   qparams.add(new BasicNameValuePair("rt","text"));

   try {
      Executor executor = Executor.newInstance();
      URI serviceURI = new URI(baseURLLanguage).normalize();
      String auth = usernameLanguage + ":" + passwordLanguage;
      byte[] response = executor.execute(Request.Post(serviceURI)
      .addHeader("Authorization", "Basic "+ Base64.encodeBase64String(auth.getBytes()))
      .bodyString(URLEncodedUtils.format(qparams, "utf-8"),
         ContentType.APPLICATION_FORM_URLENCODED)
      ).returnContent().asBytes();

      language = new String(response, "UTF-8");
   }
   catch(Exception e) {
      logger.log(Level.SEVERE, "Watson error: "+e.getMessage(), e);
   }

   return language;
}

To convert the IETF BCP-47 language tag to the display language name, all you need to do is use the Java Locale.forLanguageTag and getDisplayLanguage methods:

Locale tweetLocale = Locale.forLanguageTag(wt.identify(tweet.getText()));
tweetLocale.getDisplayLanguage(requestLocale)

The process to translate a Tweet from one language into another language is nearly the same as identifying the language of a Tweet, except now you must provide one additional parameter that indicates the source and target languages for your Tweet. These are all of the supported machine translations that you can use in your application:

Source and Target Translation Values Translation Action
mt-arar-enus Arabic to US English
mt-ptbr-enus Brazilian Portuguese to US English
mt-enus-ptbr US English to Brazilian Portuguese
mt-enus-frfr US English to French in France
mt-enus-eses US English to Spanish in Spain
mt-frfr-enus French in France to US English
mt-eses-enus Spanish in Spain to US English

When you call the Watson Machine Translation service URL, you must specify one of the translation values in the sid parameter:

public String translate(String text, String sid) {
   String tweetTranslation = "";
   List<NameValuePair> qparams = new ArrayList<NameValuePair>();
   qparams.add(new BasicNameValuePair("txt",text ));
   qparams.add(new BasicNameValuePair("sid",sid ));
   qparams.add(new BasicNameValuePair("rt","text" ));

   try {
      Executor executor = Executor.newInstance();
          URI serviceURI = new URI(baseURLTranslation).normalize();
              String auth = usernameTranslation + ":" + passwordTranslation;
              byte[] response = executor.execute(Request.Post(serviceURI)
      addHeader("Authorization", "Basic "+ Base64.encodeBase64String(auth.getBytes()))
      bodyString(URLEncodedUtils.format(qparams, "utf-8"),
         ContentType.APPLICATION_FORM_URLENCODED)
      ).returnContent().asBytes();

              tweetTranslation = new String(response, "UTF-8");
      }
   catch(Exception e) {
      logger.log(Level.SEVERE, "Watson error: "+e.getMessage(), e);
   }

   return tweetTranslation;
}

Step 6. Deploy your app

Consult the DevOps Services Build and Deploy reference for instructions on deploying your app to Bluemix either manually or automatically. If you opted to maintain the code locally, you can use the command-line utilities from Cloud Foundry to push the application directly to Bluemix. See the instructions.md file in my DevOps Services project for the details of that procedure.

Tapping into social media insights from multiple languages is no longer an impossible task. The Watson Machine Translation service on Bluemix greatly simplifies the task of normalizing content from multiple languages so that you can perform detailed sentiment analysis from a global perspective.

To get started using the sample code, be sure to include the Twitter4J and Apache HttpComponents libraries in your application. The Bluemix Java Web Starter boilerplate includes code that shows how to parse the VCAP_SERVICES environment variable, and it includes libraries that help with converting Java objects to and from JSON format.

BLUEMIX SERVICES USED IN THIS TUTORIAL:

  • Machine Translation converts text input in one language into a destination language for the end user.
  • Language Identification detects the language in which text is written, helping to inform next steps such as translation, voice to text, or direct analysis.
  • Java Web Starter boilerplate Quickly get started with Java and the IBM Data Cache service.

RELATED TOPICS:BootstrapLibertyjQuery

Source: Translate your Twitter feed

Post author

Dustin Gurley is an Designer, Developer, Artist, Instructor, Critical Theorist and Systems Engineer. He has an extensive background working professionally with 2D/2.5D/3D Motion Graphics, Compositing, Film, Video, Photography and client-side performance techniques as it pertains to web development. Dustin recently completed work on his Master of Fine Art degree in Motion Media Design (Motion Graphics) from the Savannah College of Art and Design. Prior to beginning his graduate work, Dustin obtained a Bachelor of Art degree in Communication Studies with a concentration in Broadcast and Emerging Media from the University of North Carolina at Wilmington. In addition to design and modeling, Dustin enjoys toying with his view camera, working with scratch film, authoring media related material and contributing to various industry conferences. When not in front of a computer, Dustin can be found with his wife, Regina Everett Gurley. The couple enjoys dividing their time between their home just outside of Raleigh, North Carolina and the beautiful North Carolina coast. Currently, Dustin serves as the Lead Instructor of Internet Technologies for Wake Technical Community College in Raleigh, North Carolina.