Communicate with anyone on the planet, with no linguistic divide. Sounds like something out of cheesy sci-fi flick doesn’t it? Until recently, it might have, but with Skype’s latest offering, the Skype Translator, the world may have just become a smaller place.

Over a decade of research and development has allowed Microsoft to achieve what a number of Silicon Valley icons—not to mention the U.S. Department of Defence—have not yet been able to. To do so, Microsoft Research (MSR) had to solve some major machine learning problems while pushing technologies like deep neural networks into new territory.

Translation though, has never been the hardest part of the equation. Effective text translators have been around for a while. Translating spoken language—and especially doing so in real time—requires a whole different set of tools. Spoken words aren’t just a different medium of linguistic communication; we compose our words differently in speech and in text. Then there’s inflection, tone, body language, slang, idiom, mispronunciation, regional dialect and colloquialism. Text offers data; speech and all its nuances offers nothing but problems.

To translate an English phrase like “the straw that broke the camel’s back” into, say, German, the system looks for probabilistic matches, selecting the best solution from a number of candidate phrases based on what it thinks is most likely to be correct. Over time the system builds confidence in certain results, reducing errors. With enough use, it figures out that an equivalent phrase, “the drop that tipped the bucket,” will likely sound more familiar to a German speaker.

This kind of probabilistic, statistical matching allows the system to get smarter over time, but it doesn’t really represent a breakthrough in machine learning or translation (though MSR researchers would point out that they’ve built some pretty sophisticated and unique syntax parsing algorithms into their engine). And anyhow, translation is no longer the hardest part of the equation. The real breakthrough for real-time speech-to-speech translation came around in 2009, when a group at MSR decided to return to deep neural network research in an effort to enhance speech recognition and synthesis—the turning of spoken words into text and vice versa.

Designed more like the human brain than a classical computer, Deep Neural Networks (DNNs)—biologically inspired computing paradigms designed more like the human brain than a classical computer—enable computers to learn observationally through a powerful process known as deep learning. New DNN-based models that learn as they go proved capable of building larger and more complex bodies of knowledge about the data sets they were trained on—including things like language. Speech recognition accuracy rates shot up by 25 percent. Moreover, DNNs are fast enough to make real-time translation a reality, as 50,000 people found out this week.

So how do all these magical elements come together?

When one party on a Skype Translator call speaks, his or her words touch all of those pieces, traveling first to the cloud, then in series through a speech recognition system, a program that cleans up unnecessary “ums” and “ahs” and the like, a translation engine, and a speech synthesizer that turns that translation back into audible speech. Half a beat after that person stops speaking, an audio translation is already playing while a text transcript of the translation displays within the Skype app.

Skype translator still isn’t perfect though, with its fumbles on uncommon idioms and phrases and how the system evolves as it tries to keep up with tens of thousands of users testing its capabilities, still remains to be seen. What is for certain, is that through Skype, Microsoft has ushered in an age of digital communication without borders.

 

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Simply put, Machine Learning is the field of study that gives computers the ability to learn, without being explicitly programmed. Essentially, it is a method of teaching computers to make and improve predictions or behaviours based on some data. What is this “data”? Well, that depends entirely on the problem. It could be readings from a robot’s sensors as it learns to walk, or the correct output of a program for certain input. Machine Learning converts data sets into pieces of software, known as “models,” that can represent the data set and generalize to make predictions on new data.

Broadly, Machine Learning can be used in three different ways:

  1. Data Mining: ML can be used by people to gain insights from large databases.
  2. Statistical Engineering: ML can be used to convert data into software that makes decisions about uncertain data.

Artificial Intelligence: ML can be used to emulate the human mind, to create computers that can see, hear, and understand.

c

The question arises, when a machine ‘learns’, what does it modify, its own code, or the data which contains the experience of code for the given set of input?

Well, it depends.

One example of code actually being modified is Genetic Programming, where you essentially evolve a program to complete a task (of course, the program doesn’t modify itself – instead it modifies another computer program).

Neural networks, on the other hand, modify their parameters automatically in response to prepared stimuli and expected response. This allows them to produce many behaviours (theoretically, they can produce any behaviour because they can approximate any function to an arbitrary precision, given enough time).

This may lead you to believe that machine learning algorithms work by “remembering” information, events, or experiences. This is not necessarily (or even often) the case.

Neural networks, only keep the current “state” of the approximation, which is updated as learning occurs. Rather than remembering what happened and how to react to it, neural networks build a sort of “model” of their “world.” The model tells them how to react to certain inputs, even if the inputs are something that it has never seen before.

This last ability – the ability to react to inputs that have never been seen before – is one of the core tenets of many machine learning algorithms. Imagine trying to teach a computer driver to navigate highways in traffic. An effective machine learning algorithm would (hopefully!) be able to learn similarities between different states and react to them similarly.

The similarities between states can be anything – even things we might think of as mundane can stump a computer! For example, let’s say that the computer driver learned that when a car in front of it slowed down, it had to slow down to. For a human, replacing the car with a motorcycle doesn’t change anything – we recognize that the motorcycle is also a vehicle. For a machine learning algorithm, this can actually be surprisingly difficult! A database would have to store information separately about the case where a car is in front and where a motorcycle is in front. A machine learning algorithm, on the other hand, would “learn” from the car example and be able to generalize to the motorcycle example automatically.

Machine learning is a huge field, with hundreds of different algorithms for solving a myriad of problems across a plethora of fields, ranging from robotics to stock forecasting. Think of the humble search engine. Behind it, is a very complex system that interprets your query, scours the web, and returns information that you will find useful, but because these engines have such high volume of traffic, Machine Learning is used, in the form of automated decision-making to handle the uncertainty and ambiguity of natural language.

As Rick Rashid, Founder of Microsoft Research, put it, “This topic of machine learning has become incredibly exciting over the last 10. The pace of change has been really dramatic.” With recent leaps like IBM Cognitive Computers’ Skin Cancer Detection System and Skype’s real time speech to speech translator, Machine Learning truly is, the way forward.

 

The net neutrality debate burst into the spotlight when American network provider Comcast was accused of selectively slowing down uploads and downloads to certain internet services. The term net neutrality means ‘equality in internet traffic’. The concept has since become the topic of intense debates in corporate and tech circles, even featuring in President Obama’s campaign speeches.

Net neutrality supporters believe that internet providers should not have control over internet applications and services used by their subscribers. Proponents believe that internet providers could use control over subscriber content to create a ‘false market’ for services that would otherwise have been bundled in. Let’s consider an example to demonstrate the way this works. Consider an Internet Service Provider(ISP) which provides an internet plan A, with a monthy subscription fee of Rs. 1000 for unlimited internet, with the exception of services like Skype and FaceTime. The same provider also has a plan B, which costs Rs. 1500 and additionally provides these services, thus creating a ‘false market’ for those services, because in a neutral system, all these services should come bundled in. There have been reports of ISPs blocking certain third party applications to eliminate competition for services they themselves provide. Another way providers could exploit their control over the so called internet ‘pipeline’ is by striking deals with entertainment and other internet application manufacturers such that their content reaches consumers faster. Let us consider two online gaming providers A and B, and an internet provider C. Company C has agreed to a contract with online gaming service A, where users of C’s internet services receive faster speeds when gaming using service A, thereby intentionally pushing service B out of the competition. Such abuse of power by internet providers could endanger the unrestricted internet landscape around the world, and is a major cause for concern around the globe.

Let us now take a look at the other side of the argument. Some people believe that we are not yet ready for regulation in favor of net neutrality, stating that the grey areas about the exact definition of net neutrality must first be ironed out, since net neutrality could be taken to mean ‘equal speeds to all subscribers at the same prices’, which is then a violation of free market policy. Many critics believe that there would have to be an effective regulator for internet services, which would efficiently handle all ambiguity arising out of the legal framework for net neutrality in the future.

The debate has raged on for a few years now, and technology gurus are now looking for a middle route, a proverbial ‘third way’, which would solve the problem using a milder approach, one that requires less of a structural overhaul, and hopefully bury this debate permanently.