If science-fiction movies have taught us anything, it’s that the future is a bleak and terrifying dystopia ruled by murderous sentient robots.
Fortunately, only one of these things is true — but that could soon change, as the doomsayers are so fond of telling us.
Artificial intelligence and machine learning are among the most significant technological developments in recent history. Few fields promise to “disrupt” (to borrow a favored term) life as we know it quite like machine learning, but many of the applications of machine learning technology go unseen.
Want to see some real examples of machine learning in action? Here are 10 companies that are using the power of machine learning in new and exciting ways (plus a glimpse into the future of machine learning).
1. Yelp — Image Curation at Scale
Few things compare to trying out a new restaurant then going online to complain about it afterwards. This is among the many reasons why Yelp is so popular (and useful).
While Yelp might not seem to be a tech company at first glance, Yelp is leveraging machine learning to improve users’ experience.
Classifying images into simple exterior/interior categories is easy for humans,
but surprisingly difficult for computers
Since images are almost as vital to Yelp as user reviews themselves, it should come as little surprise that Yelp is always trying to improve how it handles image processing.
This is why Yelp turned to machine learning a couple of years ago when it first implemented its picture classification technology. Yelp’s machine learning algorithms help the company’s human staff to compile, categorize, and label images more efficiently — no small feat when you’re dealing with tens of millions of photos.
2. Pinterest — Improved Content Discovery
Whether you’re a hardcore pinner or have never used the site before, Pinterest occupies a curious place in the social media ecosystem. Since Pinterest’s primary function is to curate existing content, it makes sense that investing in technologies that can make this process more effective would be a priority — and that’s definitely the case at Pinterest.
In 2015, Pinterest acquired Kosei, a machine learning company that specialized in the commercial applications of machine learning tech (specifically, content discovery and recommendation algorithms).
Today, machine learning touches virtually every aspect of Pinterest’s business operations, from spam moderation and content discovery to advertising monetization and reducing churn of email newsletter subscribers. Pretty cool.
3. Facebook — Chatbot Army
Although Facebook’s Messenger service is still a little…contentious (people have verystrong feelings about messaging apps, it seems), it’s one of the most exciting aspects of the world’s largest social media platform. That’s because Messenger has become something of an experimental testing laboratory for chatbots.
Some chatbots are virtually indistinguishable from humans when
conversing via text
Any developer can create and submit a chatbot for inclusion in Facebook Messenger. This means that companies with a strong emphasis on customer service and retention can leverage chatbots, even if they’re a tiny startup with limited engineering resources.
Of course, that’s not the only application of machine learning that Facebook is interested in. AI applications are being used at Facebook to filter out spam and poor-quality content, and the company is also researching computer vision algorithms that can “read” images to visually impaired people.
4. Twitter — Curated Timelines
Twitter has been at the center of numerous controversies of late (not least of which were the much-derided decisions to round out everyone’s avatars and changes to the way people are tagged in @ replies), but one of the more contentious changes we’ve seen on Twitter was the move toward an algorithmic feed.
Rob Lowe was particularly upset by the introduction of
algorithmically curated Twitter timelines
Whether you prefer to have Twitter show you “the best tweets first” (whatever that means) or as a reasonably chronological timeline, these changes are being driven by Twitter’s machine learning technology. Twitter’s AI evaluates each tweet in real time and “scores” them according to various metrics.
Ultimately, Twitter’s algorithms then display tweets that are likely to drive the most engagement. This is determined on an individual basis; Twitter’s machine learning tech makes those decisions based on your individual preferences, resulting in the algorithmically curated feeds, which kinda suck if we’re being completely honest. (Does anybody actually prefer the algorithmic feed? Tell me why in the comments, you lovely weirdos.)
5. Google — Neural Networks and ‘Machines That Dream’
These days, it’s probably easier to list areas of scientific R&D that Google — or, rather, parent company Alphabet — isn’t working on, rather than trying to summarize Google’s technological ambition.
Needless to say, Google has been very busy in recent years, having diversified into such fields as anti-aging technology, medical devices, and — perhaps most exciting for tech nerds — neural networks.
A selection of images created by Google’s neural network
The most visible developments in Google’s neural network research has been the DeepMind network, the “machine that dreams.” It’s the same network that produced those psychedelic images everybody was talking about a while back.
According to Google, the company is researching “virtually all aspects of machine learning,” which will lead to exciting developments in what Google calls “classical algorithms” as well as other applications including natural language processing, speech translation, and search ranking and prediction systems.
6. Edgecase — Improving Ecommerce Conversion Rates
For years, retailers have struggled to overcome the mighty disconnect between shopping in stores and shopping online. For all the talk of how online retail will be the death-knell of traditional shopping, many ecommerce sites still suck.
Edgecase, formerly known as Compare Metrics, hopes to change that.
Edgecase hopes its machine learning technology can help ecommerce retailers improve the experience for users. In addition to streamlining the ecommerce experience in order to improve conversion rates, Edgecase plans to leverage its tech to provide a better experience for shoppers who may only have a vague idea of what they’re looking for, by analyzing certain behaviors and actions that signify commercial intent — an attempt to make casual browsing online more rewarding and closer to the traditional retail experience.
7. Baidu — The Future of Voice Search
Google isn’t the only search giant that’s branching out into machine learning. Chinese search engine Baidu is also investing heavily in the applications of AI.
A simplified five-step diagram illustrating the key stages of
a natural language processing system
One of the most interesting (and disconcerting) developments at Baidu’s R&D lab is what the company calls Deep Voice, a deep neural network that can generate entirely synthetic human voices that are very difficult to distinguish from genuine human speech. The network can “learn” the unique subtleties in the cadence, accent, pronunciation and pitch to create eerily accurate recreations of speakers’ voices.
Far from an idle experiment, Deep Voice 2 — the latest iteration of the Deep Voice technology — promises to have a lasting impact on natural language processing, the underlying technology behind voice search and voice pattern recognition systems. This could have major implications for voice search applications, as well as dozens of other potential uses, such as real-time translation and biometric security.
8. HubSpot — Smarter Sales
Anyone who is familiar with HubSpot probably already knows that the company has long been an early adopter of emerging technologies, and the company proved this again earlier this month when it announced the acquisition of machine learning firm Kemvi.
Predictive lead scoring is just one of the many potential applications
of AI and machine learning
HubSpot plans to use Kemvi’s technology in a range of applications — most notably, integrating Kemvi’s DeepGraph machine learning and natural language processing tech in its internal content management system.
This, according to HubSpot’s Chief Strategy Officer Bradford Coffey, will allow HubSpot to better identify “trigger events” — changes to a company’s structure, management, or anything else that affects day-to-day operations — to allow HubSpot to more effectively pitch prospective clients and serve existing customers.
9. IBM — Better Healthcare
The inclusion of IBM might seem a little strange, given that IBM is one of the largest and oldest of the legacy technology companies, but IBM has managed to transition from older business models to newer revenue streams remarkably well. None of IBM’s products demonstrate this better than its renowned AI, Watson.
An example of how IBM’s Watson can be used
to test and validate self-learning behavioral models
Watson may be a Jeopardy! champion, but it boasts a considerably more impressive track record than besting human contestants in televised game shows. Watson has been deployed in several hospitals and medical centers in recent years, where it demonstrated its aptitude for making highly accurate recommendations in the treatment of certain types of cancers.
Watson also shows significant potential in the retail sector, where it could be used as an assistant to help shoppers, as well as the hospitality industry. As such, IBM is now offering its Watson machine learning technology on a license basis — one of the first examples of an AI application being packaged in such a manner.
10. Salesforce — Intelligent CRMs
Salesforce is a titan of the tech world, with strong market share in the customer relationship management (CRM) space and the resources to match. Lead prediction and scoring are among the greatest challenges for even the savviest digital marketer, which is why Salesforce is betting big on its proprietary Einstein machine learning technology.
Salesforce Einstein allows businesses that use Salesforce’s CRM software to analyze every aspect of a customer’s relationship — from initial contact to ongoing engagement touch points — to build much more detailed profiles of customers and identify crucial moments in the sales process. This means much more comprehensive lead scoring, more effective customer service (and happier customers), and more opportunities.
The Future of Machine Learning
One of the main problems with rapid technological advancement is that, for whatever reason, we end up taking these leaps for granted. Some of the applications of machine learning listed above would have been almost unthinkable as recently as a decade ago, and yet the pace at which scientists and researchers are advancing is nothing short of amazing.
So, what’s next in machine learning trends?
Machines That Learn More Effectively
Before long, we’ll see artificial intelligences that can learn much more effectively. This will lead to developments in how algorithms are treated, such as AI deployments that can recognize, alter, and improve upon their own internal architecture with minimal human supervision.
Automation of Cyberattack Countermeasures
The rise of cybercrime and ransomware has forced companies of all sizes to reevaluate how they respond to systemic online attacks. We’ll soon see AI take a much greater role in monitoring, preventing, and responding to cyberattacks like database breaches, DDoS attacks, and other threats.
Convincing Generative Models
Generative models, such as the ones used by Baidu in our example above, are already incredibly convincing. Soon, we won’t be able to tell the difference at all. Improvements to generative modeling will result in increasingly sophisticated images, voices, and even entire identities generated entirely by algorithms.
Better Machine Learning Training
Even the most sophisticated AI can only learn as effectively as the training it receives; oftentimes, machine learning systems require enormous volumes of data to be trained. In the future, machine learning systems will require less and less data to “learn,” resulting in systems that can learn much faster with significantly smaller data sets.
Machine learning is everywhere. It’s being used to optimize complex chips, balance power and performance inside of data centers, program robots, and to keep expensive electronics updated and operating. What’s less obvious, though, is there are no commercially available tools to validate, verify and debug these systems once machines evolve beyond the final specification.
The expectation is that devices will continue to work as designed, like a cell phone or a computer that has been updated with over-the-air software patches. But machine learning is different. It involves changing the interaction between the hardware and software and, in some cases, the physical world. In effect, it modifies the rules for how a device operates based upon previous interactions, as well as software updates, setting the stage for much wider and potentially unexpected deviations from that specification.
In most instances, these deviations will go unnoticed. In others, such as safety-critical systems, changing how systems perform can have far-reaching consequences. But tools have not been developed yet that reach beyond the algorithms used for teaching machines how to behave. When it comes to understanding machine learning’s impact on a system over time, this is a brave new world.
“The specification may capture requirements of the infrastructure for machine learning, as well as some hidden layers and the training data set, but it cannot predict what will happen in the future,” said Achim Nohl, technical marketing manager for high-performance ASIC prototyping systems at Synopsys. “That’s all heuristics. It cannot be proven wrong or right. It involves supervised versus unsupervised learning, and nobody has answers to signing off on this system. This is all about good enough. But what is good enough?”
Most companies that employ machine learning point to the ability to update and debug software as their safety net. But drill down further into system behavior and modifications and that safety net vanishes. There are no clear answers about how machines will function once they evolve or are modified by other machines.
“You’re stressing things that were unforeseen, which is the whole purpose of machine learning,” said Bill Neifert, director of models technology at ARM. “If you could see all of the eventualities, you wouldn’t need machine learning. But validation could be a problem because you may end up down a path where adaptive learning changes the system.”
Normally this is where the tech industry looks for tools to help automate solutions and anticipate problems. With machine learning those tools don’t exist yet.
“We definitely need to go way beyond where we are today,” said Harry Foster, chief verification scientist at Mentor Graphics. “Today, you have finite state machines and methods that are fixed. Here, we are dealing with systems that are dynamic. Everything needs to be extended or rethought. There are no commercial solutions in this space.”
Foster said some pioneering work in this area is being done by England’s University of Bristol in the area of validating systems that are constantly being updated. “With machine learning, you’re creating a predictive model and you want to make sure it stays within legal bounds. That’s fundamental. But if you have a car and it’s communicating with other cars, you need to make sure you’re not doing something harmful. That involves two machine learnings. How do you test one system against the other system?”
Today, understanding of these systems is relegated to a single point in time, based upon the final system specification and whatever updates have been added via over-the-air software. But machine learning uses an evolutionary teaching approach. With cars, it can depend upon how many miles a vehicle has been driven, where it was driven, by whom, and how it was driven. With a robot, it may depend upon what that robot encounters on a daily basis, whether that includes flat terrain, steps, extreme temperatures or weather. And while some of that will be shared among other devices via the cloud, the basic concept is that the machine itself adapts and learns. So rather than programming a device with software, it is programmed to learn on its own.
Predicting how even one system will behave using this model, coupled with periodic updates, is a mathematical distribution. Predicting how thousands of these systems will change, particularly if they interact with each other, or other devices, involves a series of probabilities that are in constant flux over time.
What is machine learning?
The idea that machines can be taught dates back almost two decades before the introduction of Moore’s Law. Work in this area began in the late 1940s, based on early computer work in identifying patterns in data and then making predictions from that data.
Machine learning applies to a wide spectrum of applications. At the lowest level are mundane tasks such as spam filtering. But machine learning also includes more complex programming of known use cases in a variety of industrial applications, as well as highly sophisticated image recognition systems that can distinguish between one object and another.
Arthur Samuel, one of the pioneers in machine learning, began experimenting with the possibility of making machines learn from experience back in the late 1940s—creating devices that can do things beyond what they were explicitly programmed to do. His best-known work was a checkers game program, which he developed while working at IBM. It is widely credited as the first implementation of machine learning.
Machine learning has advanced significantly since then. Checkers has been supplanted by more difficult games such as chess, Jeopardy, and Go.
In a presentation at the Hot Chips 2016 conference in Cupertino last month, Google engineer Daniel Rosenband cited four parameters for autonomous vehicles—knowing where a car is, understanding what’s going on around it, identifying the objects around a car, and determining the best options for how a car should proceed through all of that to its destination.
This requires more than driving across a simple grid or pattern recognition. It involves some complex reasoning about what a confusing sign means, how to read a traffic light if it is obscured by an object such as a red balloon, and what to do if sensors are blinded by the sun’s glare. It also includes an understanding of the effects of temperature, shock and vibration on sensors and other electronics.
Google uses a combination of sensors, radar and lidar to pull together a cohesive picture, which requires a massive amount of processing in a very short time frame. “We want to jam as much compute as possible into a car,” Rosenband said. “The primary objective is maximum performance, and that requires innovation in how to architect everything to get more performance than you could from general-purpose processing.”
Programming all of this by hand into every new car is unrealistic. Database management is difficult enough with a small data set. Adding in all of the data necessary to keep an autonomous vehicle on the road, and fully updated with new information about potential dangerous behavior, is impossible without machine learning.
“We’re seeing two applications in this space,” said Charlie Janac, chairman and CEO of Arteris. “The first is in the data center, which is a machine-learning application. The second is ADAS, where you decide on what the image is. This gets into the world of convolutional neural networking algorithms, and a really good implementation of this would include tightly coupled hardware and software. These are mission-critical systems, and they need to continually update software over the air with a capability to visualize what’s in the hardware.”
How it’s being used
Machine learning comes in many flavors, and often means different things to different people. In general, the idea is that algorithms can be used to change the functionality of a system to either improve performance, lower power, or simply to update it with new use cases. That learning can be applied to software, firmware, an IP block, a full SoC, or an integrated device with multiple SoCs.
Microsoft is using machine learning for its “mixed reality” HoloLens device, according to Nick Baker, distinguished engineer in the company’s Technology and Silicon Group. “We run changes to the algorithm and get feedback as quickly as possible, which allows us to scale as quickly as possible from as many test cases as possible,” he said.
The HoloLens is still just a prototype, but like the Google self-driving car it is processing so much information so fast and reacting so quickly to the external world that there is no way to program this device without machine learning. “The goal is to scale as quickly as possible from as many test cases as possible,” Baker said.
Machine learning can be used to optimize hardware and software in everything from IP to complex systems, based upon a knowledge base of what works best for which conditions.
“We use machine learning to improve our internal algorithms,” said Anush Mohandass, vice president of marketing at NetSpeed Systems. “Without machine learning, if you don’t have an intelligent human to set it up, you get garbage back. You may start off and experiment with 15 things on the ‘x’ axis and 1,000 things on the ‘y’ axis, and set up an algorithm based on that. But there is a potential for infinite data.”
Machine learning assures a certain level of results, no matter how many possibilities are involved. That approach also can help if there are abnormalities that do not fit into a pattern because machine learning systems can ignore those aberrations. “This way you also can debug what you care about,” Mohandass said. “The classic case is a car on auto pilot that crashes because a chip did not recognize a full spectrum of things. At some point we will need to understand every data point and why something behaves the way it does. This isn’t the 80/20 rule anymore. It’s probably closer to 99.9% and 0.1%, so the distribution becomes thinner and taller.”
eSilicon uses a version of machine learning in its online quoting tools, as well. “We have an IP marketplace where we can compile memories, try them for free, and use them until you put them into production,” said Jack Harding, eSilicon’s president and CEO. “We have a test chip capability for free, fully integrated and perfectly functional. We have a GDSII capability. We have WIP (work-in-process) tracking, manufacturing online order entry system—all fully integrated. If I can get strangers on the other side of the world to send me purchase orders after eight lines of chat and build sets of chips successfully, there is no doubt in my mind that the bottoms-up Internet of Everything crowd will be interested.”
Where it fits
In the general scheme of things, machine learning is what makes artificial intelligence possible. There is ongoing debate about which is a superset of the other, but suffice it to say that an artificially intelligent machine must utilize machine-learning algorithms to make choices based upon previous experience and data. The terms are often confusing, in part because they are blanket terms that cover a lot of ground, and in part because the terminology is evolving with technology. But no matter how those arguments progress, machine learning is critical to AI and its more recent offshoot, deep learning.
“Deep learning, as a subset of machine learning, is the most potent disruptive force we have seen because it has the ability to change what the hardware looks like,” said Chris Rowen, Cadence fellow and CTO of the company’s IP Group. “In mission-critical situations, it can have a profound effect on the hardware. Deep learning is all about making better guesses, but the nature of correctness is difficult to define. There is no way you get that right 100% of the time.”
But it is possible, at least in theory, to push closer to 100% correctness over time as more data is included in machine-learning algorithms.
“The more data you have, the better off you are,” said Microsoft’s Baker. “If you look at test images, the more tests you can provide the better.”
There is plenty of agreement on that, particularly among companies developing complex SoCs, which have quickly spiraled beyond the capabilities of engineering teams.
“I’ve never seen this fast an innovation of algorithms that are really effective at solving problems, said Mark Papermaster, CTO of Advanced Micro Devices. “One of the things about these algorithms that is particularly exciting to us is that a lot of it is based around the pioneering work in AI, leveraging what is called a gradient-descent analysis. This algorithm is very parallel in nature, and you can take advantage of the parallelism. We’ve been doing this and opening up our GPUs, our discrete graphics, to be tremendous engines to accelerate the machine learning. But different than our competitors, we are doing it in an open source environment, looking at all the common APIs and software requirements to accelerate machine learning on our CPUs and GPUs and putting all that enablement out there in an open source world.”
Sizing up the problems
Still, algorithms are only part of the machine-learning picture. A system that can optimize hardware as well as software over time is, by definition, evolving from the original system spec. How that affects reliability is unknown, because at this point there is no way to simulate or test that.
“If you implement deep learning, you’ve got a lot of similar elements,” said Raik Brinkmann, president and CEO of OneSpin Solutions. “But the complete function of the system is unknown. So if you’re looking at machine learning error rates and conversion rates, there is no way to make sure you’ve got them right. The systems learn from experience, but it depends on what you give them. And it’s a tough problem to generalize how they’re going to work based on the data.”
Brinkmann said there are a number of approaches in EDA today that may apply, particularly with big data analytics. “That’s an additional skill set—how to deal with big data questions. It’s more computerized and IT-like. But parallelization and cloud computing will be needed in the future. A single computer is not enough. You need something to manage and break down the data.”
Brinkmann noted that North Carolina State University and the Georgia Institute of Technology will begin working on these problems this fall. “But the bigger question is, ‘Once you have that data, what do you do with it?’ It’s a system without testbenches, where you have to generalize behavior and verify it. But the way chips are built is changing because of machine learning.”
ARM’s Neifert considers this a general-purpose compute problem. “You could make the argument in first-generation designs that different hardware isn’t necessary. But as we’ve seen with the evolution of any technology, you start with a general-purpose version and then demand customized hardware. With something like advanced driver assistance systems (ADAS), you can envision a step where a computer is defining the next-generation implementation because it requires higher-level functionality.”
That quickly turns troubleshooting into an unbounded problem, however. “Debug is a whole different world,” said Jim McGregor, principal analyst at Tirias Research. “Now you need a feedback loop. If you think about medical imaging, 10 years ago 5% of the medical records were digitized. Now, 95% of the records are digitized. So you combine scans with diagnoses and information about whether it’s correct or not, and then you have feedback points. With machine learning, you can design feedback loops to modify those algorithms, but it’s so complex that no human can possibly debug that code. And that code develops over time. If you’re doing medical research about a major outbreak, humans can only run so many algorithms. So how do you debug it if it’s not correct? We’re starting to see new processes for deep learning modules that are different than in the past.”
Processor makers, tools vendors, and packaging houses are racing to position themselves for a role in machine learning, despite the fact that no one is quite sure which architecture is best for this technology or what ultimately will be successful.
Rather than dampen investments, the uncertainty is fueling a frenzy. Money is pouring in from all sides. According to a new Moor Insights report, as of February 2017 there were more than 1,700 machine learning startups and 2,300 investors. The focus ranges from relatively simple dynamic network optimization to military drones using real-time information to avoid fire and adjust their targets.
While the general concepts involved machine learning—doing things that a device was not explicitly programmed to do—date back to the late 1940s, machine learning has progressed in fits and starts since then. Stymied initially by crude software (1950s through 1970s), then by insufficient processing power, memory and bandwidth (1980s through 1990s), and finally by deep market downturns in electronics (2001 and 2008), it has taken nearly 70 years for machine learning to advance to the point where it is commercially useful.
Several things have changed since then:
- The technology performance hurdles of the 1980s and 1990s are now gone. There is almost unlimited processing power, with more on the way using new chip architectures, as well as packaging approaches such as 2.5D and fan-out wafer-level packaging. Very fast memory is already available, with more types on the way, and advances in silicon photonics can speed up storage and retrieval of large blocks of data as needed.
- There are ready markets for machine learning in the data center and in the autonomous vehicle market, where the central logic of these devices will need regular updates to improve safety and reliability. Companies involved in these markets have deep pockets or strong backing, and they are investing heavily in machine learning.
- The pendulum is swinging back to hardware, or at the very least, a combination of hardware and software, because it’s faster, uses less power, and it’s more secure than putting everything into software. That bodes well for machine learning because of the enormous processing requirements, and it changes the economics for semiconductor investments.
Nevertheless, this is a technology approach riddled with uncertainty about what works best and why.
“If there was a winner, we would have seen that already,” said Randy Allen, director or advanced research at Mentor Graphics. “A lot of companies are using GPUs, because they’re easier to program. But with GPUs the big problem is determinism. If you send a signal to an FPGA, you get a response in a given amount of time. With a GPU, that’s not certain. A custom ASIC is even better if you know exactly what you’re going to do, but there is no slam-dunk algorithm that everyone is going to use.”
ASICs are the fastest, cheapest and lowest power solution for crunching numbers. But they also are the most expensive to develop, and they are unforgiving if changes are required. Changes are almost guaranteed with machine learning because the field is still evolving, so relying ASICs—or at least relying only on ASICs—is a gamble.
This is one of the reasons that GPUs have emerged as the primary option, at least in the short-term. They are inexpensive, highly parallel, and there are enough programming tools available to test and optimize these systems. The downside is they are less power efficient than a mix of processors, which can include CPUs, GPUs, DSPs and FPGAs.
FPGAs add the additional element of future-proofing and lower power, and they can be used to accelerate other operations. But in highly parallel architectures, they also are more expensive, which has renewed attention on embedded FPGAs.
“This is going to take 5 to 10 years to settle out,” said Robert Blake, president and CEO of Achronix. “Right now there is no agreed upon math for machine learning. This will be the Wild West for the next decade. Before you can get a better Siri or Alexa interface, you need algorithms that are optimized to do this. Workloads are very diverse and changing rapidly.”
Massive parallelism is a requirement. There is also some need for floating point calculations. But beyond that, it could be 1-bit or 8-bit math.
“A lot of this is pattern matching of text-based strings,” said Blake. “You don’t need floating point for that. You can implement the logic in an FPGA to make the comparisons.”
Learning vs. interpretation
One of the reasons why this becomes so complex is there are two main components to machine learning. One is the “learning” phase, which is a set of correlations or pattern matches. In machine vision, for example, it allows a device to determine whether an image is a dog or a person. That started out as 2D comparisons, but databases have grown in complexity. They now include everything from emotions to movement. They can discern different breeds of dogs, and whether a person is crawling or walking.
The harder mathematics problem is the interpretation phase. That can involve inferencing—drawing conclusions based on a set of data and then extrapolating from those conclusions to develop presumptions. It also can include estimation, which is how economics utilizes machine learning.
At this point, much of the inferencing is being done in the cloud because of the massive amount of compute power required. But at least some of that will be required to be on-board in autonomous vehicles. For one thing, it’s faster to do at least some of that locally. For another, connectivity isn’t always consistent, and in some locations it might not be available at all.
“You need real-time cores working in lock step with other cores, and you could have three or four levels of redundancy,” said Steve Glaser, senior vice president of corporate strategy and marketing at Xilinx. “You want an immediate response. You want it to be deterministic. And you want it to be flexible, which means that to create an optimized data flow you need software plus hardware plus I/O programmability for different layers of a neural network. This is any-to-any connectivity.”
How to best achieve that, however, isn’t entirely clear. The result is a scramble for market position unlike anything that has been seen in the chip industry since the introduction of the personal computer. Chipmakers are building solutions that include everything from development software, libraries, frameworks—with built-in flexibility to guard against sudden obsolescence because the market is still in flux.
What makes this so compelling for chipmakers is that the machine learning opportunity is unfolding at a time when the market for smart phone chips is flattening. But unlike phones or PCs, machine learning cuts across multiple market segments, each with the potential for significant growth (see fig. 2 below).
All of this needs to be put in context of two important architectural changes that are beginning to unfold in machine learning. The first is a shift away from trying to do everything in software to doing much more in hardware. Software is easier to program, but it’s far less efficient from a power/performance standpoint and much more vulnerable from a security standpoint. The solution here, according to Xilinx’s Glaser, is leveraging the best of both worlds by using software-defined programming. “We’re showing 6X better efficiency in images per second per watt,” he said.
A second change is the emphasis on more processors—and more types of processors—rather than fewer, highly integrated custom processors. This reverses a trend that has been underway since the start of the PC era, namely that putting everything on a single die improves performance per watt and reduces the overall bill of materials cost.
“There is much more interest in larger numbers of small processors than big ones,” said Bill Neifert, director of models technology at ARM. “We’re seeing that in the number of small processors being modeled. We’re also seeing more FPGAs and ASICs being modeled than in the past.”
Because a large portion of the growth in machine learning is tied to safety-critical systems in autonomous vehicles, that requires better modeling and better verification of systems.
“One of the benefits of creating a model as early as possible is that you can inject faults for all possible safety requirements, so that when something fails—which it will—it can fail gracefully,” said Neifert. “And if you change your architecture, you want to be able to route data differently so there are no bottlenecks. This is why we’re also seeing so much concurrency in high-performance computing.”
Measuring performance and cost with machine learning isn’t a simple formula, though. Performance can be achieved in a variety of ways, such as better throughput to memory or faster, more narrowly written algorithms for specific jobs, and highly parallel computing with acceleration. Likewise, cost can be measured in multiple ways, such as total system cost, power consumption, and sometimes the impact of slow results, such as a piece of military equipment not making decisions quickly enough in an autonomous vehicle.
Beyond that, there are challenges involving the programming environment, which is part algorithmic and part intuition. “What you’re doing is trying to figure out how humans think without language,” said Mentor’s Allen. “Machine learning is that to the nth degree. It’s how humans recognize patterns, and for that you need the right development environment. Sooner or later we will find the right level of abstraction for this. The first languages are interpreters. If you look at most languages today, they’re largely library calls. Ultimately we may need language to tie this together, either pipelining or overlapping computations. That will have a lot better chance of success than high-level functionality without a way of combining the results.”
Kurt Shuler, vice president of marketing at Arteris, agrees. He said the majority of systems developed so far are being used to jump-start research and algorithm development. The next phase will focus on more heterogeneous computing, which creates a challenge for cache coherency.
“There is a balance between computational efficiency and programming efficiency,” Shuler said. “You can make it simpler for the programmer. An early option has been to use an “open” machine learning system that consists of a mix of ARM clusters and some dedicated AI processing elements like SIMD engines or DSPs. There’s a software library, which people can license. The chip company owns the software algorithms, and you can buy the chips and board and get this up and running early. You can do this with Intel Xeon chips too, and build in your or another company’s IP using FPGAs. But these initial approaches do not slice the problem finely enough, so basically you’re working with a generic platform, and that’s not the most efficient. To increase machine learning efficiency, the industry is moving toward using multiple types of heterogeneous processing elements in these SoCs.”
In effect, this is a series of multiply and accumulate steps that need to be parsed at the beginning of an operation and recombined at the end. That has long been one of the biggest hurdles in parallel operations. The new wrinkle is that there is more data to process, and movement across skinny wires that are subject to RC delay can affect both performance and power.
“There is a multidimensional constraint to moving data,” said Raik Brinkmann, CEO of OneSpin Solutions. “In addition, power is dominated by data movement. So you need to localize processing, which is why there are DSP blocks in FPGAs today.”
This gets even more complicated with deep neural networks (DNNs) because there are multiple layers of networks, Brinkmann said.
And that creates other issues. “Uncertainty in verification becomes a huge issue,” said Achim Nohl, technical marketing manager for high-performance ASIC prototyping systems at Synopsys. “Nobody has an answer to signing off on these systems. It’s all about good enough, but what is good enough? So it becomes more and more of a requirement to do real-world testing where hardware and software is used. You have to expand from design verification to system validation in the real world.”
Not all machine learning is about autonomous vehicles or cloud-based artificial intelligence. Wherever there is too much complexity combined with too many choices, machine learning can play a role. There are numerous cases where this is already happening.
NetSpeed Systems, for example, is using machine learning to develop network-on-chip topologies for customers. eSilicon is using it to choose the best IP for specific parameters involving power, performance and cost. And ASML is using it to optimize computational lithography, basically filling in the dots on a distribution model to provide a more accurate picture than a higher level of abstraction can intrinsically provide.
“There is a lot of variety in terms of routing,” said Sailesh Kumar, CTO at NetSpeed Systems. “There are different channel sizes, different flows, and how that gets integrated has an impact on quality of service. Decisions in each of those areas lead to different NoC designs. So from an architectural perspective, you need to decide on one topology, which could be a mesh, ring or tree. The simpler the architecture, the fewer potential deadlocks. But if you do all of this manually, it’s difficult to come up with multiple design possibilities. If you automate it, you can use formal techniques and data analysis to connect all of the pieces.”
The machine-learning component in this case is a combination of training data and deductions based upon that data.
“The real driver here is fewer design rules,” Kumar said. “Generally you will hard-code the logic in software to make decisions. As you scale, you have more design rules, which makes updating the design rules an intractable problem. You have hundreds of design rules just for the architecture. What you really need to do is extract the features so you can capture every detail for the user.”
NetSpeed has been leveraging with commercially available tools for machine learning. eSilicon, in contrast, built its own custom platform based upon its experience with both internally developed and commercial third-party IP.
“The fundamental interaction between supplier and customer is changing,” said Mike Gianfagna, eSilicon‘s vice president of marketing. “It’s not working anymore because it’s too complex. There needs to be more collaboration between the system vendor, the IP supplier, the end user and the ASIC supplier. There are multiple dimensions to every architecture and physical design.”
ASML, meanwhile, is working with Cadence and Lam Research to more accurately model optical proximity correction and to minimize edge placement error. Utilizing machine learning, model allowed ASML to improve the accuracy of mask, optics, resist and etch models to less than 2nm, said Henk Niesing, director of applications product management at ASML. “We’ve been able to improve patterning through collaboration on design and patterning equipment.”
Machine learning is gaining ground as the best way of dealing with rising complexity, but ironically there is no clear approach to the best architectures, languages or methodologies for developing these machine learning systems. There are success stories in limited applications of this technology, but looked at as a whole, the problems that need to be solved are daunting.
“If you look at embedded vision, that is inherently so noisy and ambiguous that it needs help,” said Cadence Fellow Chris Rowen. “And it’s not just vision. Audio and natural languages have problems, too. But 99% of captured raw data is pixels, and most pixels will not be seen or interpreted by humans. The real value is when you don’t have humans involved, but that requires the development of human cognition technology.”
And how to best achieve that is still a work in progress—a huge project with lots of progress, and still a very long way to go. But as investment continues to pour into this field, both from startups and collaboration among large companies across a wide spectrum of industries, that progress is beginning to accelerate.
Over the past few years, the term “deep learning” has firmly worked its way into business language when the conversation is about Artificial Intelligence (AI), Big Data and analytics. And with good reason – it is an approach to AI which is showing great promise when it comes to developing the autonomous, self-teaching systems which are revolutionizing many industries.
Deep Learning is used by Google in its voice and image recognition algorithms, by Netflix and Amazon to decide what you want to watch or buy next, and by researchers at MIT to predict the future. The ever-growing industry which has established itself to sell these tools is always keen to talk about how revolutionary this all is. But what exactly is it? And is it just another fad being used to push “old fashioned” AI on us, under a sexy new label?
In my last article I wrote about the difference between AI and Machine Learning (ML). While ML is often described as a sub-discipline of AI, it’s better to think of it as the current state-of-the-art – it’s the field of AI which today is showing the most promise at providing tools that industry and society can use to drive change.
In turn, it’s probably most helpful to think of Deep Learning as the cutting-edge of the cutting-edge. ML takes some of the core ideas of AI and focuses them on solving real-world problems with neural networks designed to mimic our own decision-making. Deep Learning focuses even more narrowly on a subset of ML tools and techniques, and applies them to solving just about any problem which requires “thought” – human or artificial.
How does it work?
Essentially Deep Learning involves feeding a computer system a lot of data, which it can use to make decisions about other data. This data is fed through neural networks, as is the case in machine learning. These networks – logical constructions which ask a series of binary true/false questions, or extract a numerical value, of every bit of data which pass through them, and classify it according to the answers received.
Because Deep Learning work is focused on developing these networks, they become what are known as Deep Neural Networks – logic networks of the complexity needed to deal with classifying datasets as large as, say, Google’s image library, or Twitter’s firehose of tweets.
With datasets as comprehensive as these, and logical networks sophisticated enough to handle their classification, it becomes trivial for a computer to take an image and state with a high probability of accuracy what it represents to humans.
Pictures present a great example of how this works, because they contain a lot of different elements and it isn’t easy for us to grasp how a computer, with its one-track, calculation-focused mind, can learn to interpret them in the same way as us. But Deep Learning can be applied to any form of data – machine signals, audio, video, speech, written words – to produce conclusions that seem as if they have been arrived at by humans – very, very fast ones. Let’s look at a practical example.
Take a system designed to automatically record and report how many vehicles of a particular make and model passed along a public road. First, it would be given access to a huge database of car types, including their shape, size and even engine sound. This could be manually compiled or, in more advanced use cases, automatically gathered by the system if it is programmed to search the internet, and ingest the data it finds there.
Next it would take the data that needs to be processed – real-world data which contains the insights, in this case captured by roadside cameras and microphones. By comparing the data from its sensors with the data it has “learned”, it can classify, with a certain probability of accuracy, passing vehicles by their make and model.
So far this is all relatively straightforward. Where the “deep” part comes in, is that the system, as time goes on and it gains more experience, can increase its probability of a correct classification, by “training” itself on the new data it receives. In other words it can learn from its mistakes -just like us. For example it may incorrectly decide that a particular vehicle was a certain make and model, based on their similar size and engine noise, overlooking another differentiator which it determined had a low probability of being important to the decision. By learning that this differentiator is, in fact, vital to understanding the difference between two vehicles, it improves the probability of a correct outcome next time.
So what can Deep Learning do?
Probably the best way to finish this article and give some insight into why this is all so ground breaking is to give some more examples of how Deep Learning is being used today. Some impressive applications which are either deployed or being worked on right now include:
Navigation of self-driving cars – Using sensors and onboard analytics, cars are learning to recognize obstacles and react to them appropriately using Deep Learning.
Recoloring black and white images – by teaching computers to recognize objects and learn what they should look like to humans, color can be returned to black and white pictures and video.
Predicting the outcome of legal proceedings – A system developed a team of British and American researchers was recently shown to be able to correctly predict a court’s decision, when fed the basic facts of the case.
Precision medicine – Deep Learning techniques are being used to develop medicines genetically tailored to an individual’s genome.
Automated analysis and reporting – Systems can analyze data and report insights from it in natural sounding, human language, accompanied with infographics which we can easily digest.
It is somewhat easy to get carried away with the hype and hyperbole which is often used when these cutting edge technologies are discussed (and particularly, sold). But in truth, it’s often deserved. It isn’t uncommon to hear data scientists say they have tools and technology available to them which they did not expect to see this soon – and much of it is thanks to the advances that Machine Learning and Deep Learning have made possible.
Machine learning is the process of building analytical models to automatically discover previously unknown patterns from data that indicate associations, sequences, anomalies (outliers), classifications, and clusters and segments. These patterns reveal hidden rules as to why an event happened—for example, rules that predict likely customer churn. Businesses can take advantage of several kinds of uses for machine learning:
- Segmentation, or grouping sets of customers who have similar buying patterns for targeted marketing
- Classification based on a set of attributes to make a prediction—for example propensity to buy, customers with insurance policies likely to lapse and equipment failure that triggers preventive maintenance
- Forecasts—for example, sales projections based on time series
- Pattern discovery that associates one product with another to reveal cross-sell opportunities and sequences—for example, products that sell together over time
- Anomaly detection—for example, detecting fraud
Predictive analytics model methodology
The widely used Cross Industry Standard Process for Data Mining (CRISP-DM) methodology is used to develop predictive analytical models. CRISP-DM includes six phases: business understanding, data understanding, data preparation, model development using supervised and unsupervised learning, model evaluation and model deployment.
The business understanding phase involves defining the business problem or use case, the business objectives and the business questions that need to be answered. It also involves defining success criteria. Then the standard project-related tasks need to be performed. These tasks include defining resource requirements such as people and money, technology requirements, creating a project plan, defining any constraints, assessing risks and creating a contingency plan.
The data understanding phase involves data requirements such as internal and external data sources and data characteristics including data volumes, variety, velocity, formats and so on, as well as whether the data is in flat files, a relational database, a Hadoop Distributed File System (HDFS) or if it is live, streaming data.
This phase also includes data exploration using statistical analysis to look at data—for example, basic statistics about each data column and any information about whether data is skewed in any way. Visualizations such as histograms and scatterplots help with drilling down on outliers and errors. In addition, a data quality assessment involves understanding the degree to which data is missing, has errors, is inconsistent and is duplicated.
The objective of the data preparation phase is to produce a set of data that can be fed into machine-learning algorithms. This process requires a number of tasks including data enrichment, filtering and cleaning; data conversion; data transformation; and variable identification, which is also known as feature selection or dimensionality reduction. Variable identification’s objective is to create a data set of the most highly relevant variables to be used as model input to get optimal results. The intention is also to remove variables from a data set that are not useful as model input without compromising the model’s accuracy—for example, the accuracy of the predictions it makes.
The model development phase is about the development of a machine-learning model. Models can be built to predict, forecast or analyze data to find patterns such as associations and groups. Two types of machine learning can be used in model development: supervised learning and unsupervised learning.
Typically, predictive models are built using supervised learning. For example, if we want to develop a model that predicts equipment failure, we can use data that describes equipment that has actually failed. We can use that data to train the model to recognize the profile of a piece of equipment that is likely to fail. To accomplish this profile recognition, we split the data set containing failed equipment data records into a training data set and a test data set. Then we train the model by feeding the training data set into an algorithm, several of which can be used for prediction. Then we test the model using the test data set.
Unsupervised learning is a process of analyzing data to try and find hidden patterns in the data that indicate product association and groupings—for example, customer segmentation. Grouping is based on maximizing or minimizing similarity. The K-means clustering algorithm is a widely used algorithm for this approach. Predictive and descriptive analytical models can be built using advanced analytics or data mining tools, data science interactive workbooks with procedural or declarative programming languages, analytics clouds and automated model development tools.
After a model is developed, the next phase is to evaluate the accuracy of predictions or groupings. For predictions, this evaluation means understanding how many predictions were correct and how many were incorrect. Various methods can accomplish this evaluation. Key measures in model evaluation are the number of true positives, false positives, true negatives and false negatives. The bottom line is that we need to make sure that the model is accurate; otherwise, it could generate lots of false positives that may result in incorrect decisions and actions.
Once we are happy with the model we’ve developed, the final phase involves deploying models to run in many different environments. These environments include spreadsheets, analytics servers, applications, database management systems (DBMSs), analytical relational database management systems (RDBMSs), Apache Hadoop, Apache Spark and streaming analytics platforms: