Engaging machine learning in enterprises

Start to Scale ML

Add bookmark

IAN Editorial Team
11/09/2022

Introduction

For a better understanding of how to apply machine learning to business, let’s speak briefly about the term itself. Until recently, computers could be used for solving business problems only if explicit rules were written for them. Complex “if-else” instructions development took time and significant effort. It was critical in rapidly changing fields: the rules became outdated before the computer system was ready to use.

Today, machine learning avoids explicit programming allows the machine to learn rules from large datasets.

The heart of machine learning systems is the algorithm. Some of them are quite specific, while others, like decision trees and neural networks, are very general. Today’s gargantuan amounts of data – along with modern compute power – allows for next generation effectiveness of algorithms and brand-new areas of application for machine learning.

94 percent AIIA Survey November, 2019 of AIIA respondents are engaged with ML integration partners.

ML Challenges

1. Not having the data

Enterprise big data isn’t big data at all. The vast oceans of unstructured data outside of the stand-alone enterprise offer the opportunity for unlocking true insight. Structuring the most valuable in-house data while finding the most valuable external data is expensive and time consuming.

2. Not having a standard

The combination of ML technology coupled with current compute is in its infancy. Integration partners are offering solutions that are unproven at scale. Plug & Play has become Plug, Play & Learn.

3. Not having the talent

Even when you do find an expensive data scientist, recruiting the best talent is a challenge when your competition in the advanced tech industry.

4. Not showing your work

Advanced machine learning confounds data scientists- mostly because the path to the output cannot be shown. This is a problem for any enterprise and acutely an issue for highly regulated industry.

Also read: AI and Bots In Banking & Finance

ML Talent

Knowing number of AI engineers to employ is also an enormous challenge, as talent availability is a significant constraint across the globe. O’Reilly’s Data Science Salary Survey found that the average base salary of a global data scientist was $90,000. And those are front line engineers. That salary quickly rises to $500K - $1M, per the New York Times (Metz 4/19/19) once you add some actual work and strategy experience to the person in question.

Salary median and IQR* (US DOLLARS)

ML Limitations

Machine learning is not a magical solution that applies to every single use case. So often companies embark on an AI journey without a clear understanding of the value it should bring to their business. As a result, many data science and machine learning projects don’t have clear KPIs and simply drain R&D budgets.

Machine learning has certain limitations, and it currently doesn’t fit into every business case of every domain.

An enterprise executive interested in enhancing existing business workflows through machine learning, needs to thoroughly understand the actual capabilities of this technology. Executives hoping to narrow the technological gap must be able to address artificial intelligence in an informed way. In other words, they need to understand not just where AI can boost innovation, insight, and decision making; lead to revenue growth; and capture of efficiencies —but also where AI can’t yet provide value.

ML Categories

Classification models are used to break down large datasets into meaningful subsets. The most straightforward examples are image recognition and natural language processing.
Regression models identify trends to make predictions. Sales forecasts that take into account thousands of factors from macroeconomic indicators to weather forecasts to political threats.

ML Use cases (Part I)

For illustrative purposes, it will be helpful to list a number of well-established business use-cases for machine learning so that you (the reader) can churn up your own application ideas:

Face detection

It’s incredibly difficult to write a set of “rules” to allow machines to detect faces (consider all the different skin colors, angles of view, hair / facial hair, etc.), but an algorithm can be trained to detect faces, like those used at Facebook. Many tools for facial detection and recognition are open source.

Email spam filters

Some spam filtering can be done by rules (IE: by overtly blocking IP addresses known explicitly for spam), but much of the filtering is contextual based on the inbox content relevant for each specific user. Lots of email volume and lots of user’s marking “spam” (labelling the data) makes for a good, supervised learning problem.

Product / music / movie recommendation

Each person’s preferences are different, and preferences change over time. Companies like Amazon, Netflix and Spotify use ratings and engagement from a huge volume of items (products, songs, etc.) to predict what any given user might want to buy, watch, or listen to next.

Speech recognition

There is no single combination of sounds to specifically signal human speech, and individual pronunciations differ widely – machine learning can identify patterns of speech and help to convert speech to text. Nuance Communications (maker of Dragon Dictation) is among the better-known speech recognition companies today.

Chatbots

Derived from “chat robot”, “chatbots” allow for highly engaging, conversational experiences, through voice and text, that can be customized and used on mobile devices, web browsers, and on popular chat platforms such as Facebook Messenger, or Slack. Chatbots can be built to respond to either voice or text in the language native to the user. You can embed customized chatbots in everyday workflows, to engage with your employee workforce or consumer engagements.

Real-time bidding (online advertising)

Facebook and Google could never write specific “rules” to determine which ads a given type of user is most likely to click on. Machine learnings helps to identify patterns in user behaviour and determine which individual advertisements are most likely to be relevant to which individual user.

Credit card purchase fraud detection

Also read: 2022 Report On Unstructured Data

Like email spam filters, only a small portion of fraud detection can be done using concrete rules. New fraud methods are constantly being used, and systems must adapt to detect these patterns in real time, coaxing out the common signals associated with fraud.

ML Questions (Part I)

What type of machine learning do I need?

There are three major types of machine learning: supervised, unsupervised, and reinforcement learning. Let’s check out the use cases for each one of them.

Supervised:

A big portion of current machine learning development projects deal with supervised learning. You have an input data X and a target variable Y that you want to predict. For instance an X could be parameters that describe a person like gender, age and personal preferences. Looking at this input data, you want to predict Y, how likely the person is to click your marketing ad in Facebook. This technique is valid when you’ve got some big datasets of customer information and historical records that reveal who clicked your ads in the past. A supervised machine learning model analyzes that input data to find patterns and predict what demographic groups are most likely to click your ad. Other use cases for supervised learning would be credit scoring, underwriting, equipment diagnostics and more.

Unsupervised:

With unsupervised learning, there’s just input data X and no target variables. Machine learning models then groups input data according to its reasoning. AI algorithms work through huge datasets and often find patterns and dependencies that humans cannot identify. The characteristics of unsupervised learning make it helpful for any individual researcher or research company to sift through large amounts of data, in order to find a pattern or unusual points.

Reinforcement learning:

With reinforcement learning, data scientists specify the rules of the “game”, the environment where the “game” takes place, and the final reward (in chess analogy, which would be the victory). As machine learning algorithms start “playing the game”, they try different strategies and learn from their previous experience to maximize the final reward. Some of the most famous examples of reinforcement learning is Google’s AlphaGo.

Deep learning:

This is a technique that utilizes artificial neural networks, is applicable to all three machine learning types, but is most often used in supervised learning. Deep learning is excellent at classifying objects based on their features. For instance, it can be used to categorize pictures of cats and dogs with high precision. Deep learning is behind Facebook’s Face Recognition technology, which is 97 percent accurate per Facebook. The same technology powers advanced natural language processing (NLP), image and speech recognition software, which can be used in document processing (e.g., legal documents), sentiment analysis and word-processing software.

ML Questions (PART II)

Regardless of the type of ML you employ, consider your data:

Do I have any data? Do I have enough data?

Yes, this sounds like an obvious question, but machine learning works best when you have significant amounts of data, with no signs of slowing down its accumulation. Deploying ML against a relatively modest set of data is almost like trying to start a charcoal fire with a blowtorch: Yes, you can do it, but why? There may be more costeffective ways of analysing your data. (Defining a “relatively modest set” is a matter of perspective: Several years ago, “relatively modest” meant several hundred gigabytes. Today, it may mean dozens of terabytes or even several petabytes.)

What shape is my data in?

If your data is coming from a variety of sources, or if it hasn’t been cleaned and standardized into a consistent set, you won’t get value from an analytical exercise.

Have I thought about the total costs involved? Lots of data means lots of storage. While the per-gigabyte cost of storage is sharply lower than it used to be, you can still run up a sizable tab. That’s before you’ve even considered computational costs: How much data do you analyze? How often? Who does it? In many cases, a data scientist may be needed to coordinate efforts. A cost analysis of the options is required to help make an informed decision.

“Clean data is better than big data” - Data Scientist common refrain.

If you have reams of business data from years ago, it may have no relevance today, particularly in fields where the basic business processes change drastically year-over-year, such as mobile eCommerce). If you have reams of unstructured and disjointed data, you may have too much “cleaning” to do before you can ever get around to learning from the information collected.

Uber’s Former Head of Machine Learning Danny Lange once recommended that companies just starting out in machine learning should begin by applying supervised machine learning to historical data. Find data that’s already clean and relatively recent and use labelled training data to start finding insights. Note that in a rapidly changing field, newer data is positively required. For example, if you run a door delivery service for pet supplies, and your app, prices, product offerings, and service areas have changed significantly over the last six months, you will need much more recent data to learn from than, say, a company selling homeowners’ insurance in Montana. If data is not related to the relevant trends and nuances of your current business, it is unlikely to glean predictive value.

ML Questions (PART III)

Can your solution to this problem afford for some allowance of error?

ML might be thought of as a kind of “skill”, in the same sense that one might apply the word to human beings. A skill that’s alive, adapting, growing and informed by experience. For this reason, an ML solution will often be incorrect a certain percentage of the time, especially when it’s informed by new or varied stimuli. If your task absolutely cannot allow for any error, ML is likely to be the wrong tool for the job.

An example of an application that cannot allow for error might be an application that aims to read the amount of an invoice or bill and then pay that invoice or bill. One letter difference or one number difference could mean overpaying your bill by 10x the original amount (if the decimal was interpreted to be in the wrong place) or sending money to the wrong company (if an invoicing company name isn’t registered exactly).

In a case like above, some degree of ML might help with “bucketing” different types of bills or invoices, but the final decision to enter the payment amount and send a payment would likely require an accountable human.

No man is defeated without until he has first been defeated within, Eleanor Roosevelt

Mistakes are the portals of discovery, James Joyce

A person who never made a mistake never tried anything new, Albert Einstein

I didn’t fail the test, I just found 100 ways to do it wrong, Benjamin Franklin

Failure is the condiment that gives success its flavor, Truman Capote

Never confuse a single defeat with a final defeat, F. Scott Fitzgerald

Failures are finger posts on the road to achievement. One fails forward toward success, C.S. Lewis

To err is human, but it feels divine, Mae West.

ML Questions (PART IV)

Diving in on your talent choices:

What are some of the options available to hire the right talent?

Blaze your own path

There are a few options available when it comes to hiring the right talent and like any other recruiting episode, it boils down to fulfilling the requirements and budget constraints of the company. If the budget permits, then the company can choose to hire Data Scientists and Machine Learning Engineers and set up its own AI department. But as mentioned before, it’s important to keep in mind that ML engineers on average require double the salary of regular software engineers.

Partner with professionals

The next option is to find an integration partner. Exploring machine learning with cloud engines from Google, Amazon and the like is perhaps the easiest way to gain access to machine learning technology. The biggest challenge of outsourcing machine learning tasks is to align corporate limitations of sharing data with external expert assistance. Depending on the type of data you have, you may need to anonymize it in a way that it doesn’t reveal sensitive details, like customer contacts, their location, etc. You should also keep in mind that an anonymized data set doesn’t allow an analyst to enrich it by using external sources or applying his/her own understanding of a problem to build a more efficient model.

Partner with the next generation

One other option that has the potential to save a lot of money is to partner with universities. In the US, there are about a dozen Ph.D. data science programs available at universities and nearly the same number of computer science programs that are actively emphasizing data science. Another popular way to fill the skills gap is boot camps where attendees take 12-month or so courses. This option seems very promising for companies that aren’t ready to invest into hiring experienced experts, though you should always consider additional internal training to accumulate essential domain expertise.

Conclusion

Research institutions and tech companies have made massive progress in certain areas of machine learning, including computer vision, speech recognition, and natural language processing. Still, this technology is not a silver bullet. For now, though, most of the news is coming from the suppliers of ML technologies. And many new uses are only in the experimental phase. Few products are on the market or are likely to arrive there soon to drive immediate and widespread adoption. As a result, analysts remain divided as to the potential of ML: some have formed a rosy consensus about ML’s potential while others remain cautious about its true economic benefit. This lack of agreement is visible in the large variance of current market forecasts, which range from the hundreds of millions to couples of billions. Given the size of investment being poured into ML, acting quickly is important but preparing to act is paramount.

Read the Report Here

Tags: ML Technology Enterprise ML Data AI NLP