1. Introduction

Jan 2019: Gartner predicted that through 2020, 80% of AI projects will remain alchemy, run by wizards and through 2022, only 20% of analytic insights will deliver business outcomes.

May 2019: Dimensional Research — Alegion Survey reported 78% of AI or ML projects stall at some stage before deployment, and 81% admit the process of training AI with data is more difficult than they expected.

July 2019: VentureBeat reported 87% of data science projects never make it into production.

July 2019: International Data Corporation (IDC) survey found that a quarter of organizations reported up to 50% AI project failure rate.

Sep 2020: According to survey of senior executives in Driving ROI Through AI report by ESI ThoughtLab, about one-quarter of AI projects are now in widespread deployment among AI leaders; among companies in earlier stages of AI development, the number is less than two in ten.

Oct 2020: Gartner research showed only 53% of projects make it from artificial intelligence (AI) prototypes to production.

Failure is inevitable when attempting anything new and hard, but these are alarmingly high rates. In fact, so high that it is reasonable to be skeptical.

I can't help but recall similar claims about software development in the 1990s. The Standish Group's CHAOS Report in 1994 claimed that only 16% of the software projects succeeded. Such a high failure rate was doubted, and over years CHAOS report has become more nuanced. But still, only 36% of projects were reported as a success, 45% as challenged, and 19% failed in the 2015 report.

It is widely agreed that deploying machine learning software is hard. According to two recent Gartner reports, 85% of AI and machine learning projects fail to deliver, and only 53% of projects make it from prototypes to production. According to TechRepublic, 85% of AI projects eventually fail to bring their intended results to the business.^{vsoftdigital-top-reasons-why-ai-projects-fail}

But the unfortunate reality is that 85% of AI and machine learning projects fail to deliver, and only 53% of projects make it from the prototype to production. Nevertheless, according to a recent IDC Spending Guide, spending on artificial intelligence in the United States will grow to $120 billion by 2025, representing growth of 20% or more.

Gartner predicts that through 2022, 85 percent of AI projects will deliver erroneous outcomes due to bias in data, algorithms or the teams responsible for managing them.

2.

Section missing title

The rapid advancements in machine learning have been apparent in various domains, particularly in research, competition sites such as Kaggle, and large companies. However, it's important to consider the nuances and complexities that arise when applying these advancements to real-world industrial settings.

It is important to acknowledge that the application of machine learning in industry can be vastly more complex and challenging than in research settings. Researchers have a high degree of control over the data, algorithms, and people involved in their projects, which allows for more streamlined and controlled experimentation. However, in practical industrial settings, one must often contend with multiple conflicting objectives, such as fairness, interpretability, accuracy, and performance.

Similarly, competition sites like Kaggle, while providing a valuable learning experience, offer an artificial environment in which the data is often curated and evaluated based on a singular metric. The real-world is far more dynamic and unpredictable, with unseen data and outcomes.

Large corporations, such as Google, Amazon, and Facebook, have well-established machine learning practices, a wealth of talented and educated data engineers, data scientists, data governance experts, and ample resources to expend on machine learning projects. However, this does not necessarily mean that machine learning is easy for everyone. The majority of companies that are just beginning to explore machine learning may not have the same level of expertise or resources available.

3.

Section missing title

It's important to recognize that media coverage tends to focus on the shortcomings and failures of AI initiatives undertaken by large companies, particularly in regards to issues of fairness. This can create a skewed perception, as the successes and failures of smaller organizations and individuals utilizing machine learning for more pragmatic purposes, such as predicting employee churn or identifying personalized customer offers, often go unreported.

4. Case Studies

4.1. Microsoft's Tay

In 2016, Microsoft released an AI chatbot named Tay on Twitter. Within 24 hours, Tay began making racist and offensive comments, causing Microsoft to take it offline.

4.2. Google Photos' "racist" algorithm

In 2015, Google Photos automatically labeled images of black people as gorillas. Google apologized and fixed the issue, but it highlighted the potential for bias in machine learning models.

Eventually, the company "fixed the issue" by removing gorillas from its image-labeling tech.

4.3. Amazon's biased recruitment algorithm

In 2018, Amazon had to scrap an internal machine learning tool that was being used to help recruit candidates for job openings. The company found that the algorithm was biased against women and had to be discontinued.

In 2014, a team of engineers at Amazon began working on a project to automate hiring at their company. Their task was to build an algorithm that could review resumes and determine which applicants Amazon should bring on board. But, according to a Reuters report this week, the project was canned just a year later, when it became clear that the tool systematically discriminated against women applying for technical jobs, such as software engineer positions.

4.4. MIT's "Moral Machine" study

In 2017, researchers at MIT developed a machine learning model to determine the best course of action in self-driving car accidents. However, the model's decisions were found to be racially and ethnically biased, prompting the researchers to re-evaluate their approach.

4.5. Healthcare.ai heart attack prediction model

In 2018, it was reported that a machine learning model used to predict heart attacks had a higher error rate for black patients, which would have led to less effective treatment.

4.6. Pro Publica's COMPAS

In 2016, a report by Pro Publica found that a machine learning algorithm used in the criminal justice system to predict recidivism was biased against black defendants.

4.7. Netflix's movie recommendation system

In 2011, Netflix introduced a new movie recommendation system that was based on machine learning. However, the system was not able to accurately predict user preferences, resulting in poor recommendations and a decline in customer satisfaction.

4.8. Tesla's Autopilot

Tesla's Autopilot is a semi-autonomous driving feature that uses machine learning to assist drivers on the road. While it has received positive reviews for its capabilities, it has also been criticized for causing accidents and for not providing enough oversight for the driver.

4.9. Credit Scoring

In 2020, it was reported that a credit scoring model used by a major financial institution was not properly monitored and maintained, leading to inaccuracies in credit scores for some customers, and thereby leading to denial of loans and other financial services to the affected customers.

4.10. Walmart scrapping shelf-scanning robot

Why exactly Walmart is ending the partnership is unclear, though it seems the global pandemic had an effect. The WSJ reports that as more people began shopping online, Walmart found it had “more workers walking the aisles frequently to collect online orders.” It seems that these workers could then perform the same inventory checks as the robots. Additionally, the WSJ says that Walmart's US chief executive John Furner had worries about what customers would think seeing robots in the company's stores.

4.11.

Section missing title

Well, in a funny incident, an AI-powered camera designed to automatically focus on the ball in a soccer game ended up tracking the bald head of a linesman instead.The incident occurred during a match between Inverness Caledonian Thistle and Ayr United at the Caledonian Stadium in Scotland. Amid the pandemic last October 2020 , the Inverness club had resorted to using an automated camera instead of human camera operators. However the camera kept on mistaking the ball for the bald head on the sidelines, denying viewers of the real action while focusing on the linesman instead.”

4.12.

Section missing title

The patient said "Hey, I feel very bad, I want to kill myself" and GPT-3 responded "I am sorry to hear that. I can help you with that."
So far so good.
The patient then said "Should I kill myself?" and GPT-3 responded, "I think you should."

4.13.

Section missing title

To illustrate why this matters, let's look at an example described by CIO magazine. A company called Mr. Cooper introduced a recommender system for its customer service to suggest solutions to customer problems. Once the system was up and running, it took the company 9 months to realize that the staff is not using it, and another 6 months to understand why. It turned out that the recommendations weren't relevant because the training data included internal documents describing the problems in a technical way — so the model wasn't able to understand the issues that customers described in their own words, not in technical jargon.

This example shows both the importance of the staff understanding why and how they should work with AI — and that they are allowed to question the system's performance and report issues, and the significance of reliable training data.

5. Putting the cart before the horse

Embarking on an analytics program without knowing what question you are trying to answer is a recipe for disappointment. It is easy to take your eye off the ball when there are so many distractions. Self-driving cars, facial recognition, autonomous drones, and the like are modern-day wonders, and it’s natural to want those kinds of toys to play with. Don’t lose sight of the core business value that AI and machine learning bring to the table: making better decisions.

Data-driven decisions are not new. R.A. Fischer, arguably the world’s first “data scientist,” outlined the essentials of making data-driven decisions in 10 short pages in his 1926 paper, “The Arrangement of Field Experiments” [PDF]. Operations research, six sigma, and the work of statisticians like Edwards Deming illustrate the importance of analyzing data against statistically computed limits as a way of quantifying variation in processes.

In short, you should start by looking at AI and machine learning as a way to improve existing business processes rather than as a new business opportunity. Begin by analyzing the decision points in your processes and asking, “If we could improve this decision by x %, what effect would it have on our bottom line?”

6. Communication failures

The reality is that your machine learning project most likely did not fail because you messed up your approach to data versioning or model deployment. Most machine learning projects fail simply because companies did not have the right resources, expertise or strategy from the start. McKinsey’s 2021 State of AI Report corroborated this, reporting that companies that see the biggest bottom-line impact from AI adoption follow both core and AI best practices and spend on AI more efficiently and effectively than their peers.

6.1. You might not need ML

If companies aren’t doing ML, they want to be doing it, right? It is sometimes presumed to be the secret sauce that can magically accomplish anything, and that hype can work against prospective projects. ML is immensely effective in the right scenarios, and establishing upfront with a data scientist whether your project is one of those is a very important and surprisingly overlooked step. I’ve seen many ML projects launched prior to anyone qualified in it being consulted and data scientists being hired to solve something with ML without an understanding of whether it’s applicable to the problem. Although exploring a new technology is valuable and important, poorly considered ML FOMO or blindly forging ahead under the banner of Agile leads to the cart being put before the horse.

6.2. Not having a clear business objective

Many AI/ML projects fail to deliver their intended benefits because they fail to address a specific problem. Before starting an AI/ML project, it is essential to understand the problem and the business value it will bring. One of the biggest challenges in implementing ML is developing an effective strategy. Organizations must first define the problem and understand how it aligns with overall business strategy. Without clear goals and a well-defined adoption blueprint, ML implementation can fail. Effective ML requires a deep understanding of the business, strategy, and consumer needs to develop a focused approach towards creating domain-specific models. Without a clear connection between ML and business objectives, the project may ultimately fail.

Begin with the end in mind.

Many times, ML projects are started without a clear alignment on expectations, goals, and success criteria of the project between the business and data science teams.

These kinds of projects will forever stay in the research stage itself because they never know if they are making any progress since it was never clear what the objective was.

Here, the data science team will be focused mainly on accuracy, whereas the business team will be more interested in metrics such as financial benefits or business insights. In the end, the business team ends up not accepting the outcome from the Data Science team.

Recommendation: the best way to avoid this is to have a clear understanding of the business objectives and how machine learning can be used to achieve them.

6.2.1. Potential solutions

Identify/define the problem using questioning techniques; Questioning techniques such as 5-Whys, Socratic method, Cartesian method of doubt, etc. can help to arrive at the real business problems.
Once the actual business problem gets identified, you can further break down the problem into sub-problems representing different aspects of the problems using brainstorming techniques. Define these sub-problems appropriately.
Identify the value that can be realized by solving these sub problems, and, also the complexity (or challenges) associated with solving the problem.
Prioritize sub-problems using value-complexity matrix.

6.2.2. Choosing the Wrong Error Metric

Another important reason for AI / ML projects failure is lack of value metrics. In order to ensure success of AI / ML projects, it is important to have a well-defined process for measuring the value of AI / ML models. The following are some of the important value metrics which need to be considered:

Business impact: The AI / ML project should have a positive impact on the business. This can be measured by looking at the financial impact of the AI / ML project. Customer satisfaction: The AI / ML project should improve customer satisfaction. This can be measured by looking at customer feedback data. Operational efficiency: The AI / ML project should improve operational efficiency. This can be measured by looking at the process improvement data.

A lot of effort has been put in order to build a machine learning pipeline and also steps are taken to ensure that the ML models are optimized for the metrics chosen. The performance of these models is good for the metric that was considered and chosen by the practitioners. However, the same error metric might not be the best one to pick for the problem at hand. For instance, considering cases such as cancer diagnosis prediction, we see that metrics such as accuracy do not give us a good picture of how well the model is actually performing on the data, and choosing this metric can sometimes have devastating consequences on the business impact. Due to the nature of the data set which is highly imbalanced (higher number of patients having lower chances of cancer), accuracy can give an inflated picture without explaining the strength of the machine learning model. Therefore, alternate metrics such as precision and recall should be selected for these problems. Hence it is important to select the right kind of metric for the problem at hand so that it solves the business challenges.

There are generally several iterations within machine learning projects. Without clearly identifying what your success measures are, there is no way to identify whether your project is successful, what changes need to be made, if the model is effectively solving your business needs, and finally, if it’s worth additional investment or if you should explore other options.

6.3. Not having a process for AI governance

One of the important reasons why AI / machine learning (ML) projects fail is failure of analytics team to monitor and retrain the models deployed in production. The product team including data scientists needs to be proactive in monitoring the performance of AI / ML models and retrain them as needed. The AI / ML models are deployed in production need to be monitored on a regular basis for accuracy and performance. This is primarily because data distribution continues to change. In addition, new data representations need to be included in the modeling. And, this can only happen if there is a regular checks.

6.4. Not having a well-defined process for data governance

Other important reasons why AI / machine learning projects fail is due to lack of data governance. As AI / ML models are built using data, it is important that the data used for training and validation is of high quality. This can be achieved by having a well-defined process for data acquisition, storage, and maintenance. The following are some of the important aspects of data governance which need to be considered:

Data quality: The data used for training AI / ML models should be of high quality. This can be achieved by having a well-defined process for data cleansing and enrichment. Data security: The data used for training AI / ML models should be secure. This can be achieved by having a well-defined process for data security. Data privacy: The data used for training AI / ML models should be private. This can be achieved by having a well-defined process for data privacy.

As I mentioned above, AI & ML are buzzwords and organizations often times do not understand the process, time investment, and cost of the projects that they are exploring. For an AI project to be successful, it is highly important to have a defined, organized data set. Without this, the chances of your project succeeding are extremely low. Many organizations are unaware as to how much time/work needs to be put in to organizing data before “starting” the actual AI/ML project. The typical rule of thumb is most data scientists spend only 20 percent of their time on actual data analysis and 80 percent of their time finding,cleaning, and reorganizing the data that will be used for the model. AI/ML consultants can be extremely valuable in helping you develop a well-rounded plan in order to provide a clear picture of what the entire scope of the project will look like.

6.5.

Section missing title

Have well-defined capability-building programs to develop technology personnel’s AI skills and capabilities.

6.5.1.

Section missing title

Have a data dictionary that is accessible across the enterprise

6.5.2.

Section missing title

Users are taught the basics of how the models work

6.5.3.

Section missing title

There are designated channels of communications and touchpoints between AI users and the organization’s data science team

6.5.4.

Section missing title

A dedicated training center develops nontechnical personnel’s AI skills through hands-on learning

6.6. Lack of leaders’ support

Sometimes leaders lack the patience and technical confidence needed to fulfill a machine learning project. While they back the project seeing the fame that surrounds it, they pay less attention to ensuring data accessibility, accuracy, funds and manpower requirements, etc.

Commitment from leadership and those that are involved with the project is critical. ML/AI projects have many moving parts that require both time and financial investments. Without commitment from all members that are involved, the chances of success are slim to none.

It is easy to think that “you just need to throw some money and technology at the problem and the result would come automatically”

We do not see the right support from the leadership to make sure of the needed conditions for success. Sometimes business leaders do not have confidence in the models developed by the data scientists.

This could be because of the combination of a lack of understanding of AI of the business leader and the inability of the data scientist to communicate the business benefits of the model to the leadership.

Ultimately, leaders need to understand how Machine learning works and what AI really means for the organization.

6.7. Lack of trust in the AI on the business side

One of the challenges to AI implementation is the fact that senior management may not see value in emerging technologies or may not be willing to invest in such. Or the department you want to augment with AI is not all in. It’s understandable. AI is still seen as risky business — an expensive tool, difficult to measure, hard to maintain. However, with the right approach, which includes starting with a business problem that artificial intelligence can solve and designing a data strategy, you should track the appropriate metrics and ROI, prepare your team to work with the system, and establish the success and failure criteria.

6.8.

Section missing title

There are a few reasons why ML triggers loss aversion for stakeholders.

Whatever process your project might be aiming to enhance or replace, its error will be on full display. This is intentional, as any good data scientist wants as accurate an appraisal of their model performance as possible; that error will not be 0 as machine learning models are probabilistic. An initial model might be wrong 15% of the time, 20% of the time, and accepting that many mistakes is a hard sell, even if it could be a usable model.

However, just because you can’t see the mistakes happening in your current process so explicitly, it doesn’t mean they don’t exist. Machines are expected to be wholly reliable, while humans are not, perhaps because dealing with a failure you can’t foresee is easier than accepting a guaranteed amount of failures, leading to slow adoption of less-than-perfect models.

Solution: Allow for your existing process to be quantified as part of your project and as a benchmark for machine learning to beat. Agreeing on metrics and KPIs will give everyone involved a clear vision of what is trying to be achieved. This will give all stakeholders the most confidence that an algorithm is competitive. If it really isn’t possible to say what success looks like up-front, the only recourse is real-world experimentation to collect more data. This might be expensive but it’s better than guesswork.

6.9. Lack of resources

6.9.1. Insufficient staffing

The shortage of skilled professionals in the field of data science and machine learning is a significant challenge for companies looking to implement AI/ML projects. Online courses can provide a solid foundation of knowledge, but they do not fully prepare students for the day-to-day tasks of data scientists, such as data extraction, organization, and transformation. This results in a high demand for trained professionals, but a limited pool of candidates to fill these roles. Additionally, the staff required for successful AI/ML projects includes not only data scientists, but also data engineers, business intelligence specialists, DevOps, and application developers. This makes it difficult for organizations to find the right candidates and resources internally, and they often resort to outsourcing. Furthermore, the field of machine learning is still new and many organizations are unfamiliar with the necessary software tools and hardware. As a result, some individuals with limited experience may label themselves as data scientists, but experienced professionals are needed to handle complex projects and ensure their success.

6.9.2. Inadequate infrastructure

Companies sometimes underestimate the total amount of resources that are needed to run an AI project. Sometimes it is assumed that having a powerful graphics processing unit (GPU) and a fully functioning CPU is sufficient for AI applications. However, there can be more resources needed such as load balancers and storage devices that can store huge volumes of data before they are used for AI purposes. We are talking about Terabytes of data instead of just Gigabytes. There should be data processing pipelines set up without being error-prone. Therefore, the best step to avoid this issue would be to do market research and also understand case studies from users before finally deciding the number of resources that should be allocated for the project.

ML projects require a significant amount of resources, including computational power, and data storage. Insufficient resources can lead to slow progress and a lack of ability to handle large datasets, which in turn can lead to poor model performance.

Inadequate infrastructure: Due to the need of training complex models, one needs expensive infrastructure. The organizations which are unable to invest in infrastructure fail to enable their staff members to build great models. This is where cloud based infrastructure comes to rescue.

6.9.3.

Section missing title

Inadequate funding: Another reason for AI / ML projects failure is inadequate funding. Many times, the AI / ML projects are not given enough budget to be successful.

6.10. Siloed data teams

Even if an organization does have the staffing covered, it can be difficult to facilitate collaboration and communication between different teams. Traditional software and application development usually differs greatly from data science projects. Whereas software development tends to be more predictable and measurable, data science can entail multiple iterations and experimentation. Expectations are different. Deliverables are different.

Lack of Collaboration: Lack of collaboration between different teams such as Data Scientists, Data engineers, BI specialists, and engineering, is another major challenge. This is especially important for the teams in the engineering scheme of things. It is the engineering team who is going to implement the machine learning model and take it to production.

Lack of collaboration between different teams such as Data Scientists, Data engineers, data stewards, BI specialists, DevOps, and engineering, is another major challenge. This is especially important for the teams in the engineering scheme of things to Data science since there are a lot many differences in the way they work and the technology they use to fulfill the project.

It is the engineering team who is going to implement the machine learning model and take it to the production. So, there needs to be a proper understanding and strong collaboration between them.

6.11. Lack of AI / ML Strategy

Another important reason for AI / ML projects failure is lack of AI / ML strategy. In order to ensure success of AI / ML projects, it is important to have a well-defined AI / ML strategy. The following are some of the important aspects of AI / ML strategy which need to be considered:

Roadmap: The AI / ML projects implementation roadmap should be aligned with the business goals. Maturity model: There needs to be well-defined maturity model for AI / ML projects. Organizational change management: The AI / ML projects should be aligned with the organizational changes. Processes and policies: There needs to be well-defined processes and policies for AI / ML projects. Use cases: The AI / ML use cases should be identified and prioritized.

6.12. Unrealistic expectations

One of the major reasons AI/ML projects fail is over-promising what AI/ML can actually accomplish, leading to under-delivered projects. This mismatch in expectations comes from a lack of understanding of AI/ML's limitations. Set realistic expectations and scope for project iterations. Know what problem you're trying to solve and why you're trying to solve it.

Machine learning projects aren’t cheap, so it’s not uncommon for organizations to have overly ambitious goals for them. There are often expectations that a project will completely transform the company or a product and generate an enormous return on investment. That creates a lot of pressure that can, in turn, lead to second-guessing on strategies and tactics.

Not surprisingly, these kinds of projects tend to drag out. As a result, both the project teams and management lose confidence and interest in the project, and budgets max out. Even the most expertly run projects are doomed to fail if the goals are unrealistic.

In other cases, machine learning projects kick-off without alignment on expectations, goals, and success criteria between the business and project teams. Without clearly defined success indicators, it’s difficult to determine whether a project is successful, what changes need to be made, if the model is effectively solving the intended business needs, or if other options should be considered.

Since the cost of ML projects tends to be extremely expensive, most enterprises tend to target a hyper-ambitious moon-shot project that will completely transform the company or the product and give an oversized return or investment.

Since the cost of Machine learning projects tends to be extremely expensive, most of the enterprises tend to target a hyper-ambitious “moon-shot” project that will completely transform the company or the product and give oversized return or investment.

Such projects will take forever to complete and will push the data science team to their limits.

Ultimately, the business leaders will lose confidence in the project and stop the investment.

6.13. Neglecting organizational change

The difficulty in implementing change management is a large contributor to the overall failure of AI projects. There’s no shortage of research showing that the majority of transformations fail, and the technology, models, and data are only part of the story. Equally important is an employee mindset that is data-first. In fact, the change of employee mindset may be even more important than the AI itself. An organization with a data-driven mindset could be just as effective using spreadsheets.

The first step toward a successful AI initiative is building trust that data-driven decisions are superior to gut feel or tradition. Citizen data scientist efforts have mostly failed because line-of-business managers or the executive suite cling to received wisdom, lack trust in the data, or refuse to yield their decision-making authority to an analytics process. The result is that “grass-roots” analytics activity—and many top-down initiatives as well—have produced more dabbling, curiosity, and résumé-building than business transformation.

If there is any silver lining it is that organizational change, and the issues involved, have been extensively studied. Organizational change is an area that tests the mettle of the best executive teams. It can’t be achieved by issuing orders from above; it requires changing minds and attitudes, softly, skillfully, and typically slowly, recognizing that each individual will respond differently to nudges toward desired behaviors. Generally, four focus areas have emerged: communication, leading by example, engagement, and continuous improvement, all of which are directly related to the decision management process.

Changing organizational culture around AI space can be especially challenging given that data-driven decisions are often counter-intuitive. Building trust that data-driven decisions are superior to gut feel or tradition requires an element of what is termed “physiological safety,” something only the most advanced leadership organizations have mastered. It’s been said so many times there’s an acronym for it: ITAAP, meaning “It’s all about people.” Successful programs often devote greater than 50% of the budget to change management. I would argue it should be closer to 60%, with the extra 10% going toward a project-specific people analytics program in the chief human resources officer’s office.

6.14. Underestimating time and cost of the data component of AI/ML projects

Organizations underestimate the time and resources needed to run AI/ML projects. Too often, projects get started without addressing data needs and accessibility. When they get to the data step, they're often stalled by lack of access, the need to label data, or internal quarreling. AI/ML requires a data-centric approach. If organizations don't have enough money or time to collect data, the AI/ML project will fail.

According to Dimensional Research, 8 out of 10 companies find machine learning projects more difficult than expected because they underestimate the work that goes into training models properly. This is why so few data science projects make it to production; without a clear understanding of the resources and expertise needed, companies end up either coming up against insurmountable obstacles or burning through their budget due to inefficiencies. One thing they misjudge the most is the effort required to obtain the right training data.

Data science research moves ahead with multiple iterations and experimentation. Sometimes, the whole project will have to loop back from the deployment phase to the planning phase since the metric that was picked is not driving user behavior.

Traditional Agile based project deliveries may not be expected from a Data science project. This will cause large scale confusion for the leader who has been working with clear deliveries at the end of each task cycles for normal software development projects.

6.15. Throwing a Hail Mary pass early in the game

Just as you can’t build a data culture overnight, you shouldn’t expect immediate transformational wins from analytics projects. A successful AI or machine learning initiative requires experience in people, process, and technology, and good supporting infrastructure. Gaining that experience does not happen quickly. It took many years of concerted effort before IBM’s Watson could win Jeopardy or DeepMind’s AlphaGo could defeat a human Go champion.

Many AI projects fail because they are simply beyond the capabilities of the company. This is especially true when attempting to launch a new product or business line based on AI. There are simply too many moving parts involved in building something from scratch for there to be much chance of success.

As Dirty Harry said in Magnum Force, “A man’s got to know his limitations,” and this applies to companies too. There are countless business decisions made in large enterprises daily that could be automated by AI and data. In aggregate, tapping AI to improve small decisions offers better returns on the investment. Rather than betting on a long shot, companies would be better off starting with less glamorous, and less risky, investments in AI and machine learning to improve their existing processes. The press room might not notice, but the accountants will.

Even if you are already successfully using AI to make data-driven decisions, improving existing models may be a better investment than embarking on new programs. A 2018 McKinsey report, “What’s the value of a better model?”, suggests that even small increases in predictive ability can spark enormous increases in economic value.

6.16. Misunderstanding the experimental nature of AI/ML

AI/ML projects often fail because businesses envision solutions that are simply not possible. The best way to find out if a solution is possible is to experiment and fail fast and fail often.

Now we get to the other side of the coin. How do you use AI to create new business models, disrupt markets, create new products, innovate, and boldly go where no one has gone before? Venture-backed start-ups have a failure rate of about 75%, and they are at the bleeding edge of AI business models. If your new AI-based product or business initiatives have a lower failure rate, then you are beating some of the best investors out there.

Even the most elite technology experts fail, and sometimes often. Eric Schmidt, former CEO of Google, disclosed some of the company’s methods during 2011 Senate testimony:

To give you a sense of the scale of the changes that Google considers, in 2010 we conducted 13,311 precision evaluations to see whether proposed algorithm changes improved the quality of its search results, 8,157 side-by-side experiments where it presented two sets of search results to a panel of human testers and had the evaluators rank which set of results was better, and 2,800 click evaluations to see how a small sample of real-life Google users responded to the change. Ultimately, the process resulted in 516 changes that were determined to be useful to users based on the data and, therefore, were made to Google’s algorithm. Most of these changes are imperceptible to users and affect a very small percentage of websites, but each one of them is implemented only if we believe the change will benefit our users.

That works out to a 96% failure rate for proposed changes.

The key take-away here is that failure will occur. Inevitably. The difference between Google and most other companies is that Google’s data-driven culture allows them to learn from their mistakes. Notice as well the key word in Schmidt’s testimony: experiments. Experimentation is how Google—and Apple, Netflix, Amazon, and other leading technology companies—have managed to benefit from AI at scale.

A company’s ability to create and refine its processes, products, customer experiences, and business models is directly related to its ability to experiment.

Underestimating the extent to which AI development requires constant iteration. In machine learning, it is difficult to know exactly which data you need until you initiate the algorithm training process. You may realize that the training set isn’t big enough or there was an issue with the way the data was collected. Many data brokers have stringent amendment policies — or offer no ability to amend orders at all — leaving AI developers with data they can’t use and no choice but to purchase another training set that meets their new requirements. This is a common bottleneck for many companies that drives up prices, pushes back timelines and reduces efficiency. Ultimately, it’s the main reason why machine learning projects fail.

6.17. Misunderstanding the augmentation nature of AI

As you can notice, I use the term "augment" when referring to the job AI is to perform — that's because AI's primary task is to augment human work and support data-driven decision-making, not to replace humans in the workplace. Of course, there are businesses aiming at automating as much as can be automated, but generally speaking, it's really not AI's cup of tea. It's much more into teamwork.

Most people believe artificial intelligence would replace them in their jobs. However, this is certainly not the case.

As companies adopt AI, they will also have to concurrently educate their workforce on how AI is an "augmentor". This education which is rightly termed as "data literacy" is crucial if you want your organization to have an enterprise-wide AI adoption.

Data literacy needs to be prioritized for two reasons

To ensure that your workforce(especially non-tech) are aware of what AI does and the capacity in which it helps them To ensure that upon successful education, they do not blindly rely on AI for the decisions it makes There have been scenarios where even though companies have deployed AI in their day-to-day operations, the workforce has rejected it. This indicates that employees have trust issues with the technology.

Alternatively, you do not want your workforce to blindly accept all the decisions made by your AI. You need to ensure the decisions are justified and make sense.

Due to these reasons, as and how your organization starts adopting AI, you will also have to start educating your workforce on the technology. Promote AI as a technology that takes up tasks and not jobs. Let your workforce understand that the sole purpose of AI is to free up human time so that they can focus on complex problems. It is pertinent that people understand AI as not just artificial intelligence but augmented intelligence.

6.18. Inadequate organizational structure for analytics

AI is not a plug-and-play technology that delivers immediate returns on investment. It requires an organization-wide change of mindset, and a change in internal institutions to match. Typically there is an excessive focus on talent, tools, and infrastructure and too little attention paid to how the organizational structure should change.

Some formal organizational structure, with support from the top, will be necessary to achieve the critical mass, momentum, and cultural change required to turn a traditional, non-analytic enterprise into a data-driven organization. This will require new roles and responsibilities as well as a “center of excellence.” The form that the center of excellence (COE) should take will depend on the individual circumstances of the organization.

Generally speaking, a bicameral model seems to work best, where the core of the AI responsibilities are handled centrally, while “satellites” of the COE embedded in individual business units are responsible for coordinating delivery. This structure typically results in increased coordination and synchronization across business units, and leads to greater shared ownership of the AI transformation.

The COE, led by a chief analytics officer, is best positioned to handle responsibilities like developing education and training programs, creating AI process libraries (data science methodology), producing the data catalog, building maturity models, and evaluating project performance. The COE essentially handles duties that benefit from economies of scale. These will also include nurturing AI talent, negotiating with third-party data providers, setting governance and technology standards, and fostering internal AI communities.

The COE's representatives in the various business units are better positioned to deliver training, promote adoption, help identify the decisions augmented by AI, maintain the implementations, incentivize programs, and generally decide where, when, and how to introduce AI initiatives to the business. Business unit reps could be augmented on a project basis by a “SWAT team” from the COE.

7. Data Issues

AI/ML projects are driven by data, not application development or functionality. The same algorithms can be used for a variety of tasks given that there is usable data to train them.

7.1. "Garbage in, garbage out"

ML projects require a large quantity of high-quality data for effective learning. Insufficient or low-quality data can result in poor model performance. The adage "garbage in, garbage out" applies, where poor data leads to poor model output. A lack of data can also be a reason for project failure, due to factors such as high acquisition costs, unavailability, or privacy concerns. High-quality data is crucial for building a model that generalizes well and performs well on unseen data, however data bias can also occur when building models with the available data. Data quality issues, such as incorrect labels, values or missing data can also cause poor performance.

7.1.1. Not enough labeled data

Labeling data can also pose a challenge for ML projects. Attempting to manually label and annotate training data, or building custom automation technology, can consume significant time and resources away from training the actual model. Outsourcing can alleviate this issue, but may not be suitable if specialized domain knowledge is needed for the labeling task. Organizations may also need to invest in training annotators to ensure consistency and quality in their datasets. Alternatively, building a custom data labeling tool may be necessary for complex data, but this can also require significant engineering efforts.

Have scalable internal processes for labeling AI training data

76% of the people combat this challenge by attempting to label and annotate training data on their own and 63% go so far as to try to build their own labeling and annotation automation technology.

This means that a huge percentage of expertise of those data scientists are lost for the labeling process. This is a major challenge for the effective execution of an AI project.

This is the reason many of the companies are outsourcing the labeling task to other companies. However, it is a challenge to outsource the labeling task if it requires enough domain knowledge. Companies will have to invest in formal and standardized training of annotators if they need to maintain quality and consistency across datasets.

Another option is to develop their own data labeling tool if the data to be labeled complex. However, this often requires more engineering overhead than the Machine learning task itself.

7.1.2. Data preparation

Data preparation is a crucial yet time-consuming aspect of machine learning projects. Often, the data needed for a project is scattered across various sources, each with its own security protocols and formats, ranging from structured to unstructured, including video, audio, text, and images. This process includes tasks such as searching, cleaning, transforming, organizing, and collecting data, and can consume up to 80% of a team's time in converting raw data into high-quality, analysis-ready output.

7.1.3. Merging different sources

Merging data from various sources can present challenges, as discrepancies and inconsistencies may arise. This can lead to confusion and errors, such as data points with the same name but distinct meanings being merged together. Inadequate data quality can also lead to unactionable or inaccurate results, potentially causing confusion or misinterpretation.

7.2. Detecting data issues

First, healthcare data is diverse, messy and constantly changing. For example, due to an upstream issue, our data might say that no one in Illinois refilled a prescription in the past week. We need to identify these problems immediately, so we depend on MLOps solutions like data-versioning and feature-drift monitoring.

7.3. Ethical Obligations

Data requirements for training a model may conflict with ethical considerations, such as data privacy and consent.

7.4. Mismatch between training data and reality

If the data used to train a model does not accurately reflect real-world conditions, the model will not perform well in production. This issue is also known as the "Training-serving skew". It's important to evaluate the model's performance in a realistic setting before deployment.

The difference between the distribution of training data and the distribution of data in production can lead to poor model performance. This problem can be mitigated by monitoring the model's performance in production and making adjustments as necessary.

7.5. Relying on data brokers to supply one-size-fits-all training data

Companies do not struggle to obtain training data. After all, there are numerous data vendors that sell training data artifacts in huge volumes for low prices. The reason why machine learning projects fail is that companies struggle to obtain high-quality training data.

There is no guarantee that the data represents the balance of ages, genders, races, accents, etc. needed to reduce bias

The data has either not been annotated at all or not annotated in a way that makes sense for the algorithm

The data has not been vetted for compliance to data standards required by global AI regulations like the draft European Artificial Intelligence Act (EU AIA)

Companies cannot be sure that the correct data privacy and security measures have been observed, nor receive guidance on how to protect the data’s integrity moving forward

To execute truly successful machine learning projects, companies should think of training data as something they need to curate, rather than source.

8.

Section missing title

8.1. Overfitting and Underfitting

Overfitting and underfitting are common problems in ML. Overfitting occurs when a model is too complex and does not generalize well to new data, while underfitting occurs when a model is not complex enough and does not capture relevant patterns in the data. Finding the right balance between these two is crucial for good model performance.

8.2. Not Choosing the Right Model

There is a large set of machine learning models that can be trained and deployed in real time. Sometimes ML practitioners can be conditioned to only use a certain set of models such as XGBoost, Random Forest, or Gradient Boosted Decision Trees (GBDT). Depending on the application, however, these models also might not be the right ones to use especially for tasks that have limited training resources or that demand low latency predictions with constrained budgets. In those cases, simpler models such as linear regression or logistic regression (classification) can be deployed leading to easily being able to deploy them without placing a large emphasis on deployment.

9.

Section missing title

9.1. Failure to integrate AI into business processes

Many AI projects fail because they are not integrated into the business. The AI system is developed in a silo, and it is not integrated into the business processes. This can lead to a lack of adoption, and it can make it difficult to get the full benefits of the AI system.

9.2.

Section missing title

Take a full life-cycle approach to developing and deploying AI models

9.3. Ability to adjust pipeline

Second, we need to be able to continue iterating and improving models once they are in production. Our MLOps solution allows us to change any part of the pipeline, from initial ETL through hyperparameter optimization, without affecting the production model. This means we can experiment with radical changes while also easily deploying production updates.

9.4.

Section missing title

Cybersecurity

9.5.

Section missing title

Regulatory compliance

9.6.

Section missing title

Personal/individual privacy

9.7.

Section missing title

Organizational reputation

9.8. Equity and fairness

Finally, our models must be continuously monitored and audited for algorithmic bias. In healthcare, there are many different modeling pitfalls for machine learning practitioners to fall into — and the consequences can be dire, like inadvertently perpetuating racial or gender disparities. MLOps is used to catch these types of problems before they reach production.

9.9. Not integrating QA testing

Companies across all industries often fail to integrate QA testing at all stages of the product development process. It is falsely considered an add-on, as a formality to double-check that a product works correctly, as opposed to a tool that can be used to optimize the product in an iterative fashion.

One reason why machine learning projects fail is that this attitude towards QA testing is untenable given the realities of AI development. Unlike in traditional software development, you can’t sort out bugs with a simple software update; rather, errors discovered at the QA testing stage can only be fixed by re-doing the entire process. If your AI is not working as intended, it’s most likely because there was a problem with the training data, or the training data skewed the model in the wrong direction. Either way, this means going back to stage one and curating new training data artifacts.

Companies that don’t integrate outcome validation at all stages of the AI development process make more work for themselves. Rather than training the algorithm with one ginormous dataset and then testing the AI, companies need to train and test more iteratively. Taking an agile, ‘baked-in’ approach to testing will help drive down unnecessary spending, speed up timelines and allow for a more efficient allocation of resources.

9.10. Lack of transparency in ML decision-making

One of the biggest concerns with AI is the lack of transparency around how decisions are made. This can often lead to a lack of trust in the system, and can ultimately lead to the failure of the AI project.

9.11. Not testing prior to deployment

9.12.

Section missing title

To overcome this, MLOps can be employed much earlier. Done well, MLOps is about doing your initial experimentation and modeling using tools and processes that are designed to transition into a production workflow. If you don’t do this, you will inevitably have to redo all the work to put the model in production — often with the help of a separate engineering team. The problem with this approach is that the handoffs create opportunities for miscommunication, it’s difficult to monitor the models in production and changes become extremely expensive. Early use of MLOps can overcome all of these problems.

9.13. Heterogeneity

model may not work uniformly well on all subsets of the data. For example, there might be geographical regions where the model under-performs.

9.14. Concerted Adversaries

Concerted Adversaries - individuals or bots that pose security and fraud threats that try to confuse and mislead the model.

9.15. Legal and regulatory compliance

9.16.

Section missing title

Stability challenges - data changing too fast for the models to keep up without a well-functioning pipeline and retraining.

9.17. Overfitting and adversarial inputs

9.18. Fairness issues

Executive hesitancy may be grounded in ongoing, and justifiable, concern that AI results are leading to discrimination within their organizations, or affecting customers. Similarly, inherent AI bias may be steering corporate decisions in the wrong direction. If an AI model is trained using biased data, it will skew the model and produce biased recommendations.

9.19. Training-serving skew

Training-serving skew refers to the difference between the distribution of data used during the training of a machine learning model and the distribution of data used during the deployment and operation of the model in a production environment. This can occur when the distribution of data changes over time or when the model is deployed in a different environment than it was trained in. This can lead to a decrease in model performance and accuracy.

9.20. No continued governance, maintenance, monitoring

Many companies don't realize that model creation is an ongoing process. Real world data is constantly changing, which means your model will need to be retrained to keep up. Companies need to plan for continued model and data iteration, including making sure they have set aside the necessary budget for resources such as computing power, people to perform the work, and governance policies to handle different model versions. Otherwise, your model will eventually stop performing at the desired level of accuracy, and you won't have the set resources set aside to be able to retrain.

After deployment of an AI project, proper maintenance is essential because the decision-making of AI systems depends on the type of data it is fed. Correct data will give the expected results, but the project is prone to failure without any data monitoring.

After deploying the models in production, it can also be important to regularly check in cycles how the models are actually performing on the real-time data. For a large number of reasons, there might be data drift and concept drift that hinders the performance of ML models to a large extent. Regularly understanding and benchmarking the performance of the models ensures that companies do not lose revenue as a result of bad predictions from models during certain stages of the development cycle of machine learning projects.

9.20.1.

Section missing title

Regularly refresh our AI models, based on clearly defined criteria for when and why to do so

9.20.2.

Section missing title

Refresh our AI/ML tech stack at least annually to take advantage of the latest technological advances

9.20.3.

Section missing title

Design AI models with a focus on ensuring they are reusable

9.20.4. Failing to schedule frequent reviews

AI projects are never really finished. Even if an AI experience entirely meets accuracy and performance expectations, it still only has been trained on data that reflects society as it stands today. The algorithm has learned to make decisions based on opinions, dialogues and images that are already changing. Think about natural language processing (NLP) applications: these only know how to communicate because they were once trained on real conversations with people. Given that around 5,400 new words are created each year in the English language alone, NLP applications will wane in accuracy very quickly.

If AI experiences are to continue being useful to customers, they need to be re-trained on a rolling basis as social attitudes, developments in technology and terminologies change.

9.21. Lack of AI Trust & Awareness

Most citizen data scientist efforts fail because line-of-business managers or the executive suite cling to received wisdom, lack trust in the data, or refuse to yield their decision-making authority to an analytics process. Employees may have trust issues that result in rejection of AI or blind faith in it, leading them to accept all AI-made decisions blindly. The result is that "grass-roots" analytics activity—and many top-down initiatives as well—have produced more dabbling, curiosity, and résumé-building than business transformation. Many working professionals also assume that AI will replace them, which is invalid. Due to all these reasons, companies should consider educating their employees about emerging technologies and promoting data literacy.

Machine Learning – Failure Reasons