Skip to ContentOctai Logo
Resources: 

Framing the Right Problem: An Introduction to Machine Learning Problem Framing

Problem framing involves breaking down a big problem into smaller parts, making it easier to solve. In the context of machine learning (ML), defining the problem is crucial for effective algorithm...
Octai.com - Machine learning predictive analytics with no-code
admin
18 July, 2023
Est. Reading: 7 minutes

Problem framing involves breaking down a big problem into smaller parts, making it easier to solve. In the context of machine learning (ML), defining the problem is crucial for effective algorithm design, data collection, model selection, and deployment. This blog explores the concept of ML problem framing, its principles, and strategies.

1. Understanding Machine Learning Problem Framing

  • Clearly state the goal or desired outcome.
  • Determine if ML is the best approach to solve the problem.
  • Assess the availability and suitability of data for training an ML model.

Machine learning problem framing is the process of converting a high-level business problem into a specific ML task. This step is essential as the effectiveness of ML techniques relies heavily on how the problem is presented.

To frame an ML problem, the broad question like “How can we increase website user engagement?” needs to be transformed into a specific, measurable task such as “Predict the likelihood of a user returning to the website within the next seven days based on their browsing history and behavior patterns.”

a. State the Goal

Clearly articulate the desired outcome before considering ML. For example:

  • Weather app: Predict the amount of rain every six hours for a specific area.
  • Video app: Provide personalized video recommendations.
  • Mail app: Identify and filter out junk emails.
  • Map app: Estimate travel duration for a given route.
  • Banking app: Detect fraudulent transactions.
  • Dining app: Determine the type of food served by a restaurant based on its menu.

b. Assess ML Suitability

ML is not always the best solution for every problem. Consider the following:

  • Evaluate if an existing non-ML solution can adequately address the problem.
  • Compare the potential benefits of an ML solution against the cost, upkeep, and resources required.
  • Determine if the available data is sufficient for training an ML model.

c. Data Importance

Data is crucial for ML. Ensure the data is:

  • Sufficient: Having a large and diverse dataset improves model performance.
  • Stable and trustworthy: Data collected consistently and reliably over time is preferable.
  • Trusted: Understand the sources and reliability of the data.
  • Accessible: Ensure data availability and compatibility for model training.
  • Correct: Validate the data quality and correctness of labels.
  • Reflective: Data should mirror real-world events or phenomena for accurate predictions.

d. Predictive Power

Features in the data should have a strong correlation with the desired outcomes. Assess the predictive power of features using techniques like Pearson correlation, Adjusted mutual information, and Shapley value.

e. Predictions and Actions

ML predictions should be actionable. Consider how the predictions can be used to improve user experience or guide decision-making.

Based on the given scenario, using ML seems like a suitable approach. The IT team has a clear goal: informing students about the expected time to resolve their issues based on the current volume of helpdesk tickets. The current method provides an approximate estimation, but ML can offer more precise predictions by considering various factors such as the number of IT staff, issue type, time to resolve, ticket submission time, and time waiting for resolution. This would enable the team to provide more accurate time estimates for different types of tickets.

2. Setting Up an ML Problem

Once you’ve made sure machine learning is the best way to solve your problem and that you can get the data you need, you can set up your problem in machine learning terms. Here’s how you do it:

  • Decide on the best result and what the model should aim for.
  • Figure out what the model’s output will be.
  • Determine how you’ll measure success.

a. Decide on the Best Result and the Model’s Goal

Think about what the best outcome would be, regardless of the ML model. In other words, what do you exactly want your product or feature to do? This is the same as the goal you stated earlier.

Next, link the model’s goal with the best outcome by clearly saying what you want the model to do. Here are some examples of imaginary apps:

  • Weather app: Best outcome – Work out how much rain will fall every six hours for a certain area. Model’s goal – Predict how much rain will fall in six-hour amounts for specific areas.
  • Video app: Best outcome – Suggest useful videos. Model’s goal – Predict if a user will click on a video.
  • Mail app: Best outcome – Find junk emails. Model’s goal – Predict if an email is junk or not.
  • Map app: Best outcome – Work out how long a journey will take. Model’s goal – Predict how long it will take to travel from one point to another.
  • Banking app: Best outcome – Spot fake transactions. Model’s goal – Predict if a transaction was made by the cardholder.
  • Dining app: Best outcome – Figure out what kind of food a restaurant serves by looking at its menu. Model’s goal – Predict the type of food a restaurant serves.

Choosing the correct model type

The type of model you use depends on the specifics of your problem.

A classification model decides which group your data belongs to, for example, group A, B, or C. 

Figure 1. A classification model making predictions.

The app then does something based on this decision. For example, if it’s A, it does X; if it’s B, it does Y; if it’s C, it does Z.

Figure 2. A classification model’s output being used in the product code to make a decision.

A regression model determines where your data fits on a number scale.

Figure 3. A regression model making a numeric prediction.

Depending on this prediction, your app makes a decision. For instance, if the prediction is in range A, do X; if it’s in range B, do Y; if it’s in C, do Z.

Figure 4. A regression model’s output being used in the product code to make a decision.

Consider this example:

You want to store videos based on their expected popularity. If your model thinks a video will be popular, you want to give it to users quickly. For this, you’ll use the more efficient but costlier storage. For less popular videos, you’ll use cheaper storage. Your criteria for this are:

  • If a video is expected to get 50 or more views, use the expensive storage.
  • If a video is expected to get between 30 and 50 views, use the cheap storage.
  • If a video is expected to get less than 30 views, don’t store the video.

You choose a regression model because you’re predicting a number—the views. But when training the model, you find it considers a prediction of 28 views and 32 views equally good, even if they lead to different actions in the app. This is because regression models don’t consider set limits in your product.

Figure 5. Training a regression model.

If small differences in the model’s predictions greatly impact your app’s actions, use a classification model. It considers distinctions like 28 and 32 as significant.

Remember these lessons:

  1. Predict your app’s decisions. If the model doesn’t understand your app’s actions, it can make wrong predictions.
  2. Understand problem limitations. If app actions depend on fixed limits, use a classification model with labeled data. If limits can change, use a regression model with adjustable limits in the app’s code.

In most cases, video storage limits are flexible and change over time, favoring a regression model. But for problems with fixed limits, a classification model is best.

b. Deciding on the model’s output

The model’s output should help achieve your desired goal. If you’re using a regression model, the numerical prediction should be useful for this goal. If you’re using a classification model, category prediction should be useful.

  • Classification flowchart

Figure 6. Diagram of a classification flowchart.

  • Regression flowchart

Figure 7. Diagram of a regression flowchart.

In some cases, there’s no direct link between your goal and a label in the data. For instance, a video app may lack a ‘useful_to_user’ label.

To address this, you can use a ‘proxy label‘ as a substitute. Proxy labels represent the desired label indirectly. However, they have limitations:

  • Predicting user likes: Many users don’t use this feature, limiting its effectiveness.
  • Predicting video popularity: Popular videos may not align with individual preferences.
  • Predicting video shares: Some users don’t share videos, and others share videos they don’t personally like.
  • Predicting video plays: This may promote clickbait content.
  • Predicting watch duration: Longer videos may be favored, regardless of usefulness.
  • Predicting video rewatches: Certain video types may receive disproportionate attention.

Remember, proxy labels aren’t perfect substitutes. Choose the option with the fewest issues for your specific situation.

c. Set your success metrics

Choose metrics to evaluate your machine learning system. These metrics should focus on important factors like user engagement or desired user actions, distinct from model evaluation metrics like accuracy or recall.

Examples:

  • Weather app: Increase in users checking “Will it rain?” feature by 50% compared to before. Failure if no change.
  • Video app: 20% more average time spent on the site. Failure if no change.

Set ambitious goals, but acknowledge a gray area between success and failure. For instance, a 10% increase in site time is inconclusive.

What matters is whether the model brings you closer to success. Good evaluation metrics alone aren’t enough if goals aren’t achieved. Poor evaluation metrics that contribute to success may be worth improving.

Categories for improving the model:

  1. Not good enough, but continue with improvements.
  2. Good enough, and continue with potential further enhancements.
  3. Good enough, can’t be improved significantly.
  4. Not good enough and won’t improve sufficiently.

Weigh the resources needed against potential benefits when deciding to improve the model.

Choose the frequency for checking success metrics: days, weeks, or months.

If the system fails, investigate reasons like recommending clickbait titles or broad predictions for rain accuracy.

Conclusion

Machine learning projects require correctly framed problems to unlock their full potential. Problem framing sets the stage for the entire machine learning pipeline, including data collection, preprocessing, model training, evaluation, and deployment. Investing time and effort in problem framing can make a significant difference in the effectiveness of the machine learning solution.

The process of problem framing involves two steps:

  1. Assessing the suitability of machine learning:
  • Understand the problem at hand.
  • Identify a clear use case for machine learning.
  • Gain a comprehensive understanding of the available data.
  1. Defining the problem for machine learning:
  • Determine the desired outcome and the goal the model should aim to achieve.
  • Define the model’s output and what it should predict.
  • Establish appropriate metrics for measuring the success of the model.

By following these steps, time and resources can be saved by setting clear goals and providing a common plan for working with machine learning.

Furthermore, ML Studio offers practical tools for problem framing, leveraging its wealth of data-driven insights beyond traditional brainstorming methods. ML Studio democratizes AI by making it accessible to organizations, allowing them to tackle real-world challenges, define AI solutions, and deploy comprehensive, budget-friendly AI platforms. This seamless transition from problem framing to AI model deployment facilitates informed decision-making within organizations.

For a personal demo and to explore how ML Studio can contribute to the success of your business, feel free to get in touch with us.

chevron-down linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram