Problem framing involves breaking down a big problem into smaller parts, making it easier to solve. In the context of machine learning (ML), defining the problem is crucial for effective algorithm design, data collection, model selection, and deployment. This blog explores the concept of ML problem framing, its principles, and strategies.
Machine learning problem framing is the process of converting a high-level business problem into a specific ML task. This step is essential as the effectiveness of ML techniques relies heavily on how the problem is presented.
To frame an ML problem, the broad question like “How can we increase website user engagement?” needs to be transformed into a specific, measurable task such as “Predict the likelihood of a user returning to the website within the next seven days based on their browsing history and behavior patterns.”
Clearly articulate the desired outcome before considering ML. For example:
ML is not always the best solution for every problem. Consider the following:
Data is crucial for ML. Ensure the data is:
Features in the data should have a strong correlation with the desired outcomes. Assess the predictive power of features using techniques like Pearson correlation, Adjusted mutual information, and Shapley value.
ML predictions should be actionable. Consider how the predictions can be used to improve user experience or guide decision-making.
Based on the given scenario, using ML seems like a suitable approach. The IT team has a clear goal: informing students about the expected time to resolve their issues based on the current volume of helpdesk tickets. The current method provides an approximate estimation, but ML can offer more precise predictions by considering various factors such as the number of IT staff, issue type, time to resolve, ticket submission time, and time waiting for resolution. This would enable the team to provide more accurate time estimates for different types of tickets.
Once you’ve made sure machine learning is the best way to solve your problem and that you can get the data you need, you can set up your problem in machine learning terms. Here’s how you do it:
Think about what the best outcome would be, regardless of the ML model. In other words, what do you exactly want your product or feature to do? This is the same as the goal you stated earlier.
Next, link the model’s goal with the best outcome by clearly saying what you want the model to do. Here are some examples of imaginary apps:
The type of model you use depends on the specifics of your problem.
A classification model decides which group your data belongs to, for example, group A, B, or C.
Figure 1. A classification model making predictions.
The app then does something based on this decision. For example, if it’s A, it does X; if it’s B, it does Y; if it’s C, it does Z.
Figure 2. A classification model’s output being used in the product code to make a decision.
A regression model determines where your data fits on a number scale.
Figure 3. A regression model making a numeric prediction.
Depending on this prediction, your app makes a decision. For instance, if the prediction is in range A, do X; if it’s in range B, do Y; if it’s in C, do Z.
Figure 4. A regression model’s output being used in the product code to make a decision.
Consider this example:
You want to store videos based on their expected popularity. If your model thinks a video will be popular, you want to give it to users quickly. For this, you’ll use the more efficient but costlier storage. For less popular videos, you’ll use cheaper storage. Your criteria for this are:
You choose a regression model because you’re predicting a number—the views. But when training the model, you find it considers a prediction of 28 views and 32 views equally good, even if they lead to different actions in the app. This is because regression models don’t consider set limits in your product.
Figure 5. Training a regression model.
If small differences in the model’s predictions greatly impact your app’s actions, use a classification model. It considers distinctions like 28 and 32 as significant.
Remember these lessons:
In most cases, video storage limits are flexible and change over time, favoring a regression model. But for problems with fixed limits, a classification model is best.
The model’s output should help achieve your desired goal. If you’re using a regression model, the numerical prediction should be useful for this goal. If you’re using a classification model, category prediction should be useful.
Figure 6. Diagram of a classification flowchart.
Figure 7. Diagram of a regression flowchart.
In some cases, there’s no direct link between your goal and a label in the data. For instance, a video app may lack a ‘useful_to_user’ label.
To address this, you can use a ‘proxy label‘ as a substitute. Proxy labels represent the desired label indirectly. However, they have limitations:
Remember, proxy labels aren’t perfect substitutes. Choose the option with the fewest issues for your specific situation.
Choose metrics to evaluate your machine learning system. These metrics should focus on important factors like user engagement or desired user actions, distinct from model evaluation metrics like accuracy or recall.
Set ambitious goals, but acknowledge a gray area between success and failure. For instance, a 10% increase in site time is inconclusive.
What matters is whether the model brings you closer to success. Good evaluation metrics alone aren’t enough if goals aren’t achieved. Poor evaluation metrics that contribute to success may be worth improving.
Categories for improving the model:
Weigh the resources needed against potential benefits when deciding to improve the model.
Choose the frequency for checking success metrics: days, weeks, or months.
If the system fails, investigate reasons like recommending clickbait titles or broad predictions for rain accuracy.
Machine learning projects require correctly framed problems to unlock their full potential. Problem framing sets the stage for the entire machine learning pipeline, including data collection, preprocessing, model training, evaluation, and deployment. Investing time and effort in problem framing can make a significant difference in the effectiveness of the machine learning solution.
The process of problem framing involves two steps:
By following these steps, time and resources can be saved by setting clear goals and providing a common plan for working with machine learning.
Furthermore, ML Studio offers practical tools for problem framing, leveraging its wealth of data-driven insights beyond traditional brainstorming methods. ML Studio democratizes AI by making it accessible to organizations, allowing them to tackle real-world challenges, define AI solutions, and deploy comprehensive, budget-friendly AI platforms. This seamless transition from problem framing to AI model deployment facilitates informed decision-making within organizations.
For a personal demo and to explore how ML Studio can contribute to the success of your business, feel free to get in touch with us.