Hawke Eye #3: The Machine Learning Workflow
Machine learning is suited to provide objective solutions to subjective problems.
Artificial intelligence is everywhere. The navigation directions on your Google maps are generated on the fly using sophisticated Artificial Intelligence, developed in-house. For both routing and spoken directions.
Problem Framing
A lot of time is spent on fine-tuning the models and data collection, but if we don't frame the problem correctly, all that effort is in vain. We are optimizing for nothing.
Before we go all-in on our machine learning journey, we must first ensure the following.
Do we even need machine learning? Is the problem subjective or objective?
Who are the stakeholders?
What kind of data is available?
What metric is useful for the current business situation?
1. Do we need Machine Learning?
If a problem has objective nature, do not use machine learning. It may do more harm than good.
For example, if, in order to bend a sheet of steel you need 1 ton of force, then you need a press that is capable of providing at least 1 ton of force. “Guesstimating” the force required does not work in this case (usually you need a 10% larger press FYI).
If your problem is a number of variables, for example, estimating customer churn rate and building a strategy against it, you will get a ton of help with implementing the AI algorithms.
I will have a detailed tutorial on these problems very soon.
2. Who are the Stakeholders?
When you are building a machine learning model, you are not just an engineer. You are the project manager. The success of this model rests on your shoulders. You may not realize it in the beginning, but with so many moving parts - you are also the “product manager” as oftentimes the ML algorithms become their own product.
3. What kind of data is available?
Most “testing” models use the same or similar data. The size of the dataset may be different but the core is still the same.
The performance of all of these models will tend to converge and soon these models will become a commodity.
In order to differentiate yourself from the crowd and maintain your edge, you need to use your proprietary data.
4. The Best (ONE) Metric to Use
Taming troves of data is a mammoth challenge. You cannot optimize for everything in one go. You need a priority list of the metrics that you want to optimize for.
And no, optimization for profits doesn’t work in one go. It takes several downstream optimizations to work effectively.
This list can only be created by someone who has a deep understanding of the business lifecycle and the system in general.
Conclusion
In conclusion, to remain relevant in business in the future, you must;
Have a deep understanding of your business.
Must use proprietary data.
Must be able to identify the core business metric you want to optimize for.
Remember: Confusion is the enemy.