Foreword
If I had to pick a single source of frustration coming from the Machine Learning world, nowadays, without doubt I would mention the break necking speed at which everything around this domain evolves: algorithms, infrastructure, frameworks, best practices. By the time you experiment with a technique, a new method smashes the state-of-the-art. By the time you refactor that computer vision pipeline, your library of choice gets a major upgrade. Let alone all the headaches coming with shipping actual models to actual customers in production. Add debugging, scaling and monitoring to the list.
The way to cope with the ML madness can be summarized in two points: being up-to-date and having the right set of tools. This book covers both.
Once again, it is a matter of handling speed efficiently. You want to be on top of the latest advances in the field, in order to jump quickly from one technique to another. To do that effectively, though, you want to be using the right set of tools. Streamline your experimentation's pipelines, shorten the time from development to production, scale projects removing all the hassle of infrastructure maintenance, setup and updates. If you think this is what any Machine-Learning-oriented organization deserves, then AWS and SageMaker are what you are looking for, and this is the book you need to get up to speed.
Having worked with SageMaker myself, I can bring my testimony to the table. The completeness of the product is truly impressive: the team got your back for all the steps of the ML pipeline. From data ingestion by seamless integration with all the AWS storage solutions, to rapid prototyping within the familiar Jupyter notebooks. From automated data labelling, to automated model debugging. From hyper-parameter tuning and experiment handling, to the breadth of services supporting the deployment stage. Off-the-shelf Docker images, A/B testing, canary deployments capabilities, features' distribution shifts tracking. Just to name a few. SageMaker is a fully-fledged environment letting practitioners hit the ground running.
You still might wonder why you need Machine Learning at all. You have your rule-based systems in place. You master them inside out, and they are driving that nice uplift your company is currently enjoying. This is true. Thing is, it will not last forever.
Rules are good starting points, they are simple to debug, and provide tangible benefits almost immediately. They are not easy to adapt to a rapidly evolving market, though. Eventually, the initial uplift will start shrinking and the competitive advantage to fade out. That is when you realize you need to play smarter. The patterns a human spots in the data, the same that drove a rule-based approach in the first place, win in the short term. However, in the long run, you must increase the level of abstraction, and try removing the human touch from the workflow as much as possible. Welcome Machine Learning to the stage. The scale, efficiency, and financial gains reached by employing modern statistical learning strategies are almost limitless. The "almost" part of the story is generally driven by the technical debt slowing down Data Science projects. Which, once again, is why you imperatively need the right tools, SageMaker being one of those.
I still hear people pointing out that they are neither engineers nor scientists. They are analysts, more or less technical product managers. Business people to make it short. Do these individuals need to hear all the ML fuss? Yes, they do. Statistical learning is not only about building Netflix's recommendation engine from the ground up, or shipping Tesla's autonomous vehicles driving system. Those are impressive but still niche examples of the power of these techniques. It turns out that, as far as I am concerned, being able to build a model is a lot more impactful in a business context. First, because of the much higher number of professionals potentially involved. Second, because you do not necessarily build a model to blindly predict A given B. You might want to train an algorithm to model the interaction between A and B, their interdependencies, the process bridging them. In a nutshell, you train an algorithm to gain insights. This is what data-driven decision-making is all about, and what every moderately technical business person should pursue as a must. For this to happen, we need to democratize the access to Machine Learning, of course. Demystify the black box. Online learning resources are nowadays widely available to cover for the scientific part. What about the infrastructural side of the story instead? This is where AWS and SageMaker come to the rescue, bridging the gap between the product manager analyzing quarterly financial results, and the research scientist shipping self-flying drones. A single environment to win them all.
If I managed to tickle your curiosity enough, go ahead. Turn the page and let Julien Simon guide you through the wonderful world of Machine Learning with Amazon SageMaker.
Francesco Pochetti,