Yes, I couldn’t find the worthy one to be the 10th XD.
When I’ve got an understanding of what ML is (more or less) I also realized that all those little projects with small datasets are cute and useful, and the topic seems to be easy, but there was one “but”. I came to a conclusion that for local development one thing may work and may be a perfect solution, but Google or Facebook or Amazon don’t use 987KB csv file in order to give some recommendation (or read your mind). They use tons of data, from different sources, with different structures and levels of cleansing necessary.
But let’s forget for a moment huge tech companies. A self-respective company definitely has some kind of cloud solution for their operations and few TBs of data probably lying there untouched. Doing machine learning and data science on that thing will be a quite different experience.
So here, I have collected 10 practical projects on AWS. Few remarks before we start:
- All the projects were developed by AWS team and published on their resources. You can always google for more.
- All the projects require AWS account. If you don’t have one you will find a button to a resource with explanation and registration process.
- Some projects may cost money! It will never exceed 10$ if you follow all the instructions for clean-up and stop all the services created and used during the project.
- To all estimated durations of the projects add 50-70% of time consumed (a lot of services might be new to you, so it will take time to get used to it)
- Read all the instructions carefully and try to understand them (don’t make my mistakes by thinking you’re the smartest and know everything, so you can skip that dumb step)
So let’s get started!
1.Create a machine learning model automatically with Amazon SageMaker Autopilot
In this tutorial you will use a public dataset to create a training experiment, explore different stages of this experiment, identify and deploy the best performing model and then make some predictions using your model.
SageMaker Autopilot automatically inspects raw data, applies feature processors, picks the best set of algorithms, trains and tunes multiple models, tracks their performance, and then ranks the models based on performance, all with just a few clicks. And here you will have the possibility to play with it on your own!
Estimated time of completion: 10-15 minutes.
2.Build, train, deploy and monitor a machine learning model with Amazon SageMaker Studio
In this tutorial you will download a public dataset, will track and manage training and processing jobs, create a processing job to generate features, train, test and deploy a model to later visualize results and monitor any differences between training and deployed model.
SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning that lets you build, train, debug, deploy, and monitor your machine learning models. Studio provides all the tools you need to take your models from experimentation to production while boosting your productivity.
Estimated time of completion: 1-1,5 hour.
3. Train a deep learning model with AWS Deep Learning Containers on Amazon EC2
In this tutorial, in less than 15 minutes you will train a MNIST CNN model using deep learning with minimum set-up and tuning.
AWS Deep Learning Containers are Docker images pre-installed with deep learning frameworks to make it easy to deploy custom machine learning environments quickly by letting you skip the complicated process of building and optimizing your environments from scratch.
Estimated time of completion: 10-15 minutes.
4. Add voice to your WordPress Site
This is a pretty cool one. If you have a WP blog o thinking about creating one this tutorial will bring you a lot of value. Having all your articles in audio format seems awesome, doesn’t it? Also, if your blog is not on AWS there is another section for such a case too.
Amazon Polly is a service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice, enabling you to create applications that talk, and build entirely new categories of speech-enabled products.
Estimated time of completion: 15-20 minutes.
5. Extract text and structured data with Amazon Textract
This one is not about building ML models, but more about using them. You will extract data from some scanned documents and see how easy it is. I still remember when during my university years we were looking for some info in PDFs and couldn’t just copy-paste it. This service would be so handy in that period XD.
Amazon Textract is a fully managed machine learning service that automatically extracts text and data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.
Estimated time of completion: 10-15 minutes.
6. How to detect, analyze, and compare faces with Amazon Rekognition
Amazon Rekognition is a deep learning-based image and video analysis service. Detecting text is fun, but when it comes to images and faces the thing becomes a little more complicated. In modern world, where face recognition is used more and more for verification and authentication processes, knowing how to detect and analyze face in near real-time comes in handy.
Estimated time of completion: 15-20 minutes.
7. How to analyze insights in text with Amazon Comprehend
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. And in this tutorial you will get some taste of it by running built-in analysis on customer reviews, exploring insights, making sentiment analysis, finding key phrases, analyzing language and syntax. Later-on you will use this insights and results from the analysis to improve decision-making process.
Estimated time of completion: 10-15 minutes.
8. How to create an audio transcript with Amazon Transcribe
Add subtitles to a video? Easy. Just upload an audio to S3 and use Amazon Transcribe. I know, everything is voice now, majority prefers to listen than to read, but there are still a lot of use cases for text too. For example, we have a lot of audio data, we can transform it to the text and than run an NLP model on this text to better understand it. In this tutorial, you will learn how to extract data from audio.
Estimated time of completion: 10-15 minutes.
9. Build a semantic content recommendation system with Amazon SageMaker
This is the big one. In this tutorial you will build an entire recommendation system for information retrieval using topic SageMaker and its built-in algorithms for Neural Topic Model and K-Nearest Neghbor. This project will combine the usage of these algorithm in order to develop a better solution. Please, read carefully requirements and recommended background to not start kicking the keyboard when something doesn’t work and you have no clue why.
Estimated time of completion: 2-3 hours.
Hope this list of tutorials will help you with understanding of new technologies and you will see with your own eyes how the tech magic is done 🙂