As a starting point for a Machine Learning Engineer, we recommend at least LPIC level 1.
This will ensure you don’t waste time trying to figure out what you should be typing on the command line to manipulate your environment.
For engineers experienced in linux or MacOS command line this may not be required
You should have a solid grounding in Python Programming.
This will allow you to be more productive and concentrate on the machine learning task rather than troubleshooting syntax errors
For a good comprehensive overview of the Python Language we recommend this Python 3 book.
You will likely be looking at Python Code every day so having a good understanding of it from the beginning is going to make you more effective and efficient.
Think of Python as your best friend – don’t you want to know them better?
The most commonly used environment for machine learning in Python is Jupyter Notebook & Anaconda
Please find some links below how to set these up
Jupyter Notebook
Anaconda
Jupyter & Anaconda setup for Machine Learning in Python (Avinton Academy)
Python has some native machine learning capability and building up on your Python Programming Knowledge we recommend this Python Deep Learning book for a gentle transition from programming to machine learning
Here are some introductory tutorials about Machine Learning Basics:
ChainerRL (Avinton Academy)
Random Forest (Avinton Academy)
Naive Bayes (Avinton Academy)
Classification vs Regression (Avinton Academy)
SVM (Avinton Academy)
Depending on the application and the environment a machine learning project may use different environment, libraries, frameworks that are best suited for the problem being solved.
The most common ones today are:
You should try to familiarise yourself with these
Depending on the type of problem being solved you may be required to do some data processing and data analyses.
In many large organisations data is stored in large Relational Databases. We use SQL for querying and manipulating the data in the database. SQL is a very powerful language where we can perform some rather complex data analyses with only a few lines of simple code. You are encouraged to familiarise yourself with SQL using PostgreSQL database as it is the most advanced Open Source Database.
During the data pre-processing for Machine Learning we often need to do some data extraction and manipulation from the original dataset. Python Numpy is a package that gives quite extensive and advanced mathatics functionality that can be used for all kinds of data modelling and analyses.
You are encouraged to familiarise yourself with the numpy’s capability by browsing through the NUMPY API reference
Pandas is a library for working with data structures in Python. It extends the capability of Numpy.
Pandas API Reference
Have a look at the documentation and see which functions may be useful for your data manipulation requirements.
Many Machine Learning Engineers and projects in the industry today are using AI for object detection and other related image analyses.
For any of such projects we usually use OpenCV integrated within a machine learning library to label the images and do any pre-processing if required
You can find some OpenCV tutorials below:
Python OpenCV setup (Avinton Academy)
OpenCV simple Exercise (Avinton Academy)
OpenCV Advanced Exercise (Avinton Academy)
Stanford Machine Learning – Coursera – Andrew Ng
Lecture Series is available on YouTube
FastAI have a good set of courses which are free especially their Computational Linear Algebra course.
Deep RL Bootcamp
Andrej Karpathy’s CNN course at Stanford (CS231n: Convolutional Neural Networks for Visual Recognition)
Sergey Levine’s Deep Reinforcement Learning Course – UC Berkeley (CS 294: Deep Reinforcement Learning)
Learning Machines 101 by Richard Golden
Machine Learning tasks are typically rather computationally intensive so it is important that a Machine Learning Engineer understands some infrastructure basics in order to take full advantage of the underlying server hardware in our environment to make our model training process efficient.
Avinton Academy instructor led training will give you the basics on this area. Please attend the next available session when you have time.
In many cases we use a GPU to train the machine learning model more efficiently. It will be good to familiarise yourself with how a GPU works and why it is better at certain tasks than a CPU
This is covered in the Avinton Academy Infrastructure workshop day 2
In most cases your development environment is likely going to be on a virtual machine. As such it is good for you to familiarise yourself with Virtualisation concepts. The best approach for this would be to set up VMWare ESXI on a server from scratch and create the host Virtual Machines using the Hypervisor’s web gui.
This is covered in Avinton Academy Infrastructure Workshop.
Many AI projects are deployed in the cloud. AWS (Amazon Web Services) is the leader in cloud services and as such we recommend you familiarise yourself with working with their EC2 platform (Elastic Compute).
You can refer to AWS’s official documentation and on Avinton Academy we have a dedicated section guiding you through how to work with AWS
We often use Docker containers to easily replicate our environment from one system to another. As such I recommend you familiarise yourself with Docker Containers and try to Dockerise your environment for sharing with other team members.