Core Skills of a Machine Learning Engineer

Linux

As a starting point for a Machine Learning Engineer, we recommend at least LPIC level 1.

This will ensure you don’t waste time trying to figure out what you should be typing on the command line to manipulate your environment.

For engineers experienced in linux or MacOS command line this may not be required

Python Programming

You should have a solid grounding in Python Programming.

This will allow you to be more productive and concentrate on the machine learning task rather than troubleshooting syntax errors

For a good comprehensive overview of the Python Language we recommend this Python 3 book.

You will likely be looking at Python Code every day so having a good understanding of it from the beginning is going to make you more effective and efficient.

Think of Python as your best friend – don’t you want to know them better?

Machine Learning Development Environment

The most commonly used environment for machine learning in Python is Jupyter Notebook & Anaconda

Please find some links below how to set these up

Jupyter Notebook
Anaconda
Jupyter & Anaconda setup for Machine Learning in Python (Avinton Academy)

Machine Learning Basics

Python has some native machine learning capability and building up on your Python Programming Knowledge we recommend this Python Deep Learning book for a gentle transition from programming to machine learning
Here are some introductory tutorials about Machine Learning Basics:

ChainerRL (Avinton Academy)
Random Forest (Avinton Academy)
Naive Bayes (Avinton Academy)
Classification vs Regression (Avinton Academy)
SVM (Avinton Academy)

Machine Learning Libraries and Frameworks

Depending on the application and the environment a machine learning project may use different environment, libraries, frameworks that are best suited for the problem being solved.

The most common ones today are:

TensorFlow
Keras
Scikit Learn

You should try to familiarise yourself with these

Avinton Machine Learning Libraries and Frameworks

Data Analyses

Depending on the type of problem being solved you may be required to do some data processing and data analyses.

SQL

In many large organisations data is stored in large Relational Databases. We use SQL for querying and manipulating the data in the database. SQL is a very powerful language where we can perform some rather complex data analyses with only a few lines of simple code. You are encouraged to familiarise yourself with SQL using PostgreSQL database as it is the most advanced Open Source Database.

Python – Numpy

During the data pre-processing for Machine Learning we often need to do some data extraction and manipulation from the original dataset. Python Numpy is a package that gives quite extensive and advanced mathatics functionality that can be used for all kinds of data modelling and analyses.

You are encouraged to familiarise yourself with the numpy’s capability by browsing through the NUMPY API reference

Pandas

Pandas is a library for working with data structures in Python. It extends the capability of Numpy.
Pandas API Reference

Have a look at the documentation and see which functions may be useful for your data manipulation requirements.

OpenCV – Image Processing

Many Machine Learning Engineers and projects in the industry today are using AI for object detection and other related image analyses.

For any of such projects we usually use OpenCV integrated within a machine learning library to label the images and do any pre-processing if required

You can find some OpenCV tutorials below:

Python OpenCV setup (Avinton Academy)
OpenCV simple Exercise (Avinton Academy)
OpenCV Advanced Exercise (Avinton Academy)

Online Courses

Stanford Machine Learning – Coursera – Andrew Ng
Lecture Series is available on YouTube

Coursera Deep Learning

FastAI have a good set of courses which are free especially their Computational Linear Algebra course.

Lecture Series

Deep RL Bootcamp
Andrej Karpathy’s CNN course at Stanford (CS231n: Convolutional Neural Networks for Visual Recognition)
Sergey Levine’s Deep Reinforcement Learning Course – UC Berkeley (CS 294: Deep Reinforcement Learning)
Learning Machines 101 by Richard Golden

Avinton Machine Learning - Study Resources

Infrastructure Basics

Machine Learning tasks are typically rather computationally intensive so it is important that a Machine Learning Engineer understands some infrastructure basics in order to take full advantage of the underlying server hardware in our environment to make our model training process efficient.

Server Resources: CPU / Disk / RAM

Avinton Academy instructor led training will give you the basics on this area. Please attend the next available session when you have time.

CPU vs GPU

In many cases we use a GPU to train the machine learning model more efficiently. It will be good to familiarise yourself with how a GPU works and why it is better at certain tasks than a CPU

This is covered in the Avinton Academy Infrastructure workshop day 2

Virtualization Concepts

In most cases your development environment is likely going to be on a virtual machine. As such it is good for you to familiarise yourself with Virtualisation concepts. The best approach for this would be to set up VMWare ESXI on a server from scratch and create the host Virtual Machines using the Hypervisor’s web gui.

This is covered in Avinton Academy Infrastructure Workshop.

AWS EC2

Many AI projects are deployed in the cloud. AWS (Amazon Web Services) is the leader in cloud services and as such we recommend you familiarise yourself with working with their EC2 platform (Elastic Compute).

You can refer to AWS’s official documentation and on Avinton Academy we have a dedicated section guiding you through how to work with AWS

Docker Containers

We often use Docker containers to easily replicate our environment from one system to another. As such I recommend you familiarise yourself with Docker Containers and try to Dockerise your environment for sharing with other team members.