Data Scientist

Start Your Engineering Training Journey With Avinton Academy

What does a data scientist do?

A data scientist works with machine learning techniques like natural language processing and neural networks to build models that power AI–based applications.

What are career opportunities for data scientists?

Text Text Text

Some examples of AI-based applications developed at Avinton

Environmental Conservation in the Philippines using an AI Camera

Avinton in collaboration with Ericsson deployed an Artificial Intelligence (AI)-enabled CCTV cameras to identify migratory bird species and enable better preservation of endangered wildlife in Sasmuan, Pampanga.

Hand detection with the Avinton Edge AI Camera

Securing Worker Safety At Manufacturing Plants in Real-Time

Optical devices powered by pre-trained AI models have been deployed throughout a factory floor. These devices detect body parts of workers entering areas of risk – for example, hands getting too close to open rollers, heavy machinery, or robots. The benefits of Avinton’s solution include the real-time monitoring of worker safety in cases where milliseconds matter.

See more use cases

Start your journey to becoming a Data Scientist

Linux

As a starting point for a Machine Learning Engineer, we recommend at least LPIC level 1.

This will ensure you don’t waste time trying to figure out what you should be typing on the command line to manipulate your environment.

For engineers experienced in linux or MacOS command line this may not be required

Machine Learning Development Environment

The most commonly used environment for machine learning in Python is Jupyter Notebook & Anaconda

Please find some links below how to set these up

Jupyter Notebook
Anaconda
Jupyter & Anaconda setup for Machine Learning in Python (Avinton Academy)

Machine Learning Basics

Python has some native machine learning capability and building up on your Python Programming Knowledge we recommend this Python Deep Learning book for a gentle transition from programming to machine learning
Here are some introductory tutorials about Machine Learning Basics:

ChainerRL (Avinton Academy)
Random Forest (Avinton Academy)
Naive Bayes (Avinton Academy)
Classification vs Regression (Avinton Academy)
SVM (Avinton Academy)

Machine Learning Libraries and Frameworks

Depending on the application and the environment a machine learning project may use different environment, libraries, frameworks that are best suited for the problem being solved.

The most common ones today are:

TensorFlow
Keras
Scikit Learn

You should try to familiarise yourself with these

Avinton Machine Learning Libraries and Frameworks

Data Analyses

Depending on the type of problem being solved you may be required to do some data processing and data analyses.

SQL

In many large organisations data is stored in large Relational Databases. We use SQL for querying and manipulating the data in the database. SQL is a very powerful language where we can perform some rather complex data analyses with only a few lines of simple code. You are encouraged to familiarise yourself with SQL using PostgreSQL database as it is the most advanced Open Source Database.

Python – Numpy

During the data pre-processing for Machine Learning we often need to do some data extraction and manipulation from the original dataset. Python Numpy is a package that gives quite extensive and advanced mathatics functionality that can be used for all kinds of data modelling and analyses.

You are encouraged to familiarise yourself with the numpy’s capability by browsing through the NUMPY API reference

Pandas

Pandas is a library for working with data structures in Python. It extends the capability of Numpy.
Pandas API Reference

Have a look at the documentation and see which functions may be useful for your data manipulation requirements.

OpenCV – Image Processing

Many Machine Learning Engineers and projects in the industry today are using AI for object detection and other related image analyses.

For any of such projects we usually use OpenCV integrated within a machine learning library to label the images and do any pre-processing if required

You can find some OpenCV tutorials below:

Python OpenCV setup (Avinton Academy)
OpenCV simple Exercise (Avinton Academy)
OpenCV Advanced Exercise (Avinton Academy)

Online Courses

Stanford Machine Learning – Coursera – Andrew Ng
Lecture Series is available on YouTube

Coursera Deep Learning

FastAI have a good set of courses which are free especially their Computational Linear Algebra course.

Lecture Series

Deep RL Bootcamp
Andrej Karpathy’s CNN course at Stanford (CS231n: Convolutional Neural Networks for Visual Recognition)
Sergey Levine’s Deep Reinforcement Learning Course – UC Berkeley (CS 294: Deep Reinforcement Learning)
Learning Machines 101 by Richard Golden

Avinton Machine Learning - Study Resources

Infrastructure Basics

Machine Learning tasks are typically rather computationally intensive so it is important that a Machine Learning Engineer understands some infrastructure basics in order to take full advantage of the underlying server hardware in our environment to make our model training process efficient.

Server Resources: CPU / Disk / RAM

Avinton Academy instructor led training will give you the basics on this area. Please attend the next available session when you have time.

CPU vs GPU

In many cases we use a GPU to train the machine learning model more efficiently. It will be good to familiarise yourself with how a GPU works and why it is better at certain tasks than a CPU

This is covered in the Avinton Academy Infrastructure workshop day 2

Virtualization Concepts

In most cases your development environment is likely going to be on a virtual machine. As such it is good for you to familiarise yourself with Virtualisation concepts. The best approach for this would be to set up VMWare ESXI on a server from scratch and create the host Virtual Machines using the Hypervisor’s web gui.

This is covered in Avinton Academy Infrastructure Workshop.

AWS EC2

Many AI projects are deployed in the cloud. AWS (Amazon Web Services) is the leader in cloud services and as such we recommend you familiarise yourself with working with their EC2 platform (Elastic Compute).

You can refer to AWS’s official documentation and on Avinton Academy we have a dedicated section guiding you through how to work with AWS

Docker Containers

We often use Docker containers to easily replicate our environment from one system to another. As such I recommend you familiarise yourself with Docker Containers and try to Dockerise your environment for sharing with other team members.

機械学習 / AI

機械学習エンジニアに必要なスキル

機械学習を用いた画像分類

機械学習入門者向け ChainerRLでブロック崩しの学習

機械学習入門者向けランダムフォレストによるTitanic生存者予測

機械学習入門者向け Naive Bayes(単純ベイズ)アルゴリズムに触れてみる

機械学習入門者向け分類と回帰の違いをプログラムを書いて学ぼう

機械学習入門者向けSupport Vector Machine (SVM) に触れてみる