This task is designed to demonstrate how we can use programming to solve a real-world problem.
We will extract images from pdf files and classify them into 3 categories:
Some sample pdf files can be downloaded here.
Examples of the image types are:
For the image class we can further split the images into their individual photos ex:
This is an individual task but we will collaborate on this during the session.
You can ask questions any time via email or google hangouts and also during the training.
The expected output is as follows:
- One directory per PDF file
- Inside the directory a list of images from the pdf file
- The directory should also contain a text file with the image name and the image type
- Photos extracted from the Image types should be prefixed with extracted ex: extracted_###.jpg
Text file should look like this: