Machine Learning Mastery with Python

From Hidden Wiki
Jump to navigation Jump to search
Mathematics Animal intelligence Biological neural network Web development Security
Statistics Animal cognition Neural circuit Darknet web development Security
Messenger AI ANN VPS Cryptocurrency
Session Artificial intelligence Artificial neural network Virtual private server Cryptocurrency wallet

I think Python is an amazing platform for machine learning. There are so many algorithms and so much power ready to use. I am often asked the question: How do you use Python for machine learning? This book is my definitive answer to that question. It contains my very best knowledge and ideas on how to work through predictive modeling machine learning projects using the Python ecosystem. It is the book that I am also going to use as a refresher at the start of a new project. I’m really proud of this book and I hope that you find it a useful companion on your machine learning journey with Python.


Charles Petzold's "Code" is a recommended book. You can find pdf files of the book on Google. Just search "charles petzold code pdf" on Google.


See Machine Learning Mastery with Python 2, Machine Learning Mastery with Python 3.

Introduction

Most of the codes were tested with Python 3.8.2 on Ubuntu 20.04.

Welcome

Welcome to Machine Learning Mastery With Python. This book is your guide to applied machine learning with Python. You will discover the step-by-step process that you can use to get started and become good at machine learning for predictive modeling with the Python ecosystem.

Learn Python Machine Learning The Wrong Way

Here is what you should NOT do when you start studying machine learning in Python.

1. Get really good at Python programming and Python syntax.

2. Deeply study the underlying theory and parameters for machine learning algorithms in scikit-learn.

3. Avoid or lightly touch on all of the other tasks needed to complete a real project.


I think that this approach can work for some people, but it is a really slow and a roundabout way of getting to your goal. It teaches you that you need to spend all your time learning how to use individual machine learning algorithms. It also does not teach you the process of building predictive machine learning models in Python that you can actually use to make predictions. Sadly, this is the approach used to teach machine learning that I see in almost all books and online courses on the topic.


If you pursue a master or Ph.D. (Doctor of Philosophy) degree in computer science, computer engineering, or statistics to become an expert in machine learning field, you should study deeply. This book's purpose is to teach ordinary people whose major is not computer engineering or something similar.

Machine Learning in Python

This book focuses on a specific sub-field of machine learning called predictive modeling. This is the field of machine learning that is the most useful in industry and the type of machine learning that the scikit-learn library in Python excels at facilitating. Unlike statistics, where models are used to understand data, predictive modeling is laser focused on developing models that make the most accurate predictions at the expense of explaining why predictions are made. Unlike the broader field of machine learning that could feasibly be used with data in any format, predictive modeling is primarily focused on tabular data (e.g. tables of numbers like in a spreadsheet).


This book was written around three themes designed to get you started and using Python for applied machine learning effectively and quickly. These three parts are as follows:


Lessons: Learn how the sub-tasks of a machine learning project map onto Python and the best practice way of working through each task.


Projects: Tie together all of the knowledge from the lessons by working through case study predictive modeling problems.


Recipes: Apply machine learning with a catalog of standalone recipes in Python that you can copy-and-paste as a starting point for new projects.

Lessons

You need to know how to complete the specific subtasks of a machine learning project using the Python ecosystem. Once you know how to complete a discrete task using the platform and get a result reliably, you can do it again and again on project after project. Let’s start with an overview of the common tasks in a machine learning project. A predictive modeling machine learning project can be broken down into 6 top-level tasks:


1. Define Problem: Investigate and characterize the problem in order to better understand the goals of the project.


2. Analyze Data: Use descriptive statistics and visualization to better understand the data you have available.


3. Prepare Data: Use data transforms in order to better expose the structure of the prediction problem to modeling algorithms.


4. Evaluate Algorithms: Design a test harness to evaluate a number of standard algorithms on the data and select the top few to investigate further.


5. Improve Results: Use algorithm tuning and ensemble methods to get the most out of well-performing algorithms on your data.


6. Present Results: Finalize the model, make predictions and present results.


A blessing and a curse with Python is that there are so many techniques and so many ways to do the same thing with the platform. In part II of this book you will discover one easy or best practice way to complete each subtask of a general machine learning project. Below is a summary of the Lessons from Part II and the sub-tasks that you will learn about.


Lesson 1: Python Ecosystem for Machine Learning.

Lesson 2: Python and SciPy Crash Course.

Lesson 3: Load Datasets from CSV.

Lesson 4: Understand Data With Descriptive Statistics. (Analyze Data)

Lesson 5: Understand Data With Visualization. (Analyze Data)

Lesson 6: Pre-Process Data. (Prepare Data)

Lesson 7: Feature Selection. (Prepare Data)

Lesson 8: Resampling Methods. (Evaluate Algorithms)

Lesson 9: Algorithm Evaluation Metrics. (Evaluate Algorithms)

Lesson 10: Spot-Check Classification Algorithms. (Evaluate Algorithms)

Lesson 11: Spot-Check Regression Algorithms. (Evaluate Algorithms)

Lesson 12: Model Selection. (Evaluate Algorithms)

Lesson 13: Pipelines. (Evaluate Algorithms)

Lesson 14: Ensemble Methods. (Improve Results)

Lesson 15: Algorithm Parameter Tuning. (Improve Results)

Lesson 16: Model Finalization. (Present Results)


These lessons are intended to be read from beginning to end in order, showing you exactly how to complete each task in a predictive modeling machine learning project. Of course, you can dip into specific lessons again later to refresh yourself. Lessons are structured to demonstrate key API classes and functions, showing you how to use specific techniques for a common machine learning task. Each lesson was designed to be completed in under 30 minutes (depending on your level of skill and enthusiasm). It is possible to work through the entire book in one weekend. It also works if you want to dip into specific sections and use the book as a reference.

Projects

Recipes for common predictive modeling tasks are critically important, but they are also just the starting point. This is where most books and courses stop.


You need to piece the recipes together into end-to-end projects. This will show you how to actually deliver a model or make predictions on new data using Python. This book uses small well-understood machine learning datasets from the UCI Machine learning repository ( https://archive.ics.uci.edu/ml/index.php ) in both the lessons and in the example projects. These datasets are available for free as CSV downloads. These datasets are excellent for practicing applied machine learning because:

  • They are small, meaning they fit into memory and algorithms can model them in reasonable time.
  • They are well behaved, meaning you often don’t need to do a lot of feature engineering to get a good result.
  • They are benchmarks, meaning that many people have used them before and you can get ideas of good algorithms to try and accuracy levels you should expect.


In Part III you will work through three projects:

Hello World Project (Iris flowers dataset): This is a quick pass through the project steps without much tuning or optimizing on a dataset that is widely used as the hello world of machine learning.

Regression (Boston House Price dataset): Work through each step of the project process with a regression problem.

Binary Classification (Sonar dataset): Work through each step of the project process using all of the methods on a binary classification problem.


These projects unify all of the lessons from Part II. They also give you insight into the process for working through predictive modeling machine learning problems which is invaluable when you are trying to get a feeling for how to do this in practice. Also included in this section is a template for working through predictive modeling machine learning problems which you can use as a starting point for current and future projects. I find this useful myself to set the direction and setup important tasks (which are easy to forget) on new projects.

Recipes

Recipes are small standalone examples in Python that show you how to do one specific thing and get a result. For example, you could have a recipe that demonstrates how to use the Random Forest algorithm for classification. You could have another for normalizing the attributes of a dataset.


Recipes make the difference between a beginner who is having trouble and a fast learner capable of making accurate predictions quickly on any new project. A catalog of recipes provides a repertoire of skills that you can draw from when starting a new project. More formally, recipes are defined as follows:

  • Recipes are code snippets not tutorials.
  • Recipes provide just enough code to work.
  • Recipes are demonstrative not exhaustive.
  • Recipes run as-is and produce a result.
  • Recipes assume that required libraries are installed.
  • Recipes use built-in datasets or datasets provided in specific libraries.


You are starting your journey into machine learning with Python with a catalog of machine learning recipes used throughout this book. All of the code from the lessons in Part II and projects in Part III are available in your Python recipe catalog. Recipes are organized by chapter so that you can quickly locate a specific example used in the book. This is an valuable resource that you can use to jump-start your current and future machine learning projects. You can also build upon this recipe catalog as you discover new techniques.

Your Outcomes From Reading This Book

This book will lead you from being a developer who is interested in machine learning with Python to a developer who has the resources and capability to work through a new dataset end-to-end using Python and develop accurate predictive models. Specifically, you will know:

  • How to work through a small to medium sized dataset end-to-end.
  • How to deliver a model that can make accurate predictions on new unseen data.
  • How to complete all subtasks of a predictive modeling problem with Python.
  • How to learn new and different techniques in Python and SciPy.
  • How to get help with Python machine learning.


From here you can start to dive into the specifics of the functions, techniques and algorithms used with the goal of learning how to use them better in order to deliver more accurate predictive models, more reliably in less time.

What This Book is Not

This book was written for professional developers who want to know how to build reliable and accurate machine learning models in Python.

  • This is not a machine learning textbook. We will not be getting into the basic theory of machine learning (e.g. induction, bias-variance trade-off, etc.). You are expected to have some familiarity with machine learning basics, or be able to pick them up yourself.
  • This is not an algorithm book. We will not be working through the details of how specific machine learning algorithms work (e.g. Random Forests). You are expected to have some basic knowledge of machine learning algorithms or how to pick up this knowledge yourself.
  • This is not a Python programming book. We will not be spending a lot of time on Python syntax and programming (e.g. basic programming tasks in Python). You are expected to be a developer who can pick up a new C-like language relatively quickly.


You can still get a lot out of this book if you are weak in one or two of these areas, but you may struggle picking up the language or require some more explanation of the techniques. If this is the case, see the Getting More Help chapter at the end of the book and seek out a good companion reference text.

Summary

I hope you are as excited as me to get started. In this introduction chapter you learned that this book is unconventional. Unlike other books and courses that focus heavily on machine learning algorithms in Python and focus on little else, this book will walk you through each step of a predictive modeling machine learning project.

  • Part II of this book provides standalone lessons including a mixture of recipes and tutorials to build up your basic working skills and confidence in Python.
  • Part III of this book will introduce a machine learning project template that you can use as a starting point on your own projects and walks you through three end-to-end projects.
  • The recipes companion to this book provides a catalog of machine learning code in Python. You can browse this invaluable resource, find useful recipes and copy-and-paste them into your current and future machine learning projects.
  • Part IV will finish out the book. It will look back at how far you have come in developing your new found skills in applied machine learning with Python. You will also discover resources that you can use to get help if and when you have any questions about Python or the ecosystem.


Next Step

Next you will start Part II and your first lesson. You will take a closer look at the Python ecosystem for machine learning. You will discover what Python and SciPy are, why it is so powerful as a platform for machine learning and the different ways you should and should not use the platform.

Lessons

Python Ecosystem for Machine Learning

The Python ecosystem is growing and may become the dominant platform for machine learning. The primary rationale for adopting Python for machine learning is because it is a general purpose programming language that you can use both for R&D and in production. In this chapter you will discover the Python ecosystem for machine learning. After completing this lesson you will know:

1. Python and it’s rising use for machine learning.

2. SciPy and the functionality it provides with NumPy, Matplotlib and Pandas.

3. scikit-learn that provides all of the machine learning algorithms.

4. How to setup your Python ecosystem for machine learning and what versions to use Let’s get started.

Python

Python is a general purpose interpreted programming language. It is easy to learn and use primarily because the language focuses on readability. The philosophy of Python is captured in the Zen of Python which includes phrases like:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.

Listing 2.1: Sample of the Zen of Python.


It is a popular language in general, consistently appearing in the top 10 programming languages in surveys on Stack Overflow ( https://insights.stackoverflow.com/survey/2019 ). It’s a dynamic language and very suited to nteractive development and quick prototyping with the power to support the development of large applications. It is also widely used for machine learning and data science because of the excellent library support and because it is a general purpose programming language (unlike R or MATLAB). For example, see the results of the Kaggle platform survey results in 2011 ( http://blog.kaggle.com/2011/11/27/kagglers-favorite-tools/ ) and the KDnuggets (Knowledge Discovery Nuggets) 2015 tool survey results ( https://www.kdnuggets.com/polls/2015/analytics-data-mining-data-science-software-used.html ).


This is a simple and very important consideration. It means that you can perform your research and development (figuring out what models to use) in the same programming language that you use for your production systems. Greatly simplifying the transition from development to production.

SciPy

SciPy is an ecosystem of Python libraries for mathematics, science and engineering. It is an add-on to Python that you will need for machine learning. The SciPy ecosystem is comprised of the following core modules relevant to machine learning:

To be effective at machine learning in Python you must install and become familiar with SciPy. Specifically:

  • You will prepare your data as NumPy arrays for modeling in machine learning algorithms.
  • You will use Matplotlib (and wrappers of Matplotlib in other frameworks) to create plots and charts of your data.
  • You will use Pandas to load explore and better understand your data.

scikit-learn

The scikit-learn library is how you can develop and practice machine learning in Python. It is built upon and requires the SciPy ecosystem. The name scikit suggests that it is a SciPy plug-in or toolkit. The focus of the library is machine learning algorithms for classification, regression, clustering and more. It also provides tools for related tasks such as evaluating models, tuning parameters and pre-processing data.


Like Python and SciPy, scikit-learn is open source and is usable commercially under the BSD license. This means that you can learn about machine learning, develop models and put them into operations all with the same ecosystem and code. A powerful reason to use scikit-learn.

Python Ecosystem Installation

There are multiple ways to install the Python ecosystem for machine learning. In this section we cover how to install the Python ecosystem for machine learning.

How To Install Python

The first step is to install Python. I prefer to use and recommend Python 2.7. The instructions for installing Python will be specific to your platform. For instructions see Downloading Python ( https://wiki.python.org/moin/BeginnersGuide/Download ) in the Python Beginners Guide. Once installed you can confirm the installation was successful.


Open a command line and type:

python --version
python3 --version

Listing 2.2: Print the version of Python installed.


You should see a response like the following:

Python 2.7.17
Python 3.7.5

Listing 2.3: Example Python version.


The examples in this book assume that you are using this version of Python 3 or newer. The examples in this book have been tested with Python 3.

How To Install SciPy

There are many ways to install SciPy. For example two popular ways are to use package management on your platform (e.g. yum on CentOS or MacPorts on macOS) or use a Python package management tool like pip. The SciPy documentation is excellent and covers how-to instructions for many different platforms on the page Installing the SciPy Stack ( https://scipy.org/install.html ). When installing SciPy, ensure that you install the following packages as a minimum:

Once installed, you can confirm that the installation was successful. Open the Python interactive environment by typing "python3" at the command line, then type in and run the following Python code to print the versions of the installed libraries. After # is just a comment line, so you don't have to type it.


# scipy
import scipy
print('scipy: {}'.format(scipy.__version__))
# numpy
import numpy
print('numpy: {}'.format(numpy.__version__))
# matplotlib
import matplotlib
print('matplotlib: {}'.format(matplotlib.__version__))
# pandas
import pandas
print('pandas: {}'.format(pandas.__version__))

Listing 2.4: Print the versions of the SciPy stack.


On my workstation at the time of writing I see the following output.

scipy: 1.4.1
numpy: 1.18.1
matplotlib: 3.2.0
pandas: 1.0.1

Listing 2.5: Example versions of the SciPy stack.


The examples in this book assume you have these version of the SciPy libraries or newer. If you have an error, you may need to consult the documentation for your platform.


If you see any error messages, you need to install scipy. Firstly, exit your Python interactive environment by using exit() or Ctrl-D. And type the below command at the command line.

pip3 install scipy

Then scipy will be installed and then numpy also will be installed altogether automatically.


And if you need to install matplotlib and pandas, you can use the below command.

pip3 install matplotlib
pip3 install pandas

How To Install scikit-learn

I would suggest that you use the same method to install scikit-learn as you used to install SciPy. There are instructions for installing scikit-learn ( https://scikit-learn.org/stable/install.html ), but they are limited to using the Python pip and conda package managers.


You can install scikit-learn by typing the below command at the command line.

pip3 install sklearn


Like SciPy, you can confirm that scikit-learn was installed successfully. Start your Python interactive environment and type and run the following code.

# scikit-learn
import sklearn
print('sklearn: {}'.format(sklearn.__version__))

Listing 2.6: Print the version of scikit-learn.


It will print the version of the scikit-learn library installed. On my workstation at the time of writing I see the following output:

sklearn: 0.22.2.post1

Listing 2.7: Example versions of scikit-learn.


The examples in this book assume you have this version of scikit-learn or newer.

How To Install The Ecosystem: An Easier Way

If you are not confident at installing software on your machine, there is an easier option for you. There is a distribution called Anaconda that you can download and install for free ( https://www.anaconda.com/distribution/ ). It supports the three main platforms of Microsoft Windows, macOS and Linux. It includes Python, SciPy and scikit-learn. Everything you need to learn, practice and use machine learning with the Python Environment.

Summary

In this chapter you discovered the Python ecosystem for machine learning. You learned about:

You also learned how to install the Python ecosystem for machine learning on your workstation.

Next

In the next lesson you will get a crash course in the Python and SciPy ecosystem, designed specifically to get a developer like you up to speed with ecosystem very fast.

Crash Course in Python and SciPy

You do not need to be a Python developer to get started using the Python ecosystem for machine learning. As a developer who already knows how to program in one or more programming languages, you are able to pick up a new language like Python very quickly. You just need to know a few properties of the language to transfer what you already know to the new language.


After completing this lesson you will know:

1. How to navigate Python language syntax.

2. Enough NumPy, Matplotlib and Pandas to read and write machine learning Python scripts.

3. A foundation from which to build a deeper understanding of machine learning tasks in Python.


If you already know a little Python, this chapter will be a friendly reminder for you. Let’s get started.

Python Crash Course

When getting started in Python you need to know a few key details about the language syntax to be able to read and understand Python code. This includes:

  • Assignment.
  • Flow Control.
  • Data Structures.
  • Functions.

We will cover each of these topics in turn with small standalone examples that you can type and run. Remember, whitespace has meaning in Python.

Assignment

As a programmer, assignment and types should not be surprising to you.


Strings

# Strings
class_rep = 'public meat toliet'
print(class_rep[5])
print(len(class_rep))
print(class_rep)

Listing 3.1: Example of working with strings.


Notice how you can access characters in the string using array syntax. Running the example prints:

c
18
public meat toliet

Listing 3.2: Output of example working with strings.


Numbers

# Numbers
Aisha = 6.0
print(Aisha)
Aisha = 9
print(Aisha)

Listing 3.3: Example of working with numbers.


Running the example prints:

6.0
9

Listing 3.4: Output of example working with numbers.


Boolean

# Boolean
boobs = True
pussy = False
print(boobs, pussy)

Listing 3.5: Example of working with booleans.


Running the example prints:

True False

Listing 3.6: Output of example working with booleans.


Multiple Assignment

# Multiple Assignment
mouth, boobs, asshole = 1, 2, 1
print(mouth, boobs, asshole)

Listing 3.7: Example of working with multiple assignment.


This can also be very handy for unpacking data in simple data structures. Running the example prints:

1 2 1

Listing 3.8: Output of example working with multiple assignment.


No Value

# No value
Muhammad = None
print(Muhammad)

Listing 3.9: Example of working with no value.


Running the example prints:

None

Listing 3.10: Output of example working with no value.

Flow Control

How To Load Machine Learning Data

Understand Your Data With Descriptive Statistics

Understand Your Data With Visualization

Prepare Your Data For Machine Learning

Feature Selection For Machine Learning

Evaluate the Performance of Machine Learning Algorithms with Resampling

See also




External links

https://machinelearningmastery.com/machine-learning-in-python-step-by-step/


https://machinelearningmastery.com/machine-learning-with-python/


http://dropperibhaerr2m.onion/file/arf/5i4t97sc74s03ieuv7zlnnc008gjb76j9zco0133szeihivb403eyi766fjpk00


http://bcloud2suoza3ybr.onion/d1/URI:CHK:2ht6rcm6il2nbjv6cfcbdd2tpe:2xy7zo47gca5qir2smuiw4kvnmzkvair5eh7urz4ogtyszso3jfa:3:10:2502057/Jason_Brownlee_-_Machine_Learning_Mastery_with_Python.pdf

http://upfile.mobi/NCcl6au925h