Luis Pedro Coelho

Building Machine Learning Systems with Python – Second Edition

Zaur Huseynovhas quotedlast year
But before you go there, you will have to define what you actually mean by "better". SciKit has a complete package dedicated only to this definition. The package is called sklearn.metrics and also contains a full range of different metrics to measure clustering quality. Maybe that should be the first place to go now. Right into the sources of the metrics package.
Zaur Huseynovhas quotedlast year
SciKit provides a wide range of clustering approaches in the sklearn.cluster package. You can get a quick overview of advantages and drawbacks of each of them at
Zaur Huseynovhas quotedlast year
UCI Machine Learning Dataset Repository

The University of California at Irvine (UCI) maintains an online repository of machine learning datasets (at the time of writing, they list 233 datasets). Both the Iris and the Seeds dataset used in this chapter were taken from there.

The repository is available online at
Zaur Huseynovhas quotedlast year
Let's compare the runtime behavior of NumPy compared with normal Python lists. In the following code, we will calculate the sum of all squared numbers from 1 to 1000 and see how much time it will take. We perform it 10,000 times and report the total time so that our measurement is accurate enough.
Zaur Huseynovhas quotedlast year
What to do when you are stuck
Zaur Huseynovhas quotedlast year
Downloading the example code
You can download the example code files from your account at for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit and register to have the files e-mailed directly to you.

The code for this book is also available on GitHub at This
Drag & drop your files (not more than 5 at once)