Skip to main content

Breast Cancer Tumor Machine Learning Prediction Using Scikit Learn

Breast Cancer Machine Learning Prediction. Used Scikit Learn for Training, Evaluating, and Prediction. Used Seaborn and Matplotlib for Visualizing.

Run this on Jupyter Notebook

# LIBRARY IMPORTS

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# DATASET IMPORT

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

#VIEW DATA

cancer

cancer.keys()

print(cancer['DESCR'])

print(cancer['target_names'])

print(cancer['target'])

print(cancer['feature_names'])

print(cancer['data'])

cancer['data'].shape

df_cancer = pd.DataFrame(np.c_[cancer['data'], cancer['target']], columns = np.append(cancer['feature_names'], ['target']))

df_cancer.head()

df_cancer.tail()

# VISUALIZE DATA

# SEABORN PAIRPLOT

sns.pairplot(df_cancer, hue = 'target', vars = ['mean radius', 'mean texture', 'mean area', 'mean perimeter', 'mean smoothness'] )

# SEABORN COUNTPLOT

sns.countplot(df_cancer['target'], label = "Count") 

# SEABORN SCATTERPLOT

sns.scatterplot(x = 'mean area', y = 'mean smoothness', hue = 'target', data = df_cancer)

# SEABORN LMPLOT

sns.lmplot('mean area', 'mean smoothness', hue ='target', data = df_cancer, fit_reg=False)

# SEABORN HEATMAP

plt.figure(figsize=(20,10)) 
sns.heatmap(df_cancer.corr(), annot=True) 

# MODEL TRAINING

# DEFINING X and y

X = df_cancer.drop(['target'],axis=1)

X.head()

y = df_cancer['target']

y.head()

# TRAIN TEST SPLIT (20-80)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=5)

X_train.shape

X_test.shape

y_train.shape

y_test.shape

# IMPORT MODELS

from sklearn.svm import SVC 
from sklearn.metrics import classification_report, confusion_matrix

# TRAIN ON SVC MODEL

svc_model = SVC()
svc_model.fit(X_train, y_train)

# EVALUATING

# PREDICT

y_predict = svc_model.predict(X_test)

cm = confusion_matrix(y_test, y_predict)

sns.heatmap(cm, annot=True)

print(classification_report(y_test, y_predict))
             precision    recall  f1-score   support

        0.0       0.00      0.00      0.00        44
        1.0       0.61      1.00      0.76        70

avg / total       0.38      0.61      0.47       114
# If the result turns out to be terribly off the precision. Like in this case, it is coming out to be 34% only then we need to normalize the data.

# IMPROVING MODEL

X_train.head()

# NORMALIZATION

# TRAIN DATA

min_train = X_train.min()
print(min_train)

max_train = X_train.max()
print(max_train)

range_train = max_train - min_train
print(range_train)

X_train_scaled = (X_train - min_train)/range_train
print(X_train_scaled)

# COMPARING EARLIER TRAIN DATA AND NORMALIZED TRAIN DATA

sns.scatterplot(x = X_train['mean area'], y = X_train['mean smoothness'], hue = y_train)

sns.scatterplot(x = X_train_scaled['mean area'], y = X_train_scaled['mean smoothness'], hue = y_train)

# TEST DATA

min_test = X_test.min()

max_test = X_test.max()

range_test = max_test - min_test

X_test_scaled = (X_test - min_test)/range_test

# TRAIN AND PREDICT

from sklearn.svm import SVC 
from sklearn.metrics import classification_report, confusion_matrix

svc_model = SVC()

svc_model.fit(X_train_scaled, y_train)

y_predict = svc_model.predict(X_test_scaled)

cm = confusion_matrix(y_test, y_predict)

sns.heatmap(cm,annot=True,fmt="d")

print(classification_report(y_test,y_predict))
             precision    recall  f1-score   support

        0.0       0.76      0.97      0.85        39
        1.0       0.98      0.84      0.91        75

avg / total       0.91      0.89      0.89       114
# If the result has improved, the normalization is successful. In this case precision has drastically improved with normalization

Comments

Popular posts from this blog

Difference between .exec() and .execPopulate() in Mongoose?

Here I answer what is the difference between .exec() and .execPopulate() in Mongoose? .exec() is used with a query while .execPopulate() is used with a document Syntax for .exec() is as follows: Model.query() . populate ( 'field' ) . exec () // returns promise . then ( function ( document ) { console . log ( document ); }); Syntax for .execPopulate() is as follows: fetchedDocument . populate ( 'field' ) . execPopulate () // returns promise . then ( function ( document ) { console . log ( document ); }); When working with individual document use .execPopulate(), for model query use .exec(). Both returns a promise. One can do without .exec() or .execPopulate() but then has to pass a callback in populate.

Machine Learning — Supervised, Unsupervised, and Reinforcement — Explanation with Example

🤖 Let's take an example of machine learning and see how it can be performed in three different ways — Supervised, Unsupervised, and Reinforcement. We want a program to be able to identify apple in pictures Supervised Learning You will create or use a model that takes a set of pictures of apple and it analyses the commonality in those pictures. Now when you show a new picture to the program, it will identify whether it has an apple or not. It can also provide details on how confident is the program about it. Unsupervised Learning In this method, you create or use a model that goes through some images and tries to group them as per the commonalities it observes such as color, shape, size, partern, etc. And now you can go through the groups and inform the program what to call them. So, you can inform the program about the group that is apple mostly. Next time you show a picture, it can tell if an apple is there or not. Reinforcement Learning Here the model you create or...

269. Alien Dictionary

  Solution This article assumes you already have some confidence with  graph algorithms , such as  breadth-first search  and  depth-first searching . If you're familiar with those, but not with  topological sort  (the topic tag for this problem), don't panic, as you should still be able to make sense of it. It is one of the many more advanced algorithms that keen programmers tend to "invent" themselves before realizing it's already a widely known and used algorithm. There are a couple of approaches to topological sort;  Kahn's Algorithm  and DFS. A few things to keep in mind: The letters  within a word  don't tell us anything about the relative order. For example, the presence of the word  kitten  in the list does  not  tell us that the letter  k  is before the letter  i . The input can contain words followed by their prefix, for example,  abcd  and then  ab . These cases will never ...