Skip to main content

Breast Cancer Tumor Machine Learning Prediction Using Scikit Learn

Breast Cancer Machine Learning Prediction. Used Scikit Learn for Training, Evaluating, and Prediction. Used Seaborn and Matplotlib for Visualizing.

Run this on Jupyter Notebook

# LIBRARY IMPORTS

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# DATASET IMPORT

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

#VIEW DATA

cancer

cancer.keys()

print(cancer['DESCR'])

print(cancer['target_names'])

print(cancer['target'])

print(cancer['feature_names'])

print(cancer['data'])

cancer['data'].shape

df_cancer = pd.DataFrame(np.c_[cancer['data'], cancer['target']], columns = np.append(cancer['feature_names'], ['target']))

df_cancer.head()

df_cancer.tail()

# VISUALIZE DATA

# SEABORN PAIRPLOT

sns.pairplot(df_cancer, hue = 'target', vars = ['mean radius', 'mean texture', 'mean area', 'mean perimeter', 'mean smoothness'] )

# SEABORN COUNTPLOT

sns.countplot(df_cancer['target'], label = "Count") 

# SEABORN SCATTERPLOT

sns.scatterplot(x = 'mean area', y = 'mean smoothness', hue = 'target', data = df_cancer)

# SEABORN LMPLOT

sns.lmplot('mean area', 'mean smoothness', hue ='target', data = df_cancer, fit_reg=False)

# SEABORN HEATMAP

plt.figure(figsize=(20,10)) 
sns.heatmap(df_cancer.corr(), annot=True) 

# MODEL TRAINING

# DEFINING X and y

X = df_cancer.drop(['target'],axis=1)

X.head()

y = df_cancer['target']

y.head()

# TRAIN TEST SPLIT (20-80)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=5)

X_train.shape

X_test.shape

y_train.shape

y_test.shape

# IMPORT MODELS

from sklearn.svm import SVC 
from sklearn.metrics import classification_report, confusion_matrix

# TRAIN ON SVC MODEL

svc_model = SVC()
svc_model.fit(X_train, y_train)

# EVALUATING

# PREDICT

y_predict = svc_model.predict(X_test)

cm = confusion_matrix(y_test, y_predict)

sns.heatmap(cm, annot=True)

print(classification_report(y_test, y_predict))
             precision    recall  f1-score   support

        0.0       0.00      0.00      0.00        44
        1.0       0.61      1.00      0.76        70

avg / total       0.38      0.61      0.47       114
# If the result turns out to be terribly off the precision. Like in this case, it is coming out to be 34% only then we need to normalize the data.

# IMPROVING MODEL

X_train.head()

# NORMALIZATION

# TRAIN DATA

min_train = X_train.min()
print(min_train)

max_train = X_train.max()
print(max_train)

range_train = max_train - min_train
print(range_train)

X_train_scaled = (X_train - min_train)/range_train
print(X_train_scaled)

# COMPARING EARLIER TRAIN DATA AND NORMALIZED TRAIN DATA

sns.scatterplot(x = X_train['mean area'], y = X_train['mean smoothness'], hue = y_train)

sns.scatterplot(x = X_train_scaled['mean area'], y = X_train_scaled['mean smoothness'], hue = y_train)

# TEST DATA

min_test = X_test.min()

max_test = X_test.max()

range_test = max_test - min_test

X_test_scaled = (X_test - min_test)/range_test

# TRAIN AND PREDICT

from sklearn.svm import SVC 
from sklearn.metrics import classification_report, confusion_matrix

svc_model = SVC()

svc_model.fit(X_train_scaled, y_train)

y_predict = svc_model.predict(X_test_scaled)

cm = confusion_matrix(y_test, y_predict)

sns.heatmap(cm,annot=True,fmt="d")

print(classification_report(y_test,y_predict))
             precision    recall  f1-score   support

        0.0       0.76      0.97      0.85        39
        1.0       0.98      0.84      0.91        75

avg / total       0.91      0.89      0.89       114
# If the result has improved, the normalization is successful. In this case precision has drastically improved with normalization

Comments

Popular posts from this blog

Difference between .exec() and .execPopulate() in Mongoose?

Here I answer what is the difference between .exec() and .execPopulate() in Mongoose? .exec() is used with a query while .execPopulate() is used with a document Syntax for .exec() is as follows: Model.query() . populate ( 'field' ) . exec () // returns promise . then ( function ( document ) { console . log ( document ); }); Syntax for .execPopulate() is as follows: fetchedDocument . populate ( 'field' ) . execPopulate () // returns promise . then ( function ( document ) { console . log ( document ); }); When working with individual document use .execPopulate(), for model query use .exec(). Both returns a promise. One can do without .exec() or .execPopulate() but then has to pass a callback in populate.

Python - List - Append, Count, Extend, Index, Insert, Pop, Remove, Reverse, Sort

🐍 Advance List List is widely used and it's functionalities are heavily useful. Append Adds one element at the end of the list. Syntax list1.append(value) Input l1 = [1, 2, 3] l1.append(4) l1 Output [1, 2, 3, 4] append can be used to add any datatype in a list. It can even add list inside list. Caution: Append does not return anything. It just appends the list. Count .count(value) counts the number of occurrences of an element in the list. Syntax list1.count(value) Input l1 = [1, 2, 3, 4, 3] l1.count(3) Output 2 It returns 0 if the value is not found in the list. Extend .count(value) counts the number of occurrences of an element in the list. Syntax list1.extend(list) Input l1 = [1, 2, 3] l1.extend([4, 5]) Output [1, 2, 3, 4, 5] If we use append, entire list will be added to the first list like one element. Extend, i nstead of considering a list as one element, it joins the two lists one after other. Append works in the following way. Input l1 = [1, 2, 3] l1.append([4, 5]) Output...

269. Alien Dictionary

  Solution This article assumes you already have some confidence with  graph algorithms , such as  breadth-first search  and  depth-first searching . If you're familiar with those, but not with  topological sort  (the topic tag for this problem), don't panic, as you should still be able to make sense of it. It is one of the many more advanced algorithms that keen programmers tend to "invent" themselves before realizing it's already a widely known and used algorithm. There are a couple of approaches to topological sort;  Kahn's Algorithm  and DFS. A few things to keep in mind: The letters  within a word  don't tell us anything about the relative order. For example, the presence of the word  kitten  in the list does  not  tell us that the letter  k  is before the letter  i . The input can contain words followed by their prefix, for example,  abcd  and then  ab . These cases will never ...