One of the advanced algorithms in the field of computer science is Genetic Algorithm inspired by the Human genetic process of passing genes from one generation to another.It is generally used for optimization purpose and is heuristic in nature and can be used at various places. For eg – solving np problem,game theory,code-breaking,etc.

Another trending and useful modern-day tech is Machine Learning creating a lot of impacts on mankind which involve learning and finding the pattern in the large amount of data for classification and regression.

But can we somehow involve genetic algorithm in machine learning? How will it affect the results? Let’s find out.


Here are quick steps for how the genetic algorithm works:

  1. Initial Population– Initialize the population randomly based on the data.
  2. Fitness function– Find the fitness value of the each of the chromosomes(a chromosome is a set of parameters which define a proposed solution to the problem that the genetic algorithm is trying to solve)
  3. Selection– Select the best fitted chromosomes as parents to pass the genes for the next generation and create a new population
  4. Cross-over– Create new set of chromosome by combining the parents and add them to new population set
  5. Mutation– Perfrom mutation which alters one or more gene values in a chromosome in the new population set generated. Mutation helps in getting more diverse oppourtinity.Obtained population will be used in the next generation
  6. Repeat step 2-5 again for each generation

Now, let’s get our hands on the code:

Initially, we will run the Logisitcs regression algorithm on breast cancer data.

Import libraries

We will import the important python libraries required for this algorithm.

import numpy as np
import pandas as pd
import random
import matplotlib.pyplot
%matplotlib inline 

Import some other important libraries for implementation of the Machine Learning Algorithm.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

Data

Import the dataset from the python library sci-kit-learn.

#import the breast cancer dataset 
from sklearn.datasets import load_breast_cancer
cancer=load_breast_cancer()
df = pd.DataFrame(cancer['data'],columns=cancer['feature_names'])
label=cancer["target"]

Splitting dataset into test and train.

#splitting the model into training and testing set
X_train, X_test, y_train, y_test = train_test_split(df, 
                                                    label, test_size=0.30, 
                                                    random_state=101)

Training using Logistics Regression Technique-

#training a logistics regression model
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
predictions = logmodel.predict(X_test)
print("Accuracy = "+ str(accuracy_score(y_test,predictions)))
Accuracy = 0.935672514619883

Now, let’s include a genetic algorithm in the process:
Defining all the steps required to follow during Genetic algorithm.

#defining various steps required for the genetic algorithm
def initilization_of_population(size,n_feat):
    population = []
    for i in range(size):
        chromosome = np.ones(n_feat,dtype=np.bool)
        chromosome[:int(0.3*n_feat)]=False
        np.random.shuffle(chromosome)
        population.append(chromosome)
    return population

def fitness_score(population):
    scores = []
    for chromosome in population:
        logmodel.fit(X_train.iloc[:,chromosome],y_train)
        predictions = logmodel.predict(X_test.iloc[:,chromosome])
        scores.append(accuracy_score(y_test,predictions))
    scores, population = np.array(scores), np.array(population) 
    inds = np.argsort(scores)
    return list(scores[inds][::-1]), list(population[inds,:][::-1])

def selection(pop_after_fit,n_parents):
    population_nextgen = []
    for i in range(n_parents):
        population_nextgen.append(pop_after_fit[i])
    return population_nextgen

def crossover(pop_after_sel):
    population_nextgen=pop_after_sel
    for i in range(len(pop_after_sel)):
        child=pop_after_sel[i]
        child[3:7]=pop_after_sel[(i+1)%len(pop_after_sel)][3:7]
        population_nextgen.append(child)
    return population_nextgen

def mutation(pop_after_cross,mutation_rate):
    population_nextgen = []
    for i in range(0,len(pop_after_cross)):
        chromosome = pop_after_cross[i]
        for j in range(len(chromosome)):
            if random.random() < mutation_rate:
                chromosome[j]= not chromosome[j]
        population_nextgen.append(chromosome)
    #print(population_nextgen)
    return population_nextgen

def generations(size,n_feat,n_parents,mutation_rate,n_gen,X_train,
                                   X_test, y_train, y_test):
    best_chromo= []
    best_score= []
    population_nextgen=initilization_of_population(size,n_feat)
    for i in range(n_gen):
        scores, pop_after_fit = fitness_score(population_nextgen)
        print(scores[:2])
        pop_after_sel = selection(pop_after_fit,n_parents)
        pop_after_cross = crossover(pop_after_sel)
        population_nextgen = mutation(pop_after_cross,mutation_rate)
        best_chromo.append(pop_after_fit[0])
        best_score.append(scores[0])
    return best_chromo,best_score

Training the model and predicting the accuracy using Genetic Algorithm in Logistics regression technique.

chromo,score=generations(size=200,n_feat=30,n_parents=100,mutation_rate=0.10,
                     n_gen=38,X_train=X_train,X_test=X_test,y_train=y_train,y_test=y_test)
logmodel.fit(X_train.iloc[:,chromo[-1]],y_train)
predictions = logmodel.predict(X_test.iloc[:,chromo[-1]])
print("Accuracy score after genetic algorithm is= "+str(accuracy_score(y_test,predictions)))

Accuracy score after genetic algorithm is= 0.9532163742690059

Here,in the above code we saw how accuracy is imporved after applying the genetic algorithm with logistic regression for better feature selection.