Description of Problem¶
Suppose that you have a simple dataset consisting of 10 features, where each feature denotes a sensor, and there are 10 sensors in the dataset. Each sensor generates a signal as a float number to be ranged between 0 and 1, and the signal values are labelled as either -1 or 1. The sensors will produce differently distributed signal values. There are 400 samples along the row.
You can download the dataset at the link below
The goal of this problem is to rank sensors based on their predictive power, and the ranked sensors can be plotted in descending order. There could be many ways to score the feature importance for sensors, but here we will solve this problem by using information gain(IG) to score feature importance and to rank sensors.
Load data¶
First load the dataset
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import scipy.stats as st
from math import log
from collections import OrderedDict
from collections import Counter
from IPython.display import Image
# set tick label size for all figures in this notebook
label_size = 13
mpl.rcParams['xtick.labelsize'] = label_size
mpl.rcParams['ytick.labelsize'] = label_size
Start by creating SensorReadings
class, where __init__
method loads task_data and stores all sensor readings to a dataframe
class SensorReadings:
def __init__(self, filename):
# read task data
self.df = pd.read_csv(filename)
# sensor names column
self.sensor_index = self.df.columns[2:]
filename = os.path.join(os.getcwd(),'sensors_dataset.csv')
sensorData = SensorReadings(filename=filename)
display(sensorData.df)
sample index | class_label | sensor0 | sensor1 | sensor2 | sensor3 | sensor4 | sensor5 | sensor6 | sensor7 | sensor8 | sensor9 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | sample0 | 1.0 | 0.834251 | 0.726081 | 0.535904 | 0.214896 | 0.873788 | 0.767605 | 0.111308 | 0.557526 | 0.599650 | 0.665569 |
1 | sample1 | 1.0 | 0.804059 | 0.253135 | 0.869867 | 0.334285 | 0.604075 | 0.494045 | 0.833575 | 0.194190 | 0.014966 | 0.802918 |
2 | sample2 | 1.0 | 0.694404 | 0.595777 | 0.581294 | 0.799003 | 0.762857 | 0.651393 | 0.075905 | 0.007186 | 0.659633 | 0.831009 |
3 | sample3 | 1.0 | 0.783690 | 0.038780 | 0.285043 | 0.627305 | 0.800620 | 0.486340 | 0.827723 | 0.339807 | 0.731343 | 0.892359 |
4 | sample4 | 1.0 | 0.788835 | 0.174433 | 0.348770 | 0.938244 | 0.692065 | 0.377620 | 0.183760 | 0.616805 | 0.492899 | 0.930969 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
395 | sample395 | -1.0 | 0.433150 | 0.816109 | 0.452945 | 0.065469 | 0.237093 | 0.719321 | 0.577969 | 0.085598 | 0.357115 | 0.070060 |
396 | sample396 | -1.0 | 0.339346 | 0.914610 | 0.097827 | 0.077522 | 0.484140 | 0.690568 | 0.420054 | 0.482845 | 0.395148 | 0.438641 |
397 | sample397 | -1.0 | 0.320118 | 0.444951 | 0.401896 | 0.970993 | 0.960264 | 0.138345 | 0.354927 | 0.230749 | 0.204612 | 0.558889 |
398 | sample398 | -1.0 | 0.059132 | 0.337426 | 0.772847 | 0.099038 | 0.966042 | 0.975086 | 0.532891 | 0.035839 | 0.258723 | 0.709958 |
399 | sample399 | -1.0 | 0.379778 | 0.460256 | 0.229257 | 0.768975 | 0.321882 | 0.118572 | 0.448964 | 0.546324 | 0.363127 | 0.176632 |
400 rows × 12 columns
The list of Sensors are like below
sensorData.sensor_index
Index(['sensor0', 'sensor1', 'sensor2', 'sensor3', 'sensor4', 'sensor5', 'sensor6', 'sensor7', 'sensor8', 'sensor9'], dtype='object')
Visualizing sensor signals¶
To visualize individual sensor signals, the helper function is written. This function is bounded to the SensorReadings
class
def plot_sensor_readings(self):
fig = plt.figure(figsize=(10, 9), constrained_layout=False)
n_row, n_col = len(self.sensor_index)//2, 2
axs = fig.subplots(n_row, n_col)
plt.suptitle("individual sensors readings", fontsize=20)
color = 'tab:blue'
color_dualAxs = 'tab:red'
idx_pair = 0
for r in range(n_row):
for c in range(n_col):
axs[r, c].set_title(self.sensor_index[idx_pair],
y=1.3,pad=-14,c='k',fontsize=15)
axs[r, c].plot(self.df[self.sensor_index[idx_pair]],
'o', markersize=4, color=color)
axs[r, c].set_ylabel('sensor readings',color=color, fontsize=10)
axs[r, c].tick_params(axis='y', labelcolor=color)
axs[r, c].set_yticks([0, 0.5, 1])
axs_dual = axs[r, c].twinx()
axs_dual.set_ylabel('class label', color=color_dualAxs, fontsize=10)
axs_dual.plot(self.df.class_label, color=color_dualAxs, linewidth=3)
axs_dual.tick_params(axis='y', labelcolor=color_dualAxs)
axs_dual.set_yticks([-1, 1])
if r < n_row-1:
axs[r, c].set_xticklabels([])
idx_pair +=1
axs[n_row-1, 0].set_xlabel('sample index',fontsize=12)
axs[n_row-1, 1].set_xlabel('sample index',fontsize=12)
plt.tight_layout()
Bound this helper function to SensorReadings
class
SensorReadings.plot_sensor_readings = plot_sensor_readings
sensorData.plot_sensor_readings()
As seen from all sensor signals,
- First 200 signals belong to class label 1
- Second 200 signals belongs to -1
To become a good classifier, the sensor should be able to generate reading values in reasonably well separated distributions.
- Signals from sensor 0, 2, 4, 6, 8 seems to be well classified
- Other sensors like 1, 7, and 9 do not seem to classify well
Split dataset into features and target¶
Separate class labels from sensor readings:
- y for target labels
- X for input features
y = sensorData.df['class_label'].values
X = sensorData.df[sensorData.sensor_index].values
print(f'Shape of sensor readings {X.shape}')
print(f'Shape of class labels {y.shape}')
Shape of sensor readings (400, 10) Shape of class labels (400,)
For further understanding on the given data, we can implement principal component analysis (PCA).
The PCA is a basically dimensionality reduction method, by which the given data can be projected in the plane of two major axes that best describe a distribution of the data.
Then, we can check how well the projected data is grouped into two classes (1 and -1).
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(X)
df_PC = pd.DataFrame(data=principalComponents, columns = ['PC1', 'PC2'])
df_PC['true_label'] = sensorData.df['class_label']
display(df_PC)
PC1 | PC2 | true_label | |
---|---|---|---|
0 | -0.307421 | -0.210411 | 1.0 |
1 | 0.148512 | 0.210618 | 1.0 |
2 | -0.438685 | -0.444488 | 1.0 |
3 | -0.492798 | 0.568961 | 1.0 |
4 | -0.473759 | 0.061023 | 1.0 |
... | ... | ... | ... |
395 | 0.482446 | -0.194064 | -1.0 |
396 | 0.357519 | 0.018492 | -1.0 |
397 | -0.100119 | -0.143667 | -1.0 |
398 | 0.139063 | -0.068952 | -1.0 |
399 | 0.185756 | -0.013335 | -1.0 |
400 rows × 3 columns
df_PC
dataframe includes two principal components as well as corresponding true labels for 400 samples.
k-means clustering¶
We can apply k-means clustering method to group the projected data into two classes.
from sklearn.cluster import KMeans
# n_clusters is 2: 1 and -1
kmeans = KMeans(n_clusters=2, random_state=0).fit(df_PC.iloc[:,:2])
# replace all label 0s to -1
kmeans.labels_[kmeans.labels_==0] = -1
# add kmeans-labels to df_PC
df_PC['labels_by_kmeans'] = kmeans.labels_
Visualizing clustering results¶
fig = plt.figure(figsize=(12, 5), constrained_layout=False)
axs = fig.subplots(1,2)
class1_true = df_PC[df_PC['true_label'] == 1]
class2_true = df_PC[df_PC['true_label'] == -1]
marker_size = 12
axs[0].scatter(class1_true['PC1'], class1_true['PC2'], c='tab:red', s=marker_size, label='label: 1')
axs[0].scatter(class2_true['PC1'], class2_true['PC2'], c='tab:blue', s=marker_size, label='label: -1')
axs[0].set_title("true labels", fontsize=15)
_= axs[0].set_xlabel('PC1', fontsize=13)
_= axs[0].set_ylabel('PC2', fontsize=13)
_= axs[0].legend(fontsize=14)
axs[1].set_title("labels predicted by k-means algorithm", fontsize=15)
class1_kmeans = df_PC[df_PC['labels_by_kmeans'] == 1]
class2_kmeans = df_PC[df_PC['labels_by_kmeans'] == -1]
marker_size = 12
axs[1].scatter(class1_kmeans['PC1'], class1_kmeans['PC2'], c='tab:red', s=marker_size, label='label: 1')
axs[1].scatter(class2_kmeans['PC1'], class2_kmeans['PC2'], c='tab:blue', s=marker_size, label='label: -1')
axs[1].scatter(kmeans.cluster_centers_[0][0],kmeans.cluster_centers_[0][1], c='k',
s=100, marker='^', label='centroids')
axs[1].scatter(kmeans.cluster_centers_[1][0],kmeans.cluster_centers_[1][1], c='k',
marker='^',s=100)
_= axs[1].set_xlabel('PC1', fontsize=13)
_= axs[1].set_ylabel('PC2', fontsize=13)
_= axs[1].legend(fontsize=14,bbox_to_anchor=(1., 1), loc='upper left',)
The accuracy of label prediction by k-means method is pretty high
acc = np.count_nonzero(df_PC['true_label'] == df_PC['labels_by_kmeans'])/len(sensorData.df)
print("accuracy is {}".format(acc))
accuracy is 0.9125
As shown in PCA and k-means clustering, sensor readings can be successfully grouped into two classes by projecting them onto two major axes.
Now we can move on to ranking sensors
Ranking sensors¶
Helper functions¶
First prepare helper functions.
# This function plots obtained ranking scores in descending order.
def plotFeatureScores(scoring_method, y_label):
scores = rank[scoring_method]
keys, values = scores.keys(), scores.values()
plt.figure(figsize=(13,4))
plt.title(scoring_method, y=0.9, pad=-14, fontsize=15)
plt.bar(keys, values, color='green')
plt.ylabel(y_label, fontsize=15)
plt.xticks(rotation=30)
plt.grid(color='grey', linestyle='-', linewidth=0.5, axis='y')
plt.show()
# sort ranking sensors by importance scores in descending order
def sortScores(scores):
return dict(sorted(scores.items(), key=lambda kv: kv[1], reverse=True))
Create a dictionary variable to store ranking scores from all considered methods in this task.
rank = {}
Rename sensor list to feature_names
list
feature_names = sensorData.df.columns[2:]
for f in feature_names:
print(f)
sensor0 sensor1 sensor2 sensor3 sensor4 sensor5 sensor6 sensor7 sensor8 sensor9
Information Gain¶
We calculate information Gain(IG) to score importance of sensors. IG evaluates how much information is obtained when classifying class labels by individual features, not by all features. Thus, IG becomes a function of feature. That is, the most important feature will have the highest IG score among other features.
IG is defined as a difference in Shannon entropy before and after classifying by a feature:
$$ IG(\mathrm{feature}) = H_{\mathrm{before}} - H_{\mathrm{split \, by\, feature}} $$where $H$ is Shannon entropy given by $$ H = -\sum_{i}^{C} p_{i}\log_{\mathrm{base}} p_{i} $$
where $C$ denotes classes and $p_{i}$ is a relative occurrence of class $i$. The base of logarithmic function is determined by the number of classes.
For binary classification problem, the base is 2, which is the case of this task.
A Warm-up Example
Before getting into ranking sensor problem, for better understanding on IG scoring, lets consider a very simple binary classification problem in the figure below.
figure_path = os.path.join(os.getcwd(),'figures')
display(Image(filename=figure_path+'/binaryClassification.png', width=600))
Suppose that we have data consisting of five o and x each. Their distribution remains fixed, and Figure A is the one before classification.
- case 0: before classification (Figure A)
Now we consider three classification cases, and we want to calculate IG for each case.
- case 1: after classified by $x_1$ (Figure B)
- case 2: after classified by $x_2$ (Figure C)
- case 3: after classified by $x_3$ (Figure D)
To obtain IG after each classification, $H_{before}$ (for case 0) and $H_{after}$(for case 1,2, and 3) need to be calculated separately.
Figure A: entropy before classification: $H_{before}$
$p_{\mathrm{o}}$ = $p_{\mathrm{x}}$ = 1/2
Thus, $$ \begin{align} H_{before} &= -p(\mathrm{o})\log_2(p(\mathrm{o})) -p(\mathrm{x})\log_2(p(\mathrm{x})) \nonumber\\ &= -1/2\log_2(1/2) -1/2\log_2(1/2) \nonumber\\ &= 1\nonumber \end{align}$$
This is the maximum entropy because the impurity is greatest.
Figure B: When classified by $x_1$
In left partition,
$H_{left}=0$
In right partition,
$p(\mathrm{x}) = 5/6, p(\mathrm{o}) = 1/6$
$$ H_{right} = -5/6\log_2(5/6) - 1/6\log_2(1/6) = 0.65 $$$H_{after}$ is a weighted sum of two partitions
$$ \begin{align} H_{after} &= (4/10)H_{left} + (6/10)H_{right} \nonumber\\&= (4/10)*0 + (6/10)*0.65 \nonumber\\ &= 0.39 \nonumber\end{align}$$Thus, $$IG(x_1) = H_{before} - H_{after} = 0.61$$
Figure C: When classified by $x_2$
$H_{left}=H_{right}=0$
thus, $H_{after}=0$,
$$IG(x_2) = 1$$meaning that the information gain is maximized by $x_2$
Figure D: When classified by $x_3$
$H_{left}=1$ and $H_{right}=0$
thus, $H_{after}=1$ $$IG(x_3) = 0$$
meaning that there is no information gain by $x_3$
the results are
$$ \begin{align} IG(x_1) &= 0.61 \nonumber\\ IG(x_2) &= 1 \nonumber\\ IG(x_3) &= 0 \nonumber\\ \end{align} $$As expected, $IG(x_2)$ is the highest because $x_2$ splits o and x equally.
So, $x_2$ is the most important feature.
Ranking Sensors Problem¶
Now lets solve the ranking-sensor problem. Like the example above, the plan is to calculate IG for each sensor. This will evaluate the relative importances of individual sensors, by which the sensors are ranked in terms of predictive power.
One thing to be noted is that our sensor signals are continuous values, so they need to be discretized. This can be done by binning values and assigning corresponding labels. Since our target labels are already categorical, on the other hand, they don't need to be discretized.
Detailed implementation is shown below.
Library for Shannon Entropy
To calculate Shannon entropy, Scipy.stat.entropy is used.
Coding
InformationGain
class is written, which is to be a derived class from SensorReadings
class. This InformationGain
class includes the entire process of IG calculation.
This class takes two inputs:
sensorData.df
- number of bins for discretizing sensor-reading values
Lets go through the following code
class InformationGain(SensorReadings):
def __init__(self, _n_bins, *args, **kwargs):
super().__init__(*args, **kwargs)
# total number of samples (400 for the given data)
self.n_sample = len(self.df)
# create a member list containing sensor names 0 to 9
self.sensor_list = self.df.columns[2:]
# assign the number of bins as a member variable
self.n_bins = _n_bins
# set the range of bins for a given n_bin
self.bins = np.linspace(0, 1, self.n_bins)
# dictionary variable to store IG scores for all sensors
self.IG = {}
def binned_label_collector(self, sensor_name):
# this function collects class labels for corresponding reading values in each bin
self.binCollector = {'bin'+str(i):[] for i in range(self.n_bins-1)}
for val, label in zip(self.df[sensor_name],self.df['class_label']):
for i in range(self.n_bins-1):
if val >= self.bins[i] and val < self.bins[i+1]:
self.binCollector['bin'+str(i)].append(label)
def count_labels(self,d_bin):
# this function returns the number of each labels in a binned range
self.count_label = {'1': 0, '-1': 0}
for l in d_bin:
if l==1:
self.count_label['1'] +=1
else:
self.count_label['-1'] +=1
def cal_H_before(self):
# get the number of samples in each class
labelCount= list(Counter(self.df['class_label']).values())
# this should be [200, 200]
# set the base of logarithmic function as the number of classes
# -1 and 1 for this task, so base=2
self.base = len(labelCount)
# calculate H_before
self.H_before = st.entropy(labelCount, base=self.base)
def cal_IG(self):
# obtain H_before first
self.cal_H_before()
# obtain H_after and IG by running over all sensors
for sensor_name in self.sensor_list:
self.binned_label_collector(sensor_name)
# For current sensor, H_after is obtained by running over all bins.
H_after = 0
for key, values in self.binCollector.items():
self.count_labels(values)
weighted_samples = (self.count_label['1'] +
self.count_label['-1'])/self.n_sample
H_binned = st.entropy([self.count_label['1'], self.count_label['-1']], base=self.base)
# weighted sum of each bin's entropy to get H_after
H_after += H_binned*weighted_samples
# store IG score for the current sensor
self.IG[sensor_name] = self.H_before - H_after
# print IG scores for all sensors
for sensor_name, IG_score in self.IG.items():
print(sensor_name+": ", IG_score)
# create instances with different n_bins, here 5 and 10 are assigned
IGForSensors = {}
IGForSensors['bin5'] = InformationGain(_n_bins=6, filename=filename)
IGForSensors['bin10'] = InformationGain(_n_bins=11, filename=filename)
Check how it works (for sensor 0
)¶
Lets take a closer look at how this method works by tracing one sensor's IG calculation. The following code is written just to show the procedure
# for sensor0 for example
# call the instance generated with n_bins=5
IGForSensors['bin5'].binned_label_collector('sensor0')
Going through all bins, and check the collected populations of label 1 and -1, and calculate $H_{bin}$, and $H_{after}$ is a weighted sum of $H_{bin}$ from all bins
data = []
total_sample = 0
H_after = 0
for key, ls in IGForSensors['bin5'].binCollector.items():
counts = list(Counter(ls).values())
weight = sum(counts)/IGForSensors['bin5'].n_sample
# entropy on each bin
H_bin = st.entropy(counts, base=2)
print(key, dict(Counter(ls)))
total_sample += sum(counts)
data.append([key, dict(Counter(ls)), H_bin, weight])
H_after += H_bin*weight
print("")
# check the total number of samples
if total_sample == IGForSensors['bin5'].n_sample:
print("All samples {} are collected!".format(total_sample))
bin0 {1.0: 9, -1.0: 44} bin1 {1.0: 17, -1.0: 69} bin2 {1.0: 38, -1.0: 54} bin3 {1.0: 76, -1.0: 18} bin4 {1.0: 60, -1.0: 15} All samples 400 are collected!
df_sample = pd.DataFrame(data=data, columns=['bin_index', 'label_counts', 'H_bin', 'weight'])
df_sample = df_sample.set_index('bin_index')
display(df_sample)
print("H_after = {}".format(H_after))
print("thus, IG({}) = {}".format('sensor0',1- H_after))
label_counts | H_bin | weight | |
---|---|---|---|
bin_index | |||
bin0 | {1.0: 9, -1.0: 44} | 0.657273 | 0.1325 |
bin1 | {1.0: 17, -1.0: 69} | 0.717252 | 0.2150 |
bin2 | {1.0: 38, -1.0: 54} | 0.978071 | 0.2300 |
bin3 | {1.0: 76, -1.0: 18} | 0.704577 | 0.2350 |
bin4 | {1.0: 60, -1.0: 15} | 0.721928 | 0.1875 |
H_after = 0.7671913210009906 thus, IG(sensor0) = 0.23280867899900937
IG score for sensor0 is obtained.
IG scores for all sensors¶
Now we can get IG scores for all sensors simply by calling cal_IG()
method.
the results are plotted after sorted in descending order.
IGForSensors['bin5'].cal_IG()
# store the results to rank after sorting scores in descending order
rank['InformationGain_bin5'] = sortScores(IGForSensors['bin5'].IG)
# plot the results
plotFeatureScores('InformationGain_bin5', 'IG score')
sensor0: 0.23280867899900937 sensor1: 0.2698171579584686 sensor2: 0.3241527933534172 sensor3: 0.149810629422372 sensor4: 0.2998660346863 sensor5: 0.14143738505578096 sensor6: 0.7101095005699908 sensor7: 0.055102413923765026 sensor8: 0.38730781000504244 sensor9: 0.10794277483157944
The results show that IG scores are greatest for sensor 6
and 8
, while sensor 9
and 7
have the smaller scores. We can cross-check this result against individual sensor reading plots. Indeed sensor 6
and 8
's readings are reasonably well separated in label 1 and -1. This result shows the information gain can be a good metric for ranking sensors.
For 10 bins
We can rank sensors with different n_bins, and check how the results change
IGForSensors['bin10'].cal_IG()
# store the results to rank after sorting scores in descending order
rank['InformationGain_bin10'] = sortScores(IGForSensors['bin10'].IG)
# plot the results
plotFeatureScores('InformationGain_bin10', 'IG score')
sensor0: 0.29337656712255455 sensor1: 0.3942302164526811 sensor2: 0.379610106830096 sensor3: 0.1821675723350179 sensor4: 0.37082221149085526 sensor5: 0.21009798987609918 sensor6: 0.8330243789701728 sensor7: 0.07956458908518493 sensor8: 0.49326446490948794 sensor9: 0.12487441896598928
Alternative Methods¶
Next lets try to rank sensors by other methods. Here we will consider other methods that dedicated libraries are available so that the results can be compared with my IG ranking results.
Decision Tree¶
First, decision tree algorithm is considered, and easily get the ranking result.
from sklearn.tree import DecisionTreeRegressor
forest = DecisionTreeRegressor(random_state=0)
forest.fit(X, y)
importances = forest.feature_importances_
temp = {f:s for f, s in zip(feature_names, forest.feature_importances_)}
# store the result in rank
rank['DecisionTree'] = sortScores(temp)
for sensor, val in temp.items():
print(sensor, round(val,4))
plotFeatureScores('DecisionTree', 'importance score')
sensor0 0.0563 sensor1 0.0097 sensor2 0.0 sensor3 0.0 sensor4 0.0031 sensor5 0.0 sensor6 0.3432 sensor7 0.0 sensor8 0.5877 sensor9 0.0
from sklearn.ensemble import RandomForestRegressor
RFR = RandomForestRegressor()
RFR.fit(X, y)
temp = {f:s for f, s in zip(feature_names, RFR.feature_importances_)}
rank['RandomForestRegressor'] = sortScores(temp)
for sensor, val in temp.items():
print(sensor, round(val,4))
plotFeatureScores('RandomForestRegressor', 'importance score')
sensor0 0.0354 sensor1 0.017 sensor2 0.0221 sensor3 0.0043 sensor4 0.0629 sensor5 0.0041 sensor6 0.2811 sensor7 0.0059 sensor8 0.5625 sensor9 0.0047
Collect all results¶
Now we can collect and compare four sensor-ranking results.
All ranking results are stored in a dataframe df_rank
.
res = []
for method in rank.keys():
sensor_number = ['sensor '+s[-1] for s in rank[method].keys()]
res.append(sensor_number)
df_rank = pd.DataFrame(data=np.array(res).T, columns=rank.keys())
display(df_rank)
InformationGain_bin5 | InformationGain_bin10 | DecisionTree | RandomForestRegressor | |
---|---|---|---|---|
0 | sensor 6 | sensor 6 | sensor 8 | sensor 8 |
1 | sensor 8 | sensor 8 | sensor 6 | sensor 6 |
2 | sensor 2 | sensor 1 | sensor 0 | sensor 4 |
3 | sensor 4 | sensor 2 | sensor 1 | sensor 0 |
4 | sensor 1 | sensor 4 | sensor 4 | sensor 2 |
5 | sensor 0 | sensor 0 | sensor 2 | sensor 1 |
6 | sensor 3 | sensor 5 | sensor 3 | sensor 7 |
7 | sensor 5 | sensor 3 | sensor 5 | sensor 9 |
8 | sensor 9 | sensor 9 | sensor 7 | sensor 3 |
9 | sensor 7 | sensor 7 | sensor 9 | sensor 5 |
Export result to csv file
df_rank.to_csv('sensor_rank_result.csv', index=True)
'Programming > Machine Learning' 카테고리의 다른 글
Relative Standard Deviation(RSD) 란? (ft. 간단한 Python 예제) (0) | 2022.05.09 |
---|---|
1D Convolutional Neural Network 이해하기 (CNN in numpy & keras) (0) | 2021.08.27 |
Seaborn boxplot으로 five-number summary 이해하기 (0) | 2021.05.03 |
Information Gain (간단한 예제 & 파이썬 코드) (3) | 2020.12.12 |
Tensorflow: regression 기본 예제 (연료 효율성 예측) (0) | 2020.11.14 |
댓글