Data Analytics with AI

What is Data Analytics with AI?

Let’s say, you have some data of energy load, and I am asking you to take a decision based on the data to forecast the energy load for tomorrow. What will you do? You will check data and you will do some calculations on the data. Then you will tell me the forecast of the energy load for tomorrow. This is called human learning.

Human Learning

Now imagine that I am giving the same facilities to a machine, that machine will do it for us, then it is called data analytics with AI (Artificial Intelligence).

Data Analytics with AI

There are some processes between data to decisions, it means we are doing some data analysis to get the decision from the data. Processes are presented in the figure below.

Data Analysis (Data to Decisions)
  • Descriptive – Analysis of data to get information on what happened in past.
  • Diagnostics – If happened in past, then what was the reason.
  • Predictive – When you got the reason, then we predict what will happen in future.
  • Prescriptive – Based on the prediction we take the decision.

MATLAB Code for Data Analytics

The code is used to create a predictive model for intensity vs field. Before creating a predictive model, some pre-processing will be required on data, like identifying missing data, identifying and removing outliers and smoothing of data.
The data I used is based on solar data that are intensity and field data.

My data is in an excel file as given below.

Two columns of Intensity and Field

Read the table and plot intensity vs field

solarData=readtable('solar_heat.xlsx')
field=solarData.Field;
intensity=solarData.Intensity;
plot(field,intensity)
As you see in above figure that data is not smoothed and there are outliers as well

Find missing data

To get the missing logical index as 0 and 1 we use ismissing function, further count that values of 1 and it will tell you the total missing values. In this data there are no missing values,

idx=ismissing(field);
sum(idx)
idx1=ismissing(intensity);
nnz(idx1)
No missing value

Find outliers and fill outliers

To get the logical index for outliers, we use isoutlier command in MATLAB

idx=isoutlier(field);
nnz(idx)
idx1=isoutlier(intensity);
nnz(idx1)
There are 340 outliers in intensity

Fill outliers

In this case we are using linear interpolation method to fill the outliers as given below.

% Fill outliers
[cleanedIntensity,outlierIndices,thresholdLow,thresholdHigh] = ...
    filloutliers(intensity,'linear');

% Visualize results
clf
plot(intensity,'Color',[109 185 226]/255,'DisplayName','Input data')
hold on
plot(cleanedIntensity,'Color',[0 114 189]/255,'LineWidth',1.5,...
    'DisplayName','Cleaned data')

% Plot outliers
plot(find(outlierIndices),intensity(outlierIndices),'x','Color',[64 64 64]/255,...
    'DisplayName','Outliers')
title(['Number of outliers: ' num2str(nnz(outlierIndices))])

% Plot filled outliers
plot(find(outlierIndices),cleanedIntensity(outlierIndices),'.','MarkerSize',12,...
    'Color',[217 83 25]/255,'DisplayName','Filled outliers')

% Plot outlier thresholds
plot([xlim missing xlim],[thresholdLow*[1 1] NaN thresholdHigh*[1 1]],...
    'Color',[145 145 145]/255,'DisplayName','Outlier thresholds')

hold off
legend
clear outlierIndices thresholdLow thresholdHigh
Black sign shoes outliers and red shows filled data at every outlier and output is saved in cleanedIntensity

Instead of writing a code you can use live script capability for data pre-processing, to know how to use live scripts for data pre-processing watch my tutorial at the end of this article.

Some automatic task which you can do using live script

Smooth Data

To smooth data I used moving median method with smoothing factor 0.25

% Smooth input data
SmoothedIntensity = smoothdata(cleanedIntensity,'movmean','SmoothingFactor',0.25,...
    'SamplePoints',field);

% Visualize results
clf
plot(field,cleanedIntensity,'Color',[109 185 226]/255,...
    'DisplayName','Input data')
hold on
plot(field,SmoothedIntensity,'Color',[0 114 189]/255,'LineWidth',1.5,...
    'DisplayName','Smoothed data')
hold off
legend
Dark blue color shows smoothed Intensity and light blue curve shows before smoothing the data

To smooth data using live script capability watch my tutorial at the end of this article.

Create the Predictive Model

As you see in the above graph, it looks like a sinusoidal graph. So we will fit the data using non-linear least square method. The predictive model will be saved as fitresult. Confidence bound is at 95% and accuracy of model can be checked through goodness of fit that is R square value, which lies between 0 to 1. If it is near to 1 then the model is a perfect fit.

%CREATEFIT(FIELD,SMOOTHEDINTENSITY)
%  Create a fit.
%
%  Data for 'untitled fit 1' fit:
%      X Input : field
%      Y Output: SmoothedIntensity
%  Output:
%      fitresult : a fit object representing the fit.
%      gof : structure with goodness-of fit info.
%
%  See also FIT, CFIT, SFIT.

%  Auto-generated by MATLAB on 14-Dec-2019 14:18:17


%% Fit: 'untitled fit 1'.
[xData, yData] = prepareCurveData( field, SmoothedIntensity );

% Set up fittype and options.
ft = fittype( 'sin3' );
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
opts.Lower = [-Inf 0 -Inf -Inf 0 -Inf -Inf 0 -Inf];
opts.StartPoint = [70.1799020954987 0.0376991118430775 3.14039065337383 56.6156219595147 0.0251327412287183 1.36238564426427 56.222865322196 0.0125663706143592 -1.22580091288665];

% Fit model to data.
[fitresult, gof] = fit( xData, yData, ft, opts );

% Plot fit with data.
figure( 'Name', 'untitled fit 1' );
h = plot( fitresult, xData, yData );
legend( h, 'SmoothedIntensity vs. field', 'untitled fit 1', 'Location', 'NorthEast', 'Interpreter', 'none' );
% Label axes
xlabel( 'field', 'Interpreter', 'none' );
ylabel( 'SmoothedIntensity', 'Interpreter', 'none' );
grid on
Goodness of Fit.
For better fit, regression coefficients ( a1,b1,c1,a2,b2,c2,a3,b3,c3) have to lie between their ranges as given above . It is non-linear regression because regression coefficients are added non linearly.

Red color shows fitted curve and blue color is measured data after pre-processing

If you don’t want to write a code to fit the data, you can use cftool in MATLAB that is a part of curve fitting toolbox. Watch my tutorial on using cftool at the end of the article.

cftool

Predict on New Data

Use the predictive model to predict intensity on new fields value

we are predicting 6 intensity values on 6 inputs of field data.

output=fitresult([0.0305

27.7727
32.4116
83.6233
267.0146
414.2099])

You can use this code for your own data if you have one independent and one dependent variable, else you can use 2 independent and one dependent variable using curve fitting toolbox. For next part click on Using Machine Learning

A Recorded Webinar on Data Analytics with AI

Click here to watch the recorded Webinar

3 thoughts on “Data Analytics with AI”

Leave a Reply

Your email address will not be published.