What is Data Analytics with AI?
Let’s say, you have some data of energy load, and I am asking you to take a decision based on the data to forecast the energy load for tomorrow. What will you do? You will check data and you will do some calculations on the data. Then you will tell me the forecast of the energy load for tomorrow. This is called human learning.
Now imagine that I am giving the same facilities to a machine, that machine will do it for us, then it is called data analytics with AI (Artificial Intelligence).
There are some processes between data to decisions, it means we are doing some data analysis to get the decision from the data. Processes are presented in the figure below.
- Descriptive – Analysis of data to get information on what happened in past.
- Diagnostics – If happened in past, then what was the reason.
- Predictive – When you got the reason, then we predict what will happen in future.
- Prescriptive – Based on the prediction we take the decision.
MATLAB Code for Data Analytics
The code is used to create a predictive model for intensity vs field. Before creating a predictive model, some pre-processing will be required on data, like identifying missing data, identifying and removing outliers and smoothing of data.
The data I used is based on solar data that are intensity and field data.
My data is in an excel file as given below.
Read the table and plot intensity vs field
solarData=readtable('solar_heat.xlsx') field=solarData.Field; intensity=solarData.Intensity; plot(field,intensity)
Find missing data
To get the missing logical index as 0 and 1 we use ismissing function, further count that values of 1 and it will tell you the total missing values. In this data there are no missing values,
idx=ismissing(field); sum(idx) idx1=ismissing(intensity); nnz(idx1)
Find outliers and fill outliers
To get the logical index for outliers, we use isoutlier command in MATLAB
idx=isoutlier(field); nnz(idx) idx1=isoutlier(intensity); nnz(idx1)
In this case we are using linear interpolation method to fill the outliers as given below.
% Fill outliers [cleanedIntensity,outlierIndices,thresholdLow,thresholdHigh] = ... filloutliers(intensity,'linear'); % Visualize results clf plot(intensity,'Color',[109 185 226]/255,'DisplayName','Input data') hold on plot(cleanedIntensity,'Color',[0 114 189]/255,'LineWidth',1.5,... 'DisplayName','Cleaned data') % Plot outliers plot(find(outlierIndices),intensity(outlierIndices),'x','Color',[64 64 64]/255,... 'DisplayName','Outliers') title(['Number of outliers: ' num2str(nnz(outlierIndices))]) % Plot filled outliers plot(find(outlierIndices),cleanedIntensity(outlierIndices),'.','MarkerSize',12,... 'Color',[217 83 25]/255,'DisplayName','Filled outliers') % Plot outlier thresholds plot([xlim missing xlim],[thresholdLow*[1 1] NaN thresholdHigh*[1 1]],... 'Color',[145 145 145]/255,'DisplayName','Outlier thresholds') hold off legend clear outlierIndices thresholdLow thresholdHigh
Instead of writing a code you can use live script capability for data pre-processing, to know how to use live scripts for data pre-processing watch my tutorial at the end of this article.
To smooth data I used moving median method with smoothing factor 0.25
% Smooth input data SmoothedIntensity = smoothdata(cleanedIntensity,'movmean','SmoothingFactor',0.25,... 'SamplePoints',field); % Visualize results clf plot(field,cleanedIntensity,'Color',[109 185 226]/255,... 'DisplayName','Input data') hold on plot(field,SmoothedIntensity,'Color',[0 114 189]/255,'LineWidth',1.5,... 'DisplayName','Smoothed data') hold off legend
To smooth data using live script capability watch my tutorial at the end of this article.
Create the Predictive Model
As you see in the above graph, it looks like a sinusoidal graph. So we will fit the data using non-linear least square method. The predictive model will be saved as fitresult. Confidence bound is at 95% and accuracy of model can be checked through goodness of fit that is R square value, which lies between 0 to 1. If it is near to 1 then the model is a perfect fit.
%CREATEFIT(FIELD,SMOOTHEDINTENSITY) % Create a fit. % % Data for 'untitled fit 1' fit: % X Input : field % Y Output: SmoothedIntensity % Output: % fitresult : a fit object representing the fit. % gof : structure with goodness-of fit info. % % See also FIT, CFIT, SFIT. % Auto-generated by MATLAB on 14-Dec-2019 14:18:17 %% Fit: 'untitled fit 1'. [xData, yData] = prepareCurveData( field, SmoothedIntensity ); % Set up fittype and options. ft = fittype( 'sin3' ); opts = fitoptions( 'Method', 'NonlinearLeastSquares' ); opts.Display = 'Off'; opts.Lower = [-Inf 0 -Inf -Inf 0 -Inf -Inf 0 -Inf]; opts.StartPoint = [70.1799020954987 0.0376991118430775 3.14039065337383 56.6156219595147 0.0251327412287183 1.36238564426427 56.222865322196 0.0125663706143592 -1.22580091288665]; % Fit model to data. [fitresult, gof] = fit( xData, yData, ft, opts ); % Plot fit with data. figure( 'Name', 'untitled fit 1' ); h = plot( fitresult, xData, yData ); legend( h, 'SmoothedIntensity vs. field', 'untitled fit 1', 'Location', 'NorthEast', 'Interpreter', 'none' ); % Label axes xlabel( 'field', 'Interpreter', 'none' ); ylabel( 'SmoothedIntensity', 'Interpreter', 'none' ); grid on
If you don’t want to write a code to fit the data, you can use cftool in MATLAB that is a part of curve fitting toolbox. Watch my tutorial on using cftool at the end of the article.
Predict on New Data
Use the predictive model to predict intensity on new fields value
we are predicting 6 intensity values on 6 inputs of field data.
output=fitresult([0.0305 27.7727 32.4116 83.6233 267.0146 414.2099])
You can use this code for your own data if you have one independent and one dependent variable, else you can use 2 independent and one dependent variable using curve fitting toolbox. For next part click on Using Machine Learning
A Recorded Webinar on Data Analytics with AI
Click here to watch the recorded Webinar