Time Series Forecasting Using Machine Learning COVID-19

This example uses the data set covid-19 patients of India. The example trains a Machine Learning Model to forecast the number of covid-19 cases given the number of cases in previous days.

About Data

Corona contains a single time series, with time steps corresponding to days and values corresponding to the number of cases. The output is a column array, where each element is a single time step. For the forecastive model we require confirmed cases data only of everyday. But for analysis you require COVID-19 data of every state on daily basis.

The data which I have look like as given in the figure below, I collected data from 30th Jan 2020 to 5th May 2020 of all the effected states in India . You can download data of every country from github as well.

Load Sequence Data

Read table, extract data and plot data

aa=readtable('covid19Patients.xlsx');
time=aa.Date;
confirmed=aa.Confirmed;
Active_cases=aa.ActiveCases;
figure,
plot(time,confirmed,'LineWidth',2)
hold on
plot(time,Active_cases,'LineWidth',2)
plot(time,aa.Death,'LineWidth',2)
hold off
legend('Confirmed Cases','Active Cases','Death Cases','location','northwest')
grid on
Confirmed Cases, Active Cases and Death Cases

Daily Registered Cases Statewise

The figure below shows a surface plot of total daily cases vs. days and states of India. To plot this graph you can use MATLAB inbuilt function.

data=aa{1:end,6:end};
date=aa.Date;
states=categorical(aa.Properties.VariableNames(6:end))';
surf(states,date,data)
axis tight
yticks([aa.Date(1) aa.Date(10:10:(end-10))' aa.Date(end)]) % setting xticks manually
xlabel('States')
ylabel('Date')
zlabel('Daily cases')
As you see in the above graph till mid of March variation is almost flat, but you can see after mid March there are drastic changes.

Partition the training and testing Active_cases. Train on the first 90% of the sequence and test on the last 10%.

To forecast the values of future time steps of a sequence, specify the responses to be the training sequences with values shifted by one time step. That is, at each time step of the input sequence, the machine learning learns to predict the value of the next time step.

numTimeStepsTrain = floor(0.9*numel(Active_cases));
XTrain = Active_cases(1:numTimeStepsTrain);
YTrain = Active_cases(2:numTimeStepsTrain+1);
XTest = Active_cases(numTimeStepsTrain+1:end-1);
YTest = Active_cases(numTimeStepsTrain+2:end);

Prepare Data

Concatenate both XTrain YTrain in a single matrix

dataa=[XTrain YTrain];

Call Local Function

Machine Learning algorithm is given at the end of the article as a local function, to fit the data a linear regression model (fitlm) is used.

To get more information about this machine learning model you can visit here

You may use Regression App Learner in MATLAB to identify the best model for your data. To learn how to use Regression App Learner watch my tutorial here

YPred=trainedModel.predictFcn(XTest);

Calculate the RMSE from the predictions.

rmse = sqrt(mean((YPred-YTest).^2))

Plot the training time series with the forecasted values

numTimeStepsTest = numel(XTest);
figure
plot(Active_cases(1:numTimeStepsTrain))
hold on
idx = numTimeStepsTrain:(numTimeStepsTrain+numTimeStepsTest);
plot(idx,[Active_cases(numTimeStepsTrain); YPred],'.-')
hold off
xlabel("Daily")
ylabel("Cases")
title("Forecast")
legend(["Observed" "Forecast"])
y_labels = get(gca, 'YTick'); % get ydata
set(gca, 'YTickLabel', y_labels);
Forecasted Values on Test Data

Compare the Forecasted Values with the Test Data

figure
plot(YTest)
hold on
plot(YPred,'.-')
hold off
legend(["Observed" "Forecast"])
ylabel("Cases")
title("Forecast")
y_labels = get(gca, 'YTick'); % get ydata
set(gca, 'YTickLabel', y_labels);
True Vs Predicted

Error Plot


stem(YPred - YTest)
xlabel("Day")
ylabel("Error")
title("RMSE = " + rmse)
Error Plot at Every Point

Local Function for Machine Learning Model

function [trainedModel, validationRMSE] = trainRegressionModel(trainingData)
% [trainedModel, validationRMSE] = trainRegressionModel(trainingData)
% Returns a trained regression model and its RMSE. This code recreates the
% model trained in Regression Learner app. Use the generated code to
% automate training the same model with new data, or to learn how to
% programmatically train models.
%
%  Input:
%      trainingData: A matrix with the same number of columns and data type
%       as the matrix imported into the app.
%
%  Output:
%      trainedModel: A struct containing the trained regression model. The
%       struct contains various fields with information about the trained
%       model.
%
%      trainedModel.predictFcn: A function to make predictions on new data.
%
%      validationRMSE: A double containing the RMSE. In the app, the
%       History list displays the RMSE for each model.
%
% Use the code to train the model with new data. To retrain your model,
% call the function from the command line with your original data or new
% data as the input argument trainingData.
%
% For example, to retrain a regression model trained with the original data
% set T, enter:
%   [trainedModel, validationRMSE] = trainRegressionModel(T)
%
% To make predictions with the returned 'trainedModel' on new data T2, use
%   yfit = trainedModel.predictFcn(T2)
%
% T2 must be a matrix containing only the predictor columns used for
% training. For details, enter:
%   trainedModel.HowToPredict

% Auto-generated by MATLAB on 05-May-2020 11:05:23


% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
% Convert input to table
inputTable = array2table(trainingData, 'VariableNames', {'column_1', 'column_2'});

predictorNames = {'column_1'};
predictors = inputTable(:, predictorNames);
response = inputTable.column_2;
isCategoricalPredictor = [false];

% Train a regression model
% This code specifies all the model options and trains the model.
concatenatedPredictorsAndResponse = predictors;
concatenatedPredictorsAndResponse.column_2 = response;
linearModel = fitlm(...
    concatenatedPredictorsAndResponse, ...
    'linear', ...
    'RobustOpts', 'off');

% Create the result struct with predict function
predictorExtractionFcn = @(x) array2table(x, 'VariableNames', predictorNames);
linearModelPredictFcn = @(x) predict(linearModel, x);
trainedModel.predictFcn = @(x) linearModelPredictFcn(predictorExtractionFcn(x));

% Add additional fields to the result struct
trainedModel.LinearModel = linearModel;
trainedModel.About = 'This struct is a trained model exported from Regression Learner R2020a.';
trainedModel.HowToPredict = sprintf('To make predictions on a new predictor column matrix, X, use: \n  yfit = c.predictFcn(X) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nX must contain exactly 1 columns because this model was trained using 1 predictors. \nX must contain only predictor columns in exactly the same order and format as your training \ndata. Do not include the response column or any columns you did not import into the app. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appregression_exportmodeltoworkspace'')">How to predict using an exported model</a>.');

% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
% Convert input to table
inputTable = array2table(trainingData, 'VariableNames', {'column_1', 'column_2'});

predictorNames = {'column_1'};
predictors = inputTable(:, predictorNames);
response = inputTable.column_2;
isCategoricalPredictor = [false];

% Perform cross-validation
KFolds = 5;
cvp = cvpartition(size(response, 1), 'KFold', KFolds);
% Initialize the predictions to the proper sizes
validationPredictions = response;
for fold = 1:KFolds
    trainingPredictors = predictors(cvp.training(fold), :);
    trainingResponse = response(cvp.training(fold), :);
    foldIsCategoricalPredictor = isCategoricalPredictor;
    
    % Train a regression model
    % This code specifies all the model options and trains the model.
    concatenatedPredictorsAndResponse = trainingPredictors;
    concatenatedPredictorsAndResponse.column_2 = trainingResponse;
    linearModel = fitlm(...
        concatenatedPredictorsAndResponse, ...
        'linear', ...
        'RobustOpts', 'off');
    
    % Create the result struct with predict function
    linearModelPredictFcn = @(x) predict(linearModel, x);
    validationPredictFcn = @(x) linearModelPredictFcn(x);
    
    % Add additional fields to the result struct
    
    % Compute validation predictions
    validationPredictors = predictors(cvp.test(fold), :);
    foldPredictions = validationPredictFcn(validationPredictors);
    
    % Store predictions in the original order
    validationPredictions(cvp.test(fold), :) = foldPredictions;
end

% Compute validation RMSE
isNotMissing = ~isnan(validationPredictions) & ~isnan(response);
validationRMSE = sqrt(nansum(( validationPredictions - response ).^2) / numel(response(isNotMissing) ));

end

For next part using Deep Learning click here

Note:- Keep in mind that the COVID-19 analysis in this article is for educational purposes only not for publications. The goal is to inspire faculties, researchers and students, how deep learning and machine learning can make a big impact for such kind of analysis.

Leave a Reply

Your email address will not be published.