Forecastive Model for Covid-19 Patients on Confirmed Cases and Death Rate Analysis

Here are analysis on COVID-19 patients, the analysis are divided in two parts for the forecastive model as given below.

  • Confirmed Cases;
  • Death Rate.

The data which I am using is collected from 25th March 2020 to 26th May 2020. I used Indian COVID-19 patients data. My aim is to show how such kind of analysis can be applied to your country as well. Also it can be useful if you are writing a paper on COVID-19 analysis.

The data which I have looks like as given below in the excel file.

Excel data

You can prepare your data in an excel file as given above or you can use other file formats as well. We are going to create two types of model curve fitting (statistical analysis) and machine learning for both the cases.

Load sequence data

Corona contains a single time series, with time steps corresponding to days, and values corresponding to the number of cases. The data is cumulative and it is collected everyday. We are going to plot all cases in a single figure.

clear,clc  % clear variables and command window
aa=readtable('Covid19Patients.xlsx'); % read excel data as a table
time=aa.Date; %extract date
confirmed=aa.Confirmed; % extract confirmed cases
ActiveCases=aa.ActiveCases; % extract active cases
RecoveredCases=aa.Recovered; % extract recovered cases
figure, % new figure to plot
plot(time,confirmed,'LineWidth',2) % plot confirmed cases with line width 2
hold on  % hold figure for next plot on same axes
plot(time,RecoveredCases) % plot recovered cases
plot(time,ActiveCases,'LineWidth',2) % plot active cases
plot(time,aa.Death,'LineWidth',2) % plot death cases
hold off % hold off after plotting all data in a same figure
legend('Confirmed Cases','Active Cases','Death Cases','Recovered Cases','location','northwest')
grid on % show grid in figure
y_labels = get(gca, 'YTick'); % get y data
set(gca, 'YTickLabel', y_labels); % set y axis
axis tight
All Cases

Now let’s start from the confirmed cases.

Confirmed cases

Statistical model (curve fitting) for confirmed cases

Prepare data.

t=1:length(confirmed);

[xData, yData] = prepareCurveData( t, confirmed );

Set up fittype and options.

ft = fittype( 'poly4' ); % polynomial of 4th degree

Fit model to the data.

[fitresult, gof] = fit( xData, yData, ft, 'Normalize', 'on' ) % use fit command
Fit result and goodness of fit with 95 percent confidence bound

Plot fit with data.

figure( 'Name', 'Confirmed Cases Fit' );
h = plot( fitresult, xData, yData );
legend( h, 'confirmed vs. days', 'fitted model', 'Location', 'NorthEast', 'Interpreter', 'none' );

Label axes

xlabel( 't', 'Interpreter', 'none' );
ylabel( 'confirmed', 'Interpreter', 'none' );
grid on
y_labels = get(gca, 'YTick'); % get y data
set(gca, 'YTickLabel', y_labels); % set y axis
axis tight

Forecast for 30 days.
We will use fitted model fitresult for forecasting. We’ve already created this model above.

tfuture=length(confirmed):(length(confirmed)+30); % next 30 days array
Ypredd=fitresult(tfuture); % forecast
timee=time(end):days(1):time(end)+days(30); % time array
figure,
plot(time,confirmed,'LineWidth',2) % plot measured data
hold on
plot(timee,Ypredd,"LineWidth",2) % plot forecasted data
% label axis below
xlabel('Days')
ylabel('Confirmed cases')
title('Predection for 30 days')
dayss=[time' timee];
    xticks((dayss(1:6:end)))
hold off
axis tight
y_labels = get(gca, 'YTick'); % get y data
set(gca, 'YTickLabel', y_labels); % set y axis
axis tight

Machine learning model for confirmed cases

This analysis shows how to forecast time series data using Machine Learning.
To forecast the values of the future time steps of a sequence, you can train a regression Machine Learning, where the responses are the training sequences with values shifted by one time step. That is, at each time step of the input sequence, the Machine Learning learns to predict the value of the next time step.

This example uses the data set COVID-19 patients of India. The example trains an Machine Learning Model to forecast the number of COVID-19 cases given the number of cases in previous days.

Partition the training and test ActiveCases. Train on the first 90% of the sequence and test on the last 10%.

To forecast the values of the future time steps of the sequence, specify the responses to be the training sequences with values shifted by one time step. That is, at each time step of the input sequence, the Machine Learning learns to predict the value of the next time step.

numTimeStepsTrain = floor(0.9*numel(ActiveCases)); % 90 % for training
XTrain = confirmed(1:numTimeStepsTrain); % extract training input
YTrain = confirmed(2:numTimeStepsTrain+1); % extract training output
XTest = confirmed(numTimeStepsTrain+1:end-1); % extract 10 percent testing input
YTest = confirmed(numTimeStepsTrain+2:end); % extract 10 percent testing output


Input:
dataa: A matrix with the same number of columns and data type as the matrix imported into the app.

Output:
trainedModel: A struct containing the trained regression model. The struct contains various fields with information about the trained model.

trainedModel.predictFcn: A function to make predictions on new data.

validationRMSE: A double containing the RMSE.

To make predictions with the returned ‘trainedModel’ on new data T2, use trainedModel.predictFcn(T2)
T2 must be a matrix containing only the predictor columns used for training.

Prepare data.

dataa=[XTrain  YTrain];

Convert input to table.

inputTable = array2table(dataa, 'VariableNames', {'column_1', 'column_2'});

predictorNames = {'column_1'};
predictors = inputTable(:, predictorNames);
response = inputTable.column_2;
isCategoricalPredictor = [false];

Train a regression model

This code specifies all the model options and trains the model.

concatenatedPredictorsAndResponse = predictors;
concatenatedPredictorsAndResponse.column_2 = response;
linearModel = fitlm(...
    concatenatedPredictorsAndResponse, ...
    'linear', ...
    'RobustOpts', 'off');

Create the result struct with predict function.

predictorExtractionFcn = @(x) array2table(x, 'VariableNames', predictorNames);
linearModelPredictFcn = @(x) predict(linearModel, x);
trainedModel.predictFcn = @(x) linearModelPredictFcn(predictorExtractionFcn(x));

Add additional fields to the result struct (optional).

trainedModel.LinearModel = linearModel;
trainedModel.About = 'This struct is a trained model exported from Regression Learner R2020a.';
trainedModel.HowToPredict = sprintf('To make predictions on a new predictor column matrix, X, use: \n  yfit = c.predictFcn(X) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nX must contain exactly 1 columns because this model was trained using 1 predictors. \nX must contain only predictor columns in exactly the same order and format as your training \ndata. Do not include the response column or any columns you did not import into the app. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appregression_exportmodeltoworkspace'')">How to predict using an exported model</a>.');

K fold validation

We will divide our data in 5 folds, it will train 5 times and we will get 5 rmse values. The final rmse value will be the average of all 5.

Convert input to table.

inputTable = array2table(dataa, 'VariableNames', {'column_1', 'column_2'});
predictorNames = {'column_1'};
predictors = inputTable(:, predictorNames);
response = inputTable.column_2;
isCategoricalPredictor = [false];

Perform cross-validation.

% Perform cross-validation
KFolds = 5;
cvp = cvpartition(size(response, 1), 'KFold', KFolds);
% Initialize the predictions to the proper sizes
validationPredictions = response;
for fold = 1:KFolds
    trainingPredictors = predictors(cvp.training(fold), :);
    trainingResponse = response(cvp.training(fold), :);
    foldIsCategoricalPredictor = isCategoricalPredictor;
    
    % Train a regression model
    % This code specifies all the model options and trains the model.
    concatenatedPredictorsAndResponse = trainingPredictors;
    concatenatedPredictorsAndResponse.column_2 = trainingResponse;
    linearModel = fitlm(...
        concatenatedPredictorsAndResponse, ...
        'linear', ...
        'RobustOpts', 'off');
    
    % Create the result struct with predict function
    linearModelPredictFcn = @(x) predict(linearModel, x);
    validationPredictFcn = @(x) linearModelPredictFcn(x);
    
    % Add additional fields to the result struct
    
    % Compute validation predictions
    validationPredictors = predictors(cvp.test(fold), :);
    foldPredictions = validationPredictFcn(validationPredictors);
    
    % Store predictions in the original order
    validationPredictions(cvp.test(fold), :) = foldPredictions;
end

% Compute validation RMSE
isNotMissing = ~isnan(validationPredictions) & ~isnan(response);
validationRMSE = sqrt(nansum(( validationPredictions - response ).^2) / numel(response(isNotMissing) ))

Test the model

Predict on testing data.

YPred=trainedModel.predictFcn(XTest);

Calculate the RMSE.

rmse = sqrt(mean((YPred-YTest).^2))

Plot the training time series with the forecasted values.

numTimeStepsTest = numel(XTest);
figure
plot(confirmed(1:numTimeStepsTrain))
hold on
idx = numTimeStepsTrain:(numTimeStepsTrain+numTimeStepsTest);
plot(idx,[confirmed(numTimeStepsTrain); YPred],'.-')
hold off
xlabel("Daily")
ylabel("Cases")
title("Forecast")
legend(["Observed" "Forecast"])
y_labels = get(gca, 'YTick'); % get ydata
set(gca, 'YTickLabel', y_labels);
axis tight

Compare the forecasted values with the test data.

figure
subplot(2,1,1)
plot(YTest)
hold on
plot(YPred,'.-')
hold off
legend(["Observed" "Forecast"])
ylabel("Cases")
title("Forecast")

subplot(2,1,2)
stem(YPred - YTest)
xlabel("Day")
ylabel("Error")
title("RMSE = " + rmse)

30 days forecast

clear forecasted30
predval=XTest(end);
for i=1:30
    predval=trainedModel.predictFcn(predval);
forecasted30(i)=predval;
end
dayForecast30=time(end)+days(1):days(1):(time(end)+days(30));
figure
plot(dayForecast30,forecasted30)
xlabel('days')
ylabel('Confirmed Cases')
title('Forecasted Confirmed Cases')
axis tight

Death rate forecastive model

Same process we will follow as given above with some changes. Let’s start with a statistical model.

Recovered COVID-19 patients in percentage

% here aa is a table data
ConfitmedAfterdeath=aa.Confirmed-aa.Death; % confirmed cases after death
Recovered=aa.Recovered; % recovered cases 

percentageRecovered=Recovered./ConfitmedAfterdeath*100; % cumulative percentage of recovered patients everyday  
figure
plot(time,percentageRecovered,'LineWidth',2) % plot 
xlabel('Current days') % set x label
ylabel('Recovery Rate') % set y label 
y_labels = get(gca, 'YTick'); % get ydata
set(gca, 'YTickLabel', y_labels);
axis tight

Unrecovered COVID-19 patients in percentage

figure,
RemUnRecPat=100-percentageRecovered; % Unrecovered COVID-19 patients in percentage 
plot(time,RemUnRecPat,'LineWidth',2)
days=[1:length(time)]';
axis tight
xlabel('Current days')
ylabel('Unrecovery Rate')
y_labels = get(gca, 'YTick'); % get ydata
set(gca, 'YTickLabel', y_labels);
axis tight

Statistical model (curve fitting)

We are going to fit on the unrecovered COVID-19 patients in percentage.

Inputs:
X Input : days
Y Output: RemUnRecPat

Output:
fitresult : a fit object representing the fit.
gof : structure with goodness-of fit info.

Prepare data.

[xData, yData] = prepareCurveData( days, RemUnRecPat );

Set up fittype and options.

ft = fittype( 'poly2' ); % polynomial of degree 2

Fit the model to the data.

[fitresult, gof] = fit( xData, yData, ft )
Fitted model with rmse value 1.597

Plot fit with the data.

h = plot( fitresult, xData, yData );
legend( h, 'RemUnRecPat vs. days', 'Fitted model', 'Location', 'NorthEast', 'Interpreter', 'none' );
% Label axes
xlabel( 'days', 'Interpreter', 'none' );
ylabel( 'RemUnRecPat', 'Interpreter', 'none' );
grid on

Forecast for 37 days

FutureData=fitresult([(length(days)+1):(length(days)+37)]'); %next 37 days unrecovered COVID-19 patients in %percentage

Plot next 37 days of unrecovered COVID-19 patients in percentage.

FutureDate=[datetime((time(end)+days(1)):days(1):(time(end)+37))]';
figure,
plot(time,RemUnRecPat)
hold on
plot([time(end);FutureDate],[RemUnRecPat(end);FutureData],'LineWidth',2)
legend('Current unrecovered patients in %','Forecasted unrecovered patients in % ')
title('Unrecovered patients at the end of June in %')
axis tight
hold off

Plot next 37 days of recovered COVID-19 patients in percentage.

figure,
plot(time,100-RemUnRecPat)
hold on
plot([time(end);FutureDate],[100-RemUnRecPat(end);100-FutureData],'LineWidth',2)
xtickss=[time;[time(end);FutureDate]];
    xticks([xtickss(1:6:(end-4));xtickss(end)])
legend('Current recovered patients in %','Forecasted recovered patients in % ','Location','northwest')
title('Recovered patients at the end of June in %')
axis tight

Machine learning model

We will follow the same procedures as we’ve used in the confirmed cases. For explanations you can check the confirmed cases analysis.

Partition the training and test ActiveCases. Train on the first 90% of the sequence and test on the last 10%.

numTimeStepsTrain = floor(0.9*numel(ActiveCases));
XTrain = RemUnRecPat(1:numTimeStepsTrain);
YTrain = RemUnRecPat(2:numTimeStepsTrain+1);
XTest = RemUnRecPat(numTimeStepsTrain+1:end-1);
YTest = RemUnRecPat(numTimeStepsTrain+2:end);

Create a machine learning regression model for unrecovered COVID-19 patients in percentage.

Prepare the data set.

trainingData=[XTrain  YTrain];

inputTable = array2table(trainingData, 'VariableNames', {'column_1', 'column_2'});

predictorNames = {'column_1'};
predictors = inputTable(:, predictorNames);
response = inputTable.column_2;
isCategoricalPredictor = [false];

Train a regression model

% Train a regression model
% This code specifies all the model options and trains the model.
concatenatedPredictorsAndResponse = predictors;
concatenatedPredictorsAndResponse.column_2 = response;
linearModel = fitlm(...
    concatenatedPredictorsAndResponse, ...
    'linear', ...
    'RobustOpts', 'off');

Create the result struct with predict function.

predictorExtractionFcn = @(x) array2table(x, 'VariableNames', predictorNames);
linearModelPredictFcn = @(x) predict(linearModel, x);
trainedModel.predictFcn = @(x) linearModelPredictFcn(predictorExtractionFcn(x));

Add additional fields to the result struct (optional).

trainedModel.LinearModel = linearModel;
trainedModel.About = 'This struct is a trained model exported from Regression Learner R2020a.';
trainedModel.HowToPredict = sprintf('To make predictions on a new predictor column matrix, X, use: \n  yfit = c.predictFcn(X) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nX must contain exactly 1 columns because this model was trained using 1 predictors. \nX must contain only predictor columns in exactly the same order and format as your training \ndata. Do not include the response column or any columns you did not import into the app. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appregression_exportmodeltoworkspace'')">How to predict using an exported model</a>.');

K fold validation.

% Convert input to table
inputTable = array2table(trainingData, 'VariableNames', {'column_1', 'column_2'});

predictorNames = {'column_1'};
predictors = inputTable(:, predictorNames);
response = inputTable.column_2;
isCategoricalPredictor = [false];

% Perform cross-validation
KFolds = 5;
cvp = cvpartition(size(response, 1), 'KFold', KFolds);
% Initialize the predictions to the proper sizes
validationPredictions = response;
for fold = 1:KFolds
    trainingPredictors = predictors(cvp.training(fold), :);
    trainingResponse = response(cvp.training(fold), :);
    foldIsCategoricalPredictor = isCategoricalPredictor;
    
    % Train a regression model
    % This code specifies all the model options and trains the model.
    concatenatedPredictorsAndResponse = trainingPredictors;
    concatenatedPredictorsAndResponse.column_2 = trainingResponse;
    linearModel = fitlm(...
        concatenatedPredictorsAndResponse, ...
        'linear', ...
        'RobustOpts', 'off');
    
    % Create the result struct with predict function
    linearModelPredictFcn = @(x) predict(linearModel, x);
    validationPredictFcn = @(x) linearModelPredictFcn(x);
    
    % Add additional fields to the result struct
    
    % Compute validation predictions
    validationPredictors = predictors(cvp.test(fold), :);
    foldPredictions = validationPredictFcn(validationPredictors);
    
    % Store predictions in the original order
    validationPredictions(cvp.test(fold), :) = foldPredictions;
end

% Compute validation RMSE
isNotMissing = ~isnan(validationPredictions) & ~isnan(response);
validationRMSE = sqrt(nansum(( validationPredictions - response ).^2) / numel(response(isNotMissing) ))

Forecast on the testing data.

YPred=trainedModel.predictFcn(XTest);

Calculate the RMSE.

rmse = sqrt(mean((YPred-YTest).^2))

Plot the training time series with the forecasted values.

numTimeStepsTest = numel(XTest);
figure
plot(RemUnRecPat(1:numTimeStepsTrain))
hold on
idx = numTimeStepsTrain:(numTimeStepsTrain+numTimeStepsTest);
plot(idx,[RemUnRecPat(numTimeStepsTrain); YPred],'.-')
hold off
xlabel("Daily")
ylabel("Cases")
title("Forecast")
legend(["Observed" "Forecast"],'Location','northwest')

30 days forecast

clear forecasted30
predval=XTest(end);
for i=1:30
    predval=trainedModel.predictFcn(predval);
forecasted30(i)=predval;
end
dayForecast30=time(end)+days(1):days(1):(time(end)+days(30));
figure
plot(dayForecast30,forecasted30)
xlabel('days')
ylabel('Unrecovered Cases %')
title('Forecasted Unrecovered Cases %')

As you compare a statistical model with a machine learning model, you will figure-out that the performance of the machine learning model is better than that of the statistical model.

If you want to watch a recorded webinar on data analytics with machine learning you can click here.

My previous article on Deep Learning Model for Detecting COVID-19 on Chest X-ray using MATLAB

Note:- Keep in mind that the COVID-19 analysis in this article is for educational purposes only, NOT for publications. The goal is to spread awareness among faculties, researchers and students, how machine learning can make a big impact for such kind of analysis.

Leave a Reply

Your email address will not be published.