Using Transformer Models for Sequence Forecasting of Heat Demand in District Heating Systems | by XAI4HEAT | May, 2024

District Heating Programs (DHS) are a vital a part of city infrastructure, offering warmth to houses and companies. One of many key challenges in managing these techniques is predicting the hourly warmth demand precisely. This type of forecast helps operators fine-tune the system’s efficiency, lower prices, decrease environmental impacts, and preserve prospects comfortable.

In our analysis, we explored how cutting-edge Deep Studying (DL) methods can be utilized to make these predictions extra exact. This stage of element helps operators make knowledgeable selections on every day operations. For example, they’ll regulate manufacturing ranges primarily based on anticipated demand, handle secondary circulate at substations, and schedule upkeep throughout low-demand durations, like in a single day. Particularly, we centered on forecasting warmth demand over the subsequent 24 hours through the use of Transfomer structure, well-known for its use in probably the most potent Giant Language Fashions at this time.

We began with the speculation that fashionable DL fashions, particularly these designed for sequence forecasting and mixed with customized characteristic engineering (together with climate forecast knowledge), can ship extremely correct outcomes. Our purpose was to attain accuracy corresponding to single-step forecasting whereas sustaining a steady efficiency throughout all the forecast sequence.

Comply with XAI4HEAT on LinkedIn for extra tales like this.

To check this, we used knowledge from an area DHS substation, which included 38,710 time factors over 5 heating seasons from 2019 to 2024. We employed customized characteristic engineering methods and a state-of-the-art Transformer algorithm tailor-made for time sequence forecasting.

Our outcomes have been promising. With out incorporating climate knowledge, our mannequin achieved a Imply Absolute Error (MAE) of 28.15 KWh, which is healthier than the benchmark single-step mannequin utilizing a stacked Lengthy-Brief Time period Reminiscence (LSTM) community (MAE of 28.73 KWh). Once we included climate forecast options, the mannequin’s efficiency improved considerably, with an MAE of 21.09 KWh. Moreover, the mannequin maintained a constant efficiency throughout the forecasted hours, with a typical deviation of MAE of simply 0.23 KWh. We used a vanilla Transformer mannequin tailored for time sequence issues, skilled over 5 epochs.

In abstract, our research exhibits that fashionable DL fashions, when mixed with considerate characteristic engineering, can considerably enhance the accuracy of warmth demand forecasts in DHS. This development not solely helps higher operational selections but additionally contributes to extra environment friendly and sustainable city heating options.

A District Heating System is a centralized technique of offering warmth to a number of buildings by way of a community of insulated pipes that ship scorching water or steam from a central supply. This technique is used to effectively warmth residential, business, and industrial buildings inside a selected district or space.

The parts of the DHS are DHS plant or central warmth supply, the distribution community with insulated pipes transporting the new water or steam from the central supply to the buildings. These pipes kind two closed loops, particularly major circulate (heating fluid provide and return from central plant to substations) and secondary circulate (heating fluid provide and return from substations to shoppers). Warmth is transferred from major to secondary circulate by warmth exchangers, situated at substations.

Discover under illustration of small DHS we’re working with, together with 5 substations (L4, L8, L12, L17 and L22). Determine additionally exhibits location of measurement of a very powerful knowledge in our datasets.

DHS provide and return strains, substations and sensor areas

Let’s begin with importing the required libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import seaborn as sns
import time
from tqdm import tqdm

All knowledge from 5 substations of the native DHS and meteorological knowledge is saved in GitHub folder.

github_folder = 'https://github.com/xai4heat/xai4heat/uncooked/important/datasets/'
information = ['xai4heat_scada_L4.csv',
'xai4heat_scada_L8.csv',
'xai4heat_scada_L12.csv',
'xai4heat_scada_L17.csv',
'xai4heat_scada_L22.csv']
weather_file='climate/ni_20_rs.csv'

Climate dataset is opened and a few clearly irrelevant knowledge is eliminated.

dfw = pd.read_csv(github_folder+weather_file)
dfw['datetime'] = pd.to_datetime(dfw['datetime'])
dfw.set_index('datetime',inplace=True)
dfw=dfw.drop(['name',
'precipprob',
'preciptype',
'icon',
'stations'], axis=1)

All substations knowledge is open, some preliminary pre-processing is finished, knowledge is merged with climate knowledge and ensuing datasets are saved in an inventory for later use. Preliminary pre-processing contains stripping knowledge out of hourly time sequence and low season knowledge. November 1st and April 1st are adopted as dates for a heating season begin and finish — usually season begins earlier and ends later, however these durations are characterised by irregularities, similar to testing the system, very unbalanced consumption and so forth., therefore they’re discarded.

Additionally, the numbers of time factors with zero transmitted vitality are displayed under. Such scenario correspond to knowledge transmission failures and it wants acceptable therapy.

def strip_out_season_data(df):
date_range_season1 = (df.index >= pd.to_datetime('2018-11-01 06:00:00')) & (df.index < pd.to_datetime('2019-04-01 23:00:00'))
date_range_season2 = (df.index >= pd.to_datetime('2019-11-01 06:00:00')) & (df.index < pd.to_datetime('2020-04-01 23:00:00'))
date_range_season3 = (df.index >= pd.to_datetime('2020-11-01 06:00:00')) & (df.index < pd.to_datetime('2021-04-01 23:00:00'))
date_range_season4 = (df.index >= pd.to_datetime('2021-11-01 06:00:00')) & (df.index < pd.to_datetime('2022-04-01 23:00:00'))
date_range_season5 = (df.index >= pd.to_datetime('2022-11-01 06:00:00')) & (df.index < pd.to_datetime('2023-04-01 23:00:00'))
date_range_season6 = (df.index >= pd.to_datetime('2023-11-01 06:00:00')) & (df.index < pd.to_datetime('2024-04-01 23:00:00'))
df = df[date_range_season1 | date_range_season2 | date_range_season3 | date_range_season4 | date_range_season5 | date_range_season6]
return dfall_data=[]
for i in information:
df = pd.read_csv(github_folder+i)
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index('datetime',inplace=True)
# For every sub, present knowledge acquisition durations
print(i)
print('Timeline (from/to): ', df.index.min(), df.index.max())
# Take away knowledge exterior of the heating season
df=strip_out_season_data(df)
# Strip all knowledge besides knowledge acquired at full hour
df = df[df.index.minute == 0]
#Insert lacking timepoints, populate with NaNs
complete_time_index = pd.date_range(begin=df.index.min(), finish=df.index.max(), freq='H')
df = df.reindex(complete_time_index)
#Present variety of enery zero knowledge - inaccurate readings at calorimeter
zero_count = (df['e'] == 0).sum()
print('Information transmission failures: ', str(zero_count)+'/'+str(len(df)))
#Merging with climate knowledge
df = pd.merge(df, dfw, left_index=True, right_index=True, how='internal')
all_data.append(df)

xai4heat_scada_L4.csv
Timeline (from/to): 2019–08–05 13:00:00 2024–04–04 11:52:00
Information transmission failures: 2200/38712
xai4heat_scada_L8.csv
Timeline (from/to): 2021–05–25 13:00:00 2024–04–04 11:52:00
Information transmission failures: 13/21168
xai4heat_scada_L12.csv
Timeline (from/to): 2021–05–25 13:00:00 2024–04–04 11:52:00
Information transmission failures: 16/21168
xai4heat_scada_L17.csv
Timeline (from/to): 2019–08–05 13:00:00 2024–04–04 11:52:00
Information transmission failures: 89/38712
xai4heat_scada_L22.csv
Timeline (from/to): 2021–05–25 13:00:00 2024–04–04 11:52:00
Information transmission failures: 64/21168

Let’s deal with lacking transmitted vitality knowledge. First, zero knowledge in e column is changed with NaNs, as calorimeter studying can’t be zero. Then, for all different columns, solely knowledge similar to the placement of zero knowledge in e column is changed with NaNs. Lastly, all NaNs are imputed through the use of easy linear interpolation which is taken into account a sufficiently good approximation for the hourly time sequence knowledge.

This strategy follows the idea that zero knowledge in calorimeter studying corresponds to the problem in transmission, affecting additionally readings of the fluid and ambient temperature sensors. Zero knowledge by the temperature sensors just isn’t sturdy indication of a transmission failure, particularly when contemplating the ambient temperature sensor which clearly could learn a zero temperature.

columns_to_update = ['t_amb', 't_ref', 't_sup_prim', 't_ret_prim', 't_sup_sec', 't_ret_sec']

for i, dfa in enumerate(all_data):
dfa['e'] = dfa['e'].change(0, np.nan)
for column in columns_to_update:
dfa.loc[dfa['e'].isna(), column] = np.nan
dfa.interpolate(technique='linear', inplace=True)
all_data[i]=dfa

For visible inspection, alerts similar to transmitted vitality and temperatures of provide and return circulate fluids in secondary line are plotted. Additionally, easy z-score technique is used to establish the outliers in fluid temperatures in a secondary circulate. All knowledge with z_score > 5 is taken into account an outlier and irregular knowledge, to be handled appropriately.

num_plots = len(all_data)
num_rows = math.ceil(num_plots / 2)

def find_outliers(sequence):
"""Calculate z-scores and establish outliers in a sequence."""
imply = sequence.imply()
std = sequence.std()
z_scores = (sequence - imply) / std
return sequence[np.abs(z_scores) > 5]
# Create subplots
fig, axs = plt.subplots(num_rows, 2, figsize=(25, 6*num_rows))axs = axs.flatten()# Loop by way of every DataFrame within the listing
for i, df in enumerate(all_data):
axs[i].plot(df['t_sup_sec'], label='t_sup_sec', linewidth=1, colour='blue')
axs[i].plot(df['t_ret_sec'], label='t_ret_sec', linewidth=1, colour='inexperienced')
axs[i].plot(df['t_amb'], label='t_amb', linewidth=1, colour='orange')    # Calculate and plot outliers for t_sup_sec
outliers_sup = find_outliers(df['t_sup_sec'])
axs[i].scatter(outliers_sup.index, outliers_sup, colour='blue', s=40, label='Outliers t_sup_sec', edgecolors='ok')    # Calculate and plot outliers for t_ret_sec
outliers_ret = find_outliers(df['t_ret_sec'])
axs[i].scatter(outliers_ret.index, outliers_ret, colour='inexperienced', s=40, label='Outliers t_ret_sec', edgecolors='ok')    # Calculate and plot outliers for t_amb
outliers_amb = find_outliers(df['t_amb'])
axs[i].scatter(outliers_amb.index, outliers_amb, colour='orange', s=40, label='Outliers t_amb', edgecolors='ok')
    axs_e = axs[i].twinx()
axs_e.plot(df.index, df['e'], colour='crimson', label='e', linewidth=2)
axs_e.set_ylabel('e')    axs[i].set_title(f'{information[i]} nFluid temp at secondary circulate')    strains, labels = axs[i].get_legend_handles_labels()
lines_e, labels_e = axs_e.get_legend_handles_labels()
axs[i].legend(strains + lines_e, labels + labels_e)    axs[i].tick_params(axis='x', rotation=90)
axs[i].grid(True)if num_plots % 2 != 0:
fig.delaxes(axs[num_plots])plt.tight_layout()
plt.present()

All recognized outliers are changed with linearly interpolated knowledge.

def replace_outliers(sequence):
"""Determine outliers utilizing z-scores and change them with NaN."""
imply = sequence.imply()
std = sequence.std()
z_scores = (sequence - imply) / std
# Substitute with NaN the place situation is met
sequence[np.abs(z_scores) > 5] = np.nan
return sequence

for i, df in enumerate(all_data):
# Substitute outliers with NaNs for every related column
df['t_sup_sec'] = replace_outliers(df['t_sup_sec'].copy())
df['t_ret_sec'] = replace_outliers(df['t_ret_sec'].copy())
df['t_amb'] = replace_outliers(df['t_amb'].copy())    # Interpolate to fill NaNs
df['t_sup_sec'].interpolate(inplace=True)
df['t_ret_sec'].interpolate(inplace=True)
df['t_amb'].interpolate(inplace=True)    all_data[i]=df

Primary date options and transmitted vitality in a previous hour are launched. Additionally, meteo knowledge dimensionality is decreased.

dropcolumns=['solarenergy',
'uvindex',
'severerisk',
'visibility',
'cloudcover',
'snow',
'dew',
'conditions',
'e',
'pe']

for i, df in enumerate(all_data):
df['hour_of_day'] = df.index.hour
df['month'] = df.index.month
df['day_of_week'] = df.index.dayofweek
df['is_working_day'] = df['day_of_week'].apply(lambda x: 1 if x < 5 else 0)
deltae=(df['e']-df['e'].shift(1))*1000  df['heating_on'] = deltae.apply(lambda x: 1 if x != 0 else 0)
df['deltae']=deltae  df=df.drop(columns=dropcolumns, axis=1)
df=df.dropna()
all_data[i]=df

Based mostly on correlation evaluation, some options are eliminated. Among the options are eliminated due to a really low correlation with transmitted vitality (day_of_week, is_working_day, precip, windgust, windspeed, windir). Temperature measurement sign on the substation stage is changed with ambient temperature sign from the official meteorological station as a result of higher precision and resilience. Among the options are eliminated due to very excessive correlation with deltae, particularly (t_sup_prim, t_ret_prim, t_sup_sec, t_ret_sec), introducing the dangers of multicollinearity points (destabilization of the mannequin coefficients, make them delicate to small adjustments within the knowledge, and so forth.) and overfitting.

rmv= ['day_of_week', 'is_working_day', 'precip', 'windgust', 'windspeed', 'winddir', 't_amb', 't_sup_prim', 't_ret_prim', 't_sup_sec', 't_ret_sec', 't_ref', 'month']

for i, df in enumerate(all_data):
dfx=df.drop(rmv, axis=1)
all_data[i]=dfx

Since just lately, the Consideration and Transformer fashions began to achieve recognition as strategies for time sequence forecasting issues. Transformers are initially developed for pure language processing, however they’re tailored for time sequence forecasting issues. They deal with long-range dependencies properly, making them appropriate for advanced forecasting duties.

Transformer structure was initially proposed in [Vaswani et al, 2017]. The structure contains of variety of stacked encoders, adopted by the identical variety of stacked decoder items. Every of the encoders and decoders are similar in construction. Important parts of encoder are self-attention and Feed-Ahead Neural Community (FFNN) layers, whereas decoder contains of self-attention, encoder-decoder consideration and FFNN layers. See under the illustration of the Transformer mannequin structure.

Within the inference step, skilled Transformer mannequin primarily “interprets” the enter sequence to the output sequence, particularly a forecast. In time sequence downside, the enter sequence is X={x1,x2,..,xn} , the place every xi is a vector similar to all enter options knowledge in i knowledge level.

All enter vectors are entered into self-attention layer without delay. Self-attention permits the mannequin to deal with totally different elements of the enter sequence when encoding or decoding data, capturing dependencies and relationships between them. It’s a key part of transformer architectures and has been proven to carry out properly in varied pure language processing duties.

Self-attention course of outcomes with the eye rating vector which truly quantify these dependencies. For calculating consideration rating, question, key and worth matrices are computed primarily based on the enter sequence:

the place WQ, WK, WV are learnable weight matrices. The eye rating is then computed as follows:

the place dk is the dimension of key vectors. Lastly, the output sequence is calculated because the weighted sum (the place weights are literally consideration scores) of the worth vectors V , particularly:

Multi-head consideration is the idea that facilitates separation of deal with the totally different elements of the enter sequence vectors, by trying (vertically) on the teams of options. Such an strategy is thought for improved illustration studying (higher capturing various patterns and dependencies), higher interpretability (consideration weights computed by every head present insights on which teams of options are vital for the forecast) and different advantages.

On this strategy, consideration scores and a focus itself is calculated for every head, the place variety of options within the enter sequence have to be divisible by the variety of heads. General consideration is computed as concatenation of all the eye heads (the place m is the variety of heads) multiplied with weight matrix:

The positional encoding is a way utilized in transformer fashions to supply positional data to the enter embeddings, permitting the mannequin to grasp the order of phrases in a sequence. Positional encoding is predicated on the sine and cosine capabilities. Given an enter sequence of size n and a dimension of enter options knowledge d , the positional encoding PE(i,j) for the j -th characteristic at place i within the enter sequence, is computed as follows:

The positional encoding vector for every place i is then obtained by concatenating the values of P(i,j) throughout all dimensions j of the enter vector.

This positional encoding vector is then added element-wise to the corresponding enter options vector to acquire the ultimate enter illustration:

The tactic described above can also be referred to as absolute positional encoding. Within the subsequent analysis, relative positional encoding was proposed [Shaw et al, 2018] on the premise that pairwise positional encoding between two enter characteristic vectors is extra useful then their place.

Earlier than feeding the information to the neural community, the enter is normalized after the enter sequence is once more added to the multi-head consideration output, because it follows:

Decoder parts are following the same logic. Nonetheless, some particular elements are totally different.

First, whereas the encoded illustration of enter sequence is fed to the decoder a part of the structure, encoder additionally present key and worth matrices to the Encoder-Decoder consideration parts, enabling the decoder to deal with related elements of the enter sequences whereas producing the output. This layer works similar to multiheaded self-attention, besides that it creates its Q matrix from the layer under it, and takes the Okay and V matrix from the output of the encoder stack.

Second, the decoder operates in autoregressive method to generate the output sequence step-by-step. Right here, the self-attention layer is just allowed to attend solely to earlier positions within the output sequence. That is executed by masking future positions earlier than the softmax step within the self-attention calculation.

The output of the decoder is a sequence of hidden states, the place every hidden state represents the mannequin’s prediction for a specific time step within the forecast horizon. These hidden states are handed by way of a linear layer, which applies a linear transformation to map the hidden states to a higher-dimensional area. The output of the linear layer is then handed by way of a softmax layer, which normalizes the values throughout the forecast horizon to acquire a chance distribution over the attainable future values. The softmax operate ensures that the anticipated chances sum to 1, permitting the mannequin to output a chance distribution for every time step. The ultimate step includes deciding on the almost certainly worth for every time step primarily based on the anticipated chance distributions. This may be executed by taking the argmax of the softmax output at every time step, leading to a degree forecast for every future time step.

Implementation

A easy Transformer structure tailor-made for time sequence forecasting was used on this case, with implementation with the PyTorch library.

!pip set up pytorch-lightning

import torch
import torch.nn as nn
import pytorch_lightning as pl
from torch.utils.knowledge import DataLoader, TensorDataset, random_split
from sklearn.preprocessing import StandardScaler
from torchmetrics import MeanAbsoluteError, MeanSquaredError

Mannequin is skilled through the use of knowledge from L12 substation.

knowledge=all_data[3]

We’re including previous 24 values of transmitted vitality as options. Since warmth demand displays very sturdy every day seasonability and comparable every day consumption patterns, we need to emphasize that to the mannequin by introducing previous 24 hours transmitted vitality knowledge within the enter knowledge vector. Provided that the dimensions of enter vector is 24, this virtually implies that we’re utilizing knowledge from previous 48 hours to forecast the vitality demand in subsequent 24 hours.

num_lags = 24

# Create lagged options for deltae
for lag in vary(1, num_lags + 1):
knowledge[f'deltae_lag_{lag}'] = knowledge['deltae'].shift(lag)# Take away rows with NaN values that consequence from shifting
knowledge = knowledge.dropna()

We’re including climate forecast, particularly temperature forecasts in subsequent 24 hours. As we’re utilizing precise meteo station readings, future 24 hours of the readings are launched to the enter knowledge vector. Clearly, that’s not the forecast. Underneath the idea of correct forecasts, that’s a superb various.

# Create lagged options for deltae
for lag in vary(1, 25):
knowledge[f'temp_next_{lag}'] = knowledge['temp'].shift(-lag)

# Take away rows with NaN values that consequence from shifting
knowledge = knowledge.dropna()

The dataset is preprocessed, normalized, and transformed it into PyTorch tensors. PyTorch tensors are the elemental knowledge constructions utilized in PyTorch, a well-liked open-source machine studying library. PyTorch tensors can retailer multi-dimensional knowledge with totally different knowledge sorts supported. PyTorch contains a big selection of operations for tensor manipulation, together with mathematical operations, slicing, reshaping, and extra. These operations are optimized for efficiency and are integral to creating environment friendly machine studying fashions. PyTorch tensors are interoperable with different knowledge constructions, similar to Numpy. They are often moved between totally different units, similar to CPUs and GPUs, to speed up computations.

Information loaders are additionally created. In PyTorch, knowledge loaders are a vital part for effectively dealing with and iterating over datasets. The first objective of an information loader is to load knowledge in batches, which is essential for coaching fashions on giant datasets that don’t match into reminiscence suddenly.

train_size=int(0.8*len(knowledge))
prepare = knowledge[:train_size]
take a look at = knowledge[train_size:]

# Deciding on options and goal
options = prepare.drop(columns=['deltae'])
goal = prepare['deltae']# Normalize options
scaler = StandardScaler()
features_scaled = scaler.fit_transform(options)# Convert to PyTorch tensors
features_tensor = torch.tensor(features_scaled, dtype=torch.float32)
target_tensor = torch.tensor(goal.values, dtype=torch.float32).view(-1, 1)# Enter sequence size and future prediction size
N = 24  # Variety of time steps in every enter sequence
M = 24  # Variety of time steps to foretell# Put together sequences
sequences = [features_tensor[i:i+N] for i in vary(len(features_tensor) - N - M + 1)]
targets = [target_tensor[i+N:i+N+M] for i in vary(len(target_tensor) - N - M + 1)]# Convert to tensors and create datasets
sequences_tensor = torch.stack(sequences)
targets_tensor = torch.stack(targets).view(len(targets), M)  # Guarantee targets are correctly formed# DataLoader setup
dataset = TensorDataset(sequences_tensor, targets_tensor)
train_size = int(0.8 * len(dataset))
train_dataset, val_dataset = random_split(dataset, [train_size, len(dataset) - train_size])
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=False)
val_loader = DataLoader(val_dataset, batch_size=32)
# Check set dealing with (comparable strategy)
test_features = take a look at.drop(columns=['deltae'])
test_target = take a look at['deltae']
test_features_scaled = scaler.rework(test_features)  # Use the identical scaler as for the coaching knowledge
test_features_tensor = torch.tensor(test_features_scaled, dtype=torch.float32)
test_target_tensor = torch.tensor(test_target.values, dtype=torch.float32).view(-1, 1)# Put together sequences for testing
test_sequences = [test_features_tensor[i:i+N] for i in vary(len(test_features_tensor) - N - M + 1)]
test_targets = [test_target_tensor[i+N:i+N+M] for i in vary(len(test_target_tensor) - N - M + 1)]# Convert to tensors and create take a look at dataset
test_sequences_tensor = torch.stack(test_sequences)
test_targets_tensor = torch.stack(test_targets).view(len(test_targets), M)  # Guarantee targets are correctly formedtest_dataset = TensorDataset(test_sequences_tensor, test_targets_tensor)
test_loader = DataLoader(test_dataset, batch_size=32)

The TimeSeriesTransformer class is a customized neural community mannequin constructed utilizing PyTorch Lightning and designed for time sequence forecasting duties. It makes use of the Transformer structure, which is thought for dealing with sequences and capturing dependencies throughout time.

Parameters:

input_dim: The variety of options within the enter knowledge. Within the context of Transformers, this additionally corresponds to the d_model which is the dimensions of the anticipated enter embeddings.
num_classes: The dimensions of the output layer, which determines what number of values the mannequin predicts at every time step.
dim_feedforward: The dimensionality of the feedforward community mannequin within the transformer layers.
nhead: The variety of heads within the multi-head consideration fashions.
num_layers: The variety of sub-encoder and sub-decoder layers within the transformer.
dropout: The dropout charge, a regularization parameter to stop overfitting by randomly setting a fraction of the enter items to 0 at every replace throughout coaching.

Elements:

self.transformer: The Transformer mannequin from PyTorch’s neural community library. It’s configured with each encoder and decoder layers having the identical variety of layers, head quantity, and so forth.
self.linear_out: A linear layer that maps the Transformer’s output to the specified output dimension (num_classes). This layer acts as the ultimate prediction layer.
self.val_mae and self.val_rmse: Metrics from torchmetrics to compute the imply absolute error and the foundation imply sq. error, respectively. These are used to guage the mannequin throughout validation.

class PlotLossCallback(pl.Callback):
def __init__(self):
tremendous().__init__()
self.losses = []

    def on_train_batch_end(self, coach, pl_module, outputs, batch, batch_idx):
# Append the loss from the present batch to the listing of losses
self.losses.append(outputs['loss'].merchandise())    def on_train_end(self, coach, pl_module):
# Plot the coaching loss curve
plt.plot(np.arange(len(self.losses)), self.losses, label='Coaching Loss')
plt.xlabel('Batch')
plt.ylabel('Loss')
plt.title('Coaching Loss Curve')
plt.legend()
plt.present()class TimeSeriesTransformer(pl.LightningModule):
def __init__(self, input_dim, num_outputs=24, dim_feedforward=512, nhead=4, num_layers=3, dropout=0.2):
tremendous().__init__()
self.transformer = nn.Transformer(
d_model=input_dim,
nhead=nhead,
num_encoder_layers=num_layers,
num_decoder_layers=num_layers,
dim_feedforward=dim_feedforward,
dropout=dropout
)
# Assuming every time step outputs one characteristic
self.linear_out = nn.Linear(input_dim, 1)
self.val_mae = MeanAbsoluteError()
self.val_rmse = MeanSquaredError(squared=False)    def ahead(self, src):
src = src.permute(1, 0, 2)  # [sequence_length, batch_size, features]
output = self.transformer(src, src)
output = self.linear_out(output)
output = output.permute(1, 0, 2)  # [batch_size, sequence_length, num_outputs]
return output    def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.ahead(x)
# Guarantee y is [batch_size, sequence_length, num_outputs]
y = y.view(y_hat.form)  # Reshape y to match y_hat
loss = nn.MSELoss()(y_hat, y)
self.log('train_loss', loss)
return loss    def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self.ahead(x)
y = y.view(y_hat.form)
loss = nn.MSELoss()(y_hat, y)
self.log('val_loss', loss, on_epoch=True)
self.val_mae(y_hat.flatten(), y.flatten())
self.val_rmse(y_hat.flatten(), y.flatten())
return {"val_loss": loss}    def on_validation_epoch_end(self):
self.log('val_mae', self.val_mae.compute(), prog_bar=True)
self.log('val_rmse', self.val_rmse.compute(), prog_bar=True)
self.val_mae.reset()
self.val_rmse.reset()    def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.001)

Variety of heads (nhead) have to be a divisor of enter options depend. Mannequin is initialized with supplied parameters. The collection of parameters is made after a lot of experiments.

# Initialize the mannequin
mannequin = TimeSeriesTransformer(input_dim=features_tensor.form[1],
num_outputs=24,
nhead=7, # with climate forecast
#nhead=4, # with out climate forecast
dim_feedforward=512,  # Measurement of the feedforward community
num_layers=3,  # Variety of layers within the encoder and decoder
dropout=0.1)  # Dropout charge

# Create a PyTorch Lightning coach and match the mannequin
coach = pl.Coach(max_epochs=5, accelerator='gpu', callbacks=[PlotLossCallback()], units=1 if torch.cuda.is_available() else None)
#coach = pl.Coach(max_epochs=10, accelerator='cpu', callbacks=[PlotLossCallback()])
coach.match(mannequin, train_loader, val_loader)

Lastly, the mannequin is evaluated.

def evaluate_model(mannequin, test_loader):
mannequin.eval()
mae_metric = MeanAbsoluteError()
rmse_metric = MeanSquaredError(squared=False)
with torch.no_grad():
for x, y in test_loader:
# Guarantee x has the anticipated form; assuming it is already appropriate, no have to unsqueeze
y_hat = mannequin(x)  # Ahead cross

            # Reshape y_hat and y if needed, guarantee these tensors are suitable
# Flatten the sequence and batch dimensions to deal with all predictions equally
y_hat_flat = y_hat.reshape(-1)  # Flatten all batches and sequences
y_flat = y.reshape(-1)  # Flatten all batches and sequences            # Replace metrics
mae_metric.replace(y_hat_flat, y_flat)
rmse_metric.replace(y_hat_flat, y_flat)    # Compute ultimate metric values
mae = mae_metric.compute()
rmse = rmse_metric.compute()
return mae.merchandise(), rmse.merchandise()# Assuming 'mannequin' is your skilled mannequin occasion and 'test_loader' is about up
mae, rmse = evaluate_model(mannequin, test_loader)
print(f"Imply Absolute Error on Check Set: {mae:.4f}")
print(f"Root Imply Sq. Error on Check Set: {rmse:.4f}")

Imply Absolute Error on Check Set: 21.0991
Root Imply Sq. Error on Check Set: 57.6080

Ensuing MAE is definitely a median of forecasting accuracies at totally different occasions in output vector of 24 hours. Beneath bar half exhibits precise accuracies at every of the totally different occasions. As anticipated, this accuracy degrades for the later time factors, nevertheless, not as a lot as anticipated.

hours = vary(24)

# Plot the bar chart
plt.determine(figsize=(15, 8))  # Modify the width and peak as wanted
bars = plt.bar(hours, maes, colour='skyblue')
plt.xlabel('Hour')
plt.ylabel('MAE')
plt.title('Imply Absolute Error (MAE) for Every Hour')
plt.xticks(hours)  # Guarantee all hours are displayed on the x-axis
plt.grid(axis='y')  # Add grid strains alongside the y-axis# Annotate every bar with its corresponding MAE worth
for bar, mae in zip(bars, maes):
plt.textual content(bar.get_x() + bar.get_width() / 2,  # x-coordinate of textual content
bar.get_height()+0.1,                  # y-coordinate of textual content
f'{mae:.2f}',                      # Textual content to show (formatted MAE worth)
ha='middle', va='backside',          # Textual content alignment
colour='black',                     # Textual content colour
fontsize=8)plt.present()

Normal deviation of all accuracies is displayed.

std = np.std(maes)
print("Normal Deviation:", std)

Normal Deviation: 0.22962380641058933

We’re exhibiting the visualizations of the alignment of precise values and forecasts on the totally different place in output sequence, on the information pattern so to make the variations extra distinguished.

timepoint_index=0
hours_forecasts=[]
hours_actuals=[]
numhours=24

# Iterate over the take a look at loader to get forecasts and precise values for the chosen timepoint
with torch.no_grad():
for tp in vary(0,numhours):
forecasts = []
actuals = []
for inputs, targets in test_loader:
# Make predictions utilizing the mannequin
predictions = mannequin(inputs)          # Extract forecasts and precise values for the chosen timepoint from every pattern within the batch
for i in vary(len(inputs)):
# Extract forecasts and precise values for the desired timepoint from every pattern
forecast = predictions[i, tp].merchandise()
precise = targets[i, tp].merchandise()            # Append the forecasts and precise values to the respective lists
forecasts.append(forecast)
actuals.append(precise)
hours_forecasts.append(forecasts)
hours_actuals.append(actuals)

begin = 6000
finish = 6300
num_hours = len(hours_forecasts)
maes=[]
# Create a determine with a number of subplots
num_rows = (num_hours + 1) // 2  # Variety of rows for subplots
fig, axes = plt.subplots(num_rows, 2, figsize=(20, num_rows*4))

# Iterate over the forecasts and actuals
for i in vary(num_hours):
actuals_array = np.array(hours_actuals[i])
forecasts_array = np.array(hours_forecasts[i])
mae = np.imply(np.abs(actuals_array - forecasts_array))
maes.append(mae)
row_index = i // 2  # Calculate the row index for the subplot
col_index = i % 2   # Calculate the column index for the subplot    # Plot the forecasts and precise values within the subplot
ax = axes[row_index, col_index]
hours = np.arange(len(hours_forecasts[i]))
ax.plot(hours[start:end], hours_forecasts[i][start:end], label='Forecasts')
ax.plot(hours[start:end], hours_actuals[i][start:end], label='Actuals')
ax.set_xlabel('Hour')
ax.set_ylabel('deltae')
ax.set_title(f'Forecasts vs Actuals at hour {i}, MAE: {mae}')
ax.legend()
ax.grid(True)# Modify structure
plt.tight_layout()# Present the plot
plt.present()

Checkout Jupyter pocket book with the above code at XAI4HEAT repo.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Consideration Is All You Want. arXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762

Shaw, P., Uszkoreit, J., & Vaswani, A. (2018). Self-Consideration with Relative Place Representations. https://doi.org/10.48550/ARXIV.1803.02155

This analysis was supported by the Science Fund of the Republic of Serbia, Grant №23-SSF-PRISMA-206, Explainable AI-assisted operations in district heating techniques — XAI4HEAT.

Source link

Using Transformer Models for Sequence Forecasting of Heat Demand in District Heating Systems | by XAI4HEAT | May, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Teradata AI Unlimited in Microsoft Fabric is Now Available for Public Preview through Microsoft Fabric Workload Hub

Cognigy Unveils Agentic AI: Transforming the Future of Enterprise Contact Centers

Our Picks

Understanding KAN: The Latest Alternative to MLP

My First Blog on SDVs and Machine Learning: A Newcomer’s Guide to Automotive Technology ~Buckle Up for a Humorous Ride! | by Itisrv | Jun, 2024

Image Data-Augmentation. Applied GenAI | by Sarvesh Khetan | Jun, 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Using Transformer Models for Sequence Forecasting of Heat Demand in District Heating Systems | by XAI4HEAT | May, 2024

Implementation

Related Posts