Using Transformer Models for Sequence Forecasting of Heat Demand in District Heating Systems | by XAI4HEAT | May, 2024

District Heating Applications (DHS) are an important part of metropolis infrastructure, providing heat to homes and corporations. One in every of many key challenges in managing these methods is predicting the hourly heat demand exactly. This kind of forecast helps operators fine-tune the system’s effectivity, decrease costs, lower environmental impacts, and protect prospects snug.

In our evaluation, we explored how cutting-edge Deep Learning (DL) strategies may be utilized to make these predictions additional actual. This stage of ingredient helps operators make educated choices on day by day operations. For instance, they will regulate manufacturing ranges based totally on anticipated demand, deal with secondary flow into at substations, and schedule maintenance all through low-demand durations, like in a single day. Notably, we centered on forecasting heat demand over the next 24 hours via the usage of Transfomer construction, well-known for its use in in all probability essentially the most potent Large Language Fashions right now.

We started with the hypothesis that modern DL fashions, significantly these designed for sequence forecasting and combined with personalized attribute engineering (along with local weather forecast information), can ship extraordinarily appropriate outcomes. Our objective was to realize accuracy equivalent to single-step forecasting whereas sustaining a gradual effectivity all through all of the forecast sequence.

Adjust to XAI4HEAT on LinkedIn for additional tales like this.

To test this, we used information from an space DHS substation, which included 38,710 time elements over 5 heating seasons from 2019 to 2024. We employed personalized attribute engineering strategies and a state-of-the-art Transformer algorithm tailored for time sequence forecasting.

Our outcomes have been promising. With out incorporating local weather information, our model achieved a Indicate Absolute Error (MAE) of 28.15 KWh, which is more healthy than the benchmark single-step model using a stacked Prolonged-Transient Time interval Memory (LSTM) group (MAE of 28.73 KWh). As soon as we included local weather forecast choices, the model’s effectivity improved significantly, with an MAE of 21.09 KWh. Furthermore, the model maintained a continuing effectivity all through the forecasted hours, with a typical deviation of MAE of merely 0.23 KWh. We used a vanilla Transformer model tailor-made for time sequence points, expert over 5 epochs.

In summary, our analysis reveals that modern DL fashions, when combined with thoughtful attribute engineering, can significantly improve the accuracy of heat demand forecasts in DHS. This growth not solely helps larger operational choices however moreover contributes to additional setting pleasant and sustainable metropolis heating choices.

A District Heating System is a centralized strategy of providing heat to quite a lot of buildings by the use of a group of insulated pipes that ship scorching water or steam from a central provide. This method is used to successfully heat residential, enterprise, and industrial buildings inside a specific district or house.

The elements of the DHS are DHS plant or central heat provide, the distribution group with insulated pipes transporting the brand new water or steam from the central provide to the buildings. These pipes variety two closed loops, significantly main flow into (heating fluid present and return from central plant to substations) and secondary flow into (heating fluid present and return from substations to consumers). Heat is transferred from main to secondary flow into by heat exchangers, located at substations.

Uncover below illustration of small DHS we’re working with, along with 5 substations (L4, L8, L12, L17 and L22). Decide moreover reveals location of measurement of a really highly effective information in our datasets.

DHS present and return strains, substations and sensor areas

Let’s start with importing the required libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import seaborn as sns
import time
from tqdm import tqdm

All information from 5 substations of the native DHS and meteorological information is saved in GitHub folder.

github_folder = 'https://github.com/xai4heat/xai4heat/raw/essential/datasets/'
info = ['xai4heat_scada_L4.csv',
'xai4heat_scada_L8.csv',
'xai4heat_scada_L12.csv',
'xai4heat_scada_L17.csv',
'xai4heat_scada_L22.csv']
weather_file='local weather/ni_20_rs.csv'

Local weather dataset is opened and some clearly irrelevant information is eradicated.

dfw = pd.read_csv(github_folder+weather_file)
dfw['datetime'] = pd.to_datetime(dfw['datetime'])
dfw.set_index('datetime',inplace=True)
dfw=dfw.drop(['name',
'precipprob',
'preciptype',
'icon',
'stations'], axis=1)

All substations information is open, some preliminary pre-processing is completed, information is merged with local weather information and ensuing datasets are saved in a listing for later use. Preliminary pre-processing comprises stripping information out of hourly time sequence and off-season information. November 1st and April 1st are adopted as dates for a heating season start and end — normally season begins earlier and ends later, nonetheless these durations are characterised by irregularities, much like testing the system, very unbalanced consumption and so forth., subsequently they’re discarded.

Moreover, the numbers of time elements with zero transmitted vitality are displayed below. Such state of affairs correspond to information transmission failures and it needs acceptable remedy.

def strip_out_season_data(df):
date_range_season1 = (df.index >= pd.to_datetime('2018-11-01 06:00:00')) & (df.index < pd.to_datetime('2019-04-01 23:00:00'))
date_range_season2 = (df.index >= pd.to_datetime('2019-11-01 06:00:00')) & (df.index < pd.to_datetime('2020-04-01 23:00:00'))
date_range_season3 = (df.index >= pd.to_datetime('2020-11-01 06:00:00')) & (df.index < pd.to_datetime('2021-04-01 23:00:00'))
date_range_season4 = (df.index >= pd.to_datetime('2021-11-01 06:00:00')) & (df.index < pd.to_datetime('2022-04-01 23:00:00'))
date_range_season5 = (df.index >= pd.to_datetime('2022-11-01 06:00:00')) & (df.index < pd.to_datetime('2023-04-01 23:00:00'))
date_range_season6 = (df.index >= pd.to_datetime('2023-11-01 06:00:00')) & (df.index < pd.to_datetime('2024-04-01 23:00:00'))
df = df[date_range_season1 | date_range_season2 | date_range_season3 | date_range_season4 | date_range_season5 | date_range_season6]
return dfall_data=[]
for i in info:
df = pd.read_csv(github_folder+i)
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index('datetime',inplace=True)
# For each sub, current information acquisition durations
print(i)
print('Timeline (from/to): ', df.index.min(), df.index.max())
# Take away information exterior of the heating season
df=strip_out_season_data(df)
# Strip all information apart from information acquired at full hour
df = df[df.index.minute == 0]
#Insert missing timepoints, populate with NaNs
complete_time_index = pd.date_range(start=df.index.min(), end=df.index.max(), freq='H')
df = df.reindex(complete_time_index)
#Current number of enery zero information - inaccurate readings at calorimeter
zero_count = (df['e'] == 0).sum()
print('Data transmission failures: ', str(zero_count)+'/'+str(len(df)))
#Merging with local weather information
df = pd.merge(df, dfw, left_index=True, right_index=True, how='inside')
all_data.append(df)

xai4heat_scada_L4.csv
Timeline (from/to): 2019–08–05 13:00:00 2024–04–04 11:52:00
Data transmission failures: 2200/38712
xai4heat_scada_L8.csv
Timeline (from/to): 2021–05–25 13:00:00 2024–04–04 11:52:00
Data transmission failures: 13/21168
xai4heat_scada_L12.csv
Timeline (from/to): 2021–05–25 13:00:00 2024–04–04 11:52:00
Data transmission failures: 16/21168
xai4heat_scada_L17.csv
Timeline (from/to): 2019–08–05 13:00:00 2024–04–04 11:52:00
Data transmission failures: 89/38712
xai4heat_scada_L22.csv
Timeline (from/to): 2021–05–25 13:00:00 2024–04–04 11:52:00
Data transmission failures: 64/21168

Let’s cope with missing transmitted vitality information. First, zero information in e column is modified with NaNs, as calorimeter learning cannot be zero. Then, for all completely different columns, solely information much like the position of zero information in e column is modified with NaNs. Lastly, all NaNs are imputed via the usage of straightforward linear interpolation which is taken into consideration a sufficiently good approximation for the hourly time sequence information.

This technique follows the concept zero information in calorimeter learning corresponds to the issue in transmission, affecting moreover readings of the fluid and ambient temperature sensors. Zero information by the temperature sensors simply is not sturdy indication of a transmission failure, significantly when considering the ambient temperature sensor which clearly might study a zero temperature.

columns_to_update = ['t_amb', 't_ref', 't_sup_prim', 't_ret_prim', 't_sup_sec', 't_ret_sec']

for i, dfa in enumerate(all_data):
dfa['e'] = dfa['e'].change(0, np.nan)
for column in columns_to_update:
dfa.loc[dfa['e'].isna(), column] = np.nan
dfa.interpolate(method='linear', inplace=True)
all_data[i]=dfa

For seen inspection, alerts much like transmitted vitality and temperatures of present and return flow into fluids in secondary line are plotted. Moreover, straightforward z-score method is used to ascertain the outliers in fluid temperatures in a secondary flow into. All information with z_score > 5 is taken into consideration an outlier and irregular information, to be dealt with appropriately.

num_plots = len(all_data)
num_rows = math.ceil(num_plots / 2)

def find_outliers(sequence):
"""Calculate z-scores and set up outliers in a sequence."""
suggest = sequence.suggest()
std = sequence.std()
z_scores = (sequence - suggest) / std
return sequence[np.abs(z_scores) > 5]
# Create subplots
fig, axs = plt.subplots(num_rows, 2, figsize=(25, 6*num_rows))axs = axs.flatten()# Loop by the use of each DataFrame throughout the itemizing
for i, df in enumerate(all_data):
axs[i].plot(df['t_sup_sec'], label='t_sup_sec', linewidth=1, color='blue')
axs[i].plot(df['t_ret_sec'], label='t_ret_sec', linewidth=1, color='inexperienced')
axs[i].plot(df['t_amb'], label='t_amb', linewidth=1, color='orange')    # Calculate and plot outliers for t_sup_sec
outliers_sup = find_outliers(df['t_sup_sec'])
axs[i].scatter(outliers_sup.index, outliers_sup, color='blue', s=40, label='Outliers t_sup_sec', edgecolors='okay')    # Calculate and plot outliers for t_ret_sec
outliers_ret = find_outliers(df['t_ret_sec'])
axs[i].scatter(outliers_ret.index, outliers_ret, color='inexperienced', s=40, label='Outliers t_ret_sec', edgecolors='okay')    # Calculate and plot outliers for t_amb
outliers_amb = find_outliers(df['t_amb'])
axs[i].scatter(outliers_amb.index, outliers_amb, color='orange', s=40, label='Outliers t_amb', edgecolors='okay')
    axs_e = axs[i].twinx()
axs_e.plot(df.index, df['e'], color='crimson', label='e', linewidth=2)
axs_e.set_ylabel('e')    axs[i].set_title(f'{info[i]} nFluid temp at secondary flow into')    strains, labels = axs[i].get_legend_handles_labels()
lines_e, labels_e = axs_e.get_legend_handles_labels()
axs[i].legend(strains + lines_e, labels + labels_e)    axs[i].tick_params(axis='x', rotation=90)
axs[i].grid(True)if num_plots % 2 != 0:
fig.delaxes(axs[num_plots])plt.tight_layout()
plt.current()

All acknowledged outliers are modified with linearly interpolated information.

def replace_outliers(sequence):
"""Decide outliers using z-scores and alter them with NaN."""
suggest = sequence.suggest()
std = sequence.std()
z_scores = (sequence - suggest) / std
# Substitute with NaN the place scenario is met
sequence[np.abs(z_scores) > 5] = np.nan
return sequence

for i, df in enumerate(all_data):
# Substitute outliers with NaNs for each associated column
df['t_sup_sec'] = replace_outliers(df['t_sup_sec'].copy())
df['t_ret_sec'] = replace_outliers(df['t_ret_sec'].copy())
df['t_amb'] = replace_outliers(df['t_amb'].copy())    # Interpolate to fill NaNs
df['t_sup_sec'].interpolate(inplace=True)
df['t_ret_sec'].interpolate(inplace=True)
df['t_amb'].interpolate(inplace=True)    all_data[i]=df

Main date choices and transmitted vitality in a earlier hour are launched. Moreover, meteo information dimensionality is decreased.

dropcolumns=['solarenergy',
'uvindex',
'severerisk',
'visibility',
'cloudcover',
'snow',
'dew',
'conditions',
'e',
'pe']

for i, df in enumerate(all_data):
df['hour_of_day'] = df.index.hour
df['month'] = df.index.month
df['day_of_week'] = df.index.dayofweek
df['is_working_day'] = df['day_of_week'].apply(lambda x: 1 if x < 5 else 0)
deltae=(df['e']-df['e'].shift(1))*1000  df['heating_on'] = deltae.apply(lambda x: 1 if x != 0 else 0)
df['deltae']=deltae  df=df.drop(columns=dropcolumns, axis=1)
df=df.dropna()
all_data[i]=df

Primarily based totally on correlation analysis, some choices are eradicated. Among the many choices are eradicated resulting from a very low correlation with transmitted vitality (day_of_week, is_working_day, precip, windgust, windspeed, windir). Temperature measurement signal on the substation stage is modified with ambient temperature signal from the official meteorological station because of larger precision and resilience. Among the many choices are eradicated resulting from very extreme correlation with deltae, significantly (t_sup_prim, t_ret_prim, t_sup_sec, t_ret_sec), introducing the risks of multicollinearity factors (destabilization of the model coefficients, make them delicate to small changes throughout the information, and so forth.) and overfitting.

rmv= ['day_of_week', 'is_working_day', 'precip', 'windgust', 'windspeed', 'winddir', 't_amb', 't_sup_prim', 't_ret_prim', 't_sup_sec', 't_ret_sec', 't_ref', 'month']

for i, df in enumerate(all_data):
dfx=df.drop(rmv, axis=1)
all_data[i]=dfx

Since simply currently, the Consideration and Transformer fashions started to realize recognition as methods for time sequence forecasting points. Transformers are initially developed for pure language processing, nonetheless they’re tailor-made for time sequence forecasting points. They cope with long-range dependencies correctly, making them acceptable for superior forecasting duties.

Transformer construction was initially proposed in [Vaswani et al, 2017]. The construction comprises of number of stacked encoders, adopted by the an identical number of stacked decoder gadgets. Each of the encoders and decoders are comparable in development. Necessary elements of encoder are self-attention and Feed-Forward Neural Neighborhood (FFNN) layers, whereas decoder comprises of self-attention, encoder-decoder consideration and FFNN layers. See below the illustration of the Transformer model construction.

Simplified Transformer construction view

Throughout the inference step, expert Transformer model primarily “interprets” the enter sequence to the output sequence, significantly a forecast. In time sequence draw back, the enter sequence is X={x1,x2,..,xn} , the place each xi is a vector much like all enter choices information in i information degree.

All enter vectors are entered into self-attention layer immediately. Self-attention permits the model to cope with completely completely different parts of the enter sequence when encoding or decoding information, capturing dependencies and relationships between them. It is a key a part of transformer architectures and has been confirmed to hold out correctly in assorted pure language processing duties.

Self-attention course of outcomes with the attention score vector which really quantify these dependencies. For calculating consideration score, query, key and price matrices are computed based totally on the enter sequence:

the place WQ, WK, WV are learnable weight matrices. The attention score is then computed as follows:

the place dk is the dimension of key vectors. Lastly, the output sequence is calculated as a result of the weighted sum (the place weights are actually consideration scores) of the price vectors V , significantly:

Multi-head consideration is the concept facilitates separation of cope with the completely completely different parts of the enter sequence vectors, by making an attempt (vertically) on the groups of choices. Such an technique is assumed for improved illustration learning (larger capturing varied patterns and dependencies), larger interpretability (consideration weights computed by each head current insights on which groups of choices are important for the forecast) and completely different benefits.

On this technique, consideration scores and a spotlight itself is calculated for each head, the place number of choices throughout the enter sequence must be divisible by the number of heads. Basic consideration is computed as concatenation of all the attention heads (the place m is the number of heads) multiplied with weight matrix:

The positional encoding is a approach utilized in transformer fashions to provide positional information to the enter embeddings, allowing the model to know the order of phrases in a sequence. Positional encoding is based on the sine and cosine capabilities. Given an enter sequence of dimension n and a dimension of enter choices information d , the positional encoding PE(i,j) for the j -th attribute at place i throughout the enter sequence, is computed as follows:

The positional encoding vector for each place i is then obtained by concatenating the values of P(i,j) all through all dimensions j of the enter vector.

This positional encoding vector is then added element-wise to the corresponding enter choices vector to amass the final word enter illustration:

The tactic described above can be known as absolute positional encoding. Throughout the subsequent evaluation, relative positional encoding was proposed [Shaw et al, 2018] on the premise that pairwise positional encoding between two enter attribute vectors is additional helpful then their place.

Sooner than feeding the data to the neural group, the enter is normalized after the enter sequence is as soon as extra added to the multi-head consideration output, as a result of it follows:

Decoder elements are following the identical logic. Nonetheless, some explicit parts are completely completely different.

First, whereas the encoded illustration of enter sequence is fed to the decoder part of the construction, encoder moreover current key and price matrices to the Encoder-Decoder consideration elements, enabling the decoder to cope with associated parts of the enter sequences whereas producing the output. This layer works much like multiheaded self-attention, apart from that it creates its Q matrix from the layer below it, and takes the Okay and V matrix from the output of the encoder stack.

Second, the decoder operates in autoregressive technique to generate the output sequence step-by-step. Proper right here, the self-attention layer is simply allowed to attend solely to earlier positions throughout the output sequence. That’s executed by masking future positions sooner than the softmax step throughout the self-attention calculation.

The output of the decoder is a sequence of hidden states, the place each hidden state represents the model’s prediction for a selected time step throughout the forecast horizon. These hidden states are handed by the use of a linear layer, which applies a linear transformation to map the hidden states to a higher-dimensional space. The output of the linear layer is then handed by the use of a softmax layer, which normalizes the values all through the forecast horizon to amass an opportunity distribution over the attainable future values. The softmax function ensures that the anticipated possibilities sum to 1, allowing the model to output an opportunity distribution for each time step. The last word step contains deciding on the virtually actually price for each time step based totally on the anticipated probability distributions. This can be executed by taking the argmax of the softmax output at each time step, resulting in a level forecast for each future time step.

Implementation

A straightforward Transformer construction tailored for time sequence forecasting was used on this case, with implementation with the PyTorch library.

!pip arrange pytorch-lightning

import torch
import torch.nn as nn
import pytorch_lightning as pl
from torch.utils.information import DataLoader, TensorDataset, random_split
from sklearn.preprocessing import StandardScaler
from torchmetrics import MeanAbsoluteError, MeanSquaredError

Model is expert via the usage of information from L12 substation.

information=all_data[3]

We’re together with earlier 24 values of transmitted vitality as choices. Since heat demand shows very sturdy day by day seasonability and comparable day by day consumption patterns, we have to emphasize that to the model by introducing earlier 24 hours transmitted vitality information throughout the enter information vector. Supplied that the size of enter vector is 24, this nearly implies that we’re using information from earlier 48 hours to forecast the vitality demand in subsequent 24 hours.

num_lags = 24

# Create lagged choices for deltae
for lag in range(1, num_lags + 1):
information[f'deltae_lag_{lag}'] = information['deltae'].shift(lag)# Take away rows with NaN values that consequence from shifting
information = information.dropna()

We’re together with local weather forecast, significantly temperature forecasts in subsequent 24 hours. As we’re using exact meteo station readings, future 24 hours of the readings are launched to the enter information vector. Clearly, that’s not the forecast. Beneath the thought of appropriate forecasts, that’s an outstanding varied.

# Create lagged choices for deltae
for lag in range(1, 25):
information[f'temp_next_{lag}'] = information['temp'].shift(-lag)

# Take away rows with NaN values that consequence from shifting
information = information.dropna()

The dataset is preprocessed, normalized, and remodeled it into PyTorch tensors. PyTorch tensors are the fundamental information constructions utilized in PyTorch, a popular open-source machine learning library. PyTorch tensors can retailer multi-dimensional information with completely completely different information types supported. PyTorch comprises a wide array of operations for tensor manipulation, along with mathematical operations, slicing, reshaping, and additional. These operations are optimized for effectivity and are integral to creating setting pleasant machine learning fashions. PyTorch tensors are interoperable with completely different information constructions, much like Numpy. They’re usually moved between completely completely different models, much like CPUs and GPUs, to hurry up computations.

Data loaders are moreover created. In PyTorch, information loaders are an important half for successfully coping with and iterating over datasets. The primary goal of an info loader is to load information in batches, which is crucial for teaching fashions on large datasets that do not match into memory immediately.

train_size=int(0.8*len(information))
put together = information[:train_size]
check out = information[train_size:]

# Deciding on choices and aim
choices = put together.drop(columns=['deltae'])
aim = put together['deltae']# Normalize choices
scaler = StandardScaler()
features_scaled = scaler.fit_transform(choices)# Convert to PyTorch tensors
features_tensor = torch.tensor(features_scaled, dtype=torch.float32)
target_tensor = torch.tensor(aim.values, dtype=torch.float32).view(-1, 1)# Enter sequence dimension and future prediction dimension
N = 24  # Number of time steps in each enter sequence
M = 24  # Number of time steps to predict# Put collectively sequences
sequences = [features_tensor[i:i+N] for i in range(len(features_tensor) - N - M + 1)]
targets = [target_tensor[i+N:i+N+M] for i in range(len(target_tensor) - N - M + 1)]# Convert to tensors and create datasets
sequences_tensor = torch.stack(sequences)
targets_tensor = torch.stack(targets).view(len(targets), M)  # Assure targets are accurately fashioned# DataLoader setup
dataset = TensorDataset(sequences_tensor, targets_tensor)
train_size = int(0.8 * len(dataset))
train_dataset, val_dataset = random_split(dataset, [train_size, len(dataset) - train_size])
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=False)
val_loader = DataLoader(val_dataset, batch_size=32)
# Test set coping with (comparable technique)
test_features = check out.drop(columns=['deltae'])
test_target = check out['deltae']
test_features_scaled = scaler.rework(test_features)  # Use the an identical scaler as for the teaching information
test_features_tensor = torch.tensor(test_features_scaled, dtype=torch.float32)
test_target_tensor = torch.tensor(test_target.values, dtype=torch.float32).view(-1, 1)# Put collectively sequences for testing
test_sequences = [test_features_tensor[i:i+N] for i in range(len(test_features_tensor) - N - M + 1)]
test_targets = [test_target_tensor[i+N:i+N+M] for i in range(len(test_target_tensor) - N - M + 1)]# Convert to tensors and create check out dataset
test_sequences_tensor = torch.stack(test_sequences)
test_targets_tensor = torch.stack(test_targets).view(len(test_targets), M)  # Assure targets are accurately fashionedtest_dataset = TensorDataset(test_sequences_tensor, test_targets_tensor)
test_loader = DataLoader(test_dataset, batch_size=32)

The TimeSeriesTransformer class is a personalized neural group model constructed using PyTorch Lightning and designed for time sequence forecasting duties. It makes use of the Transformer construction, which is assumed for coping with sequences and capturing dependencies all through time.

Parameters:

input_dim: The number of choices throughout the enter information. Throughout the context of Transformers, this moreover corresponds to the d_model which is the size of the anticipated enter embeddings.
num_classes: The scale of the output layer, which determines what variety of values the model predicts at each time step.
dim_feedforward: The dimensionality of the feedforward group model throughout the transformer layers.
nhead: The number of heads throughout the multi-head consideration fashions.
num_layers: The number of sub-encoder and sub-decoder layers throughout the transformer.
dropout: The dropout cost, a regularization parameter to cease overfitting by randomly setting a fraction of the enter gadgets to 0 at each substitute all through teaching.

Parts:

self.transformer: The Transformer model from PyTorch’s neural group library. It is configured with every encoder and decoder layers having the an identical number of layers, head amount, and so forth.
self.linear_out: A linear layer that maps the Transformer’s output to the required output dimension (num_classes). This layer acts as the final word prediction layer.
self.val_mae and self.val_rmse: Metrics from torchmetrics to compute the suggest absolute error and the muse suggest sq. error, respectively. These are used to guage the model all through validation.

class PlotLossCallback(pl.Callback):
def __init__(self):
large().__init__()
self.losses = []

    def on_train_batch_end(self, coach, pl_module, outputs, batch, batch_idx):
# Append the loss from the current batch to the itemizing of losses
self.losses.append(outputs['loss'].merchandise())    def on_train_end(self, coach, pl_module):
# Plot the teaching loss curve
plt.plot(np.arange(len(self.losses)), self.losses, label='Teaching Loss')
plt.xlabel('Batch')
plt.ylabel('Loss')
plt.title('Teaching Loss Curve')
plt.legend()
plt.current()class TimeSeriesTransformer(pl.LightningModule):
def __init__(self, input_dim, num_outputs=24, dim_feedforward=512, nhead=4, num_layers=3, dropout=0.2):
large().__init__()
self.transformer = nn.Transformer(
d_model=input_dim,
nhead=nhead,
num_encoder_layers=num_layers,
num_decoder_layers=num_layers,
dim_feedforward=dim_feedforward,
dropout=dropout
)
# Assuming each time step outputs one attribute
self.linear_out = nn.Linear(input_dim, 1)
self.val_mae = MeanAbsoluteError()
self.val_rmse = MeanSquaredError(squared=False)    def forward(self, src):
src = src.permute(1, 0, 2)  # [sequence_length, batch_size, features]
output = self.transformer(src, src)
output = self.linear_out(output)
output = output.permute(1, 0, 2)  # [batch_size, sequence_length, num_outputs]
return output    def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(x)
# Assure y is [batch_size, sequence_length, num_outputs]
y = y.view(y_hat.kind)  # Reshape y to match y_hat
loss = nn.MSELoss()(y_hat, y)
self.log('train_loss', loss)
return loss    def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(x)
y = y.view(y_hat.kind)
loss = nn.MSELoss()(y_hat, y)
self.log('val_loss', loss, on_epoch=True)
self.val_mae(y_hat.flatten(), y.flatten())
self.val_rmse(y_hat.flatten(), y.flatten())
return {"val_loss": loss}    def on_validation_epoch_end(self):
self.log('val_mae', self.val_mae.compute(), prog_bar=True)
self.log('val_rmse', self.val_rmse.compute(), prog_bar=True)
self.val_mae.reset()
self.val_rmse.reset()    def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.001)

Number of heads (nhead) must be a divisor of enter choices rely. Model is initialized with provided parameters. The gathering of parameters is made after loads of experiments.

# Initialize the model
model = TimeSeriesTransformer(input_dim=features_tensor.kind[1],
num_outputs=24,
nhead=7, # with local weather forecast
#nhead=4, # with out local weather forecast
dim_feedforward=512,  # Measurement of the feedforward group
num_layers=3,  # Number of layers throughout the encoder and decoder
dropout=0.1)  # Dropout cost

# Create a PyTorch Lightning coach and match the model
coach = pl.Coach(max_epochs=5, accelerator='gpu', callbacks=[PlotLossCallback()], models=1 if torch.cuda.is_available() else None)
#coach = pl.Coach(max_epochs=10, accelerator='cpu', callbacks=[PlotLossCallback()])
coach.match(model, train_loader, val_loader)

Lastly, the model is evaluated.

def evaluate_model(model, test_loader):
model.eval()
mae_metric = MeanAbsoluteError()
rmse_metric = MeanSquaredError(squared=False)
with torch.no_grad():
for x, y in test_loader:
# Assure x has the anticipated kind; assuming it's already acceptable, no must unsqueeze
y_hat = model(x)  # Forward cross

            # Reshape y_hat and y if wanted, assure these tensors are appropriate
# Flatten the sequence and batch dimensions to cope with all predictions equally
y_hat_flat = y_hat.reshape(-1)  # Flatten all batches and sequences
y_flat = y.reshape(-1)  # Flatten all batches and sequences            # Exchange metrics
mae_metric.substitute(y_hat_flat, y_flat)
rmse_metric.substitute(y_hat_flat, y_flat)    # Compute final metric values
mae = mae_metric.compute()
rmse = rmse_metric.compute()
return mae.merchandise(), rmse.merchandise()# Assuming 'model' is your expert model event and 'test_loader' is about up
mae, rmse = evaluate_model(model, test_loader)
print(f"Indicate Absolute Error on Test Set: {mae:.4f}")
print(f"Root Indicate Sq. Error on Test Set: {rmse:.4f}")

Indicate Absolute Error on Test Set: 21.0991
Root Indicate Sq. Error on Test Set: 57.6080

Ensuing MAE is certainly a median of forecasting accuracies at completely completely different events in output vector of 24 hours. Beneath bar half reveals exact accuracies at each of the completely completely different events. As anticipated, this accuracy degrades for the later time elements, however, not as so much as anticipated.

hours = range(24)

# Plot the bar chart
plt.decide(figsize=(15, 8))  # Modify the width and peak as needed
bars = plt.bar(hours, maes, color='skyblue')
plt.xlabel('Hour')
plt.ylabel('MAE')
plt.title('Indicate Absolute Error (MAE) for Each Hour')
plt.xticks(hours)  # Assure all hours are displayed on the x-axis
plt.grid(axis='y')  # Add grid strains alongside the y-axis# Annotate each bar with its corresponding MAE price
for bar, mae in zip(bars, maes):
plt.textual content material(bar.get_x() + bar.get_width() / 2,  # x-coordinate of textual content material
bar.get_height()+0.1,                  # y-coordinate of textual content material
f'{mae:.2f}',                      # Textual content material to point out (formatted MAE price)
ha='center', va='bottom',          # Textual content material alignment
color='black',                     # Textual content material color
fontsize=8)plt.current()

Regular deviation of all accuracies is displayed.

std = np.std(maes)
print("Regular Deviation:", std)

Regular Deviation: 0.22962380641058933

We’re exhibiting the visualizations of the alignment of exact values and forecasts on the completely completely different place in output sequence, on the data sample so to make the variations additional distinguished.

timepoint_index=0
hours_forecasts=[]
hours_actuals=[]
numhours=24

# Iterate over the check out loader to get forecasts and exact values for the chosen timepoint
with torch.no_grad():
for tp in range(0,numhours):
forecasts = []
actuals = []
for inputs, targets in test_loader:
# Make predictions using the model
predictions = model(inputs)          # Extract forecasts and exact values for the chosen timepoint from each sample throughout the batch
for i in range(len(inputs)):
# Extract forecasts and exact values for the specified timepoint from each sample
forecast = predictions[i, tp].merchandise()
exact = targets[i, tp].merchandise()            # Append the forecasts and exact values to the respective lists
forecasts.append(forecast)
actuals.append(exact)
hours_forecasts.append(forecasts)
hours_actuals.append(actuals)

start = 6000
end = 6300
num_hours = len(hours_forecasts)
maes=[]
# Create a decide with quite a lot of subplots
num_rows = (num_hours + 1) // 2  # Number of rows for subplots
fig, axes = plt.subplots(num_rows, 2, figsize=(20, num_rows*4))

# Iterate over the forecasts and actuals
for i in range(num_hours):
actuals_array = np.array(hours_actuals[i])
forecasts_array = np.array(hours_forecasts[i])
mae = np.suggest(np.abs(actuals_array - forecasts_array))
maes.append(mae)
row_index = i // 2  # Calculate the row index for the subplot
col_index = i % 2   # Calculate the column index for the subplot    # Plot the forecasts and exact values throughout the subplot
ax = axes[row_index, col_index]
hours = np.arange(len(hours_forecasts[i]))
ax.plot(hours[start:end], hours_forecasts[i][start:end], label='Forecasts')
ax.plot(hours[start:end], hours_actuals[i][start:end], label='Actuals')
ax.set_xlabel('Hour')
ax.set_ylabel('deltae')
ax.set_title(f'Forecasts vs Actuals at hour {i}, MAE: {mae}')
ax.legend()
ax.grid(True)# Modify construction
plt.tight_layout()# Current the plot
plt.current()

Checkout Jupyter pocket ebook with the above code at XAI4HEAT repo.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Consideration Is All You Need. arXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762

Shaw, P., Uszkoreit, J., & Vaswani, A. (2018). Self-Consideration with Relative Place Representations. https://doi.org/10.48550/ARXIV.1803.02155

This evaluation was supported by the Science Fund of the Republic of Serbia, Grant №23-SSF-PRISMA-206, Explainable AI-assisted operations in district heating methods — XAI4HEAT.

Source link

Using Transformer Models for Sequence Forecasting of Heat Demand in District Heating Systems | by XAI4HEAT | May, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

How to Assist Human Agents & Transform Customer Experience with Conversational AI?

Salesforce Introduces Agentforce Testing Center: AI Agent Lifecycle Management Tooling for Testing Autonomous AI Agents at Scale

70% of Firms Disrupted by AI: New Endava Research

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Our Picks

How are LLMs creative?. The Science of Creativity in LLMs… | by Sushil Khadka | Jun, 2024

Sending Emails With Python

deepset Launches Studio for Architecting LLM Applications with Native Integrations to deepset Cloud and NVIDIA AI Enterprise

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Using Transformer Models for Sequence Forecasting of Heat Demand in District Heating Systems | by XAI4HEAT | May, 2024

Implementation

Related Posts