Using Transformer Models for Sequence Forecasting of Heat Demand in District Heating Systems | by XAI4HEAT | May, 2024

District Heating Functions (DHS) are an vital a part of metropolis infrastructure, offering warmth to properties and companies. One in all many key challenges in managing these strategies is predicting the hourly warmth demand precisely. This sort of forecast helps operators fine-tune the system’s effectivity, lower prices, decrease environmental impacts, and defend prospects cosy.

In our analysis, we explored how cutting-edge Deep Studying (DL) methods could also be utilized to make these predictions extra precise. This stage of ingredient helps operators make educated decisions on day-to-day operations. For example, they’ll regulate manufacturing ranges primarily based completely on anticipated demand, take care of secondary circulation into at substations, and schedule upkeep all by means of low-demand durations, like in a single day. Notably, we centered on forecasting warmth demand over the following 24 hours through the utilization of Transfomer building, well-known for its use in possibly basically essentially the most potent Giant Language Fashions proper now.

We began with the speculation that fashionable DL fashions, considerably these designed for sequence forecasting and mixed with customized attribute engineering (together with native climate forecast data), can ship terribly applicable outcomes. Our goal was to comprehend accuracy equal to single-step forecasting whereas sustaining a gradual effectivity all by means of the entire forecast sequence.

Alter to XAI4HEAT on LinkedIn for extra tales like this.

To check this, we used data from an area DHS substation, which included 38,710 time components over 5 heating seasons from 2019 to 2024. We employed customized attribute engineering methods and a state-of-the-art Transformer algorithm tailor-made for time sequence forecasting.

Our outcomes have been promising. With out incorporating native climate data, our mannequin achieved a Point out Absolute Error (MAE) of 28.15 KWh, which is healthier than the benchmark single-step mannequin utilizing a stacked Extended-Transient Time interval Reminiscence (LSTM) group (MAE of 28.73 KWh). As quickly as we included native climate forecast decisions, the mannequin’s effectivity improved considerably, with an MAE of 21.09 KWh. Moreover, the mannequin maintained a unbroken effectivity all by means of the forecasted hours, with a typical deviation of MAE of merely 0.23 KWh. We used a vanilla Transformer mannequin tailored for time sequence factors, skilled over 5 epochs.

In abstract, our evaluation reveals that fashionable DL fashions, when mixed with considerate attribute engineering, can considerably enhance the accuracy of warmth demand forecasts in DHS. This development not solely helps bigger operational decisions nonetheless furthermore contributes to extra setting nice and sustainable metropolis heating decisions.

A District Heating System is a centralized technique of offering warmth to numerous buildings by way of a bunch of insulated pipes that ship scorching water or steam from a central present. This technique is used to efficiently warmth residential, enterprise, and industrial buildings inside a particular district or home.

The weather of the DHS are DHS plant or central warmth present, the distribution group with insulated pipes transporting the model new water or steam from the central present to the buildings. These pipes selection two closed loops, considerably major circulation into (heating fluid current and return from central plant to substations) and secondary circulation into (heating fluid current and return from substations to shoppers). Warmth is transferred from major to secondary circulation into by warmth exchangers, positioned at substations.

Uncover beneath illustration of small DHS we’re working with, together with 5 substations (L4, L8, L12, L17 and L22). Resolve furthermore reveals location of measurement of a extremely extremely efficient data in our datasets.

DHS current and return strains, substations and sensor areas

Let’s begin with importing the required libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import seaborn as sns
import time
from tqdm import tqdm

All data from 5 substations of the native DHS and meteorological data is saved in GitHub folder.

github_folder = 'https://github.com/xai4heat/xai4heat/uncooked/important/datasets/'
data = ['xai4heat_scada_L4.csv',
'xai4heat_scada_L8.csv',
'xai4heat_scada_L12.csv',
'xai4heat_scada_L17.csv',
'xai4heat_scada_L22.csv']
weather_file='native climate/ni_20_rs.csv'

Native climate dataset is opened and a few clearly irrelevant data is eradicated.

dfw = pd.read_csv(github_folder+weather_file)
dfw['datetime'] = pd.to_datetime(dfw['datetime'])
dfw.set_index('datetime',inplace=True)
dfw=dfw.drop(['name',
'precipprob',
'preciptype',
'icon',
'stations'], axis=1)

All substations data is open, some preliminary pre-processing is accomplished, data is merged with native climate data and ensuing datasets are saved in an inventory for later use. Preliminary pre-processing includes stripping data out of hourly time sequence and low season data. November 1st and April 1st are adopted as dates for a heating season begin and finish — usually season begins earlier and ends later, nonetheless these durations are characterised by irregularities, very like testing the system, very unbalanced consumption and so forth., subsequently they’re discarded.

Furthermore, the numbers of time components with zero transmitted vitality are displayed beneath. Such state of affairs correspond to data transmission failures and it wants acceptable treatment.

def strip_out_season_data(df):
date_range_season1 = (df.index >= pd.to_datetime('2018-11-01 06:00:00')) & (df.index < pd.to_datetime('2019-04-01 23:00:00'))
date_range_season2 = (df.index >= pd.to_datetime('2019-11-01 06:00:00')) & (df.index < pd.to_datetime('2020-04-01 23:00:00'))
date_range_season3 = (df.index >= pd.to_datetime('2020-11-01 06:00:00')) & (df.index < pd.to_datetime('2021-04-01 23:00:00'))
date_range_season4 = (df.index >= pd.to_datetime('2021-11-01 06:00:00')) & (df.index < pd.to_datetime('2022-04-01 23:00:00'))
date_range_season5 = (df.index >= pd.to_datetime('2022-11-01 06:00:00')) & (df.index < pd.to_datetime('2023-04-01 23:00:00'))
date_range_season6 = (df.index >= pd.to_datetime('2023-11-01 06:00:00')) & (df.index < pd.to_datetime('2024-04-01 23:00:00'))
df = df[date_range_season1 | date_range_season2 | date_range_season3 | date_range_season4 | date_range_season5 | date_range_season6]
return dfall_data=[]
for i in data:
df = pd.read_csv(github_folder+i)
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index('datetime',inplace=True)
# For every sub, present data acquisition durations
print(i)
print('Timeline (from/to): ', df.index.min(), df.index.max())
# Take away data exterior of the heating season
df=strip_out_season_data(df)
# Strip all data aside from data acquired at full hour
df = df[df.index.minute == 0]
#Insert lacking timepoints, populate with NaNs
complete_time_index = pd.date_range(begin=df.index.min(), finish=df.index.max(), freq='H')
df = df.reindex(complete_time_index)
#Present variety of enery zero data - inaccurate readings at calorimeter
zero_count = (df['e'] == 0).sum()
print('Knowledge transmission failures: ', str(zero_count)+'/'+str(len(df)))
#Merging with native climate data
df = pd.merge(df, dfw, left_index=True, right_index=True, how='inside')
all_data.append(df)

xai4heat_scada_L4.csv
Timeline (from/to): 2019–08–05 13:00:00 2024–04–04 11:52:00
Knowledge transmission failures: 2200/38712
xai4heat_scada_L8.csv
Timeline (from/to): 2021–05–25 13:00:00 2024–04–04 11:52:00
Knowledge transmission failures: 13/21168
xai4heat_scada_L12.csv
Timeline (from/to): 2021–05–25 13:00:00 2024–04–04 11:52:00
Knowledge transmission failures: 16/21168
xai4heat_scada_L17.csv
Timeline (from/to): 2019–08–05 13:00:00 2024–04–04 11:52:00
Knowledge transmission failures: 89/38712
xai4heat_scada_L22.csv
Timeline (from/to): 2021–05–25 13:00:00 2024–04–04 11:52:00
Knowledge transmission failures: 64/21168

Let’s address lacking transmitted vitality data. First, zero data in e column is modified with NaNs, as calorimeter studying can’t be zero. Then, for all fully completely different columns, solely data very like the place of zero data in e column is modified with NaNs. Lastly, all NaNs are imputed through the utilization of easy linear interpolation which is considered a sufficiently good approximation for the hourly time sequence data.

This system follows the idea zero data in calorimeter studying corresponds to the difficulty in transmission, affecting furthermore readings of the fluid and ambient temperature sensors. Zero data by the temperature sensors merely just isn’t sturdy indication of a transmission failure, considerably when contemplating the ambient temperature sensor which clearly would possibly examine a zero temperature.

columns_to_update = ['t_amb', 't_ref', 't_sup_prim', 't_ret_prim', 't_sup_sec', 't_ret_sec']

for i, dfa in enumerate(all_data):
dfa['e'] = dfa['e'].change(0, np.nan)
for column in columns_to_update:
dfa.loc[dfa['e'].isna(), column] = np.nan
dfa.interpolate(technique='linear', inplace=True)
all_data[i]=dfa

For seen inspection, alerts very like transmitted vitality and temperatures of current and return circulation into fluids in secondary line are plotted. Furthermore, easy z-score technique is used to establish the outliers in fluid temperatures in a secondary circulation into. All data with z_score > 5 is considered an outlier and irregular data, to be handled appropriately.

num_plots = len(all_data)
num_rows = math.ceil(num_plots / 2)

def find_outliers(sequence):
"""Calculate z-scores and arrange outliers in a sequence."""
recommend = sequence.recommend()
std = sequence.std()
z_scores = (sequence - recommend) / std
return sequence[np.abs(z_scores) > 5]
# Create subplots
fig, axs = plt.subplots(num_rows, 2, figsize=(25, 6*num_rows))axs = axs.flatten()# Loop by way of every DataFrame all through the itemizing
for i, df in enumerate(all_data):
axs[i].plot(df['t_sup_sec'], label='t_sup_sec', linewidth=1, colour='blue')
axs[i].plot(df['t_ret_sec'], label='t_ret_sec', linewidth=1, colour='inexperienced')
axs[i].plot(df['t_amb'], label='t_amb', linewidth=1, colour='orange')    # Calculate and plot outliers for t_sup_sec
outliers_sup = find_outliers(df['t_sup_sec'])
axs[i].scatter(outliers_sup.index, outliers_sup, colour='blue', s=40, label='Outliers t_sup_sec', edgecolors='okay')    # Calculate and plot outliers for t_ret_sec
outliers_ret = find_outliers(df['t_ret_sec'])
axs[i].scatter(outliers_ret.index, outliers_ret, colour='inexperienced', s=40, label='Outliers t_ret_sec', edgecolors='okay')    # Calculate and plot outliers for t_amb
outliers_amb = find_outliers(df['t_amb'])
axs[i].scatter(outliers_amb.index, outliers_amb, colour='orange', s=40, label='Outliers t_amb', edgecolors='okay')
    axs_e = axs[i].twinx()
axs_e.plot(df.index, df['e'], colour='crimson', label='e', linewidth=2)
axs_e.set_ylabel('e')    axs[i].set_title(f'{data[i]} nFluid temp at secondary circulation into')    strains, labels = axs[i].get_legend_handles_labels()
lines_e, labels_e = axs_e.get_legend_handles_labels()
axs[i].legend(strains + lines_e, labels + labels_e)    axs[i].tick_params(axis='x', rotation=90)
axs[i].grid(True)if num_plots % 2 != 0:
fig.delaxes(axs[num_plots])plt.tight_layout()
plt.present()

All acknowledged outliers are modified with linearly interpolated data.

def replace_outliers(sequence):
"""Resolve outliers utilizing z-scores and alter them with NaN."""
recommend = sequence.recommend()
std = sequence.std()
z_scores = (sequence - recommend) / std
# Substitute with NaN the place state of affairs is met
sequence[np.abs(z_scores) > 5] = np.nan
return sequence

for i, df in enumerate(all_data):
# Substitute outliers with NaNs for every related column
df['t_sup_sec'] = replace_outliers(df['t_sup_sec'].copy())
df['t_ret_sec'] = replace_outliers(df['t_ret_sec'].copy())
df['t_amb'] = replace_outliers(df['t_amb'].copy())    # Interpolate to fill NaNs
df['t_sup_sec'].interpolate(inplace=True)
df['t_ret_sec'].interpolate(inplace=True)
df['t_amb'].interpolate(inplace=True)    all_data[i]=df

Major date decisions and transmitted vitality in a earlier hour are launched. Furthermore, meteo data dimensionality is decreased.

dropcolumns=['solarenergy',
'uvindex',
'severerisk',
'visibility',
'cloudcover',
'snow',
'dew',
'conditions',
'e',
'pe']

for i, df in enumerate(all_data):
df['hour_of_day'] = df.index.hour
df['month'] = df.index.month
df['day_of_week'] = df.index.dayofweek
df['is_working_day'] = df['day_of_week'].apply(lambda x: 1 if x < 5 else 0)
deltae=(df['e']-df['e'].shift(1))*1000  df['heating_on'] = deltae.apply(lambda x: 1 if x != 0 else 0)
df['deltae']=deltae  df=df.drop(columns=dropcolumns, axis=1)
df=df.dropna()
all_data[i]=df

Based totally completely on correlation evaluation, some decisions are eradicated. Among the many many decisions are eradicated ensuing from a really low correlation with transmitted vitality (day_of_week, is_working_day, precip, windgust, windspeed, windir). Temperature measurement sign on the substation stage is modified with ambient temperature sign from the official meteorological station due to bigger precision and resilience. Among the many many decisions are eradicated ensuing from very excessive correlation with deltae, considerably (t_sup_prim, t_ret_prim, t_sup_sec, t_ret_sec), introducing the dangers of multicollinearity components (destabilization of the mannequin coefficients, make them delicate to small adjustments all through the knowledge, and so forth.) and overfitting.

rmv= ['day_of_week', 'is_working_day', 'precip', 'windgust', 'windspeed', 'winddir', 't_amb', 't_sup_prim', 't_ret_prim', 't_sup_sec', 't_ret_sec', 't_ref', 'month']

for i, df in enumerate(all_data):
dfx=df.drop(rmv, axis=1)
all_data[i]=dfx

Since merely at the moment, the Consideration and Transformer fashions began to comprehend recognition as strategies for time sequence forecasting factors. Transformers are initially developed for pure language processing, nonetheless they’re tailored for time sequence forecasting factors. They address long-range dependencies accurately, making them acceptable for superior forecasting duties.

Transformer building was initially proposed in [Vaswani et al, 2017]. The development includes of variety of stacked encoders, adopted by the an an identical variety of stacked decoder devices. Every of the encoders and decoders are comparable in growth. Vital components of encoder are self-attention and Feed-Ahead Neural Neighborhood (FFNN) layers, whereas decoder includes of self-attention, encoder-decoder consideration and FFNN layers. See beneath the illustration of the Transformer mannequin building.

All through the inference step, skilled Transformer mannequin primarily “interprets” the enter sequence to the output sequence, considerably a forecast. In time sequence draw again, the enter sequence is X={x1,x2,..,xn} , the place every xi is a vector very like all enter decisions data in i data diploma.

All enter vectors are entered into self-attention layer instantly. Self-attention permits the mannequin to deal with fully fully completely different components of the enter sequence when encoding or decoding data, capturing dependencies and relationships between them. It’s a key part of transformer architectures and has been confirmed to carry out accurately in assorted pure language processing duties.

Self-attention course of outcomes with the eye rating vector which actually quantify these dependencies. For calculating consideration rating, question, key and worth matrices are computed primarily based completely on the enter sequence:

the place WQ, WK, WV are learnable weight matrices. The eye rating is then computed as follows:

the place dk is the dimension of key vectors. Lastly, the output sequence is calculated on account of the weighted sum (the place weights are literally consideration scores) of the value vectors V , considerably:

Multi-head consideration is the idea facilitates separation of address the fully fully completely different components of the enter sequence vectors, by attempting (vertically) on the teams of decisions. Such an approach is assumed for improved illustration studying (bigger capturing diversified patterns and dependencies), bigger interpretability (consideration weights computed by every head present insights on which teams of decisions are vital for the forecast) and fully completely different advantages.

On this system, consideration scores and a highlight itself is calculated for every head, the place variety of decisions all through the enter sequence should be divisible by the variety of heads. Primary consideration is computed as concatenation of all the eye heads (the place m is the variety of heads) multiplied with weight matrix:

The positional encoding is a strategy utilized in transformer fashions to supply positional data to the enter embeddings, permitting the mannequin to know the order of phrases in a sequence. Positional encoding relies on the sine and cosine capabilities. Given an enter sequence of dimension n and a dimension of enter decisions data d , the positional encoding PE(i,j) for the j -th attribute at place i all through the enter sequence, is computed as follows:

The positional encoding vector for every place i is then obtained by concatenating the values of P(i,j) all by means of all dimensions j of the enter vector.

This positional encoding vector is then added element-wise to the corresponding enter decisions vector to amass the ultimate phrase enter illustration:

The tactic described above might be often called absolute positional encoding. All through the next analysis, relative positional encoding was proposed [Shaw et al, 2018] on the premise that pairwise positional encoding between two enter attribute vectors is extra useful then their place.

Earlier than feeding the info to the neural group, the enter is normalized after the enter sequence is as quickly as additional added to the multi-head consideration output, on account of it follows:

Decoder components are following the an identical logic. Nonetheless, some express components are fully fully completely different.

First, whereas the encoded illustration of enter sequence is fed to the decoder a part of the development, encoder furthermore present key and worth matrices to the Encoder-Decoder consideration components, enabling the decoder to deal with related components of the enter sequences whereas producing the output. This layer works very like multiheaded self-attention, aside from that it creates its Q matrix from the layer beneath it, and takes the Okay and V matrix from the output of the encoder stack.

Second, the decoder operates in autoregressive approach to generate the output sequence step-by-step. Correct proper right here, the self-attention layer is solely allowed to attend solely to earlier positions all through the output sequence. That is executed by masking future positions ahead of the softmax step all through the self-attention calculation.

The output of the decoder is a sequence of hidden states, the place every hidden state represents the mannequin’s prediction for a specific time step all through the forecast horizon. These hidden states are handed by way of a linear layer, which applies a linear transformation to map the hidden states to a higher-dimensional area. The output of the linear layer is then handed by way of a softmax layer, which normalizes the values all by means of the forecast horizon to amass a possibility distribution over the attainable future values. The softmax operate ensures that the anticipated potentialities sum to 1, permitting the mannequin to output a possibility distribution for every time step. The final phrase step incorporates deciding on the just about really worth for every time step primarily based completely on the anticipated likelihood distributions. This may be executed by taking the argmax of the softmax output at every time step, leading to a degree forecast for every future time step.

Implementation

An easy Transformer building tailor-made for time sequence forecasting was used on this case, with implementation with the PyTorch library.

!pip organize pytorch-lightning

import torch
import torch.nn as nn
import pytorch_lightning as pl
from torch.utils.data import DataLoader, TensorDataset, random_split
from sklearn.preprocessing import StandardScaler
from torchmetrics import MeanAbsoluteError, MeanSquaredError

Mannequin is skilled through the utilization of data from L12 substation.

data=all_data[3]

We’re along with earlier 24 values of transmitted vitality as decisions. Since warmth demand exhibits very sturdy day-to-day seasonability and comparable day-to-day consumption patterns, we’ve to emphasise that to the mannequin by introducing earlier 24 hours transmitted vitality data all through the enter data vector. Equipped that the dimensions of enter vector is 24, this practically implies that we’re utilizing data from earlier 48 hours to forecast the vitality demand in subsequent 24 hours.

num_lags = 24

# Create lagged decisions for deltae
for lag in vary(1, num_lags + 1):
data[f'deltae_lag_{lag}'] = data['deltae'].shift(lag)# Take away rows with NaN values that consequence from shifting
data = data.dropna()

We’re along with native climate forecast, considerably temperature forecasts in subsequent 24 hours. As we’re utilizing precise meteo station readings, future 24 hours of the readings are launched to the enter data vector. Clearly, that’s not the forecast. Beneath the considered applicable forecasts, that’s an impressive diversified.

# Create lagged decisions for deltae
for lag in vary(1, 25):
data[f'temp_next_{lag}'] = data['temp'].shift(-lag)

# Take away rows with NaN values that consequence from shifting
data = data.dropna()

The dataset is preprocessed, normalized, and transformed it into PyTorch tensors. PyTorch tensors are the basic data constructions utilized in PyTorch, a preferred open-source machine studying library. PyTorch tensors can retailer multi-dimensional data with fully fully completely different data sorts supported. PyTorch includes a wide selection of operations for tensor manipulation, together with mathematical operations, slicing, reshaping, and extra. These operations are optimized for effectivity and are integral to creating setting nice machine studying fashions. PyTorch tensors are interoperable with fully completely different data constructions, very like Numpy. They’re often moved between fully fully completely different fashions, very like CPUs and GPUs, to rush up computations.

Knowledge loaders are furthermore created. In PyTorch, data loaders are an vital half for efficiently dealing with and iterating over datasets. The first aim of an data loader is to load data in batches, which is essential for instructing fashions on giant datasets that don’t match into reminiscence instantly.

train_size=int(0.8*len(data))
put collectively = data[:train_size]
take a look at = data[train_size:]

# Deciding on decisions and goal
decisions = put collectively.drop(columns=['deltae'])
goal = put collectively['deltae']# Normalize decisions
scaler = StandardScaler()
features_scaled = scaler.fit_transform(decisions)# Convert to PyTorch tensors
features_tensor = torch.tensor(features_scaled, dtype=torch.float32)
target_tensor = torch.tensor(goal.values, dtype=torch.float32).view(-1, 1)# Enter sequence dimension and future prediction dimension
N = 24  # Variety of time steps in every enter sequence
M = 24  # Variety of time steps to foretell# Put collectively sequences
sequences = [features_tensor[i:i+N] for i in vary(len(features_tensor) - N - M + 1)]
targets = [target_tensor[i+N:i+N+M] for i in vary(len(target_tensor) - N - M + 1)]# Convert to tensors and create datasets
sequences_tensor = torch.stack(sequences)
targets_tensor = torch.stack(targets).view(len(targets), M)  # Guarantee targets are precisely common# DataLoader setup
dataset = TensorDataset(sequences_tensor, targets_tensor)
train_size = int(0.8 * len(dataset))
train_dataset, val_dataset = random_split(dataset, [train_size, len(dataset) - train_size])
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=False)
val_loader = DataLoader(val_dataset, batch_size=32)
# Take a look at set dealing with (comparable approach)
test_features = take a look at.drop(columns=['deltae'])
test_target = take a look at['deltae']
test_features_scaled = scaler.rework(test_features)  # Use the an an identical scaler as for the instructing data
test_features_tensor = torch.tensor(test_features_scaled, dtype=torch.float32)
test_target_tensor = torch.tensor(test_target.values, dtype=torch.float32).view(-1, 1)# Put collectively sequences for testing
test_sequences = [test_features_tensor[i:i+N] for i in vary(len(test_features_tensor) - N - M + 1)]
test_targets = [test_target_tensor[i+N:i+N+M] for i in vary(len(test_target_tensor) - N - M + 1)]# Convert to tensors and create take a look at dataset
test_sequences_tensor = torch.stack(test_sequences)
test_targets_tensor = torch.stack(test_targets).view(len(test_targets), M)  # Guarantee targets are precisely commontest_dataset = TensorDataset(test_sequences_tensor, test_targets_tensor)
test_loader = DataLoader(test_dataset, batch_size=32)

The TimeSeriesTransformer class is a customized neural group mannequin constructed utilizing PyTorch Lightning and designed for time sequence forecasting duties. It makes use of the Transformer building, which is assumed for dealing with sequences and capturing dependencies all by means of time.

Parameters:

input_dim: The variety of decisions all through the enter data. All through the context of Transformers, this furthermore corresponds to the d_model which is the dimensions of the anticipated enter embeddings.
num_classes: The dimensions of the output layer, which determines what number of values the mannequin predicts at every time step.
dim_feedforward: The dimensionality of the feedforward group mannequin all through the transformer layers.
nhead: The variety of heads all through the multi-head consideration fashions.
num_layers: The variety of sub-encoder and sub-decoder layers all through the transformer.
dropout: The dropout price, a regularization parameter to stop overfitting by randomly setting a fraction of the enter devices to 0 at every substitute all by means of instructing.

Components:

self.transformer: The Transformer mannequin from PyTorch’s neural group library. It’s configured with each encoder and decoder layers having the an an identical variety of layers, head quantity, and so forth.
self.linear_out: A linear layer that maps the Transformer’s output to the required output dimension (num_classes). This layer acts as the ultimate phrase prediction layer.
self.val_mae and self.val_rmse: Metrics from torchmetrics to compute the recommend absolute error and the muse recommend sq. error, respectively. These are used to guage the mannequin all by means of validation.

class PlotLossCallback(pl.Callback):
def __init__(self):
giant().__init__()
self.losses = []

    def on_train_batch_end(self, coach, pl_module, outputs, batch, batch_idx):
# Append the loss from the present batch to the itemizing of losses
self.losses.append(outputs['loss'].merchandise())    def on_train_end(self, coach, pl_module):
# Plot the instructing loss curve
plt.plot(np.arange(len(self.losses)), self.losses, label='Educating Loss')
plt.xlabel('Batch')
plt.ylabel('Loss')
plt.title('Educating Loss Curve')
plt.legend()
plt.present()class TimeSeriesTransformer(pl.LightningModule):
def __init__(self, input_dim, num_outputs=24, dim_feedforward=512, nhead=4, num_layers=3, dropout=0.2):
giant().__init__()
self.transformer = nn.Transformer(
d_model=input_dim,
nhead=nhead,
num_encoder_layers=num_layers,
num_decoder_layers=num_layers,
dim_feedforward=dim_feedforward,
dropout=dropout
)
# Assuming every time step outputs one attribute
self.linear_out = nn.Linear(input_dim, 1)
self.val_mae = MeanAbsoluteError()
self.val_rmse = MeanSquaredError(squared=False)    def ahead(self, src):
src = src.permute(1, 0, 2)  # [sequence_length, batch_size, features]
output = self.transformer(src, src)
output = self.linear_out(output)
output = output.permute(1, 0, 2)  # [batch_size, sequence_length, num_outputs]
return output    def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.ahead(x)
# Guarantee y is [batch_size, sequence_length, num_outputs]
y = y.view(y_hat.sort)  # Reshape y to match y_hat
loss = nn.MSELoss()(y_hat, y)
self.log('train_loss', loss)
return loss    def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self.ahead(x)
y = y.view(y_hat.sort)
loss = nn.MSELoss()(y_hat, y)
self.log('val_loss', loss, on_epoch=True)
self.val_mae(y_hat.flatten(), y.flatten())
self.val_rmse(y_hat.flatten(), y.flatten())
return {"val_loss": loss}    def on_validation_epoch_end(self):
self.log('val_mae', self.val_mae.compute(), prog_bar=True)
self.log('val_rmse', self.val_rmse.compute(), prog_bar=True)
self.val_mae.reset()
self.val_rmse.reset()    def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.001)

Variety of heads (nhead) should be a divisor of enter decisions rely. Mannequin is initialized with offered parameters. The gathering of parameters is made after a great deal of experiments.

# Initialize the mannequin
mannequin = TimeSeriesTransformer(input_dim=features_tensor.sort[1],
num_outputs=24,
nhead=7, # with native climate forecast
#nhead=4, # with out native climate forecast
dim_feedforward=512,  # Measurement of the feedforward group
num_layers=3,  # Variety of layers all through the encoder and decoder
dropout=0.1)  # Dropout price

# Create a PyTorch Lightning coach and match the mannequin
coach = pl.Coach(max_epochs=5, accelerator='gpu', callbacks=[PlotLossCallback()], fashions=1 if torch.cuda.is_available() else None)
#coach = pl.Coach(max_epochs=10, accelerator='cpu', callbacks=[PlotLossCallback()])
coach.match(mannequin, train_loader, val_loader)

Lastly, the mannequin is evaluated.

def evaluate_model(mannequin, test_loader):
mannequin.eval()
mae_metric = MeanAbsoluteError()
rmse_metric = MeanSquaredError(squared=False)
with torch.no_grad():
for x, y in test_loader:
# Guarantee x has the anticipated sort; assuming it is already acceptable, no should unsqueeze
y_hat = mannequin(x)  # Ahead cross

            # Reshape y_hat and y if wished, guarantee these tensors are applicable
# Flatten the sequence and batch dimensions to deal with all predictions equally
y_hat_flat = y_hat.reshape(-1)  # Flatten all batches and sequences
y_flat = y.reshape(-1)  # Flatten all batches and sequences            # Trade metrics
mae_metric.substitute(y_hat_flat, y_flat)
rmse_metric.substitute(y_hat_flat, y_flat)    # Compute remaining metric values
mae = mae_metric.compute()
rmse = rmse_metric.compute()
return mae.merchandise(), rmse.merchandise()# Assuming 'mannequin' is your skilled mannequin occasion and 'test_loader' is about up
mae, rmse = evaluate_model(mannequin, test_loader)
print(f"Point out Absolute Error on Take a look at Set: {mae:.4f}")
print(f"Root Point out Sq. Error on Take a look at Set: {rmse:.4f}")

Point out Absolute Error on Take a look at Set: 21.0991
Root Point out Sq. Error on Take a look at Set: 57.6080

Ensuing MAE is definitely a median of forecasting accuracies at fully fully completely different occasions in output vector of 24 hours. Beneath bar half reveals precise accuracies at every of the fully fully completely different occasions. As anticipated, this accuracy degrades for the later time components, nonetheless, not as a lot as anticipated.

hours = vary(24)

# Plot the bar chart
plt.determine(figsize=(15, 8))  # Modify the width and peak as wanted
bars = plt.bar(hours, maes, colour='skyblue')
plt.xlabel('Hour')
plt.ylabel('MAE')
plt.title('Point out Absolute Error (MAE) for Every Hour')
plt.xticks(hours)  # Guarantee all hours are displayed on the x-axis
plt.grid(axis='y')  # Add grid strains alongside the y-axis# Annotate every bar with its corresponding MAE worth
for bar, mae in zip(bars, maes):
plt.textual content material materials(bar.get_x() + bar.get_width() / 2,  # x-coordinate of textual content material materials
bar.get_height()+0.1,                  # y-coordinate of textual content material materials
f'{mae:.2f}',                      # Textual content material materials to level out (formatted MAE worth)
ha='middle', va='backside',          # Textual content material materials alignment
colour='black',                     # Textual content material materials colour
fontsize=8)plt.present()

Common deviation of all accuracies is displayed.

std = np.std(maes)
print("Common Deviation:", std)

Common Deviation: 0.22962380641058933

We’re exhibiting the visualizations of the alignment of tangible values and forecasts on the fully fully completely different place in output sequence, on the info pattern so to make the variations extra distinguished.

timepoint_index=0
hours_forecasts=[]
hours_actuals=[]
numhours=24

# Iterate over the take a look at loader to get forecasts and precise values for the chosen timepoint
with torch.no_grad():
for tp in vary(0,numhours):
forecasts = []
actuals = []
for inputs, targets in test_loader:
# Make predictions utilizing the mannequin
predictions = mannequin(inputs)          # Extract forecasts and precise values for the chosen timepoint from every pattern all through the batch
for i in vary(len(inputs)):
# Extract forecasts and precise values for the required timepoint from every pattern
forecast = predictions[i, tp].merchandise()
precise = targets[i, tp].merchandise()            # Append the forecasts and precise values to the respective lists
forecasts.append(forecast)
actuals.append(precise)
hours_forecasts.append(forecasts)
hours_actuals.append(actuals)

begin = 6000
finish = 6300
num_hours = len(hours_forecasts)
maes=[]
# Create a determine with numerous subplots
num_rows = (num_hours + 1) // 2  # Variety of rows for subplots
fig, axes = plt.subplots(num_rows, 2, figsize=(20, num_rows*4))

# Iterate over the forecasts and actuals
for i in vary(num_hours):
actuals_array = np.array(hours_actuals[i])
forecasts_array = np.array(hours_forecasts[i])
mae = np.recommend(np.abs(actuals_array - forecasts_array))
maes.append(mae)
row_index = i // 2  # Calculate the row index for the subplot
col_index = i % 2   # Calculate the column index for the subplot    # Plot the forecasts and precise values all through the subplot
ax = axes[row_index, col_index]
hours = np.arange(len(hours_forecasts[i]))
ax.plot(hours[start:end], hours_forecasts[i][start:end], label='Forecasts')
ax.plot(hours[start:end], hours_actuals[i][start:end], label='Actuals')
ax.set_xlabel('Hour')
ax.set_ylabel('deltae')
ax.set_title(f'Forecasts vs Actuals at hour {i}, MAE: {mae}')
ax.legend()
ax.grid(True)# Modify building
plt.tight_layout()# Present the plot
plt.present()

Checkout Jupyter pocket e book with the above code at XAI4HEAT repo.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Consideration Is All You Want. arXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762

Shaw, P., Uszkoreit, J., & Vaswani, A. (2018). Self-Consideration with Relative Place Representations. https://doi.org/10.48550/ARXIV.1803.02155

This analysis was supported by the Science Fund of the Republic of Serbia, Grant №23-SSF-PRISMA-206, Explainable AI-assisted operations in district heating strategies — XAI4HEAT.

Source link

Using Transformer Models for Sequence Forecasting of Heat Demand in District Heating Systems | by XAI4HEAT | May, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Teradata AI Unlimited in Microsoft Fabric is Now Available for Public Preview through Microsoft Fabric Workload Hub

Cognigy Unveils Agentic AI: Transforming the Future of Enterprise Contact Centers

Our Picks

Analysis Sentiment [Bagian 1] — Clash of Champions by Ruang Guru eps 4 | by Saefulismail | Jul, 2024

InVideo Pricing, Pros Cons, Features, Alternatives

A Paranoid Future Scenario For AI

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Using Transformer Models for Sequence Forecasting of Heat Demand in District Heating Systems | by XAI4HEAT | May, 2024

Implementation

Related Posts