Ultra-short-term photovoltaic power forecasting based on TCN contrastive encoding and xLSTM.
The inherent intermittency and non-stationarity of photovoltaic (PV) power challenge high-renewable power systems. Conventional methods struggle with diurnal non-stationarity, and standard long short-term memory (LSTM) networks are limited by scalar hidden states. We propose CL-TCN-xLSTM, an ultra-short-term PV forecasting model that combines contrastive scene encoding with extended LSTM (xLSTM). A trend-relative power ratio decomposition decouples the raw power into a deterministic trend and a stationary relative power ratio, reducing prediction complexity. A temporal convolutional network (TCN) encoder, pre-trained via contrastive learning, extracts scene embeddings from historical weather sequences, which are used to initialize the xLSTM predictor. The xLSTM then employs matrix memory to capture multi-timescale fluctuations, forecasting the relative power ratio that is finally multiplied by the trend to obtain the power. Experiments on two-year data from eight PV plants show that CL-TCN-xLSTM achieves the lowest MAE and RMSE across all forecast horizons from 1 to 4 h. For the 4-h forecast, it also yields the best nMAE and nRMSE. Compared with the best baseline TCR-Reformer, CL-TCN-xLSTM reduces MAE by 25.8% and RMSE by 24.8%. The model exhibits the slowest error accumulation and the strongest cross-site generalization. On a highly volatile day, its forecast curve follows the actual power fluctuations reasonably well without excessive extrapolation. Semantic analysis of the learned scene embeddings reveals that the contrastively pre-trained encoder captures meaningful weather regimes with a linear probe accuracy of 92.7%, even without any labels. Furthermore, the model handles dawn and dusk transitions smoothly, showing no abrupt error spikes. Ablation studies further confirm that the trend-relative power ratio decomposition is the most critical pillar, the xLSTM memory mechanism excels at suppressing large errors, contrastive pre-training provides substantial gains, and the TCN contrastive encoder captures meteorological patterns in its scene embeddings.