Inflation in Croatia: a new era of forecasting with machine learning

Čorak, Jakov; Brusan, Mihael

doi:10.3326/pse.50.1.3.

118
Views

35
Downloads

Inflation in Croatia: a new era of forecasting with machine learning

Jakov Čorak*

Mihael Brusan*

Download citation

https://doi.org/10.3326/pse.50.1.3

FULL ARTICLE

FIGURES & DATA

REFERENCES

CROSSMARK POLICY

METRICS

LICENCING

PDF

In this article

Abstract
1 Introduction
2 Literature review
3 Methodology and data
4 Results
5 Discussion
6 Caveats
7 Conclusion
8 Appendix
Notes
Disclosure statement
References

Abstract

This paper examines the use of machine learning methods for forecasting inflation in Croatia. Out-of-sample forecasts are generated for multiple horizons using ten models and four alternative sets of input features comprising lags of the target variable, conventional macroeconomic indicators, unconventional variables including Google Trends data, and a combined feature set. Forecast accuracy is assessed across models and relative to a benchmark for the full sample as well as for the periods before and after March 2020 (COVID-19). The results indicate that no single model consistently outperforms others across all settings; however, machine learning methods, particularly tree-based models, deliver superior performance under specific conditions. The forecasts produced by the two best-performing models, SARIMA and LightGBM, exceed the accuracy of the European Commission’s projections. As the first paper to apply machine learning to inflation forecasting in Croatia, this study introduces modern analytical techniques into the Croatian forecasting literature.

Keywords: inflation; machine learning; forecasting; macroeconomics; Croatia

JEL: E37

1 Introduction

After decades of low and stable inflation in developed economies, galloping inflation seemed almost unimaginable. In extreme cases, some economists considered it a phenomenon of the past. However, due to recent shocks, inflation has once again emerged as one of the key macroeconomic issues, spurring intense interest from both the academic and professional communities. In this context, successful inflation forecasting becomes crucial for economic policymakers, especially for central banks implementing monetary policy. Timely and reliable inflation forecasts are essential for making adequate monetary policy decisions, such as those related to changing key interest rates, primarily because of the medium-term nature of the monetary transmission mechanism. For this reason, central banks must act in an anticipatory manner and rely primarily on projections of future inflation developments.

Inflation forecasting is a challenging task, as confirmed by the large forecast errors produced by models used by many central banks and international institutions (Medeiros et al.,

2021: 100). In recent years, machine learning (ML) models have gained importance due to their numerous advantages over traditional econometric models used in macroeconomic modelling. The big data environment naturally complements ML methods because it allows for the exploitation of their ability to process a large number of features and uncover complex relationships within the data.

The main goal of this paper is to determine which ML model is the most successful in out-of-sample inflation forecasting in Croatia. Particular emphasis is placed on comparing the predictive capabilities of ML models against traditional econometric time series approaches, such as SARIMA. The paper further examines the forecasting performance of the models under different conditions, before and after the COVID crisis. The pandemic, a strong exogenous shock, caused high levels of volatility and uncertainty, producing an environment that the models, trained during more stable macroeconomic periods (with the exception of the 2008 crisis), had not previously encountered. Separating less volatile from more volatile periods, therefore, shows whether the models can maintain accuracy in a changed macroeconomic environment and how informative conventional versus unconventional features are in stable as compared to stressful times. Finally, the best model is selected and its forecasts are compared with the forecasts of the European Commission.

This paper contributes to the literature in at least two ways. First, the paper investigates the application of modern ML methods to inflation forecasting in Croatia and constitutes, to the best of the authors’ knowledge, the first contribution of its kind in the Croatian literature. Second, the comparison of inflation forecasts across different economic regimes, using various models and features, provides a wealth of information regarding the predictive capabilities of the models, as well as the informativeness and importance of the features employed.

The remainder of the paper is organized as follows. The second chapter presents an extensive review of the international and domestic literature on inflation forecasting using various models, with a particular focus on ML models. The third chapter includes a description of all variables used, including an explanation of their categorization, and sets out the methodological framework, focusing on model selection, data preparation and transformation, the forecasting procedure, and measures of forecast accuracy (the prediction error indicators used). Chapter four presents the results, namely the forecast errors of all utilized models, and selects the best models, whose predictions are then compared with those of the European Commission. Fifth chapter discusses the results. The conclusion is provided at the end of the paper.

2 Literature review

The modelling of inflation is one of the fundamental issues in macroeconomic analysis and has been the subject of various theoretical and empirical approaches over decades. The typical framework for understanding inflation dynamics long relied on the concept of the Phillips curve. Although Blinder (

1997: 241) emphasizes that the Phillips curve served as a reliable tool in inflation forecasting for decades, later works question its predictive power. For instance, Atkeson and Ohanian (

2001) demonstrate that the Phillips curve in the US does not even outperform simple naive models in forecasting inflation and conclude that “the search for yet another Phillips curve-based inflation forecasting model should be abandoned”. Similar findings are confirmed by Ang, Bekaert and Wei (

2005), who show that ARMA models and, in particular, forecasts based on surveys, achieve lower forecast errors than Phillips curve models.

The poor empirical forecasting performance of traditional theoretical models has prompted researchers to experiment with different sets and types of variables, as well as various model approaches, in order to improve the accuracy of inflation forecasts. In this context, the literature shows a growing interest in including additional information in models to enhance the accuracy of inflation forecasts. For example, Chen, Turnovsky and Zivot (

2014) include aggregates of world commodity prices in models, Forni et al. (

2003), Stock and Watson (

2003), and Monteforte and Moretti (

2013) use financial variables, while Groen, Paap and Ravazzolo (

2013) and Ang, Bekaert and Wei (

2005) use expectation variables. As the set of relevant predictors expanded, researchers often resorted to factor models, which compress information from a multitude of variables into a few latent components – factors (e.g. Eickmeier and Ziegler,

2008). However, their efficiency largely depends on the quality of the variables used in constructing the factors. Some research indicates that expanding the database does not necessarily result in better forecasting outcomes (e.g. Barhoumi, Darné and Ferrara,

2009).

In the last decade, ML models have received special attention in the literature. Unlike factor models, which reduce dimensionality by summarizing information into a few components, ML models allow for the direct inclusion of a large number of features. Although the application of ML to inflation forecasting is still relatively new, numerous empirical studies in the international literature already confirm its advantages over traditional econometric models. Nakamura (

2004) shows that neural networks outperform a simple univariate AR model in forecasting US inflation at short horizons. Subsequent studies confirm the superior performance of ML and regularized models to that of traditional linear benchmarks. In particular, Medeiros and Mendes (

2016) and Garcia, Medeiros and Vasconcelos (

2017) find that LASSO-type models dominate standard AR and factor models in forecasting inflation in the US and Brazil, respectively. Using a broader set of ML techniques, Medeiros et al. (

2021) report that Random forest (RF) delivers the lowest forecast errors for US inflation and performs robustly across different economic conditions, a result also supported by Ülke, Sahin and Subasi (

2018) for more volatile time series. However, evidence suggests that this superiority is not uniform across time. Naghi, O’Neill and Zaharieva (

2024) show that while RF outperforms benchmark models prior to COVID-19, its performance deteriorates during the pandemic and high-inflation period, when SVM and GBM yield more accurate forecasts. Overall, the literature indicates that ML methods tend to outperform ARMA-type benchmarks (Araujo and Gaglianone,

2023), particularly at longer horizons where tree-based models such as RF and XGBoost perform especially well, although simpler autoregressive models may remain competitive in smaller samples and data-constrained economies (Ivașcu,

2023).

Inflation has received limited attention in the Croatian economic literature, as most studies have concentrated on explaining inflation dynamics and their links with other macroeconomic variables (e.g. Payne,

2002), with little emphasis placed on inflation forecasting. This gap in the literature constitutes the main motivation for this paper. To the authors’ best knowledge, only two studies explicitly examine inflation forecasting in Croatia. Pufnik and Kunovac (

2006) use a SARIMA model to forecast short-term inflation and show that, over longer horizons, aggregating forecasts of individual CPI sub-components improves accuracy relative to directly forecasting the aggregate CPI. Kunovac (

2007) applies principal component analysis and finds that information extracted from a large set of macroeconomic variables enhances forecasting performance, with even a single factor outperforming benchmark models.

3 Methodology and data

Although numerous definitions of machine learning exist in the literature, this paper relies on the definition provided by Masini, Medeiros and Mendes (

2021: 77), according to which: “machine learning is the combination of automated computer algorithms with powerful statistical methods to learn (discover) hidden patterns in rich datasets”. This definition highlights the key characteristic of ML: the model’s ability to autonomously learn complex patterns from a large amount of data, including nonlinearities and interactions among variables.

To forecast inflation in Croatia, a total of ten linear and non-linear models were employed, utilizing the traditional SARIMA model as a primary benchmark. The forecasting success of each model is also compared with a naive model’s forecast using the slightly adjusted MASE forecast error, which is introduced later.

3.1 Data

The target variable is the month-on-month inflation rate, measured as the monthly change in the Harmonised Index of Consumer Prices (HICP). The model specification includes lagged inflation and ten additional variables which, after applying the transformations described in section 3.2, give rise to several hundred input features. This approach aims to maximise forecasting accuracy while remaining feasible given the available computational resources.

Table 1

Input features

DISPLAY Table

Table 1 provides an overview of the input features. The first variable (laghicp) captures lagged inflation. The remaining variables are grouped into conventional and unconventional features. The former correspond to the variables listed in rows two to seven and represent macroeconomic indicators commonly used in inflation modelling (hard data). The latter correspond to the last four rows and include high-frequency indicators and sentiment measures (soft data) that provide timely information on current economic conditions and expectations and are widely used in contemporary modelling.

3.2 Feature transformation

The time series used in the analysis begin on 31 January 2006 and were retrieved on 31 March 2025. To replicate authentic forecasting conditions and prevent data leakage, missing values caused by publication delays (e.g., unemployment data) were filled by shifting the corresponding variables forward by one month. Following this alignment, an extensive feature engineering process was applied to the 11 base variables using five distinct transformation methods:

Rolling mean: the average of the previous six months is computed, which reduces short-term volatility and emphasises medium-term trends.
Exponential moving average (EMA): unlike the rolling mean, the EMA assigns greater weight to more recent observations, meaning that newer features are of greater importance in the model than older ones.
First differences: obtained by subtracting consecutive observations (y_t – y_t–₁), which removes the trend component and transforms the series into stationary processes.
Lags: for each variable, lags from one to nine months were generated, as some of the utilized variables have a proven forecasting property several months ahead.
Standardization: every feature was converted to a form with a mean of zero and a standard deviation of one.

These transformations expanded the initial input space to a maximum of 440 features. Consequently, due to the initialization period required for lags and moving averages, the final usable dataset consists of 215 observations.

3.3 Models

The forecast is given by the equation:

π_t+h = G_h(x_t) + u_t+h, h = 1,…,H; t = 1,…,T

(1)

where π_t+h is the inflation in month t + h, x_t = (x_t, ..., x_nt)^T is the vector of input data (features). G_h(·) is the function that connects the input data with inflation, and u_t+h is a random error with a mean value of zero (Medeiros et al.,

2021). The goal of the model is to estimate the function G_h that minimizes the forecast error.

Ridge regression (RR) was proposed by Hoerl and Kennard (

1970). In essence, it is a linear regression model that differs from the usual OLS only in the estimation of the coefficient parameter. The sum of the least squares of deviations is minimized, as is standard, with an added penalty term applied to the error function. The penalty function is defined as:

(2)

where β_h,i is the i-th parameter for horizon h, λ is the regularization factor which, as a hyperparameter, determines the strength of the penalization, and the penalty function p(β_h_,i; λ) assigns λ for the square of each coefficient β (Medeiros et al.,

2021).

Lasso was proposed by Tibshirani (

1996). Similar to the RR model, it is a linear regression model. Unlike the former (L2), it uses L1 penalization, which can be defined as follows:

(3)

where the notation is read similarly to that for ridge regression. Lasso performs simultaneous regularization and variable selection. Less informative features receive a coefficient equal to zero, meaning the model discards features that do not contribute to inflation forecasting.

The Elastic net (ENet) model, developed by Zou and Hastie (

2005), combines the RR and Lasso approaches. In other words, the ENet model uses L1 and L2 penalization for balanced regularization in the following way:

(4)

where α ∈ [0,1] (Medeiros et al.,

2021).

Support vector regression (SVR), introduced in the paper by Drucker et al. (

1997) is a regularised regression approach that can capture nonlinear relationships through a feature mapping ϕ(⋅) (or, equivalently, a kernel). In SVR, the model is chosen to be as flat as possible while allowing errors within an ε-insensitive tube, which leads to the optimisation problem:

(5)

Here, ε sets the width of the tube, while C controls how strongly violations outside the tube are penalised via the slack variables ξ_i,ξ^*_i. In practice, the problem is solved by moving to the dual formulation and solving a quadratic programming problem, as described in the paper. The resulting predictor depends only on a subset of training observations (the “support vectors”), which makes the method particularly attractive in high-dimensional settings. In forecasting applications, SVR is then applied by tuning ε, C, and kernel parameters (e.g., via cross-validation) and evaluating the fitted function on new inputs to generate predictions.

The Random Forest (RF) is an ensemble of multiple simple decision trees designed to reduce the variance of individual regression trees. It is created by combining a large number of randomly constructed trees (Breiman,

2001). A regression tree is in itself a nonparametric model that approximates an unknown nonlinear function using local predictions and the recursive partitioning of the covariate space (Breiman,

1996). The RF for a regression problem is defined as an ensemble of M randomly generated trees. For the j-th tree in the ensemble, the predicted value at query point x is denoted by m_n(x;Θ_j, D_n), where Θ_j represents independent random variables controlling the construction process of individual trees, and represents the training sample. The goal is to use the dataset D_n to construct an estimate m_n :[0,1]^p→ ℝ of the function m. The estimate of the RF with a finite number of trees is then given by the expression:

(6)

(Scornet, Biau and Vert,

2015). Since the number of trees can be made very large in practice, it’s natural to consider the case M → ∞ . Then, the RF estimate converges to the expectation of the individual tree predictions (Scornet, Biau and Vert,

2015):

(7)

The Gradient Boosting Model (GBM) constructs a complex predictor by summing shallow regression trees, where each new tree f_m is trained to minimize the residual errors of the previous ensemble (Friedman,

2001). The final model is represented as:

(8)

where v acts as a shrinkage parameter to control overfitting. While the standard GBM effectively captures non-linearities, advanced implementations have been developed to improve efficiency. XGBoost (Extreme Gradient Boosting) optimizes the standard GBM framework by adding explicit regularization terms Ω(f) and utilizing parallel processing, resulting in more robust generalization (Friedman,

2001; Chen and Guestrin,

2016). For further scalability in high-dimensional environments, LightGBM introduces Gradient-based One-side sampling (GOSS) and Exclusive feature bundling (EFB). These techniques drastically reduce data scanning requirements by focusing on instances with large gradients and bundling sparse features, achieving up to 20-fold faster training speeds with comparable accuracy (Ke et al.,

2017).

Neural networks (NNs) are multi-layer systems of interconnected neurons inspired by biological processes, whose purpose is to map input features into forecasts through a sequence of nonlinear transformations. Although NNs are highly effective at modelling complex nonlinear relationships, they are often described as “blackbox” models due to the limited interpretability of their parameters. Consider a multilayer perceptron (MLP) with a total of L layers, where the first L – 1 layers are hidden and the L-th layer is the output layer. Let x_t denote the input vector at time t. We define the layer outputs (activations) recursively using the notation z as:

(9)

where W_l and b_l denote the weight matrix and bias vector of layer l, and σ_l(⋅) is the nonlinear activation function of that layer (e.g., ReLU). The h-step-ahead forecast is then obtained via an affine projection from the last hidden layer:

(10)

where g(⋅) is the output-layer function. The set of parameters

is estimated by minimizing a loss function

The gradients of the loss with respect to the parameters are computed using the backpropagation algorithm, and the parameters are updated using a chosen optimization procedure (Rumelhart, Hinton and Williams,

1986).

Although traditionally considered a statistical time series model, the Seasonal Autoregressive Integrated Moving Average (SARIMA) is increasingly integrated into modern ML frameworks, often serving as a robust benchmark for automated predictive systems. SARIMA extends the standard ARIMA architecture by explicitly incorporating seasonal components, enabling the effective recognition of periodic patterns. The model is formally defined by the expression:

(11)

where B denotes the lag operator, (1 – B) and (1 – B^S) represent the non-seasonal and seasonal differencing required for stationarity, while the polynomials ϕ and θ (along with their seasonal counterparts) capture the autoregressive and moving average dynamics. Parameter selection is performed by automated algorithms (auto-SARIMA) that optimize the model structure based on information criteria such as the Akaike information criterion (AIC), ensuring both transparency and diagnostic robustness (Hyndman and Athanasopoulos,

2021).

Developed by Facebook’s research division, the Prophet model specializes in forecasting time series characterized by pronounced seasonal patterns and occasional anomalies. Unlike traditional autoregressive models, Prophet relies on an additive decomposition methodology that treats the forecasting problem as a curve-fitting exercise, separating the signal into three distinct components:

y(t) = g(t) + s(t) + h(t) + ε_t

(12)

where g(t) represents trend component, s(t) is the seasonal component, h(t) represents the influence of holidays or specific events (Taylor and Letham,

2018; Hyndman and Athanasopoulos,

2021).

3.4 Forecasting procedure

To ensure generalization and prevent overfitting, this study employs an expanding window cross-validation technique, specifically adapted for time series data. As illustrated in figure 1, the model is initially trained on a fixed set, which progressively expands by the addition of one new month of data in each iteration. Following each expansion, the model is fully retrained and validated on the subsequent month, a process that meticulously simulates real-world forecasting and eliminates data leakage.

Figure 1

Illustrative display of the expanding window approach

DISPLAY Figure

Model selection within each cross-validation fold is determined by minimizing the Mean Squared Error (MSE):

(13)

Final predictive performance is evaluated using two metrics. The first is the RMSE defined as:

(14)

The second metric used is the out-of-sample MASE (OMASE), which is nearly identical to the measure proposed by Hyndman and Koehler (

2006), with the exception that the denominator here is the out-of-sample MAE of the naive model, whereas Hyndman and Koehler employ the in-sample MAE of the naive model. OMASE can be written as:

(15)

where m is the length of the seasonal cycle (for monthly data 12). An OMASE value below 1 indicates that the model under evaluation outperforms the naive seasonal benchmark.

To rigorously evaluate forecasting performance and feature importance under varying economic conditions, 12 experimental configurations were developed by crossing three evaluation periods with four distinct input feature sets:

Evaluation periods:

The entire available period
The period up to February 2020.
The period from March 2020 onward (the start of the lockdown).

Input feature combinations:

Target variable lags only.
Lags + conventional features.
Lags + unconventional features.
All available variables combined.

To benchmark performance against that of the European Commission (EC), generated monthly forecasts are aggregated into annual inflation rates (see appendix). The study strictly replicates the data availability constraints of the EC’s three key reporting cycles to ensure a fair comparison:

Spring Forecast (current year): Published in May with an April cut-off. Only information available by the end of April is used, with delayed-release variables imputed using lagged values. Given the availability of a HICP flash estimate for April, eight monthly inflation rates must be forecast, corresponding to horizons h = 1, ..., 8.
Autumn Forecast (current year): Published in November with an October cut-off. Data availability is aligned analogously, including a flash estimate for October. As ten monthly observations are known, forecasts are generated for the remaining two months (h = 1, 2).
Autumn Forecast (following year): Based on the same October cut-off and information set as the autumn forecast for the current year. Constructing the annual inflation rate for the following year requires forecasting the final two months of the current year and all twelve months of the next year, resulting in fourteen monthly projections (h = 1, ..., 14).

4 Results

This chapter reports the results of all models used to forecast monthly inflation in Croatia. The forecasting setup and evaluation procedure follow the framework outlined in chapter 3, with specific reference to section 3.4. Owing to computational constraints, forecasts are first generated for horizons of 3, 6, and 9 months, after which the best-performing model is employed to produce forecasts for the full set of 14 horizons. Table 2 presents RMSE forecast errors for the period up to March 2020, corresponding to the pre-COVID-19 sample.

Table 2

RMSE in the pre-COVID-19 period for the 3-, 6- and 9-month forecasting horizons

DISPLAY Table

The results presented in table 2 indicate that tree-based models (LightGBM and XGBoost) achieve above-average accuracy in a period of subdued volatility (the period preceding the outbreak of the COVID-19 pandemic). At longer forecasting horizons (6 and 9 months), they consistently outperform the traditional SARIMA model, with their performance in some cases further enhanced by the inclusion of external features. Among the linear models, only the Ridge model maintains OMASE values below one across all horizons. Given that ENet and Lasso perform substantially worse than Ridge, it can be concluded that the feature exclusion induced by the L1 penalty does not, in this case, contribute to improved forecast accuracy. Prophet is the only model in the pre-COVID-19 period that fails to outperform the simple (naïve) forecasting model under any combination of input features. Moreover, for no other model does the inclusion of external features deteriorate forecasting performance to the same extent as it does for Prophet. Table 2 does not reveal any clear or systematic pattern whereby unconventional variables consistently improve forecasting accuracy relative to conventional variables, or vice versa. Table 3 reports the forecasting errors for the period following the onset of the COVID crisis.

Table 3

RMSE in the post-COVID-19 period for the 3-, 6- and 9-month forecasting horizons

DISPLAY Table

Table 3 shows that the SARIMA model consistently achieves the lowest RMSE across all forecasting horizons. Among the ML models, the closest competitor to SARIMA is LightGBM. At shorter horizons, LGBM particularly benefits from the inclusion of conventional variables, whereas at longer horizons it attains higher predictive accuracy when unconventional features are included. Among the linear models, the Ridge model without external variables proves to be the most stable across horizons, although Lasso and ENet fall within a comparable accuracy range. Prophet records OMASE values below 1 only when it relies exclusively on lags of the target variable, which suggests that the expanded feature set offers insufficient informational value for this model. Neural networks also perform best when only temporal lags of the target variable are used, indicating that they have limited ability to extract useful information from additional external predictors. Furthermore, tables 2 and 3 reveal that OMASE values fall below 1 more frequently in the post-COVID than in the pre-pandemic period. This implies that, after the onset of the pandemic, models more often outperformed the naïve benchmark in terms of forecast accuracy. At the same time, table 3 shows that RMSE values for all models are considerably higher than those reported in table 2, pointing to increased unpredictability of inflation in the post-pandemic environment. Heightened volatility, structural changes in the economy, and global shocks such as supply chain disruptions and the sharp rise in energy prices all contributed to this increased uncertainty, ultimately reducing the predictive accuracy of all models under consideration. Finally, table 4 reports forecasting errors for the full sample period.

The results in table 4 for the full observation period reveal several important patterns regarding model performance. First, SARIMA attains the lowest forecasting errors across the selected horizons. Second, among the linear models, the lowest forecast errors are most frequently obtained when only temporal lags of the target variable are used as input features. Third, among the nonlinear models, LightGBM with external variables exhibits consistent predictive superiority over the other models. Across multiple feature combinations and all horizons, it records low RMSE and OMASE values, often the lowest among all models except SARIMA. Alongside LightGBM, XGBoost also stands out among nonlinear models, although its predictive performance is somewhat less consistent across different sets of external variables.

To enable a comparison between the best-performing model and the European Commission’s forecasts in the final stage of the analysis, we proceed to select a model that will serve as the representative model for the final forecast and comparison. Since it is difficult to establish a single criterion to unambiguously identify the most successful model, the most appropriate approach appears to be focusing on the consistency of each model’s results across different macroeconomic regimes.

Given that SARIMA achieves the lowest forecasting errors in most cases, across all horizons and all periods examined, it can be regarded as the best-performing model for forecasting inflation in Croatia. If SARIMA is excluded as a traditional time-series benchmark, LightGBM emerges as the most consistent performer among the ML models. LightGBM regularly attains low forecasting errors in all periods considered and across nearly all combinations of input variables (see appendix table A4).

Table 4

RMSE over the full observation period for the 3-, 6- and 9-month forecasting horizons

DISPLAY Table

Accordingly, two models are selected for the final inflation forecasting exercise: SARIMA, as the traditional time-series model that consistently delivers the strongest results, and LightGBM, which has proven to be the most successful among ML approaches. Using these models, monthly inflation rates (month-over-month) are forecast for each of the 14 horizons. The resulting monthly forecasts are then transformed, as described in section 3.4, into annual inflation rates (year-over-year) by incorporating the contributions of realised monthly inflation rates. This enables comparison with the European Commission’s forecasts across the three forecasting cycles. Figure 2 presents the comparison of actual average annual inflation with the European Commission’s projections and the forecasts generated by the SARIMA and LightGBM models for all three forecasting scenarios: (1) the autumn forecast for the current year (h = 2); (2) the spring forecast for the current year (h = 8); and (3) the autumn forecast for the following year (h = 14).

Figure 2

Forecasted average annual inflation rate by the European Commission compared with forecasts from the SARIMA and LightGBM models (in %)

DISPLAY Figure

Figure 2 shows that the SARIMA and LightGBM models outperform the European Commission’s forecasts in all forecasting scenarios. For the shortest horizon (h = 2, autumn forecast for the current year), the differences between the models and the European Commission are small, as most observations for that year are already known; nevertheless, our models provide slightly more accurate estimates. At the medium horizon (h = 8, spring forecast for the current year), the deviations between forecasted and realised values become larger, which is expected given the greater uncertainty and the longer time span until the end of the year. Even so, SARIMA and LightGBM continue to exhibit smaller errors relative to actual outcomes than the European Commission. The largest deviations occur at the longest horizon (h = 14, autumn forecast for the following year), clearly illustrating the difficulty of predicting inflation more than a year in advance. Even in this case, the forecasts generated by the two models remain closer to the realised values than those of the European Commission, thereby confirming their superiority under conditions of pronounced uncertainty. Finally, the figure reveals a clear distinction between SARIMA and LightGBM. SARIMA forecasts inflation more accurately during the stable pre-COVID period, whereas LightGBM delivers more precise predictions in the volatile post-pandemic environment.

5 Discussion

The central question addressed in this study is whether ML models outperform benchmark models in forecasting inflation in Croatia and, if so, which ML approach performs best. The results suggest that ML methods achieve forecasting accuracy comparable to, and in some cases exceeding, that of standard univariate econometric models. This finding is consistent with recent evidence showing that ML techniques can improve inflation forecasts relative to traditional approaches (Medeiros et al.,

2021; Araujo and Gaglianone,

2023). Among the ML methods considered, tree-based models perform particularly well, in line with results reported in related empirical studies.

Beyond model choice, forecasting performance is strongly influenced by the quantity and informational content of the predictors. Recent studies emphasize the benefits of data-rich environments, where models exploit large sets of potential predictors (e.g. Medeiros et al.,

2021; Araujo and Gaglianone,

2023). In contrast, this paper adopts a parsimonious, ad hoc selection of key variables. While this simplifies the modelling framework, it may limit the gains typically associated with ML in data-rich settings. Nevertheless, given the available sample size, the selected predictors appear adequate: LightGBM performs competitively relative to SARIMA, suggesting that the essential information for forecasting inflation is captured by this reduced feature set. This approach is also motivated by structural constraints of the Croatian data environment, characterized by relatively short time series and a limited number of high-frequency macroeconomic indicators, a limitation also noted for other small economies (e.g. Ivașcu,

2023). As longer time series become available, the relative performance of ML models, particularly treebased methods, may improve.

Differences between our results and those reported for larger economies can also be explained by specific features of the Croatian macroeconomic environment. For much of the sample period, monetary policy relied on the exchange rate as the nominal anchor, while external shocks played a dominant role in shaping inflation dynamics (Globan, Arčabić and Sorić,

2015). Under such conditions, inflation exhibits strong persistence and pronounced trend and seasonal components, making it particularly amenable to models such as SARIMA. Previous studies document the strong performance of ARIMA-type models in forecasting Croatian inflation (Pufnik and Kunovac,

2006) and in explaining its dynamics (Živko and Bošnjak,

2017), while our results confirm that SARIMA remains a reliable benchmark in this setting. The pronounced autoregressive nature of inflation implies that models explicitly exploiting its temporal structure can be highly effective, helping to explain the strong performance of SARIMA relative to more complex alternatives.

Finally, the results indicate that, over the sample period, our models consistently produced forecasts closer to realised inflation outcomes than those published by the European Commission, even in periods of heightened uncertainty. These results should, however, be interpreted with caution. Forecasts produced by institutions such as the European Commission may incorporate elements of forward guidance aimed at shaping public expectations, which can generate systematic deviations from realised outcomes, particularly during episodes of large shocks. While such practices may support short-term expectation anchoring, persistent forecast errors may ultimately weaken institutional credibility, raising the question of whether expectation management through official forecasts enhances macroeconomic stability or undermines trust over time. This trade-off remains an open empirical question with important implications for the design of official forecasting frameworks.

6 Caveats

This section outlines several important limitations related to data availability and methodological choices underlying the empirical analysis.

A first limitation concerns the use of single-extraction Google Trends series. Recent studies have shown that Google Trends data are subject to sampling variation, implying that repeated extractions of the same query may yield different values and thereby raising concerns regarding reproducibility (e.g. Cebrián and Domenech,

2024). To address this issue, some contributions propose averaging multiple extractions of the same query. However, implementing such procedures would substantially increase the data-collection burden and delay model estimation. Consequently, the analysis relies on a single extraction per query, a practice that has also been adopted in earlier studies (e.g. Choi and Varian,

2012). Nevertheless, this choice represents a limitation that should be considered when interpreting the results.

A second limitation relates to hyperparameter optimisation. Due to computational constraints, the range of hyperparameters considered in the grid search was necessarily restricted, although it was not excessively narrow. Future work could address this limitation by expanding the grid search to cover a wider set of hyperparameter values, provided sufficient computational resources are available. Alternatively, the use of alternative hyperparameter tuning strategies beyond grid search, such as random search or Bayesian optimisation, could be considered in future research.

A third limitation concerns differences in model selection and evaluation procedures across model classes. The SARIMA models were specified using the Akaike information criterion (AIC), in line with standard practice in the econometric time-series literature, where information criteria are commonly employed for model selection. In contrast, machine learning models were trained and evaluated using an expanding-window cross-validation approach, which is more typical in the contemporary forecasting literature. This design choice was motivated by the intention to place each modelling approach in its natural methodological setting. Nevertheless, the use of different training and evaluation frameworks may limit the strict comparability of results across model classes. Future research could address this issue by applying a unified evaluation strategy (such as CV) to all models.

7 Conclusion

After several years of stability and low inflation in advanced economies, inflation has once again moved to the forefront of attention in both academic and policy circles. The sudden shocks associated with the COVID-19 pandemic, disruptions in global supply chains and increased geopolitical tensions triggered a sharp rise in prices, posing a significant challenge for policymakers, particularly central banks, which rely on accurate inflation forecasts to design effective monetary measures. In this context, timely and precise inflation projections are essential. This paper therefore examines the potential of modern machine learning methods for forecasting inflation in Croatia, a country where the literature on inflation forecasting, and especially on the application of machine learning, remains notably limited.

The findings show that no single model performs best across all time periods and feature sets. However, SARIMA, a traditional time-series model, stands out as the most reliable and consistently accurate model overall. Tree-based models, especially LightGBM, also exhibit strong predictive capabilities and outperform other machine learning methods across a variety of feature combinations. As data availability expands and new macroeconomic conditions are incorporated over time, it is reasonable to expect that models such as LightGBM will increasingly assume a leading role in inflation forecasting.

In the final comparison with the European Commission’s projections, both SARIMA and LightGBM produce more accurate forecasts of average annual inflation across all forecasting scenarios, even at longer horizons. This further confirms their predictive superiority. The results also reveal an important pattern: before the COVID-19 crisis, during a period of greater macroeconomic stability, SARIMA generated more accurate forecasts than LightGBM, whereas in the postCOVID period LightGBM delivered more precise predictions than SARIMA.

Overall, the results point to a considerable potential for machine learning methods in inflation forecasting. By incorporating diverse input variables and evaluating models across different economic regimes, this study contributes to the empirical literature and highlights several avenues for further research. Given that inflation is a multidimensional and complex phenomenon, sophisticated tools are required, and machine learning models, when applied with due care, appear to offer a promising solution.

8 Appendix

Table A1 illustrates the procedure for converting monthly inflation rates into year-over-year inflation rates, which facilitates understanding of equations (A1) to (A4). The year-over-year inflation rate in month t is defined as:

(A1)

where P_t denotes the consumer price index in month t. The contribution of monthly inflation rates to the year-over-year rate for the most recent h months is given by:

(A2)

In equation (A2), we use the forecasted monthly inflation rates

, while in equation (A3) we retain the realised monthly rates π_s^mom. The contribution of realised monthly changes in the HICP index over the months that remain in the base after the most recent h months drop out is expressed as:

(A3)

Combining equations (A2) and (A3) yields:

(A4)

Table A1

RMSE expressed as indices relative to the SARIMA model for the pre-COVID-19 period

DISPLAY Table

Table A2

RMSE expressed as indices relative to the SARIMA model for the post-COVID-19 period

DISPLAY Table

Table A3

RMSE expressed as indices relative to the SARIMA model for the full observation period

DISPLAY Table

Table A4

Average RMSE by feature set used

DISPLAY Table

Notes

* We express our sincere gratitude to Associate Professor Silvija Vlah Jerić, PhD, for her exceptional dedication and generosity, as well as for the time, motivation, and invaluable guidance she devoted to helping us complete this research. We are also grateful to Assistant Professor Ivana Lolić, for her contribution, together with Associate Professor Silvija Vlah Jerić, to the modernization of the study program at the Faculty of Economics and Business in Zagreb through the introduction of the course Applied Machine Learning, which provided the foundational knowledge necessary for the development of this research. We further thank two anonymous reviewers for their constructive comments, which significantly improved the quality of this paper.

The article was judged the best student article in the 2025 annual competition of the Hanžeković Foundation.

Disclosure statement

The authors have no conflicts of interest to declare.

References

Ang, A., Bekaert, G. and Wei, M., 2005. Do macro variables, asset markets, or surveys forecast inflation better? Journal of Monetary Economics, 54(4), pp. 1163-1212 [CrossRef]

Araujo, G. S. and Gaglianone, W. P., 2023. Machine learning methods for inflation forecasting in Brazil: new contenders versus classical models. Latin American Journal of Central Banking, 4(2), 100087 [CrossRef]

Atkeson, A. and Ohanian, L. E., 2001. Are Phillips curves useful for forecasting inflation? Federal Reserve Bank of Minneapolis Quarterly Review, 25(1), pp. 2-11 [CrossRef]

Barhoumi, K., Darné, O. and Ferrara, L., 2009. Are disaggregate data useful for factor analysis in forecasting French GDP? Journal of Forecasting, 29(1-2), pp. 132-144 [CrossRef]

Blinder, A. S., 1997. Is there a core of practical macroeconomics that we should all believe? The American Economic Review, 87(2), pp. 240-243.

Breiman, L., 1996. Bagging predictors. Machine Learning, 24(2), pp. 123-140 [CrossRef]

Breiman, L., 2001. Random forests. Machine Learning, 45(1), pp. 5-32 [CrossRef]

Cebrián, E. and Domenech, J., 2024. Addressing Google Trends inconsistencies. Technological Forecasting and Social Change, 202, p. 123318 [CrossRef]

Chen, T. and Guestrin, C., 2016. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, pp. 785-794 [CrossRef]

Chen, Y.-C., Turnovsky, S. J. and Zivot, E., 2014. Forecasting inflation using commodity price aggregates. Journal of Econometrics, 183(1), pp. 117-134 [CrossRef]

Choi, H. and Varian, H., 2012. Predicting the present with Google Trends. Economic Record, 88(s1), pp. 2–9 [CrossRef]

Drucker, H. [et al.], 1997. Support vector regression machines. Advances in Neural Information Processing Systems, 28, pp. 779-784.

Eickmeier, S. and Ziegler, C., 2008. How successful are dynamic factor models at forecasting output and inflation? A meta‐analytic approach. Journal of Forecasting, 27(3), pp. 237-265 [CrossRef]

Forni, M. [et al.], 2003. Do financial variables help forecasting inflation and real activity in the euro area? Journal of Monetary Economics, 50(6), pp. 1243-1255 [CrossRef]

Friedman, J. H., 2001. Greedy function approximation: a gradient boosting machine. The Annals of Statistics, 29(5), pp. 1189-1232 [CrossRef]

Garcia, M. G. P., Medeiros, M. C. and Vasconcelos, G. F. R., 2017. Real-time inflation forecasting with high-dimensional models: the case of Brazil. International Journal of Forecasting, 33(3), pp. 679-693 [CrossRef]

Globan, T., Arčabić, V. and Sorić, P., 2015. Inflation in new EU member states: a domestically or externally driven phenomenon? Emerging Markets Finance and Trade, 52(1), pp. 154-168 [CrossRef]

Groen, J. J. J., Paap, R. and Ravazzolo, F., 2013. Real-time inflation forecasting in a changing world. Journal of Business and Economic Statistics, 31(1), pp. 29-44 [CrossRef]

Hoerl, A.E. and Kennard, R.W., 1970. Ridge regression: applications to nonorthogonal problems. Technometrics, 12(1), pp. 69-82 [CrossRef]

Hyndman, R. J. and Athanasopoulos, G., 2021. Forecasting: principles and practice. 3^rd ed. Melbourne: OTexts.

Hyndman, R. J. and Koehler, A. B., 2006. Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), pp. 679-688 [CrossRef]

Ivașcu, C., 2023. Can machine learning models predict inflation? Proceedings of the International Conference on Business Excellence, 17(1), pp. 1748-1756 [CrossRef]

Ke, G. [et al.], 2017. LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Red Hook, NY: Curran Associates Inc., pp. 3149-3157.

Kunovac, D., 2007. Factor model forecasting of inflation in Croatia. Financial Theory and Practice, 31(4), pp. 371-393.

Masini, R. P., Medeiros, M. C. and Mendes, E. F., 2021. Machine learning advances for time series forecasting. Journal of Economic Surveys, 37(1), pp. 76-111 [CrossRef]

Medeiros, M. C. [et al.], 2021. Forecasting inflation in a data-rich environment: the benefits of machine learning methods. Journal of Business and Economic Statistics, 39(1), pp. 98-119 [CrossRef]

Medeiros, M. C. and Mendes, E. F., 2016. ℓ1-regularization of high-dimensional time-series models with non-Gaussian and heteroskedastic errors. Journal of Econometrics, 191(1), pp. 255-271 [CrossRef]

Monteforte, L. and Moretti, G., 2013. Real‐time forecasts of inflation: the role of financial variables. Journal of Forecasting, 32(1), pp. 51-61[CrossRef]

Naghi, A. A., O’Neill, E. and Zaharieva, M. D., 2024. The benefits of forecasting inflation with machine learning: new evidence. Journal of Applied Econometrics, 39(7), pp. 1321-1331 [CrossRef]

Nakamura, E., 2004. Inflation forecasting using a neural network. Economics Letters, 86(3), pp. 373-378 [CrossRef]

Payne, J. E., 2002. Inflationary dynamics of a transition economy: the Croatian experience. Journal of Policy Modeling, 24(3), pp. 219-230 [CrossRef]

Pufnik, A. and Kunovac, D., 2006. Short-term inflation forecasting in Croatia using seasonal ARIMA processes. Istraživanja, I-18.

Rumelhart, D. E., Hinton, G. E. and Williams, R. J., 1986. Learning representations by back-propagating errors. Nature, 323(6088), pp. 533-536 [CrossRef]

Scornet, E., Biau, G. and Vert, J.-P., 2015. Consistency of random forests. The Annals of Statistics, 43(4), pp. 1716-1741 [CrossRef]

Sorić, P. and Lolić, I., 2017. Economic uncertainty and its impact on the Croatian economy. Public Sector Economics, 41(4), pp. 443-477 [CrossRef]

Stock, J. H. and Watson, M., 2003. Forecasting output and inflation: the role of asset prices. Journal of Economic Literature, 41(3), pp. 788-829 [CrossRef]

Taylor, S. J. and Letham, B., 2018. Forecasting at scale. The American Statistician, 72(1), pp. 37-45 [CrossRef]

Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), pp. 267-288.

Ülke, V., Sahin, A. and Subasi, A., 2018. A comparison of time series and machine learning models for inflation forecasting: empirical evidence from the USA. Neural Computing and Applications, 30(5), pp. 1519–1527 [CrossRef]

Zou, H. and Hastie, T., 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B (Statistical Methodology), 67(2), pp. 301–320 [CrossRef]

Živko, I. and Bošnjak, M., 2017. Time series modeling of inflation and its volatility in Croatia. Notitia – časopis za ekonomske, poslovne i društvene teme, 3(1), pp. 1-10 [CrossRef]

March, 2026
I/2026

Balázs Égert, Dubravko Mihaljek
Exchange rate issues
Guest editors’ introduction to the thematic issue of Public Sector Economics

Frane Banić, Guzmán González-Torres
Different strokes for different folks: untangling supply and demand shocks using survey-data to assess sectoral inflationary pressures in Croatia

Jakov Čorak, Mihael Brusan
Inflation in Croatia: a new era of forecasting with machine learning

Aaron Mehrotra
Public debt and the dollar

Sergii Sheludko
When the guns roar: how the war, reserves and exports shape Ukraine’s cost of external borrowing

Juan Camilo Anzoátegui Zapata, Danilo Rodríguez Arango, Sergio David Sánchez Varela
Effects of reputation and monetary policy communication on exchange rate uncertainty: evidence from an emerging market economy

Samson Edo, Eseosa Joy Sowemimo
Currency depreciation and inflationary pressure vis-à-vis monetary intervention: perspectives on growth and policy implications

Mislav Brkić
Our Dollar, Your Problem