Time Series Analysis （ARIMA）

Jan 4, 2025

15 min read

Click the link to use the template.

RPI-Gold and Silver Jewelry: Month-on-Month Historical Trends

The ARIMA model, short for AutoRegressive Integrated Moving Average, is a widely used statistical model for time series forecasting. It combines the autoregressive (AR) and moving average (MA) parts and can handle non-stationary time series.

The ARIMA model is defined by three main parameters: (p), (d), and (q).

▪︎ Autoregressive (AR) part: The parameter (p) indicates the number of lag observations in the model, representing how many past values are used to predict the current value.

▪︎ Differencing (I) part: The parameter (d) is the degree of differencing, used to make the time series stationary.

▪︎ Moving Average (MA) part: The parameter (q) indicates the number of lagged forecast errors in the prediction model.

▪︎ Identification: Determine the parameters (p), (d), (q) by conducting descriptive analysis and examining ACF/PACF plots.

▪︎ Estimation: Use statistical software to estimate the model parameters.

▪︎ Diagnostic Checking: Conduct model diagnostics through residual analysis and statistical tests to validate the model's adequacy.

▪︎ Forecasting: Utilize the fitted model to make time series predictions.

A common extension of the ARIMA model is SARIMA (Seasonal ARIMA), used for handling time series with seasonal components.

The main steps in conducting time series analysis include:

▪︎ Data Import and Cleaning: Check for missing values and outliers, and handle them. Convert the data into time series format and ensure the time index is correct.

▪︎ Plotting the Time Series: Visualize the data to understand its structure.

▪︎ Differencing and Stationarity Tests: Perform differencing to stabilize the mean of the time series and check for stationarity.

▪︎ ACF and PACF Plots: Plot the autocorrelation and partial autocorrelation functions to identify the order of the AR and MA parts.

▪︎ Model Fitting and Forecasting: Fit the ARIMA model to the data and generate forecasts.

The main steps in conducting time series analysis cover the following aspects: data import and cleaning, plotting time series graphs, differencing and stationarity testing, ACF and PACF plots, model fitting, and forecasting.

Here are the main analysis steps for each:

Step 1. Data Import and Cleaning

Check for missing values and outliers, and handle them accordingly.

Convert the data into time series format, ensuring the time index is correct.

prompt：

Read XXX.xlsx, check for missing values and outliers, and handle them accordingly. Save the original file after processing.

Step 2. Plotting the Time Series

Plot time series graphs to observe data trends and seasonality.

Generate line charts for the main fields (latest value, change percentage, etc.) to visually represent the time series data.

prompt：

Read Month-on-Month Historical Trends.xlsx , plot a time series graph to observe data trends and seasonality. Generate line charts for the main fields (latest value, change percentage, etc.) to visually represent the time series data. Follow these steps to plot a 3-year change percentage time series graph:

Read and Preprocess Data:

Read the original dataset containing the date and change percentage fields. Ensure that the change percentage data is converted from a string percentage to a float. For example, convert "-29.41%" to -29.41 using float(value.strip('%')).

Filter Data for the Recent 3 Years:

Select data from the past three years from the dataset. Assuming the current year is 2023, filter data starting from 2020.

Plot the Time Series Graph:

Use a plotting library such as Matplotlib or Seaborn to create a time series graph, with the date on the x-axis and change percentage on the y-axis.

Optimize the Chart:

Add a title and axis labels to clarify the chart's content. Ensure the y-axis range is appropriate to highlight the changes in percentage. Use distinct colors and appropriate line styles to enhance the chart's readability. Add data labels or annotations at key data points or significant changes to emphasize important variations.

Step 3. Differencing and Stationarity Testing

Perform differencing on the data to eliminate non-stationarity.

Use the Augmented Dickey-Fuller (ADF) test to verify if the data is stationary.

If the data remains non-stationary after differencing, try differencing of different orders until stationarity is achieved.

prompt：

Read Month-on-Month Historical Trends.xlsx perform differencing on the data to eliminate non-stationarity. Use the Augmented Dickey-Fuller (ADF) test to verify if the data is stationary.

If the data remains non-stationary after differencing, try differencing of different orders until stationarity is achieved. Save the processed data results in "Differencing and Stationarity Testing" locally

Preview and display whether the data is stationary, and if not, list the final differencing order attempted.

Step 4. Plotting Autocorrelation and Partial Autocorrelation Function Charts (ACF & PACF)

Calculate and plot ACF and PACF charts to help determine the parameters (p) and (q) of the ARIMA model.

Analyze the cutoff and tailing phenomena in the ACF and PACF charts to select suitable parameters.

prompt：

Read Month-on-Month Historical Trends.xlsx calculate and plot ACF and PACF charts to help determine the parameters (p) and (q) of the ARIMA model.

ACF Chart Analysis:

The ACF chart shows the correlation between the time series and its lagged values. If the ACF tails off to zero after a certain lag, it usually indicates the presence of a Moving Average (MA) component. If the ACF cuts off after a certain lag, it may indicate the presence of an Autoregressive (AR) component.

PACF Chart Analysis:

The PACF chart shows the direct correlation between the time series and its lagged values, excluding the influence of intermediate lags. If the PACF cuts off after a certain lag, it usually indicates the presence of an AR component. If the PACF tails off, it may indicate the presence of an MA component.

Observed from the Charts:

ACF Chart: There's a significant spike at lag 1, then it gradually decays, indicating a possible MA component. The decay pattern of the ACF suggests a possible MA order (q) of 1 or 2.

PACF Chart: There's a significant spike at lag 1, then it quickly cuts off to zero, indicating a possible AR component. The cutoff pattern of the PACF suggests a possible AR order (p) of 1.

⭕️ Conclusion:

Based on the ACF and PACF charts, the possible ARIMA model parameters are p=1 and q=1 or q=2. Further model diagnostics and validation are needed to determine the optimal values of p and q, as well as whether differencing (d) is needed to make the series stationary.

Step 5. Model Fitting

Determine the parameters (p), (d), and (q) of the ARIMA model based on the previous steps. The possible ARIMA model parameters are p=1 and q=1 or q=2.📊

Fit the ARIMA model using the data and evaluate the model's fitting results and parameter significance.

prompt：

Read Differencing and Stationarity Testing.xlsx Month-on-Month Historical Trends.xlsxARIMA Model Parameters are p=1, q=1 or q=2, d=0

Fit the data using the ARIMA model and review the model fit and parameter significance. Save the analyzed data locally, named "Model Fit".

[ARIMA Model Fit Plot - Interpretation]

This plot shows the fit of the ARIMA model:

The blue line represents the observed values.

The red line represents the fitted values by the model.

From the plot, it can be seen that the model fits well at certain time points, but at other points, especially during high volatility periods, the fit may not be as good. This may indicate limitations of the model in capturing certain characteristics of the data.

[Data Section - Interpretation] ARIMA Model Parameters and Statistical Test Results

Model Parameters:

ARIMA(1,0,1): This indicates an Autoregressive Integrated Moving Average model with (p = 1), (d = 0), (q = 1).

Log Likelihood: -482.384, used for model comparison; the larger, the better.

AIC (Akaike Information Criterion): 972.768, used for model selection; the smaller, the better.

BIC (Bayesian Information Criterion): 986.674, also used for model selection; the smaller, the better.

HQIC (Hannan-Quinn Information Criterion): 978.372.

Parameter Estimates:

const (Constant term): 0.4290, indicating the constant offset in the model.

ar.L1 (Autoregressive term): 0.3603, indicating the first-order autoregressive coefficient.

ma.L1 (Moving average term): -0.0354, indicating the first-order moving average coefficient.

sigma2 (Variance): 3.3145, indicating the variance of the residuals.

Significance Test:

P>|z|: This denotes the p-value of the parameters to test their significance., a p-value less than 0.05 is considered significant.

The p-value of the constant term is 0.033, which is less than 0.05, hence significant.

The p-value of the autoregressive term is 0.082, slightly above 0.05, thus near significant.

The p-value of the moving average term is 0.868, which is much higher than 0.05, hence not significant.

Diagnostic Test:

Ljung-Box test: Q-statistic is 0.00, p-value is 0.98, indicating no autocorrelation in residuals, suggesting a good model fit.

Jarque-Bera test: JB-statistic is 229.70, p-value is 0.00, indicating residuals are not normally distributed.

Heteroskedasticity test: H-statistic is 0.63, p-value is 0.04, indicating the presence of heteroskedasticity.

Conclusion:

The constant term of the model is significant, but the moving average term is not, suggesting a poor fit in the moving average part.

The Ljung-Box test shows no autocorrelation issues, but the Jarque-Bera test and heteroskedasticity test indicate the model residuals are not normally distributed and exhibit heteroskedasticity, potentially affecting predictive performance.

Consider improving the model by adjusting parameters or exploring other models, such as increasing the differencing order (d), or other time series models.

Overall, while this ARIMA model performs well in certain aspects, the issues with residual normality and heteroskedasticity suggest that further tuning and validation are needed.

[ARIMA Model Fit Plot - Interpretation]

This plot shows the fit of the ARIMA model:

The blue line represents the observed values.

The red line represents the fitted values by the model.

[Data Section - Interpretation] ARIMA Model Parameters and Statistical Test Results

Model Parameters:

ARIMA(1,0,1): This indicates an Autoregressive Integrated Moving Average model with (p = 1), (d = 0), (q = 1).

Log Likelihood: -482.384, used for model comparison; the larger, the better.

AIC (Akaike Information Criterion): 972.768, used for model selection; the smaller, the better.

BIC (Bayesian Information Criterion): 986.674, also used for model selection; the smaller, the better.

HQIC (Hannan-Quinn Information Criterion): 978.372.

Parameter Estimates:

const (Constant term): 0.4290, indicating the constant offset in the model.

ar.L1 (Autoregressive term): 0.3603, indicating the first-order autoregressive coefficient.

ma.L1 (Moving average term): -0.0354, indicating the first-order moving average coefficient.

sigma2 (Variance): 3.3145, indicating the variance of the residuals.

Significance Test:

P>|z|: This denotes the p-value of the parameters to test their significance., a p-value less than 0.05 is considered significant.

The p-value of the constant term is 0.033, which is less than 0.05, hence significant.

The p-value of the autoregressive term is 0.082, slightly above 0.05, thus near significant.

The p-value of the moving average term is 0.868, which is much higher than 0.05, hence not significant.

Diagnostic Test:

Ljung-Box test: Q-statistic is 0.00, p-value is 0.98, indicating no autocorrelation in residuals, suggesting a good model fit.

Jarque-Bera test: JB-statistic is 229.70, p-value is 0.00, indicating residuals are not normally distributed.

Heteroskedasticity test: H-statistic is 0.63, p-value is 0.04, indicating the presence of heteroskedasticity.

Conclusion:

The constant term of the model is significant, but the moving average term is not, suggesting a poor fit in the moving average part.

Consider improving the model by adjusting parameters or exploring other models, such as increasing the differencing order (d), or other time series models.

Overall, while this ARIMA model performs well in certain aspects, the issues with residual normality and heteroskedasticity suggest that further tuning and validation are needed.

Step 6. Model Diagnostics

Analyze residuals, plot residuals to ensure zero mean and white noise.

Validate residuals as white noise using the Ljung-Box test or other statistical tests.

If the model does not meet requirements, adjust model parameters or consider alternative models.

prompt：

Read/ Model Fit.csv ARIMA model parameters are p=1, q=1 or q=2, d=0

Analyze model residuals, plot residuals to ensure the residual mean is zero and they are white noise.

Validate residuals as white noise using the Ljung-Box test or other statistical tests.

If the model does not meet requirements, adjust model parameters or consider alternative models.

Residual Plot Interpretation:

Residual Distribution: Residuals should be randomly distributed around the zero line (red dashed line) without any obvious patterns or trends. If there are clear patterns in the residual plot, this may indicate that the model has not captured certain information in the data.

Residual Mean: The plot mentions a residual mean of -0.0023, which is very close to zero. This is expected because a good model should balance positive and negative prediction errors.

Ljung-Box Test: The plot also provides the results of the Ljung-Box test to check for autocorrelation in the residual series.

lb_stat: The value of the Ljung-Box statistic is 10.854.

lb_pvalue: The corresponding p-value is 0.58, which is much higher than the commonly used significance level of 0.05. This means there is not enough evidence to reject the null hypothesis that the residual series is white noise, indicating no significant autocorrelation among residuals.

Residual Volatility: The residual plot shows the volatility of the residuals. Ideally, the volatility should be constant, indicating no heteroskedasticity. If residual volatility changes over time, it may suggest poor model fit during certain periods.

Conclusion:

Model Fit: Based on the Ljung-Box test results, there is no significant autocorrelation in the model's residuals, indicating good performance in capturing the data's autocorrelation structure.

Model Parameters: The ARIMA model parameters are (p=1), (q=1) or (q=2), (d=0). This means the model uses a first-order autoregressive term and a first-order or second-order moving average term, and the data was stationary before model fitting.

Improvement Suggestions: Although the Ljung-Box test shows no significant autocorrelation in residuals, the residual plot shows larger volatility during certain periods. This may indicate poor model fit during these times, possibly requiring further parameter adjustments or exploring other models.

Overall, this ARIMA model performs well in capturing data autocorrelation but may have room for improvement in handling data volatility.

Step 7: Forecasting

prompt：

Read Model Fit.csv to ensure the file is loaded successfully and handle exceptions such as file not found or empty content.

Process the date column by converting it to datetime format, print the length of the preprocessed data, and display the first few rows to confirm proper date parsing.

If date formats are inconsistent, use pandas to automatically infer the date format to ensure data integrity.

Verify the date column is intact after preprocessing for data checks.

Fit an ARIMA model with parameters p=1, d=0, q=1 or q=2, and select the appropriate model for fitting.

Use the fitted ARIMA model to forecast the next 10 steps in the time series, obtaining predicted values and confidence intervals. Forecast data beyond 2022.

Check the forecast results and handle potential exceptions that may occur.

Create a chart comparing actual values, predicted values, and confidence intervals. Ensure the chart is clear and accurately displays the forecast results.

Finally, save the time series forecast chart and prediction results table locally, ensuring they are stored in the same Excel file.

Note: The confidence interval area should be clear and prominent, adjust the X-axis to enhance the visual proportion of the confidence interval.

If the closed parameter is used in the pd.date_range() function and is unsupported, remove or replace it with the correct parameter.

Alternative: Remove the closed parameter by simply omitting it from pd.date_range calls.

Step 8: Interpret and Apply Results

Model Fitting and Prediction Interpretation:

Model Fitting: The ARIMA model (parameters p=1, q=1 or q=2, d=0) was used to fit historical data. The fit can be evaluated by how closely actual values align with model predictions.

Key Figures Summary:

Price Change: Measures the month-to-month change in jewelry prices. For example, a -29.41% change in December 2022 indicates a nearly one-third price drop from the previous month.

Changes over 3, 6, and 12 months: Reflect price changes over different periods. For example, a 220.0% change over the past three months in December 2022 shows significant volatility.

Forecast Results: The model provides future predicted values and confidence intervals. For instance, a prediction for December 31, 2022, of 0.6936 with a confidence interval of [-2.8746, 4.2619] indicates slight price increase and high uncertainty.

Future Data Trend Interpretation:

Volatility: Jewelry prices show high volatility, influenced by supply-demand, economic conditions, monetary policy, etc.

Trends: Despite volatility, long-term trends may be driven by macroeconomic and industry-specific factors. Economic growth expectations could push jewelry prices higher.

Practical Measures or Strategies:

Risk Management: Use tools like futures or options to hedge price risks.

Market Research: Analyze factors affecting prices, including macroeconomic indicators, industry trends, and consumer preferences, to better understand and predict price changes.

Flexible Pricing Strategies: Adjust pricing based on forecasts to attract consumers and protect profit margins.

Inventory Management: Adjust inventory levels based on forecasts to avoid excessive stock during price falls or increase stock before price rises.

Consumer Education: Educate consumers on the potential value and risks of investing in jewelry to help them make informed buying decisions.

Monitor Macroeconomics: Stay updated on economic indicators and policy changes to understand their impact on jewelry price trends.

Implementing these strategies allows jewelers and investors to better navigate market uncertainty, optimize decisions, and mitigate potential negative impacts.

Bayeslab makes data analysis as easy as note-taking!

Bayeslab makes data analysis as easy
as note-taking!

Start Free