Store Sales Prediction in Python - Time Series Machine Learning Project
By NeuralNine
Time Series ForecastingMachine Learning Model DevelopmentData PreprocessingDeep Learning Architectures
Share:
Key Concepts
- Time Series Prediction: Forecasting future values based on historical data.
- Temporal Convolutional Neural Network (TCN): A deep learning architecture using 1D convolutional layers for sequence modeling.
- Encoder-Decoder Architecture: A common neural network structure where an encoder processes input and a decoder generates output.
- Kaggle Store Sales Dataset: A benchmark dataset for time series forecasting, involving predicting store sales for different items and stores.
- Pandas Pivot: A data manipulation technique to reshape data from a long format to a wide format.
- StandardScaler: A preprocessing technique from scikit-learn to standardize features by removing the mean and scaling to unit variance.
- PyTorch Tensors: The fundamental data structure in PyTorch for numerical computation, similar to NumPy arrays but with GPU acceleration.
- DataLoader and Dataset: PyTorch utilities for efficient data loading and batching during model training.
- 1D Convolutional Layers: Layers that apply filters to sequential data, capturing local patterns.
- Dilation and Padding: Techniques in convolutional layers to expand the receptive field and maintain temporal dimensions.
- Mean Squared Error (MSE) / Root Mean Squared Error (RMSE): Common loss functions for regression tasks.
- Backpropagation: The process of adjusting model weights based on the calculated loss.
- Optimizer (Adam): An algorithm used to update model weights during training.
- Inference: The process of using a trained model to make predictions on new, unseen data.
- Inverse Transform: Reversing the scaling applied during preprocessing to obtain original data values.
Machine Learning Process for Time Series Prediction
This video details the end-to-end machine learning process for time series prediction, specifically using a Temporal Convolutional Neural Network (TCN) on the Kaggle Store Sales dataset. The goal is to predict store sales for the next 16 days based on the past 120 days of data.
1. Dataset Overview and Setup
- Dataset: Kaggle Store Sales dataset, a permanent competition for time series forecasting.
- Data Files:
train.csv: Contains historical sales data.test.csv: Contains the structure for prediction, but without sales data.sample_submission.csv: Shows the required submission format (ID, Sales).
- Objective: Predict sales for each store and item category for 16 days.
- Environment Setup:
- Utilized Jupyter Lab for interactive development.
- Recommended using
uvfor package management (pip install uv,uv init,uv add pandas numpy torch scikit-learn matplotlib jupyterlab). - Installed necessary libraries:
pandas(data manipulation),numpy(numerical operations),torch(PyTorch for deep learning),scikit-learn(preprocessing),matplotlib(visualization).
2. Data Exploration and Preprocessing
- Loading Data:
pandas.read_csv('data/train.csv')to load the training data. - Data Structure: The raw data has columns like
ID,date,store_nbr,family,on_promotion, andsales. - Problem Formulation: The task requires predicting sales for each unique combination of
store_nbrandfamilyfor a future time period. This implies creating a separate time series for each combination. - Data Reshaping (Pivoting):
- Feature Engineering: A new column
store_familywas created by combiningstore_nbrandfamilyusingdf.apply(lambda x: f"{x['store_nbr']}_{x['family']}", axis=1). - Pivoting: The
pandas.pivotfunction was used to transform the data into a wide format:index='date'columns='store_family'values='sales'
- This resulted in a DataFrame
df_pivotedwhere each row is a date, and each column represents a uniquestore_familytime series.
- Feature Engineering: A new column
- Data Visualization:
- Used
matplotlib.pyplotto visualize a sample of time series (8 families x 3 stores) to understand patterns, seasonality, and potential outliers. - Observed varying sales patterns across different store-family combinations, including zero sales for some periods.
- Used
- Data Scaling:
- Necessity: Neural networks are sensitive to feature scales, so standardization is crucial.
- Method:
sklearn.preprocessing.StandardScalerwas used. - Train-Validation Split: The pivoted data was split into training (80%) and validation (20%) sets. Crucially, no shuffling was performed to preserve the temporal order of the data.
scaler.fit_transform(train_data): Fitted the scaler on the training data and transformed it.scaler.transform(test_data): Transformed the validation data using the scaler fitted on the training data.
- Creating Input/Output Sequences:
- A helper function
create_xy(data, input_length, output_length)was defined. - This function iterates through the scaled data to create sequences of
input_length(120 days) as input features (X) andoutput_length(16 days) as target values (Y). - The iteration range was
len(data) - input_length - output_length + 1to ensure enough data for both input and output. - The output sequences (
Y) were theoutput_lengthdays immediately following the input sequence. - The function returned
XandYas NumPy arrays. - Applied this function to
train_data_scaledandtest_data_scaledwithinput_length=120andoutput_length=16.
- A helper function
- PyTorch Tensor Conversion:
- Converted NumPy arrays to PyTorch tensors:
torch.FloatTensor(numpy_array). - Moved tensors to GPU if available:
.to('cuda').
- Converted NumPy arrays to PyTorch tensors:
- DataLoader Setup:
- Created
torch.utils.data.TensorDatasetfor training and testing data. - Created
torch.utils.data.DataLoaderwithbatch_size=32for efficient batching. shuffle=Truefor the training loader to introduce randomness during training.shuffle=Falsefor the test loader as shuffling is not needed for inference.
- Created
3. Temporal Convolutional Neural Network (TCN) Model
- Architecture: An encoder-decoder structure using 1D convolutional layers.
- Key Components:
nn.Conv1d: 1D convolutional layers.in_channels: Number of input features per time step (initially 1782, which is 33 families * 54 stores).out_channels: Number of filters (e.g., 64).kernel_size: The size of the sliding window (e.g., 3).padding: Adds zero-padding to the input to maintain temporal dimensions.dilation: Spreads out the kernel to increase the receptive field without increasing kernel size or depth. Dilation rates were doubled in subsequent layers (1, 2, 4).
- Activation Function:
nn.ReLU(Rectified Linear Unit) to introduce non-linearity. - Cropping: After applying convolution and activation, the padding was cropped out to maintain the original temporal length. The amount of cropping matched the padding applied.
nn.Linear(Fully Connected Layer): A final layer to map the extracted features to the desired output shape.in_features: Number of channels from the last convolutional layer (64).out_features: The total number of output values required, which isoutput_length * number_of_channels(16 days * 1782 store-family combinations).
view()method: Reshaped the output of the linear layer into the desired format:(batch_size, output_length, number_of_channels).
- Forward Pass:
- The input tensor
xwas transposed to match the expected(batch_size, channels, sequence_length)format fornn.Conv1d. - Data was passed sequentially through convolutional layers, activation functions, and cropping.
- The output of the last convolutional block was passed through the linear layer.
- The output was reshaped using
view()to the final prediction format.
- The input tensor
4. Model Training
- Initialization:
- Instantiated the
TCNModel. - Defined the optimizer:
torch.optim.Adamwith a learning rate of0.0001. - Defined the loss function:
nn.MSELoss(Mean Squared Error). The square root was applied manually to calculate RMSE.
- Instantiated the
- Training Loop:
- Iterated for a fixed number of
epochs(e.g., 30) to avoid overfitting. - Set the model to training mode:
model.train(). - Iterated through batches from the
train_loader. - Steps within each batch:
- Zero gradients:
optimizer.zero_grad(). - Forward pass:
predictions = model(x_batch). - Calculate loss:
loss = torch.sqrt(criterion(predictions, y_batch)). - Backward pass:
loss.backward(). - Optimizer step:
optimizer.step(). - Accumulate epoch loss.
- Zero gradients:
- Printed the epoch loss every 5 epochs.
- Iterated for a fixed number of
- Evaluation on Validation Set:
- Set the model to evaluation mode:
model.eval(). - Disabled gradient calculation:
with torch.no_grad():. - Made predictions on the
X_test_tensor. - Calculated the test loss (RMSE).
- Observation: The test loss was higher than the training loss, potentially due to outliers in the test data. The speaker noted that a large outlier in
Y_testsignificantly impacted the RMSE, but the median values were similar, suggesting the model might still perform reasonably on Kaggle.
- Set the model to evaluation mode:
5. Full Dataset Training and Prediction
- Retraining on Full Data:
- The entire training dataset (without the train-validation split) was used to retrain the model.
- The
StandardScalerwas refitted on the entire pivoted training data. - The
create_xyfunction was applied to the scaled full data. - Tensors and DataLoaders were created for the full dataset.
- A new model (
final_model) was initialized and trained for the same number of epochs using thefull_loader.
- Generating Test Predictions:
- The
final_modelwas set to evaluation mode (model.eval()). - Crucial Step: The last 120 days of the full scaled training data (
full_data_scaled[-120:]) were used as input to predict the first 16 days of the test set. This is because the test set's sales are unknown, and the model needs historical context from the training data to make predictions. - The input sequence was unsqueezed to add a batch dimension.
- Predictions were generated:
predictions = final_model(last_sequence).
- The
- Post-processing Predictions:
- Predictions were moved back to CPU (
.cpu()) and converted to NumPy arrays (.numpy()). - The extra dimension was squeezed out.
- Inverse Scaling:
scaler.inverse_transform(predictions)was applied to convert scaled predictions back to original sales values. - Capping: Negative predictions were capped at zero using
np.maximum(predictions, 0).
- Predictions were moved back to CPU (
- Formatting for Submission:
- The
test.csvfile was loaded into a DataFrame (test_df). - The
store_familyfeature was recreated for the test DataFrame. - The unique dates from the test set were extracted.
- A prediction DataFrame (
prediction_df) was created with the predicted sales, indexed by date and with columns representingstore_family. - The
prediction_dfwas "stacked" using.stack()and then.reset_index()to transform it back into a long format, similar to the original training data structure. - The
test_dfand the long prediction DataFrame were merged ondateandstore_familyto align predictions with the correct IDs. - The final submission DataFrame was created by selecting the
IDandsalescolumns. - The submission DataFrame was saved to
submission.csvusingto_csv(index=False).
- The
6. Kaggle Submission and Conclusion
- Submission: The generated
submission.csvfile was uploaded to the Kaggle Store Sales competition. - Result: Achieved a score of 0.48, placing 94th on the leaderboard.
- Key Takeaway: The presented approach provides a solid baseline for time series prediction using TCNs. The speaker encourages further experimentation with more complex architectures, feature engineering, and hyperparameter tuning to improve the score.
This comprehensive process demonstrates how to handle time series data, build and train a TCN model, and generate predictions for submission on a platform like Kaggle.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "Store Sales Prediction in Python - Time Series Machine Learning Project". What would you like to know?
Chat is based on the transcript of this video and may not be 100% accurate.