Pipeline Overview
Our prediction system follows a rigorous end-to-end pipeline that ensures data quality, feature relevance, model robustness, and prediction validation.
Data Sources & Quality
Quality predictions start with quality data. We aggregate historical and real-time price data from multiple exchanges to ensure accuracy and reliability.
Data Collection
- Hourly OHLCV (Open, High, Low, Close, Volume) data for 20+ cryptocurrencies
- Historical data spanning multiple market cycles for robust training
- Real-time price feeds for timely predictions
- Fundamental data including FOMC decisions and macro indicators
Quality Assurance
- Automated data validation to detect anomalies and gaps
- Cross-exchange verification for price accuracy
- Outlier detection and handling protocols
- Missing data imputation using robust statistical methods
Feature Engineering
We engineer over 100 features from raw price data, combining traditional technical indicators with custom-designed predictive features.
Technical Indicators
- Trend: Ichimoku Cloud (Tenkan, Kijun, Senkou Span A/B), Moving Averages (SMA, EMA)
- Momentum: RSI, MACD, Stochastic Oscillator, Williams %R
- Volatility: Bollinger Bands, ATR, Keltner Channels, Standard Deviation
- Volume: OBV, Volume Profile, Volume-Price Trend
Advanced Features
- Pattern Detection: Volatility squeeze (BB inside Keltner), momentum expansion signals
- Divergence Analysis: RSI divergence, MACD histogram divergence
- Periodic Patterns: Day-of-week effects, hour-of-day seasonality
- Cross-Asset Features: BTC correlation, sector momentum
Feature Selection
Not all features are created equal. We use importance ranking from gradient boosting models and recursive feature elimination to identify the most predictive features, reducing noise and improving generalization.
Model Architecture
We deploy an ensemble of five distinct model architectures, each with unique strengths for capturing different patterns in market data.
XGBoost
Our primary model uses extreme gradient boosting with careful hyperparameter tuning. XGBoost excels at capturing non-linear relationships between features and handles missing values gracefully.
LightGBM
Microsoft's gradient boosting framework provides faster training and often complementary predictions to XGBoost, particularly effective on high-cardinality features.
Deep Neural Network (DNN)
A multi-layer feedforward network with dropout regularization captures complex feature interactions that tree-based models might miss.
LSTM (Long Short-Term Memory)
Our sequence model uses a sliding window of historical data to capture temporal dependencies and momentum patterns in price movements.
Graph Neural Network (GNN)
An experimental model that learns correlations between assets, capturing how movements in major cryptocurrencies influence altcoins.
Ensemble Strategy
We combine predictions from all five models using a weighted ensemble approach. Weights are dynamically adjusted based on recent performance, giving more influence to models that have been accurate in current market conditions.
Training & Validation
Rigorous training and validation procedures ensure our models generalize to unseen data and don't overfit to historical patterns.
Walk-Forward Validation
We use time-series cross-validation with a walk-forward approach. Models are trained on historical data and validated on subsequent periods, mimicking real-world deployment conditions.
Train/Validation/Test Split
- Training: Historical data for model fitting (no data leakage)
- Validation: Recent historical data for hyperparameter tuning
- Test: Holdout data never seen during training for final evaluation
Preventing Overfitting
- Early stopping based on validation loss
- L2 regularization in neural networks
- Tree depth limits and minimum samples per leaf in boosting models
- Feature importance pruning to remove noisy predictors
Prediction Validation
Every prediction is timestamped under saved before its target time. Once the prediction window closes, we compare predictions against actual outcomes.
Metrics We Track
- Direction Accuracy: Did we correctly predict up/down movement?
- Mean Absolute Error: Average magnitude of prediction errors
- Hit Rate by Confidence: Accuracy stratified by prediction confidence
- Model-Specific Performance: Individual model accuracy tracking
Transparency Commitment
All prediction outcomes are published on our Performance page. We believe in full transparency — both successes and failures are documented and analyzed.
Important Disclaimer
Past performance does not guarantee future results. Cryptocurrency markets are highly volatile and unpredictable. Our predictions are probabilistic estimates, not financial advice. Always do your own research and never invest more than you can afford to lose.
Continuous Improvement
Markets evolve, and our models must adapt. We employ a continuous improvement process to maintain prediction quality.
Model Retraining
- Weekly model retraining with latest data
- Regime detection to identify market condition changes
- Automatic performance degradation alerts
Research & Development
- Ongoing feature engineering experimentation
- New model architecture testing (transformers, attention mechanisms)
- Alternative data source integration (sentiment, on-chain metrics)